1
|
van den Akker OR, Thibault RT, Ioannidis JPA, Schorr SG, Strech D. Transparency in the secondary use of health data: assessing the status quo of guidance and best practices. ROYAL SOCIETY OPEN SCIENCE 2025; 12:241364. [PMID: 40144285 PMCID: PMC11937929 DOI: 10.1098/rsos.241364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 12/18/2024] [Accepted: 12/31/2024] [Indexed: 03/28/2025]
Abstract
We evaluated what guidance exists in the literature to improve the transparency of studies that make secondary use of health data. To find peer-reviewed papers, we searched PubMed and Google Scholar. To find institutional documents, we used our personal expertise to draft a list of health organizations and searched their websites. We quantitatively and qualitatively coded different types of research transparency: registration, methods reporting, results reporting, data sharing and code sharing. We found 56 documents that provide recommendations to improve the transparency of studies making secondary use of health data, mainly in relation to study registration (n = 27) and/or methods reporting (n = 39). Only three documents made recommendations on data sharing or code sharing. Recommendations for study registration and methods reporting mainly came in the form of structured documents like registration templates and reporting guidelines. Aside from the recommendations aimed directly at researchers, we also found recommendations aimed at the wider research community, typically on how to improve research infrastructure. Limitations or challenges of improving transparency were rarely mentioned, highlighting the need for more nuance in providing transparency guidance for studies that make secondary use of health data.
Collapse
Affiliation(s)
| | - Robert T. Thibault
- Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA
- Coalition for Aligning Science, Chevy Chase, MD, USA
| | - John P. A. Ioannidis
- Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA
- Departments of Medicine and of Epidemiology and Population Health, Stanford University, Stanford, CA, USA
| | - Susanne G. Schorr
- QUEST Center for Responsible Research, Berlin Institute of Health, Berlin, Germany
| | - Daniel Strech
- QUEST Center for Responsible Research, Berlin Institute of Health, Berlin, Germany
| |
Collapse
|
2
|
Morozyuk D, Weiner MG. Outliers in diagnosis ratios: A clue toward possibly absent data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:1175-1182. [PMID: 38222346 PMCID: PMC10785923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
The evaluation of completeness of real-world data is a particularly challenging component of data quality assessment because the degree of truly versus erroneously absent data is unknown. Among inpatient data sets, while absolute counts of admissions having specific categories of diagnoses in the principal or any position may vary depending on hospital size, we hypothesized that the ratio of these parameters will be preserved across sites, with outliers suggesting the potential for erroneously absent data. For several categories of clinical conditions assigned to inpatient admissions, we analyzed the ratio of their recording as the principal diagnosis versus any diagnosis across several hospitals and compared the ratios against a national benchmark. Our analysis showed ratios that matched clinical expectations, with reasonable preservation of ratios across sites. However, some conditions exhibited more variability in the ratios and some sites had many outliers possibly reflecting data quality issues that warrant further attention.
Collapse
|
3
|
Levitt EB, Patch DA, Hess MC, Terrero A, Jaeger B, Haendel MA, Chute CG, Yeager MT, Ponce BA, Theiss SM, Spitler CA, Johnson JP. Outcomes of SARS-CoV-2 infection among patients with orthopaedic fracture surgery in the National COVID Cohort Collaborative (N3C). Injury 2023; 54:111092. [PMID: 37871347 DOI: 10.1016/j.injury.2023.111092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/02/2023] [Indexed: 10/25/2023]
Abstract
BACKGROUND The objective of this study was to investigate the outcomes of COVID-19-positive patients undergoing orthopaedic fracture surgery using data from a national database of U.S. adults with a COVID-19 test for SARS-CoV-2. METHODS This is a retrospective cohort study using data from a national database to compare orthopaedic fracture surgery outcomes between COVID-19-positive and COVID-19-negative patients in the United States. Participants aged 18-99 with orthopaedic fracture surgery between March and December 2020 were included. The main exposure was COVID-19 status. Outcomes included perioperative complications, 30-day all-cause mortality, and overall all-cause mortality. Multivariable adjusted models were fitted to determine the association of COVID-positivity with all-cause mortality. RESULTS The total population of 6.5 million patient records was queried, identifying 76,697 participants with a fracture. There were 7,628 participants in the National COVID Cohort who had a fracture and operative management. The Charlson Comorbidity Index was higher in the COVID-19-positive group (n = 476, 6.2 %) than the COVID-19-negative group (n = 7,152, 93.8 %) (2.2 vs 1.4, p<0.001). The COVID-19-positive group had higher mortality (13.2 % vs 5.2 %, p<0.001) than the COVID-19-negative group with higher odds of death in the fully adjusted model (Odds Ratio=1.59; 95 % Confidence Interval: 1.16-2.18). CONCLUSION COVID-19-positive participants with a fracture requiring surgery had higher mortality and perioperative complications than COVID-19-negative patients in this national cohort of U.S. adults tested for COVID-19. The risks associated with COVID-19 can guide potential treatment options and counseling of patients and their families. Future studies can be conducted as data accumulates. LEVEL OF EVIDENCE Level III.
Collapse
Affiliation(s)
- Eli B Levitt
- Department of Orthopaedic Surgery, University of Alabama, Birmingham, AL, USA; Department of Translational Medicine, Florida International University Herbert Wertheim College of Medicine, Miami, FL, USA
| | - David A Patch
- Department of Orthopaedic Surgery, University of Alabama, Birmingham, AL, USA
| | - Matthew C Hess
- Department of Orthopaedic Surgery, University of Alabama, Birmingham, AL, USA
| | - Alfredo Terrero
- Department of Translational Medicine, Florida International University Herbert Wertheim College of Medicine, Miami, FL, USA; Department of Translational Medicine, School of Medicine, University of Miami Miller, Miami, FL, USA
| | - Byron Jaeger
- Department of Epidemiology, University of Alabama, Birmingham, AL, USA
| | - Melissa A Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | - Matthew T Yeager
- Department of Orthopaedic Surgery, University of Alabama, Birmingham, AL, USA
| | | | - Steven M Theiss
- Department of Orthopaedic Surgery, University of Alabama, Birmingham, AL, USA
| | - Clay A Spitler
- Department of Orthopaedic Surgery, University of Alabama, Birmingham, AL, USA
| | - Joey P Johnson
- Department of Orthopaedic Surgery, University of Alabama, Birmingham, AL, USA.
| |
Collapse
|
4
|
Valencia Morales DJ, Bansal V, Heavner SF, Castro JC, Sharma M, Tekin A, Bogojevic M, Zec S, Sharma N, Cartin-Ceba R, Nanchal RS, Sanghavi DK, La Nou AT, Khan SA, Belden KA, Chen JT, Melamed RR, Sayed IA, Reilkoff RA, Herasevich V, Domecq Garces JP, Walkey AJ, Boman K, Kumar VK, Kashyap R. Validation of automated data abstraction for SCCM discovery VIRUS COVID-19 registry: practical EHR export pathways (VIRUS-PEEP). Front Med (Lausanne) 2023; 10:1089087. [PMID: 37859860 PMCID: PMC10583598 DOI: 10.3389/fmed.2023.1089087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 09/14/2023] [Indexed: 10/21/2023] Open
Abstract
Background The gold standard for gathering data from electronic health records (EHR) has been manual data extraction; however, this requires vast resources and personnel. Automation of this process reduces resource burdens and expands research opportunities. Objective This study aimed to determine the feasibility and reliability of automated data extraction in a large registry of adult COVID-19 patients. Materials and methods This observational study included data from sites participating in the SCCM Discovery VIRUS COVID-19 registry. Important demographic, comorbidity, and outcome variables were chosen for manual and automated extraction for the feasibility dataset. We quantified the degree of agreement with Cohen's kappa statistics for categorical variables. The sensitivity and specificity were also assessed. Correlations for continuous variables were assessed with Pearson's correlation coefficient and Bland-Altman plots. The strength of agreement was defined as almost perfect (0.81-1.00), substantial (0.61-0.80), and moderate (0.41-0.60) based on kappa statistics. Pearson correlations were classified as trivial (0.00-0.30), low (0.30-0.50), moderate (0.50-0.70), high (0.70-0.90), and extremely high (0.90-1.00). Measurements and main results The cohort included 652 patients from 11 sites. The agreement between manual and automated extraction for categorical variables was almost perfect in 13 (72.2%) variables (Race, Ethnicity, Sex, Coronary Artery Disease, Hypertension, Congestive Heart Failure, Asthma, Diabetes Mellitus, ICU admission rate, IMV rate, HFNC rate, ICU and Hospital Discharge Status), and substantial in five (27.8%) (COPD, CKD, Dyslipidemia/Hyperlipidemia, NIMV, and ECMO rate). The correlations were extremely high in three (42.9%) variables (age, weight, and hospital LOS) and high in four (57.1%) of the continuous variables (Height, Days to ICU admission, ICU LOS, and IMV days). The average sensitivity and specificity for the categorical data were 90.7 and 96.9%. Conclusion and relevance Our study confirms the feasibility and validity of an automated process to gather data from the EHR.
Collapse
Affiliation(s)
- Diana J. Valencia Morales
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| | - Vikas Bansal
- Division of Nephrology and Critical Care Medicine, Department of Internal Medicine, Mayo Clinic, Rochester, MN, United States
| | - Smith F. Heavner
- CURE Drug Repurposing Collaboratory, Critical Path Institute, Tucson, AZ, United States
| | - Janna C. Castro
- Department of Information Technology, Mayo Clinic, Scottsdale, AZ, United States
| | - Mayank Sharma
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| | - Aysun Tekin
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| | - Marija Bogojevic
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| | - Simon Zec
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| | - Nikhil Sharma
- Division of Nephrology and Critical Care Medicine, Department of Internal Medicine, Mayo Clinic, Rochester, MN, United States
| | - Rodrigo Cartin-Ceba
- Division of Critical Care Medicine, Department of Pulmonary Medicine, Mayo Clinic, Scottsdale, AZ, United States
| | - Rahul S. Nanchal
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Devang K. Sanghavi
- Department of Critical Care Medicine, Mayo Clinic Florida, Jacksonville, FL, United States
| | - Abigail T. La Nou
- Department of Critical Care Medicine, Mayo Clinic Health System, Eau Claire, WI, United States
| | - Syed A. Khan
- Department of Critical Care Medicine, Mayo Clinic Health System, Mankato, MN, United States
| | - Katherine A. Belden
- Division of Infectious Diseases, Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, PA, United States
| | - Jen-Ting Chen
- Division of Critical Care Medicine, Department of Internal Medicine, Montefiore Medical Center, Albert Einstein College of Medicine, Bronx, NY, United States
| | - Roman R. Melamed
- Department of Critical Care Medicine, Abbott Northwestern Hospital, Allina Health, Minneapolis, MN, United States
| | - Imran A. Sayed
- Department of Pediatrics, Children’s Hospital of Colorado, University of Colorado Anschutz Medical Campus, Colorado Springs, CO, United States
| | - Ronald A. Reilkoff
- Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Department of Internal Medicine, University of Minnesota Medical School, Edina, MN, United States
| | - Vitaly Herasevich
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| | - Juan Pablo Domecq Garces
- Division of Nephrology and Critical Care Medicine, Department of Internal Medicine, Mayo Clinic, Rochester, MN, United States
| | - Allan J. Walkey
- Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Department of Medicine, Evans Center of Implementation and Improvement Sciences, Boston University School of Medicine, Boston, MA, United States
| | - Karen Boman
- Society of Critical Care Medicine, Mount Prospect, IL, United States
| | - Vishakha K. Kumar
- Society of Critical Care Medicine, Mount Prospect, IL, United States
| | - Rahul Kashyap
- Division of Critical Care Medicine, Department of Anesthesiology and Perioperative Care, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
5
|
Yogesh MJ, Karthikeyan J. Health Informatics: Engaging Modern Healthcare Units: A Brief Overview. Front Public Health 2022; 10:854688. [PMID: 35570921 PMCID: PMC9099090 DOI: 10.3389/fpubh.2022.854688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 03/31/2022] [Indexed: 11/13/2022] Open
Abstract
In the current scenario, with a large amount of unstructured data, Health Informatics is gaining traction, allowing Healthcare Units to leverage and make meaningful insights for doctors and decision-makers with relevant information to scale operations and predict the future view of treatments via Information Systems Communication. Now, around the world, massive amounts of data are being collected and analyzed for better patient diagnosis and treatment, improving public health systems and assisting government agencies in designing and implementing public health policies, instilling confidence in future generations who want to use better public health systems. This article provides an overview of the HL7 FHIR Architecture, including the workflow state, linkages, and various informatics approaches used in healthcare units. The article discusses future trends and directions in Health Informatics for successful application to provide public health safety. With the advancement of technology, healthcare units face new issues that must be addressed with appropriate adoption policies and standards.
Collapse
Affiliation(s)
- M. J. Yogesh
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | | |
Collapse
|
6
|
Yin AL, Guo WL, Sholle ET, Rajan M, Alshak MN, Choi JJ, Goyal P, Jabri A, Li HA, Pinheiro LC, Wehmeyer GT, Weiner M, Safford MM, Campion TR, Cole CL. Comparing automated vs. manual data collection for COVID-specific medications from electronic health records. Int J Med Inform 2022; 157:104622. [PMID: 34741892 PMCID: PMC8529289 DOI: 10.1016/j.ijmedinf.2021.104622] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 09/19/2021] [Accepted: 10/15/2021] [Indexed: 12/29/2022]
Abstract
INTRODUCTION Data extraction from electronic health record (EHR) systems occurs through manual abstraction, automated extraction, or a combination of both. While each method has its strengths and weaknesses, both are necessary for retrospective observational research as well as sudden clinical events, like the COVID-19 pandemic. Assessing the strengths, weaknesses, and potentials of these methods is important to continue to understand optimal approaches to extracting clinical data. We set out to assess automated and manual techniques for collecting medication use data in patients with COVID-19 to inform future observational studies that extract data from the electronic health record (EHR). MATERIALS AND METHODS For 4,123 COVID-positive patients hospitalized and/or seen in the emergency department at an academic medical center between 03/03/2020 and 05/15/2020, we compared medication use data of 25 medications or drug classes collected through manual abstraction and automated extraction from the EHR. Quantitatively, we assessed concordance using Cohen's kappa to measure interrater reliability, and qualitatively, we audited observed discrepancies to determine causes of inconsistencies. RESULTS For the 16 inpatient medications, 11 (69%) demonstrated moderate or better agreement; 7 of those demonstrated strong or almost perfect agreement. For 9 outpatient medications, 3 (33%) demonstrated moderate agreement, but none achieved strong or almost perfect agreement. We audited 12% of all discrepancies (716/5,790) and, in those audited, observed three principal categories of error: human error in manual abstraction (26%), errors in the extract-transform-load (ETL) or mapping of the automated extraction (41%), and abstraction-query mismatch (33%). CONCLUSION Our findings suggest many inpatient medications can be collected reliably through automated extraction, especially when abstraction instructions are designed with data architecture in mind. We discuss quality issues, concerns, and improvements for institutions to consider when crafting an approach. During crises, institutions must decide how to allocate limited resources. We show that automated extraction of medications is feasible and make recommendations on how to improve future iterations.
Collapse
Affiliation(s)
- Andrew L Yin
- Weill Cornell Medical College, Weill Cornell Medicine, New York, NY, United States; Department of Medicine, Weill Cornell Medicine, New York, NY, United States.
| | - Winston L Guo
- Weill Cornell Medical College, Weill Cornell Medicine, New York, NY, United States
| | - Evan T Sholle
- Information Technologies & Services Department, Weill Cornell Medicine, New York, NY, United States
| | - Mangala Rajan
- Department of Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Mark N Alshak
- Weill Cornell Medical College, Weill Cornell Medicine, New York, NY, United States; Department of Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Justin J Choi
- Division of General Internal Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Parag Goyal
- Division of General Internal Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Assem Jabri
- Division of General Internal Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Han A Li
- Weill Cornell Medical College, Weill Cornell Medicine, New York, NY, United States; Department of Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Laura C Pinheiro
- Department of Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Graham T Wehmeyer
- Weill Cornell Medical College, Weill Cornell Medicine, New York, NY, United States; Department of Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Mark Weiner
- Department of Medicine, Weill Cornell Medicine, New York, NY, United States; Information Technologies & Services Department, Weill Cornell Medicine, New York, NY, United States
| | - Monika M Safford
- Division of General Internal Medicine, Weill Cornell Medicine, New York, NY, United States
| | - Thomas R Campion
- Information Technologies & Services Department, Weill Cornell Medicine, New York, NY, United States; Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, United States; Clinical and Translational Science Center, Weill Cornell Medicine, New York, NY, United States
| | - Curtis L Cole
- Department of Medicine, Weill Cornell Medicine, New York, NY, United States; Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, United States
| |
Collapse
|
7
|
Li R, Niu Y, Scott SR, Zhou C, Lan L, Liang Z, Li J. Using Electronic Medical Record Data for Research in a Healthcare Information and Management Systems Society (HIMSS) Analytics Electronic Medical Record Adoption Model (EMRAM) Stage 7 Hospital in Beijing: Cross-sectional Study. JMIR Med Inform 2021; 9:e24405. [PMID: 34342589 PMCID: PMC8371484 DOI: 10.2196/24405] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 12/01/2020] [Accepted: 06/07/2021] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND With the proliferation of electronic medical record (EMR) systems, there is an increasing interest in utilizing EMR data for medical research; yet, there is no quantitative research on EMR data utilization for medical research purposes in China. OBJECTIVE This study aimed to understand how and to what extent EMR data are utilized for medical research purposes in a Healthcare Information and Management Systems Society (HIMSS) Analytics Electronic Medical Record Adoption Model (EMRAM) Stage 7 hospital in Beijing, China. Obstacles and issues in the utilization of EMR data were also explored to provide a foundation for the improved utilization of such data. METHODS For this descriptive cross-sectional study, cluster sampling from Xuanwu Hospital, one of two Stage 7 hospitals in Beijing, was conducted from 2016 to 2019. The utilization of EMR data was described as the number of requests, the proportion of requesters, and the frequency of requests per capita. Comparisons by year, professional title, and age were conducted by double-sided chi-square tests. RESULTS From 2016 to 2019, EMR data utilization was poor, as the proportion of requesters was 5.8% and the frequency was 0.1 times per person per year. The frequency per capita gradually slowed and older senior-level staff more frequently used EMR data compared with younger staff. CONCLUSIONS The value of using EMR data for research purposes is not well studied in China. More research is needed to quantify to what extent EMR data are utilized across all hospitals in Beijing and how these systems can enhance future studies. The results of this study also suggest that young doctors may be less exposed or have less reason to access such research methods.
Collapse
Affiliation(s)
- Rui Li
- Information Center, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Yue Niu
- Statistical Procedure Department, Blueballon (Beijing) Medical Research Co, Ltd, Beijing, China
| | - Sarah Robbins Scott
- National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Chu Zhou
- National Center for AIDS/STD Control and Prevention, Chinese Center for Disease Control and Prevention, Beijing, China
| | - Lan Lan
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Beijing, China
| | - Zhigang Liang
- Information Center, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Jia Li
- Information Center, Xuanwu Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
8
|
Haendel MA, Chute CG, Bennett TD, Eichmann DA, Guinney J, Kibbe WA, Payne PRO, Pfaff ER, Robinson PN, Saltz JH, Spratt H, Suver C, Wilbanks J, Wilcox AB, Williams AE, Wu C, Blacketer C, Bradford RL, Cimino JJ, Clark M, Colmenares EW, Francis PA, Gabriel D, Graves A, Hemadri R, Hong SS, Hripscak G, Jiao D, Klann JG, Kostka K, Lee AM, Lehmann HP, Lingrey L, Miller RT, Morris M, Murphy SN, Natarajan K, Palchuk MB, Sheikh U, Solbrig H, Visweswaran S, Walden A, Walters KM, Weber GM, Zhang XT, Zhu RL, Amor B, Girvin AT, Manna A, Qureshi N, Kurilla MG, Michael SG, Portilla LM, Rutter JL, Austin CP, Gersing KR. The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment. J Am Med Inform Assoc 2021; 28:427-443. [PMID: 32805036 PMCID: PMC7454687 DOI: 10.1093/jamia/ocaa196] [Citation(s) in RCA: 373] [Impact Index Per Article: 93.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 08/14/2020] [Indexed: 01/12/2023] Open
Abstract
OBJECTIVE Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. MATERIALS AND METHODS The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. RESULTS Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. CONCLUSIONS The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.
Collapse
Affiliation(s)
- Melissa A Haendel
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA
- Translational and Integrative Sciences Center, Department of Molecular Toxicology, Oregon State University, Corvallis, Oregon, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA
| | - Tellen D Bennett
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, Colorado, USA
| | - David A Eichmann
- School of Library and Information Science, The University of Iowa, Iowa City, Iowa, USA
| | | | | | - Philip R O Payne
- Institute for Informatics, Washington University in St. Louis, Saint Louis,Missouri, USA
| | - Emily R Pfaff
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | | | - Joel H Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, New York, USA
| | - Heidi Spratt
- University of Texas Medical Branch, Galveston, Texas, USA
| | | | | | | | - Andrew E Williams
- Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston,Massachusetts, USA
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, California, USA
| | - Clair Blacketer
- Janssen Research and Development, LLC, Raritan, New Jersey, USA
| | - Robert L Bradford
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - James J Cimino
- University of Alabama-Birmingham, Birmingham, Alabama, USA
| | - Marshall Clark
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - Evan W Colmenares
- Department of Pharmaceutical Outcomes and Policy, University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | | | - Davera Gabriel
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Alexis Graves
- University of Iowa Institute for Clinical and Translational Science, The University of Iowa, Iowa City, Iowa, USA
| | - Raju Hemadri
- National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Stephanie S Hong
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - George Hripscak
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Dazhi Jiao
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | | | - Adam M Lee
- University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - Harold P Lehmann
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | - Robert T Miller
- Tufts Clinical and Translational Science Institute, Tufts University, Boston,Massachusetts, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh,Pennsylvania, USA
| | | | | | | | - Usman Sheikh
- National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Harold Solbrig
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh,Pennsylvania, USA
| | - Anita Walden
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA
- Sage Bionetworks, Seattle, Washington, USA
| | - Kellie M Walters
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill,North Carolina, USA
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston,Massachusetts, USA
| | | | - Richard L Zhu
- Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | | | | | - Amin Manna
- Palantir Technologies, Palo Alto, California, USA
| | | | - Michael G Kurilla
- Division of Clinical Innovation, National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Sam G Michael
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | - Lili M Portilla
- Office of Strategic Alliances, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | - Joni L Rutter
- Office of the Director, National Center for Advancing Translational Science, Bethesda, Maryland, USA
| | - Christopher P Austin
- National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, Maryland, USA
| | - Ken R Gersing
- National Center for Advancing Translational Science, Bethesda, Maryland, USA
| |
Collapse
|
9
|
Zayas-Cabán T, Wald JS. Opportunities for the use of health information technology to support research. JAMIA Open 2020; 3:321-325. [PMID: 34541462 PMCID: PMC7660961 DOI: 10.1093/jamiaopen/ooaa037] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 06/12/2020] [Accepted: 07/27/2020] [Indexed: 01/28/2023] Open
Abstract
In the last decade, expanding use of health information technology (IT) across the United States has created opportunities for use of electronic health data for health services and biomedical research, but efforts may be hampered by limited data access, data quality, and system functionality. We identify five opportunities to advance the use of health IT for health services and biomedical research, which informed a federal government-led, collaborative effort to develop a relevant policy and development agenda. In particular, the health IT infrastructure should more effectively support the use of electronic health data for research; provide adaptable technologies; incorporate relevant research-related functionality; support patient and caregiver engagement in research; and support effective integration of knowledge into practice. While not exhaustive, these represent important opportunities that the biomedical and health informatics communities can pursue to better leverage health IT and electronic health data for research.
Collapse
Affiliation(s)
- Teresa Zayas-Cabán
- Office of the National Coordinator for Health Information Technology, Washington, DC, USA
| | | |
Collapse
|
10
|
Campion TR, Craven CK, Dorr DA, Knosp BM. Understanding enterprise data warehouses to support clinical and translational research. J Am Med Inform Assoc 2020; 27:1352-1358. [PMID: 32679585 PMCID: PMC7647350 DOI: 10.1093/jamia/ocaa089] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 04/24/2020] [Accepted: 05/12/2020] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE Among National Institutes of Health Clinical and Translational Science Award (CTSA) hubs, adoption of electronic data warehouses for research (EDW4R) containing data from electronic health record systems is nearly ubiquitous. Although benefits of EDW4R include more effective, efficient support of scientists, little is known about how CTSA hubs have implemented EDW4R services. The goal of this qualitative study was to understand the ways in which CTSA hubs have operationalized EDW4R to support clinical and translational researchers. MATERIALS AND METHODS After conducting semistructured interviews with informatics leaders from 20 CTSA hubs, we performed a directed content analysis of interview notes informed by naturalistic inquiry. RESULTS We identified 12 themes: organization and data; oversight and governance; data access request process; data access modalities; data access for users with different skill sets; engagement, communication, and literacy; service management coordinated with enterprise information technology; service management coordinated within a CTSA hub; service management coordinated between informatics and biostatistics; funding approaches; performance metrics; and future trends and current technology challenges. DISCUSSION This study is a step in developing an improved understanding and creating a common vocabulary about EDW4R operations across institutions. Findings indicate an opportunity for establishing best practices for EDW4R operations in academic medicine. Such guidance could reduce the costs associated with developing an EDW4R by establishing a clear roadmap and maturity path for institutions to follow. CONCLUSIONS CTSA hubs described varying approaches to EDW4R operations that may assist other institutions in better serving investigators with electronic patient data.
Collapse
Affiliation(s)
- Thomas R Campion
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Catherine K Craven
- Institute for Health Care Delivery Science, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - David A Dorr
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon, USA
| | - Boyd M Knosp
- Institute for Clinical and Translational Science, Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
11
|
Thompson CA, Jin A, Luft HS, Lichtensztajn DY, Allen L, Liang SY, Schumacher BT, Gomez SL. Population-Based Registry Linkages to Improve Validity of Electronic Health Record-Based Cancer Research. Cancer Epidemiol Biomarkers Prev 2020; 29:796-806. [PMID: 32066621 DOI: 10.1158/1055-9965.epi-19-0882] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Revised: 11/01/2019] [Accepted: 02/12/2020] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND There is tremendous potential to leverage the value gained from integrating electronic health records (EHR) and population-based cancer registry data for research. Registries provide diagnosis details, tumor characteristics, and treatment summaries, while EHRs contain rich clinical detail. A carefully conducted cancer registry linkage may also be used to improve the internal and external validity of inferences made from EHR-based studies. METHODS We linked the EHRs of a large, multispecialty, mixed-payer health care system with the statewide cancer registry and assessed the validity of our linked population. For internal validity, we identify patients that might be "missed" in a linkage, threatening the internal validity of an EHR study population. For generalizability, we compared linked cases with all other cancer patients in the 22-county EHR catchment region. RESULTS From an EHR population of 4.5 million, we identified 306,554 patients with cancer, 26% of the catchment region patients with cancer; 22.7% of linked patients were diagnosed with cancer after they migrated away from our health care system highlighting an advantage of system-wide linkage. We observed demographic differences between EHR patients and non-EHR patients in the surrounding region and demonstrated use of selection probabilities with model-based standardization to improve generalizability. CONCLUSIONS Our experiences set the foundation to encourage and inform researchers interested in working with EHRs for cancer research as well as provide context for leveraging linkages to assess and improve validity and generalizability. IMPACT Researchers conducting linkages may benefit from considering one or more of these approaches to establish and evaluate the validity of their EHR-based populations.See all articles in this CEBP Focus section, "Modernizing Population Science."
Collapse
Affiliation(s)
- Caroline A Thompson
- School of Public Health, San Diego State University, San Diego, California.
- Sutter Health Palo Alto Medical Foundation Research Institute, Palo Alto, California
- University of California San Diego School of Medicine, San Diego, California
| | - Anqi Jin
- Sutter Health Palo Alto Medical Foundation Research Institute, Palo Alto, California
| | - Harold S Luft
- Sutter Health Palo Alto Medical Foundation Research Institute, Palo Alto, California
| | - Daphne Y Lichtensztajn
- Greater Bay Area Cancer Registry, Department of Epidemiology & Biostatistics, University of California San Francisco School of Medicine, San Francisco, California
- Department of Epidemiology & Biostatistics, University of California San Francisco School of Medicine, San Francisco, California
| | - Laura Allen
- Greater Bay Area Cancer Registry, Department of Epidemiology & Biostatistics, University of California San Francisco School of Medicine, San Francisco, California
- Department of Epidemiology & Biostatistics, University of California San Francisco School of Medicine, San Francisco, California
| | - Su-Ying Liang
- Sutter Health Palo Alto Medical Foundation Research Institute, Palo Alto, California
| | - Benjamin T Schumacher
- School of Public Health, San Diego State University, San Diego, California
- University of California San Diego School of Medicine, San Diego, California
| | - Scarlett Lin Gomez
- Greater Bay Area Cancer Registry, Department of Epidemiology & Biostatistics, University of California San Francisco School of Medicine, San Francisco, California
- Department of Epidemiology & Biostatistics, University of California San Francisco School of Medicine, San Francisco, California
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California
| |
Collapse
|
12
|
Gotz D, Zhang J, Wang W, Shrestha J, Borland D. Visual Analysis of High-Dimensional Event Sequence Data via Dynamic Hierarchical Aggregation. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:440-450. [PMID: 31443007 DOI: 10.1109/tvcg.2019.2934661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Temporal event data are collected across a broad range of domains, and a variety of visual analytics techniques have been developed to empower analysts working with this form of data. These techniques generally display aggregate statistics computed over sets of event sequences that share common patterns. Such techniques are often hindered, however, by the high-dimensionality of many real-world event sequence datasets which can prevent effective aggregation. A common coping strategy for this challenge is to group event types together prior to visualization, as a pre-process, so that each group can be represented within an analysis as a single event type. However, computing these event groupings as a pre-process also places significant constraints on the analysis. This paper presents a new visual analytics approach for dynamic hierarchical dimension aggregation. The approach leverages a predefined hierarchy of dimensions to computationally quantify the informativeness, with respect to a measure of interest, of alternative levels of grouping within the hierarchy at runtime. This information is then interactively visualized, enabling users to dynamically explore the hierarchy to select the most appropriate level of grouping to use at any individual step within an analysis. Key contributions include an algorithm for interactively determining the most informative set of event groupings for a specific analysis context, and a scented scatter-plus-focus visualization design with an optimization-based layout algorithm that supports interactive hierarchical exploration of alternative event type groupings. We apply these techniques to high-dimensional event sequence data from the medical domain and report findings from domain expert interviews.
Collapse
|
13
|
Lynch KE, Deppen SA, DuVall SL, Viernes B, Cao A, Park D, Hanchrow E, Hewa K, Greaves P, Matheny ME. Incrementally Transforming Electronic Medical Records into the Observational Medical Outcomes Partnership Common Data Model: A Multidimensional Quality Assurance Approach. Appl Clin Inform 2019; 10:794-803. [PMID: 31645076 DOI: 10.1055/s-0039-1697598] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
BACKGROUND The development and adoption of health care common data models (CDMs) has addressed some of the logistical challenges of performing research on data generated from disparate health care systems by standardizing data representations and leveraging standardized terminology to express clinical information consistently. However, transforming a data system into a CDM is not a trivial task, and maintaining an operational, enterprise capable CDM that is incrementally updated within a data warehouse is challenging. OBJECTIVES To develop a quality assurance (QA) process and code base to accompany our incremental transformation of the Department of Veterans Affairs Corporate Data Warehouse health care database into the Observational Medical Outcomes Partnership (OMOP) CDM to prevent incremental load errors. METHODS We designed and implemented a multistage QA) approach centered on completeness, value conformance, and relational conformance data-quality elements. For each element we describe key incremental load challenges, our extract, transform, and load (ETL) solution of data to overcome those challenges, and potential impacts of incremental load failure. RESULTS Completeness and value conformance data-quality elements are most affected by incremental changes to the CDW, while updates to source identifiers impact relational conformance. ETL failures surrounding these elements lead to incomplete and inaccurate capture of clinical concepts as well as data fragmentation across patients, providers, and locations. CONCLUSION Development of robust QA processes supporting accurate transformation of OMOP and other CDMs from source data is still in evolution, and opportunities exist to extend the existing QA framework and tools used for incremental ETL QA processes.
Collapse
Affiliation(s)
- Kristine E Lynch
- VA Salt Lake City Health Care System, Salt Lake City, Utah, United States.,Department of Internal Medicine, Division of Epidemiology, University of Utah, Salt Lake City, Utah, United States
| | - Stephen A Deppen
- Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Scott L DuVall
- VA Salt Lake City Health Care System, Salt Lake City, Utah, United States.,Department of Internal Medicine, Division of Epidemiology, University of Utah, Salt Lake City, Utah, United States
| | - Benjamin Viernes
- VA Salt Lake City Health Care System, Salt Lake City, Utah, United States.,Department of Internal Medicine, Division of Epidemiology, University of Utah, Salt Lake City, Utah, United States
| | - Aize Cao
- Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Daniel Park
- Tennessee Valley Healthcare System, Nashville, Tennessee, United States
| | - Elizabeth Hanchrow
- Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Kushan Hewa
- Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Peter Greaves
- Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Michael E Matheny
- Vanderbilt University Medical Center, Nashville, Tennessee, United States.,Tennessee Valley Healthcare System, Nashville, Tennessee, United States
| |
Collapse
|
14
|
Li B, Li J, Jiang Y, Lan X. Experience and reflection from China's Xiangya medical big data project. J Biomed Inform 2019; 93:103149. [PMID: 30878618 DOI: 10.1016/j.jbi.2019.103149] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 02/13/2019] [Accepted: 03/07/2019] [Indexed: 01/16/2023]
Abstract
The construction of medical big data includes several problems that need to be solved, such as integration and data sharing of many heterogeneous information systems, efficient processing and analysis of large-scale medical data with complex structure or low degree of structure, and narrow application range of medical data. Therefore, medical big data construction is not only a simple collection and application of medical data but also a complex systematic project. This paper introduces China's experience in the construction of a regional medical big data ecosystem, including the overall goal of the project; establishment of policies to encourage data sharing; handling the relationship between personal privacy, information security, and information availability; establishing a cooperation mechanism between agencies; designing a polycentric medical data acquisition system; and establishing a large data centre. From the experience gained from one of China's earliest established medical big data projects, we outline the challenges encountered during its development and recommend approaches to overcome these challenges to design medical big data projects in China more rationally. Clear and complete top-level design of a project requires to be planned in advance and considered carefully. It is essential to provide a culture of information sharing and to facilitate the opening of data, and changes in ideas and policies need the guidance of the government. The contradiction between data sharing and data security must be handled carefully, that is not to say data openness could be abandoned. The construction of medical big data involves many institutions, and high-level management and cooperation can significantly improve efficiency and promote innovation. Compared with infrastructure construction, it is more challenging and time-consuming to develop appropriate data standards, data integration tools and data mining tools.
Collapse
Affiliation(s)
- Bei Li
- Department of Medical Information, Information Security and Big Data Institute, Central South University, Changsha 410013, Hunan, China.
| | - Jianbin Li
- Department of Medical Information, Information Security and Big Data Institute, Central South University, Changsha 410013, Hunan, China; North China Electric Power University, Beijing, China.
| | - Yuqiao Jiang
- Department of Medical Information, Information Security and Big Data Institute, Central South University, Changsha 410013, Hunan, China
| | - Xiaoyun Lan
- Department of Medical Information, Information Security and Big Data Institute, Central South University, Changsha 410013, Hunan, China
| |
Collapse
|
15
|
Delvaux N, Aertgeerts B, van Bussel JC, Goderis G, Vaes B, Vermandere M. Health Data for Research Through a Nationwide Privacy-Proof System in Belgium: Design and Implementation. JMIR Med Inform 2018; 6:e11428. [PMID: 30455164 PMCID: PMC6300317 DOI: 10.2196/11428] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 09/03/2018] [Accepted: 09/04/2018] [Indexed: 01/19/2023] Open
Abstract
Background Health data collected during routine care have important potential for reuse for other purposes, especially as part of a learning health system to advance the quality of care. Many sources of bias have been identified through the lifecycle of health data that could compromise the scientific integrity of these data. New data protection legislation requires research facilities to improve safety measures and, thus, ensure privacy. Objective This study aims to address the question on how health data can be transferred from various sources and using multiple systems to a centralized platform, called Healthdata.be, while ensuring the accuracy, validity, safety, and privacy. In addition, the study demonstrates how these processes can be used in various research designs relevant for learning health systems. Methods The Healthdata.be platform urges uniformity of the data registration at the primary source through the use of detailed clinical models. Data retrieval and transfer are organized through end-to-end encrypted electronic health channels, and data are encoded using token keys. In addition, patient identifiers are pseudonymized so that health data from the same patient collected across various sources can still be linked without compromising the deidentification. Results The Healthdata.be platform currently collects data for >150 clinical registries in Belgium. We demonstrated how the data collection for the Belgian primary care morbidity register INTEGO is organized and how the Healthdata.be platform can be used for a cluster randomized trial. Conclusions Collecting health data in various sources and linking these data to a single patient is a promising feature that can potentially address important concerns on the validity and quality of health data. Safe methods of data transfer without compromising privacy are capable of transporting these data from the primary data provider or clinician to a research facility. More research is required to demonstrate that these methods improve the quality of data collection, allowing researchers to rely on electronic health records as a valid source for scientific data.
Collapse
Affiliation(s)
- Nicolas Delvaux
- Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
| | - Bert Aertgeerts
- Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
| | | | - Geert Goderis
- Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
| | - Bert Vaes
- Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
| | - Mieke Vermandere
- Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
| |
Collapse
|
16
|
Verheij RA, Curcin V, Delaney BC, McGilchrist MM. Possible Sources of Bias in Primary Care Electronic Health Record Data Use and Reuse. J Med Internet Res 2018; 20:e185. [PMID: 29844010 PMCID: PMC5997930 DOI: 10.2196/jmir.9134] [Citation(s) in RCA: 183] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 02/11/2018] [Accepted: 03/01/2018] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Enormous amounts of data are recorded routinely in health care as part of the care process, primarily for managing individual patient care. There are significant opportunities to use these data for other purposes, many of which would contribute to establishing a learning health system. This is particularly true for data recorded in primary care settings, as in many countries, these are the first place patients turn to for most health problems. OBJECTIVE In this paper, we discuss whether data that are recorded routinely as part of the health care process in primary care are actually fit to use for other purposes such as research and quality of health care indicators, how the original purpose may affect the extent to which the data are fit for another purpose, and the mechanisms behind these effects. In doing so, we want to identify possible sources of bias that are relevant for the use and reuse of these type of data. METHODS This paper is based on the authors' experience as users of electronic health records data, as general practitioners, health informatics experts, and health services researchers. It is a product of the discussions they had during the Translational Research and Patient Safety in Europe (TRANSFoRm) project, which was funded by the European Commission and sought to develop, pilot, and evaluate a core information architecture for the learning health system in Europe, based on primary care electronic health records. RESULTS We first describe the different stages in the processing of electronic health record data, as well as the different purposes for which these data are used. Given the different data processing steps and purposes, we then discuss the possible mechanisms for each individual data processing step that can generate biased outcomes. We identified 13 possible sources of bias. Four of them are related to the organization of a health care system, whereas some are of a more technical nature. CONCLUSIONS There are a substantial number of possible sources of bias; very little is known about the size and direction of their impact. However, anyone that uses or reuses data that were recorded as part of the health care process (such as researchers and clinicians) should be aware of the associated data collection process and environmental influences that can affect the quality of the data. Our stepwise, actor- and purpose-oriented approach may help to identify these possible sources of bias. Unless data quality issues are better understood and unless adequate controls are embedded throughout the data lifecycle, data-driven health care will not live up to its expectations. We need a data quality research agenda to devise the appropriate instruments needed to assess the magnitude of each of the possible sources of bias, and then start measuring their impact. The possible sources of bias described in this paper serve as a starting point for this research agenda.
Collapse
Affiliation(s)
- Robert A Verheij
- Netherlands Institute for Health Services Research, Utrecht, Netherlands
| | - Vasa Curcin
- King's College London, London, United Kingdom
| | - Brendan C Delaney
- Imperial College London, Imperial College Business School, London, United Kingdom
| | - Mark M McGilchrist
- University of Dundee, Department of Public Health Sciences, Dundee, United Kingdom
| |
Collapse
|
17
|
Richesson RL, Green BB, Laws R, Puro J, Kahn MG, Bauck A, Smerek M, Van Eaton EG, Zozus M, Hammond WE, Stephens KA, Simon GE. Pragmatic (trial) informatics: a perspective from the NIH Health Care Systems Research Collaboratory. J Am Med Inform Assoc 2018; 24:996-1001. [PMID: 28340241 DOI: 10.1093/jamia/ocx016] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2016] [Accepted: 02/14/2017] [Indexed: 11/14/2022] Open
Abstract
Pragmatic clinical trials (PCTs) are research investigations embedded in health care settings designed to increase the efficiency of research and its relevance to clinical practice. The Health Care Systems Research Collaboratory, initiated by the National Institutes of Health Common Fund in 2010, is a pioneering cooperative aimed at identifying and overcoming operational challenges to pragmatic research. Drawing from our experience, we present 4 broad categories of informatics-related challenges: (1) using clinical data for research, (2) integrating data from heterogeneous systems, (3) using electronic health records to support intervention delivery or health system change, and (4) assessing and improving data capture to define study populations and outcomes. These challenges impact the validity, reliability, and integrity of PCTs. Achieving the full potential of PCTs and a learning health system will require meaningful partnerships between health system leadership and operations, and federally driven standards and policies to ensure that future electronic health record systems have the flexibility to support research.
Collapse
Affiliation(s)
- Rachel L Richesson
- Division of Clinical Systems and Analytics, Duke University School of Nursing, Durham, NC, USA.,Duke Center for Health Informatics, Durham, NC, USA
| | - Beverly B Green
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Reesa Laws
- Kaiser Permanente Center for Health Research, Portland, OR, USA
| | | | - Michael G Kahn
- Department of Pediatrics, University of Colorado, Denver, CO, USA
| | - Alan Bauck
- Kaiser Permanente Center for Health Research, Portland, OR, USA
| | - Michelle Smerek
- Clinical Research Informatics, Duke Clinical Research Institute, Durham, NC, USA
| | - Erik G Van Eaton
- Department of Surgery, University of Washington, Seattle, WA, USA
| | - Meredith Zozus
- University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - W Ed Hammond
- Duke Center for Health Informatics, Durham, NC, USA
| | - Kari A Stephens
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Greg E Simon
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| |
Collapse
|
18
|
Lynch KE, Whitcomb BW, DuVall SL. How Confounder Strength Can Affect Allocation of Resources in Electronic Health Records. PERSPECTIVES IN HEALTH INFORMATION MANAGEMENT 2018; 15:1d. [PMID: 29618960 PMCID: PMC5869441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
When electronic health record (EHR) data are used, multiple approaches may be available for measuring the same variable, introducing potentially confounding factors. While additional information may be gleaned and residual confounding reduced through resource-intensive assessment methods such as natural language processing (NLP), whether the added benefits offset the added cost of the additional resources is not straightforward. We evaluated the implications of misclassification of a confounder when using EHRs. Using a combination of simulations and real data surrounding hospital readmission, we considered smoking as a potential confounder. We compared ICD-9 diagnostic code assignment, which is an easily available measure but has the possibility of substantial misclassification of smoking status, with NLP, a method of determining smoking status that more expensive and time-consuming than ICD-9 code assignment but has less potential for misclassification. Classification of smoking status with NLP consistently produced less residual confounding than the use of ICD-9 codes; however, when minimal confounding was present, differences between the approaches were small. When considerable confounding is present, investing in a superior measurement tool becomes advantageous.
Collapse
Affiliation(s)
| | | | - Scott L DuVall
- VA Salt Lake City Health Care System in Salt Lake City, UT
| |
Collapse
|
19
|
Kennell TI, Willig JH, Cimino JJ. Clinical Informatics Researcher's Desiderata for the Data Content of the Next Generation Electronic Health Record. Appl Clin Inform 2017; 8:1159-1172. [PMID: 29270955 DOI: 10.4338/aci-2017-06-r-0101] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVE Clinical informatics researchers depend on the availability of high-quality data from the electronic health record (EHR) to design and implement new methods and systems for clinical practice and research. However, these data are frequently unavailable or present in a format that requires substantial revision. This article reports the results of a review of informatics literature published from 2010 to 2016 that addresses these issues by identifying categories of data content that might be included or revised in the EHR. MATERIALS AND METHODS We used an iterative review process on 1,215 biomedical informatics research articles. We placed them into generic categories, reviewed and refined the categories, and then assigned additional articles, for a total of three iterations. RESULTS Our process identified eight categories of data content issues: Adverse Events, Clinician Cognitive Processes, Data Standards Creation and Data Communication, Genomics, Medication List Data Capture, Patient Preferences, Patient-reported Data, and Phenotyping. DISCUSSION These categories summarize discussions in biomedical informatics literature that concern data content issues restricting clinical informatics research. These barriers to research result from data that are either absent from the EHR or are inadequate (e.g., in narrative text form) for the downstream applications of the data. In light of these categories, we discuss changes to EHR data storage that should be considered in the redesign of EHRs, to promote continued innovation in clinical informatics. CONCLUSION Based on published literature of clinical informaticians' reuse of EHR data, we characterize eight types of data content that, if included in the next generation of EHRs, would find immediate application in advanced informatics tools and techniques.
Collapse
Affiliation(s)
- Timothy I Kennell
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James H Willig
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| | - James J Cimino
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States.,Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, United States
| |
Collapse
|
20
|
Lingren T, Sadhasivam S, Zhang X, Marsolo K. Electronic medical records as a replacement for prospective research data collection in postoperative pain and opioid response studies. Int J Med Inform 2017; 111:45-50. [PMID: 29425633 DOI: 10.1016/j.ijmedinf.2017.12.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Revised: 09/21/2017] [Accepted: 12/16/2017] [Indexed: 11/30/2022]
Abstract
BACKGROUND AND AIM Many clinical research studies claim to collect data that are also captured in the electronic medical record (EMR). We evaluate the potential for EMR data to replace prospective research data collection. METHODS Using a dataset of 358 surgical patients enrolled in a prospective study, we examined the completeness and agreement of EMR and study entries for several variables, including the patient's stay in the post-operative care unit (PACU), surgical pain relief and pain medication side effects. RESULTS For all variables with a completeness percentage, values were greater than 96%. For the adverse event variables, we found slight to substantial agreement (Cohen's kappa), ranging from 0.19 (nausea) to 0.48 (respiratory depression) to 0.73 (emesis). CONCLUSION The potential to use EMR data as a replacement for prospective research data collection shows promise, but for now, should be evaluated on a variable-by-variable basis.
Collapse
Affiliation(s)
- Todd Lingren
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA
| | | | - Xue Zhang
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA
| | - Keith Marsolo
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, USA.
| |
Collapse
|
21
|
Abstract
Context: The High Value Healthcare Collaborative (HVHC) sepsis project was a two-year multi-site project where Member health care delivery systems worked on improving sepsis care using a dissemination & implementation framework designed by HVHC. As part of the project evaluation, participating Members provided 5 data submissions over the project period. Members created data files using a uniform specification, but the data sources and methods used to create the data sets differed. Extensive data cleaning was necessary to get a data set usable for the evaluation analysis. Case Description: HVHC was the coordinating center for the project and received and cleaned all data submissions. Submissions received 3 sequentially more detailed levels of checking by HVHC. The most detailed level evaluated validity by comparing values within-Member over time and between Member. For a subset of episodes Member-submitted data were compared to matched Medicare claims data. Findings: Inconsistencies in data submissions, particularly for length-of-stay variables were common in early submissions and decreased with subsequent submissions. Multiple resubmissions were sometimes required to get clean data. Data checking also uncovered a systematic difference in the way Medicare and some members defined intensive care unit stay. Conclusions: Data checking is a critical for ensuring valid analytic results for projects using electronic health record data. It is important to budget sufficient resources for data checking. Interim data submissions and checks help find anomalies early. Data resubmissions should be checked as fixes can introduce new errors. Communicating with those responsible for creating the data set provides critical information.
Collapse
|
22
|
Meystre SM, Lovis C, Bürkle T, Tognola G, Budrionis A, Lehmann CU. Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress. Yearb Med Inform 2017; 26:38-52. [PMID: 28480475 PMCID: PMC6239225 DOI: 10.15265/iy-2017-007] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Indexed: 12/30/2022] Open
Abstract
Objective: To perform a review of recent research in clinical data reuse or secondary use, and envision future advances in this field. Methods: The review is based on a large literature search in MEDLINE (through PubMed), conference proceedings, and the ACM Digital Library, focusing only on research published between 2005 and early 2016. Each selected publication was reviewed by the authors, and a structured analysis and summarization of its content was developed. Results: The initial search produced 359 publications, reduced after a manual examination of abstracts and full publications. The following aspects of clinical data reuse are discussed: motivations and challenges, privacy and ethical concerns, data integration and interoperability, data models and terminologies, unstructured data reuse, structured data mining, clinical practice and research integration, and examples of clinical data reuse (quality measurement and learning healthcare systems). Conclusion: Reuse of clinical data is a fast-growing field recognized as essential to realize the potentials for high quality healthcare, improved healthcare management, reduced healthcare costs, population health management, and effective clinical research.
Collapse
Affiliation(s)
- S. M. Meystre
- Medical University of South Carolina, Charleston, SC, USA
| | - C. Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Switzerland
| | - T. Bürkle
- University of Applied Sciences, Bern, Switzerland
| | - G. Tognola
- Institute of Electronics, Computer and Telecommunication Engineering, Italian Natl. Research Council IEIIT-CNR, Milan, Italy
| | - A. Budrionis
- Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway
| | - C. U. Lehmann
- Departments of Biomedical Informatics and Pediatrics, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
23
|
Yen PY, Lara B, Lopetegui M, Bharat A, Ardoin S, Johnson B, Mathur P, Embi PJ, Curtis JR. Usability and Workflow Evaluation of "RhEumAtic Disease activitY" (READY). A Mobile Application for Rheumatology Patients and Providers. Appl Clin Inform 2016; 7:1007-1024. [PMID: 27803949 DOI: 10.4338/aci-2016-03-ra-0036] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Accepted: 09/19/2016] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND RhEumAtic Disease activitY (READY) is a mobile health (mHealth) application that aims to create a shared platform integrating data from both patients and physicians, with a particular emphasis on arthritis disease activity. METHODS We made READY available on an iPad and pilot implemented it at a rheumatology outpatient clinic. We conducted 1) a usability evaluation study to explore patients' and physicians' interactions with READY, and 2) a time motion study (TMS) to observe the clinical workflow before and after the implementation. RESULTS A total of 33 patients and 15 physicians participated in the usability evaluation. We found usability problems in navigation, data entry, pain assessment, documentation, and instructions along with error messages. Despite these issues, 25 (75,76%) patients reported they liked READY. Physicians provided mixed feedback because they were concerned about the impact of READY on clinical workflow. Six physicians participated in the TMS. We observed 47 patient visits (44.72 hours) in the pre-implementation phase, and 42 patient visits (37.82 hours) in the post-implementation phase. We found that patients spent more time on READY than paper (4.39mins vs. 2.26mins), but overall, READY did not delay the workflow (pre = 52.08 mins vs. post = 45.46 mins). This time difference may be compensated with READY eliminating a workflow step for the staff. CONCLUSION Patients preferred READY to paper documents. Many found it easier to input information because of the larger font size and the ease of 'tapping' rather than writing-out or circling answers. Even though patients spent more time on READY than using paper documents, the longer usage of READY was mainly due to when troubleshooting was needed. Most patients did not have problems after receiving initial support from the staff. This study not only enabled improvements to the software but also serves as good reference for other researchers or institutional decision makers who are interested in implementing such a technology.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Jeffrey R Curtis
- Jeffrey R. Curtis, MD, MS, MPH, University of Alabama at Birmingham, Division of Clinical Immunology and Rheumatology, 510 20th Street South, FOT 802D Birmingham AL 35294, Tel. 205-975-2176, E-mail:
| |
Collapse
|
24
|
Cole AM, Stephens KA, Keppel GA, Estiri H, Baldwin LM. Extracting Electronic Health Record Data in a Practice-Based Research Network: Processes to Support Translational Research across Diverse Practice Organizations. EGEMS 2016; 4:1206. [PMID: 27141519 PMCID: PMC4827782 DOI: 10.13063/2327-9214.1206] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Context: The widespread adoption of electronic health records (EHRs) offers significant opportunities to conduct research with clinical data from patients outside traditional academic research settings. Because EHRs are designed primarily for clinical care and billing, significant challenges are inherent in the use of EHR data for clinical and translational research. Efficient processes are needed for translational researchers to overcome these challenges. The Data QUEST Coordinating Center (DQCC), which oversees Data Query Extraction Standardization Translation (Data QUEST) – a primary-care, EHR data-sharing infrastructure – created processes that guide EHR data extraction for clinical and translational research across these diverse practices. We describe these processes and their application in a case example. Case Description: The DQCC process for developing EHR data extractions not only supports researchers’ access to EHR data, but supports this access for the purpose of answering scientific questions. This process requires complex coordination across multiple domains, including the following: (1) understanding the context of EHR data; (2) creating and maintaining a governance structure to support exchange of EHR data; and (3) defining data parameters that are used in order to extract data from the EHR. We use the Northwest-Alaska Pharmacogenomics Research Network (NWA-PGRN) as a case example that focuses on pharmacogenomic discovery and clinical applications to describe the DQCC process. The NWA-PGRN collaborates with Data QUEST to explore ways to leverage primary-care EHR data to support pharmacogenomics research. Findings: Preliminary analysis on the case example shows that initial decisions about how researchers define the study population can influence study outcomes. Major Themes and Conclusions: The experience of the DQCC demonstrates that coordinating centers provide expertise in helping researchers understand the context of EHR data, create and maintain governance structures, and guide the definition of parameters for data extractions. This expertise is critical to supporting research with EHR data. Replication of these strategies through coordinating centers may lead to more efficient translational research. Investigators must also consider the impact of initial decisions in defining study groups that may potentially affect outcomes.
Collapse
Affiliation(s)
- Allison M Cole
- University of Washington, Institute of Translational Health Sciences
| | - Kari A Stephens
- University of Washington, Institute of Translational Health Sciences
| | - Gina A Keppel
- University of Washington, Institute of Translational Health Sciences
| | - Hossein Estiri
- University of Washington, Institute of Translational Health Sciences
| | - Laura-Mae Baldwin
- University of Washington, Institute of Translational Health Sciences
| |
Collapse
|
25
|
Reimer AP, Milinovich A, Madigan EA. Data quality assessment framework to assess electronic medical record data for use in research. Int J Med Inform 2016; 90:40-7. [PMID: 27103196 DOI: 10.1016/j.ijmedinf.2016.03.006] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Revised: 03/07/2016] [Accepted: 03/18/2016] [Indexed: 11/16/2022]
Abstract
INTRODUCTION The proliferation and use of electronic medical records (EMR) in the clinical setting now provide a rich source of clinical data that can be leveraged to support research on patient outcomes, comparative effectiveness, and health systems research. Once the large volume and variety of data that robust clinical EMRs provide is aggregated, the suitability of the data for research purposes must be addressed. Therefore, the purpose of this paper is two-fold. First, we present a stepwise framework capable of guiding initial data quality assessment when matching multiple data sources regardless of context or application. Then, we demonstrate a use case of initial analysis of a longitudinal data repository of electronic health record data that illustrates the first four steps of the framework, and report results. METHODS A six-step data quality assessment framework is proposed and described that includes the following data quality assessment steps: (1) preliminary analysis, (2) documentation-longitudinal concordance, (3) breadth, (4) data element presence, (5) density, and (6) prediction. The six-step framework was applied to the Transport Data Mart-a data repository that contains over 28,000 records for patients that underwent interhospital transfer that includes EMRs from the sending hospitalization, transport, and receiving hospitalization. RESULTS There were a total of 9557 log entries of which 8139 were successfully matched to corresponding hospital encounters. 2832 were successfully mapped to both the sending and receiving hospital encounters (resulting in a 93% automatic matching rate), with 590 including air medical transport EMR data representing a complete case for testing. Results from Step 2 indicate that once records are identified and matched, there appears to be relatively limited drop-off of additional records when the criteria for matching increases, indicating the a proportion of records consistently contain nearly complete data. Measures of central tendency used in Step 3 and 4 exhibit a right skewness suggesting that a small proportion of records contain the highest number of repeated measures for the measured variables. CONCLUSIONS The proposed six-step data quality assessment framework is useful in establishing the metadata for a longitudinal data repository that can be replicated by other studies. There are practical issues that need to be addressed including the data quality assessments-with the most prescient being the need to establish data quality metrics for benchmarking acceptable levels of EMR data inclusiveness through testing and application.
Collapse
Affiliation(s)
- Andrew P Reimer
- Frances Payne Bolton School of Nursing, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106, United States; Cleveland Clinic, 10900 Euclid Avenue, Cleveland, OH 44195, United States.
| | - Alex Milinovich
- Cleveland Clinic, 10900 Euclid Avenue, Cleveland, OH 44195, United States.
| | - Elizabeth A Madigan
- Frances Payne Bolton School of Nursing, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106, United States.
| |
Collapse
|
26
|
Cohen B, Vawdrey DK, Liu J, Caplan D, Furuya EY, Mis FW, Larson E. Challenges Associated With Using Large Data Sets for Quality Assessment and Research in Clinical Settings. Policy Polit Nurs Pract 2015; 16:117-24. [PMID: 26351216 DOI: 10.1177/1527154415603358] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
The rapidly expanding use of electronic records in health-care settings is generating unprecedented quantities of data available for clinical, epidemiological, and cost-effectiveness research. Several challenges are associated with using these data for clinical research, including issues surrounding access and information security, poor data quality, inconsistency of data within and across institutions, and a paucity of staff with expertise to manage and manipulate large clinical data sets. In this article, we describe our experience with assembling a data-mart and conducting clinical research using electronic data from four facilities within a single hospital network in New York City. We culled data from several electronic sources, including the institution's admission-discharge-transfer system, cost accounting system, electronic health record, clinical data warehouse, and departmental records. The final data-mart contained information for more than 760,000 discharges occurring from 2006 through 2012. Using categories identified by the National Institutes of Health Big Data to Knowledge initiative as a framework, we outlined challenges encountered during the development and use of a domain-specific data-mart and recommend approaches to overcome these challenges.
Collapse
Affiliation(s)
- Bevin Cohen
- Columbia University School of Nursing, New York, NY, USA
| | - David K Vawdrey
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Jianfang Liu
- Columbia University School of Nursing, New York, NY, USA
| | - David Caplan
- Department of Information Services, New York-Presbyterian Hospital, New York, NY, USA
| | - E Yoko Furuya
- Department of Medicine, Columbia University, New York, NY, USA
| | - Frederick W Mis
- Department of Information Services, New York-Presbyterian Hospital, New York, NY, USA
| | - Elaine Larson
- Columbia University School of Nursing, New York, NY, USA
| |
Collapse
|
27
|
O'Malley AS, Rich EC, Maccarone A, DesRoches CM, Reid RJ. Disentangling the Linkage of Primary Care Features to Patient Outcomes: A Review of Current Literature, Data Sources, and Measurement Needs. J Gen Intern Med 2015; 30 Suppl 3:S576-85. [PMID: 26105671 PMCID: PMC4512966 DOI: 10.1007/s11606-015-3311-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Primary care plays a central role in the provision of health care, and is an organizing feature for health care delivery systems in most Western industrialized democracies. For a variety of reasons, however, the practice of primary care has been in decline in the U.S. This paper reviews key primary care concepts and their definitions, notes the increasingly complex interplay between primary care and the broader health care system, and offers research priorities to support future measurement, delivery and understanding of the role of primary care features on health care costs and quality.
Collapse
|
28
|
Hsu W, Gonzalez NR, Chien A, Pablo Villablanca J, Pajukanta P, Viñuela F, Bui AAT. An integrated, ontology-driven approach to constructing observational databases for research. J Biomed Inform 2015; 55:132-42. [PMID: 25817919 DOI: 10.1016/j.jbi.2015.03.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Revised: 02/14/2015] [Accepted: 03/19/2015] [Indexed: 11/28/2022]
Abstract
The electronic health record (EHR) contains a diverse set of clinical observations that are captured as part of routine care, but the incomplete, inconsistent, and sometimes incorrect nature of clinical data poses significant impediments for its secondary use in retrospective studies or comparative effectiveness research. In this work, we describe an ontology-driven approach for extracting and analyzing data from the patient record in a longitudinal and continuous manner. We demonstrate how the ontology helps enforce consistent data representation, integrates phenotypes generated through analyses of available clinical data sources, and facilitates subsequent studies to identify clinical predictors for an outcome of interest. Development and evaluation of our approach are described in the context of studying factors that influence intracranial aneurysm (ICA) growth and rupture. We report our experiences in capturing information on 78 individuals with a total of 120 aneurysms. Two example applications related to assessing the relationship between aneurysm size, growth, gene expression modules, and rupture are described. Our work highlights the challenges with respect to data quality, workflow, and analysis of data and its implications toward a learning health system paradigm.
Collapse
Affiliation(s)
- William Hsu
- Department of Radiological Sciences, UCLA David Geffen School of Medicine, Los Angeles, CA, United States.
| | - Nestor R Gonzalez
- Department of Radiological Sciences, UCLA David Geffen School of Medicine, Los Angeles, CA, United States; Department of Neurosurgery, UCLA David Geffen School of Medicine, Los Angeles, CA, United States
| | - Aichi Chien
- Department of Radiological Sciences, UCLA David Geffen School of Medicine, Los Angeles, CA, United States
| | - J Pablo Villablanca
- Department of Radiological Sciences, UCLA David Geffen School of Medicine, Los Angeles, CA, United States
| | - Päivi Pajukanta
- Department of Human Genetics, UCLA David Geffen School of Medicine, Los Angeles, CA, United States
| | - Fernando Viñuela
- Department of Radiological Sciences, UCLA David Geffen School of Medicine, Los Angeles, CA, United States
| | - Alex A T Bui
- Department of Radiological Sciences, UCLA David Geffen School of Medicine, Los Angeles, CA, United States
| |
Collapse
|
29
|
Priest EL, Klekar C, Cantu G, Berryman C, Garinger G, Hall L, Kouznetsova M, Kudyakov R, Masica A. Developing electronic data methods infrastructure to participate in collaborative research networks. EGEMS 2014; 2:1126. [PMID: 25848600 PMCID: PMC4371420 DOI: 10.13063/2327-9214.1126] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Context: Collaborative networks support the goals of a learning health system by sharing, aggregating, and analyzing data to facilitate identification of best practices care across delivery organizations. This case study describes the infrastructure and process developed by an integrated health delivery system to successfully prepare and submit a complex data set to a large national collaborative network. Case Description: We submitted four years of data for a diverse population of patients in specific clinical areas: diabetes, chronic heart failure, sepsis, and hip, knee, and spine. The most recent submission included 19 tables, more than 376,000 unique patients, and almost 5 million patient encounters. Data was extracted from multiple clinical and administrative systems. Lessons Learned: We found that a structured process with documentation was key to maintaining communication, timelines, and quality in a large-scale data submission to a national collaborative network. The three key components of this process were the experienced project team, documentation, and communication. We used a formal QA and feedback process to track and review data. Overall, the data submission was resource intensive and required an incremental approach to data quality. Conclusion: Participation in collaborative networks can be time and resource intense, however it can serve as a catalyst to increase the technical data available to the learning health system.
Collapse
|
30
|
Otero P, Hersh W, Jai Ganesh AU. Big Data: Are Biomedical and Health Informatics Training Programs Ready? Contribution of the IMIA Working Group for Health and Medical Informatics Education. Yearb Med Inform 2014; 9:177-81. [PMID: 25123740 DOI: 10.15265/iy-2014-0007] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVE The growing volume and diversity of health and biomedical data indicate that the era of Big Data has arrived for healthcare. This has many implications for informatics, not only in terms of implementing and evaluating information systems, but also for the work and training of informatics researchers and professionals. This article addresses the question: What do biomedical and health informaticians working in analytics and Big Data need to know? METHODS We hypothesize a set of skills that we hope will be discussed among academic and other informaticians. RESULTS The set of skills includes: Programming - especially with data-oriented tools, such as SQL and statistical programming languages; Statistics - working knowledge to apply tools and techniques; Domain knowledge - depending on one's area of work, bioscience or health care; and Communication - being able to understand needs of people and organizations, and articulate results back to them. CONCLUSION Biomedical and health informatics educational programs must introduce concepts of analytics, Big Data, and the underlying skills to use and apply them into their curricula. The development of new coursework should focus on those who will become experts, with training aiming to provide skills in "deep analytical talent" as well as those who need knowledge to support such individuals.
Collapse
Affiliation(s)
- P Otero
- Dra. Paula Otero, Department of Health Informatics, Hospital Italiano de Buenos Aires, Peron 4190, (1199) Ciudad Autonoma de Buenos, Argentina, E-mail:
| | | | | |
Collapse
|
31
|
Johnson KE, Kamineni A, Fuller S, Olmstead D, Wernli KJ. How the provenance of electronic health record data matters for research: a case example using system mapping. EGEMS 2014; 2:1058. [PMID: 25821838 PMCID: PMC4371416 DOI: 10.13063/2327-9214.1058] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
INTRODUCTION The use of electronic health records (EHRs) for research is proceeding rapidly, driven by computational power, analytical techniques, and policy. However, EHR-based research is limited by the complexity of EHR data and a lack of understanding about data provenance, meaning the context under which the data were collected. This paper presents system flow mapping as a method to help researchers more fully understand the provenance of their EHR data as it relates to local workflow. We provide two specific examples of how this method can improve data identification, documentation, and processing. BACKGROUND EHRs store clinical and administrative data, often in unstructured fields. Each clinical system has a unique and dynamic workflow, as well as an EHR customized for local use. The EHR customization may be influenced by a broader context such as documentation required for billing. METHODS We present a case study with two examples of using system flow mapping to characterize EHR data for a local colorectal cancer screening process. FINDINGS System flow mapping demonstrated that information entered into the EHR during clinical practice required interpretation and transformation before it could be accurately applied to research. We illustrate how system flow mapping shaped our knowledge of the quality and completeness of data in two examples: (1) determining colonoscopy indication as recorded in the EHR, and (2) discovering a specific EHR form that captured family history. DISCUSSION Researchers who do not consider data provenance risk compiling data that are systematically incomplete or incorrect. For example, researchers who are not familiar with the clinical workflow under which data were entered might miss or misunderstand patient information or procedure and diagnostic codes. CONCLUSIONS Data provenance is a fundamental characteristic of research data from EHRs. Given the diversity of EHR platforms and system workflows, researchers need tools for evaluating and reporting data availability, quality, and transformations. Our case study illustrates how system mapping can inform researchers about the provenance of their data as it pertains to local workflows.
Collapse
|