1
|
Coutinho-Almeida J, Saez C, Correia R, Rodrigues PP. Development and initial validation of a data quality evaluation tool in obstetrics real-world data through HL7-FHIR interoperable Bayesian networks and expert rules. JAMIA Open 2024; 7:ooae062. [PMID: 39070966 PMCID: PMC11283181 DOI: 10.1093/jamiaopen/ooae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 06/05/2024] [Accepted: 06/19/2024] [Indexed: 07/30/2024] Open
Abstract
Background The increasing prevalence of electronic health records (EHRs) in healthcare systems globally has underscored the importance of data quality for clinical decision-making and research, particularly in obstetrics. High-quality data is vital for an accurate representation of patient populations and to avoid erroneous healthcare decisions. However, existing studies have highlighted significant challenges in EHR data quality, necessitating innovative tools and methodologies for effective data quality assessment and improvement. Objective This article addresses the critical need for data quality evaluation in obstetrics by developing a novel tool. The tool utilizes Health Level 7 (HL7) Fast Healthcare Interoperable Resources (FHIR) standards in conjunction with Bayesian Networks and expert rules, offering a novel approach to assessing data quality in real-world obstetrics data. Methods A harmonized framework focusing on completeness, plausibility, and conformance underpins our methodology. We employed Bayesian networks for advanced probabilistic modeling, integrated outlier detection methods, and a rule-based system grounded in domain-specific knowledge. The development and validation of the tool were based on obstetrics data from 9 Portuguese hospitals, spanning the years 2019-2020. Results The developed tool demonstrated strong potential for identifying data quality issues in obstetrics EHRs. Bayesian networks used in the tool showed high performance for various features with area under the receiver operating characteristic curve (AUROC) between 75% and 97%. The tool's infrastructure and interoperable format as a FHIR Application Programming Interface (API) enables a possible deployment of a real-time data quality assessment in obstetrics settings. Our initial assessments show promised, even when compared with physicians' assessment of real records, the tool can reach AUROC of 88%, depending on the threshold defined. Discussion Our results also show that obstetrics clinical records are difficult to assess in terms of quality and assessments like ours could benefit from more categorical approaches of ranking between bad and good quality. Conclusion This study contributes significantly to the field of EHR data quality assessment, with a specific focus on obstetrics. The combination of HL7-FHIR interoperability, machine learning techniques, and expert knowledge presents a robust, adaptable solution to the challenges of healthcare data quality. Future research should explore tailored data quality evaluations for different healthcare contexts, as well as further validation of the tool capabilities, enhancing the tool's utility across diverse medical domains.
Collapse
Affiliation(s)
- João Coutinho-Almeida
- CINTESIS@RISE—Centre for Health Technologies and Services Research, University of Porto, 4200-319 Porto, Portugal
- MEDCIDS—Faculty of Medicine of University of Porto, 4200-319 Porto, Portugal
- Health Data Science PhD Program, Faculty of Medicine of the University of Porto, 4200-319 Porto, Portugal
| | - Carlos Saez
- Instituto Universitario de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas, Universitat Politècnica de València, 46022 Valencia, Spain
| | - Ricardo Correia
- CINTESIS@RISE—Centre for Health Technologies and Services Research, University of Porto, 4200-319 Porto, Portugal
- MEDCIDS—Faculty of Medicine of University of Porto, 4200-319 Porto, Portugal
- Health Data Science PhD Program, Faculty of Medicine of the University of Porto, 4200-319 Porto, Portugal
| | - Pedro Pereira Rodrigues
- CINTESIS@RISE—Centre for Health Technologies and Services Research, University of Porto, 4200-319 Porto, Portugal
- MEDCIDS—Faculty of Medicine of University of Porto, 4200-319 Porto, Portugal
- Health Data Science PhD Program, Faculty of Medicine of the University of Porto, 4200-319 Porto, Portugal
| |
Collapse
|
2
|
Syed R, Eden R, Makasi T, Chukwudi I, Mamudu A, Kamalpour M, Kapugama Geeganage D, Sadeghianasl S, Leemans SJJ, Goel K, Andrews R, Wynn MT, Ter Hofstede A, Myers T. Digital Health Data Quality Issues: Systematic Review. J Med Internet Res 2023; 25:e42615. [PMID: 37000497 PMCID: PMC10131725 DOI: 10.2196/42615] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Revised: 12/07/2022] [Accepted: 12/31/2022] [Indexed: 04/01/2023] Open
Abstract
BACKGROUND The promise of digital health is principally dependent on the ability to electronically capture data that can be analyzed to improve decision-making. However, the ability to effectively harness data has proven elusive, largely because of the quality of the data captured. Despite the importance of data quality (DQ), an agreed-upon DQ taxonomy evades literature. When consolidated frameworks are developed, the dimensions are often fragmented, without consideration of the interrelationships among the dimensions or their resultant impact. OBJECTIVE The aim of this study was to develop a consolidated digital health DQ dimension and outcome (DQ-DO) framework to provide insights into 3 research questions: What are the dimensions of digital health DQ? How are the dimensions of digital health DQ related? and What are the impacts of digital health DQ? METHODS Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a developmental systematic literature review was conducted of peer-reviewed literature focusing on digital health DQ in predominately hospital settings. A total of 227 relevant articles were retrieved and inductively analyzed to identify digital health DQ dimensions and outcomes. The inductive analysis was performed through open coding, constant comparison, and card sorting with subject matter experts to identify digital health DQ dimensions and digital health DQ outcomes. Subsequently, a computer-assisted analysis was performed and verified by DQ experts to identify the interrelationships among the DQ dimensions and relationships between DQ dimensions and outcomes. The analysis resulted in the development of the DQ-DO framework. RESULTS The digital health DQ-DO framework consists of 6 dimensions of DQ, namely accessibility, accuracy, completeness, consistency, contextual validity, and currency; interrelationships among the dimensions of digital health DQ, with consistency being the most influential dimension impacting all other digital health DQ dimensions; 5 digital health DQ outcomes, namely clinical, clinician, research-related, business process, and organizational outcomes; and relationships between the digital health DQ dimensions and DQ outcomes, with the consistency and accessibility dimensions impacting all DQ outcomes. CONCLUSIONS The DQ-DO framework developed in this study demonstrates the complexity of digital health DQ and the necessity for reducing digital health DQ issues. The framework further provides health care executives with holistic insights into DQ issues and resultant outcomes, which can help them prioritize which DQ-related problems to tackle first.
Collapse
Affiliation(s)
- Rehan Syed
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Rebekah Eden
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Tendai Makasi
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Ignatius Chukwudi
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Azumah Mamudu
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Mostafa Kamalpour
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Dakshi Kapugama Geeganage
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Sareh Sadeghianasl
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Sander J J Leemans
- Rheinisch-Westfälische Technische Hochschule, Aachen University, Aachen, Germany
| | - Kanika Goel
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Robert Andrews
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Moe Thandar Wynn
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Arthur Ter Hofstede
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| | - Trina Myers
- School of Information Systems, Faculty of Science, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
3
|
Quan TP, Lacey B, Peto TEA, Walker AS. Health record hiccups-5,526 real-world time series with change points labelled by crowdsourced visual inspection. Gigascience 2022; 12:giad060. [PMID: 37503960 PMCID: PMC10375518 DOI: 10.1093/gigascience/giad060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 05/19/2023] [Accepted: 07/07/2023] [Indexed: 07/29/2023] Open
Abstract
BACKGROUND Large routinely collected data such as electronic health records (EHRs) are increasingly used in research, but the statistical methods and processes used to check such data for temporal data quality issues have not moved beyond manual, ad hoc production and visual inspection of graphs. With the prospect of EHR data being used for disease surveillance via automated pipelines and public-facing dashboards, automation of data quality checks will become increasingly valuable. FINDINGS We generated 5,526 time series from 8 different EHR datasets and engaged >2,000 citizen-science volunteers to label the locations of all suspicious-looking change points in the resulting graphs. Consensus labels were produced using density-based clustering with noise, with validation conducted using 956 images containing labels produced by an experienced data scientist. Parameter tuning was done against 670 images and performance calculated against 286 images, resulting in a final sensitivity of 80.4% (95% CI, 77.1%-83.3%), specificity of 99.8% (99.7%-99.8%), positive predictive value of 84.5% (81.4%-87.2%), and negative predictive value of 99.7% (99.6%-99.7%). In total, 12,745 change points were found within 3,687 of the time series. CONCLUSIONS This large collection of labelled EHR time series can be used to validate automated methods for change point detection in real-world settings, encouraging the development of methods that can successfully be applied in practice. It is particularly valuable since change point detection methods are typically validated using synthetic data, so their performance in real-world settings cannot be assumed to be comparable. While the dataset focusses on EHRs and data quality, it should also be applicable in other fields.
Collapse
Affiliation(s)
- T Phuong Quan
- Nuffield Department of Clinical Medicine, University of Oxford, Oxford OX3 9DU, UK
| | - Ben Lacey
- Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
| | - Tim E A Peto
- Nuffield Department of Clinical Medicine, University of Oxford, Oxford OX3 9DU, UK
| | - A Sarah Walker
- Nuffield Department of Clinical Medicine, University of Oxford, Oxford OX3 9DU, UK
| |
Collapse
|
4
|
García-de-León-Chocano R, Sáez C, Muñoz-Soler V, Oliver-Roig A, García-de-León-González R, García-Gómez JM. Robust estimation of infant feeding indicators by data quality assessment of longitudinal electronic health records from birth up to 18 months of life. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 207:106147. [PMID: 34020376 DOI: 10.1016/j.cmpb.2021.106147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 04/26/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND AND OBJECTIVE The Baby-Friendly Hospital Initiative (BFHI) is an international strategy aimed at improving breastfeeding practices in health care services. Regular monitoring of indicators is key for BFHI implementation and maintenance. Currently, routine data collected from electronic health records (EHR) is an excellent source for infant feeding monitoring, however data quality (DQ) assessment should be undertaken. The aim of this research is to enable robust estimations of infant feeding indicators through DQ assessment of routine EHR data. MATERIALS AND METHODS We use the longitudinal series of healthcare contacts belonging to 6427 children born from 2009 to 2018 in the Health Area V of Murcia (Spain). Longitudinal data came from EHR at hospital discharge and community infant health reviews up to 18 months. The data of each healthcare contact contained a 24-h recall of infant feeding. We perform a DQ process in three phases: (1) an assessment of each-single-contact and the definition of their infant feeding status; (2) a longitudinal DQ assessment of completeness and consistency of the series of contacts to obtain meta-information that guides the duration calculus, for each case, of the different types of breastfeeding: exclusive breastfeeding (EBF), full breastfeeding (FBF) and any breastfeeding (ABF); and finally (3) a robust estimation of indicators and description of DQ of each indicator. RESULTS We found deficiencies of DQ in 30.42% of single contacts for EBF, 19.02% for FBF and 22.50% for ABF that were used to establish the infant feeding status. However, after longitudinal DQ assessment, we obtained valid and reliable data rates for most indicators such as "median duration of breastfeeding" nearly 90%, both for FBF and ABF, not so for EBF. CONCLUSIONS Despite the DQ deficiencies found in raw data, the DQ assurance approach by indicators proposed in this work, allowed us to obtain a robust estimation of indicators with a significant percentage of subjects with valid information for ABF and FBF monitoring. The estimations were consistent with results previously published. The methodology provided with this study allows a continuous and reliable population monitoring of infant feeding indicators of BFHI from routine EHR data.
Collapse
Affiliation(s)
- Ricardo García-de-León-Chocano
- Department of Information Technology, Hospital Virgen del Castillo, Yecla, Gerencia Área de Salud V - Altiplano, Servicio Murciano de Salud, Spain.
| | - Carlos Sáez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| | - Verónica Muñoz-Soler
- Doctoral student Nursing Department, Faculty of Health Sciences, University of Alicante, Spain
| | - Antonio Oliver-Roig
- Department of Nursing, Faculty of Health Sciences, University of Alicante, Spain
| | - Ricardo García-de-León-González
- Department of Paediatrics, Hospital Virgen del Castillo, Yecla, Gerencia Área de Salud V-Altiplano, Servicio Murciano de Salud, Spain
| | - Juan Miguel García-Gómez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| |
Collapse
|
5
|
Gagalova KK, Leon Elizalde MA, Portales-Casamar E, Görges M. What You Need to Know Before Implementing a Clinical Research Data Warehouse: Comparative Review of Integrated Data Repositories in Health Care Institutions. JMIR Form Res 2020; 4:e17687. [PMID: 32852280 PMCID: PMC7484778 DOI: 10.2196/17687] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 06/09/2020] [Accepted: 07/17/2020] [Indexed: 12/23/2022] Open
Abstract
Background Integrated data repositories (IDRs), also referred to as clinical data warehouses, are platforms used for the integration of several data sources through specialized analytical tools that facilitate data processing and analysis. IDRs offer several opportunities for clinical data reuse, and the number of institutions implementing an IDR has grown steadily in the past decade. Objective The architectural choices of major IDRs are highly diverse and determining their differences can be overwhelming. This review aims to explore the underlying models and common features of IDRs, provide a high-level overview for those entering the field, and propose a set of guiding principles for small- to medium-sized health institutions embarking on IDR implementation. Methods We reviewed manuscripts published in peer-reviewed scientific literature between 2008 and 2020, and selected those that specifically describe IDR architectures. Of 255 shortlisted articles, we found 34 articles describing 29 different architectures. The different IDRs were analyzed for common features and classified according to their data processing and integration solution choices. Results Despite common trends in the selection of standard terminologies and data models, the IDRs examined showed heterogeneity in the underlying architecture design. We identified 4 common architecture models that use different approaches for data processing and integration. These different approaches were driven by a variety of features such as data sources, whether the IDR was for a single institution or a collaborative project, the intended primary data user, and purpose (research-only or including clinical or operational decision making). Conclusions IDR implementations are diverse and complex undertakings, which benefit from being preceded by an evaluation of requirements and definition of scope in the early planning stage. Factors such as data source diversity and intended users of the IDR influence data flow and synchronization, both of which are crucial factors in IDR architecture planning.
Collapse
Affiliation(s)
- Kristina K Gagalova
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada.,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada.,Research Institute, BC Children's Hospital, Vancouver, BC, Canada
| | - M Angelica Leon Elizalde
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada
| | - Elodie Portales-Casamar
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,Department of Pediatrics, University of British Columbia, Vancouver, BC, Canada
| | - Matthias Görges
- Research Institute, BC Children's Hospital, Vancouver, BC, Canada.,Department of Anesthesiology, Pharmacology and Therapeutics, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|