1
|
Berman L, Ostchega Y, Giannini J, Anandan LP, Clark E, Spotnitz M, Sulieman L, Volynski M, Ramirez A. Application of a Data Quality Framework to Ductal Carcinoma In Situ Using Electronic Health Record Data From the All of Us Research Program. JCO Clin Cancer Inform 2024; 8:e2400052. [PMID: 39178364 DOI: 10.1200/cci.24.00052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/27/2024] [Accepted: 07/17/2024] [Indexed: 08/25/2024] Open
Abstract
PURPOSE The specific aims of this paper are to (1) develop and operationalize an electronic health record (EHR) data quality framework, (2) apply the dimensions of the framework to the phenotype and treatment pathways of ductal carcinoma in situ (DCIS) using All of Us Research Program data, and (3) propose and apply a checklist to evaluate the application of the framework. METHODS We developed a framework of five data quality dimensions (DQD; completeness, concordance, conformance, plausibility, and temporality). Participants signed a consent and Health Insurance Portability and Accountability Act authorization to share EHR data and responded to demographic questions in the Basics questionnaire. We evaluated the internal characteristics of the data and compared data with external benchmarks with descriptive and inferential statistics. We developed a DQD checklist to evaluate concept selection, internal verification, and external validity for each DQD. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) concept ID codes for DCIS were used to select a cohort of 2,209 females 18 years and older. RESULTS Using the proposed DQD checklist criteria, (1) concepts were selected and internally verified for conformance; (2) concepts were selected and internally verified for completeness; (3) concepts were selected, internally verified, and externally validated for concordance; (4) concepts were selected, internally verified, and externally validated for plausibility; and (5) concepts were selected, internally verified, and externally validated for temporality. CONCLUSION This assessment and evaluation provided insights into data quality for the DCIS phenotype using EHR data from the All of Us Research Program. The review demonstrates that salient clinical measures can be selected, applied, and operationalized within a conceptual framework and evaluated for fitness for use by applying a proposed checklist.
Collapse
Affiliation(s)
- Lew Berman
- National Institutes of Health, All of Us Research Program, Bethesda, MD
| | - Yechiam Ostchega
- National Institutes of Health, All of Us Research Program, Bethesda, MD
| | - John Giannini
- National Institutes of Health, All of Us Research Program, Bethesda, MD
| | | | | | - Matthew Spotnitz
- National Institutes of Health, All of Us Research Program, Bethesda, MD
| | | | | | - Andrea Ramirez
- National Institutes of Health, All of Us Research Program, Bethesda, MD
| |
Collapse
|
2
|
Peng Y, Bathelt F, Gebler R, Gött R, Heidenreich A, Henke E, Kadioglu D, Lorenz S, Vengadeswaran A, Sedlmayr M. Use of Metadata-Driven Approaches for Data Harmonization in the Medical Domain: Scoping Review. JMIR Med Inform 2024; 12:e52967. [PMID: 38354027 PMCID: PMC10902772 DOI: 10.2196/52967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/01/2023] [Accepted: 12/03/2023] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Multisite clinical studies are increasingly using real-world data to gain real-world evidence. However, due to the heterogeneity of source data, it is difficult to analyze such data in a unified way across clinics. Therefore, the implementation of Extract-Transform-Load (ETL) or Extract-Load-Transform (ELT) processes for harmonizing local health data is necessary, in order to guarantee the data quality for research. However, the development of such processes is time-consuming and unsustainable. A promising way to ease this is the generalization of ETL/ELT processes. OBJECTIVE In this work, we investigate existing possibilities for the development of generic ETL/ELT processes. Particularly, we focus on approaches with low development complexity by using descriptive metadata and structural metadata. METHODS We conducted a literature review following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We used 4 publication databases (ie, PubMed, IEEE Explore, Web of Science, and Biomed Center) to search for relevant publications from 2012 to 2022. The PRISMA flow was then visualized using an R-based tool (Evidence Synthesis Hackathon). All relevant contents of the publications were extracted into a spreadsheet for further analysis and visualization. RESULTS Regarding the PRISMA guidelines, we included 33 publications in this literature review. All included publications were categorized into 7 different focus groups (ie, medicine, data warehouse, big data, industry, geoinformatics, archaeology, and military). Based on the extracted data, ontology-based and rule-based approaches were the 2 most used approaches in different thematic categories. Different approaches and tools were chosen to achieve different purposes within the use cases. CONCLUSIONS Our literature review shows that using metadata-driven (MDD) approaches to develop an ETL/ELT process can serve different purposes in different thematic categories. The results show that it is promising to implement an ETL/ELT process by applying MDD approach to automate the data transformation from Fast Healthcare Interoperability Resources to Observational Medical Outcomes Partnership Common Data Model. However, the determining of an appropriate MDD approach and tool to implement such an ETL/ELT process remains a challenge. This is due to the lack of comprehensive insight into the characterizations of the MDD approaches presented in this study. Therefore, our next step is to evaluate the MDD approaches presented in this study and to determine the most appropriate MDD approaches and the way to integrate them into the ETL/ELT process. This could verify the ability of using MDD approaches to generalize the ETL process for harmonizing medical data.
Collapse
Affiliation(s)
- Yuan Peng
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | | | - Richard Gebler
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | - Robert Gött
- Core Unit Datenintegrationszentrum, University Medicine Greifswald, Greifswald, Germany
| | - Andreas Heidenreich
- Department for Information and Communication Technology (DICT), Data Integration Center (DIC), Goethe University Frankfurt, University Hospital, Frankfurt am Main, Germany
| | - Elisa Henke
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | - Dennis Kadioglu
- Department for Information and Communication Technology (DICT), Data Integration Center (DIC), Goethe University Frankfurt, University Hospital, Frankfurt am Main, Germany
- Institute for Medical Informatics, Goethe University Frankfurt, University Hospital Frankfurt, Frankfurt am Main, Germany
| | - Stephan Lorenz
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| | - Abishaa Vengadeswaran
- Institute for Medical Informatics, Goethe University Frankfurt, University Hospital Frankfurt, Frankfurt am Main, Germany
| | - Martin Sedlmayr
- Institute for Medical Informatics and Biometry, Carl Gustav Carus Faculty of Medicine, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
3
|
Zhang S, Benis N, Cornet R. Automated approach for quality assessment of RDF resources. BMC Med Inform Decis Mak 2023; 23:90. [PMID: 37165363 PMCID: PMC10170671 DOI: 10.1186/s12911-023-02182-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/20/2023] [Indexed: 05/12/2023] Open
Abstract
INTRODUCTION The Semantic Web community provides a common Resource Description Framework (RDF) that allows representation of resources such that they can be linked. To maximize the potential of linked data - machine-actionable interlinked resources on the Web - a certain level of quality of RDF resources should be established, particularly in the biomedical domain in which concepts are complex and high-quality biomedical ontologies are in high demand. However, it is unclear which quality metrics for RDF resources exist that can be automated, which is required given the multitude of RDF resources. Therefore, we aim to determine these metrics and demonstrate an automated approach to assess such metrics of RDF resources. METHODS An initial set of metrics are identified through literature, standards, and existing tooling. Of these, metrics are selected that fulfil these criteria: (1) objective; (2) automatable; and (3) foundational. Selected metrics are represented in RDF and semantically aligned to existing standards. These metrics are then implemented in an open-source tool. To demonstrate the tool, eight commonly used RDF resources were assessed, including data models in the healthcare domain (HL7 RIM, HL7 FHIR, CDISC CDASH), ontologies (DCT, SIO, FOAF, ORDO), and a metadata profile (GRDDL). RESULTS Six objective metrics are identified in 3 categories: Resolvability (1), Parsability (1), and Consistency (4), and represented in RDF. The tool demonstrates that these metrics can be automated, and application in the healthcare domain shows non-resolvable URIs (ranging from 0.3% to 97%) among all eight resources and undefined URIs in HL7 RIM, and FHIR. In the tested resources no errors were found for parsability and the other three consistency metrics for correct usage of classes and properties. CONCLUSION We extracted six objective and automatable metrics from literature, as the foundational quality requirements of RDF resources to maximize the potential of linked data. Automated tooling to assess resources has shown to be effective to identify quality issues that must be avoided. This approach can be expanded to incorporate more automatable metrics so as to reflect additional quality dimensions with the assessment tool implementing more metrics.
Collapse
Affiliation(s)
- Shuxin Zhang
- Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Meibergdreef 9, Amsterdam, The Netherlands
- Amsterdam Public Health, Methodology & Digital Health, Amsterdam, The Netherlands
| | - Nirupama Benis
- Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Meibergdreef 9, Amsterdam, The Netherlands
- Amsterdam Public Health, Methodology & Digital Health, Amsterdam, The Netherlands
| | - Ronald Cornet
- Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Meibergdreef 9, Amsterdam, The Netherlands
- Amsterdam Public Health, Methodology & Digital Health, Amsterdam, The Netherlands
| |
Collapse
|
4
|
Touré V, Krauss P, Gnodtke K, Buchhorn J, Unni D, Horki P, Raisaro JL, Kalt K, Teixeira D, Crameri K, Österle S. FAIRification of health-related data using semantic web technologies in the Swiss Personalized Health Network. Sci Data 2023; 10:127. [PMID: 36899064 PMCID: PMC10006404 DOI: 10.1038/s41597-023-02028-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 02/17/2023] [Indexed: 03/12/2023] Open
Abstract
The Swiss Personalized Health Network (SPHN) is a government-funded initiative developing federated infrastructures for a responsible and efficient secondary use of health data for research purposes in compliance with the FAIR principles (Findable, Accessible, Interoperable and Reusable). We built a common standard infrastructure with a fit-for-purpose strategy to bring together health-related data and ease the work of both data providers to supply data in a standard manner and researchers by enhancing the quality of the collected data. As a result, the SPHN Resource Description Framework (RDF) schema was implemented together with a data ecosystem that encompasses data integration, validation tools, analysis helpers, training and documentation for representing health metadata and data in a consistent manner and reaching nationwide data interoperability goals. Data providers can now efficiently deliver several types of health data in a standardised and interoperable way while a high degree of flexibility is granted for the various demands of individual research projects. Researchers in Switzerland have access to FAIR health data for further use in RDF triplestores.
Collapse
Affiliation(s)
- Vasundra Touré
- Personalized Health Informatics Group, SIB Swiss Institute of Bioinformatics, 4051, Basel, Switzerland
| | - Philip Krauss
- Trivadis - Part of Accenture, 4051, Basel, Switzerland
| | - Kristin Gnodtke
- Personalized Health Informatics Group, SIB Swiss Institute of Bioinformatics, 4051, Basel, Switzerland
| | | | - Deepak Unni
- Personalized Health Informatics Group, SIB Swiss Institute of Bioinformatics, 4051, Basel, Switzerland
| | - Petar Horki
- Personalized Health Informatics Group, SIB Swiss Institute of Bioinformatics, 4051, Basel, Switzerland
| | - Jean Louis Raisaro
- Health Informatics and Data Privacy Group, Biomedical Data Science Center, 1010 Lausanne University Hospital, Lausanne, Switzerland
| | - Katie Kalt
- Clinical Data Platform Research, Directorate of Research and Education, Zurich University Hospital, 8091, Zurich, Switzerland
| | - Daniel Teixeira
- DSI - Data Group, Geneva University Hospital, 1205, Geneva, Switzerland
| | - Katrin Crameri
- Personalized Health Informatics Group, SIB Swiss Institute of Bioinformatics, 4051, Basel, Switzerland
| | - Sabine Österle
- Personalized Health Informatics Group, SIB Swiss Institute of Bioinformatics, 4051, Basel, Switzerland.
| |
Collapse
|
5
|
Frid S, Pastor Duran X, Bracons Cucó G, Pedrera-Jiménez M, Serrano-Balazote P, Muñoz Carrero A, Lozano-Rubí R. An Ontology-Based Approach for Consolidating Patient Data Standardized With European Norm/International Organization for Standardization 13606 (EN/ISO 13606) Into Joint Observational Medical Outcomes Partnership (OMOP) Repositories: Description of a Methodology. JMIR Med Inform 2023; 11:e44547. [PMID: 36884279 PMCID: PMC10034609 DOI: 10.2196/44547] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/28/2022] [Accepted: 01/05/2023] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND To discover new knowledge from data, they must be correct and in a consistent format. OntoCR, a clinical repository developed at Hospital Clínic de Barcelona, uses ontologies to represent clinical knowledge and map locally defined variables to health information standards and common data models. OBJECTIVE The aim of the study is to design and implement a scalable methodology based on the dual-model paradigm and the use of ontologies to consolidate clinical data from different organizations in a standardized repository for research purposes without loss of meaning. METHODS First, the relevant clinical variables are defined, and the corresponding European Norm/International Organization for Standardization (EN/ISO) 13606 archetypes are created. Data sources are then identified, and an extract, transform, and load process is carried out. Once the final data set is obtained, the data are transformed to create EN/ISO 13606-normalized electronic health record (EHR) extracts. Afterward, ontologies that represent archetyped concepts and map them to EN/ISO 13606 and Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) standards are created and uploaded to OntoCR. Data stored in the extracts are inserted into its corresponding place in the ontology, thus obtaining instantiated patient data in the ontology-based repository. Finally, data can be extracted via SPARQL queries as OMOP CDM-compliant tables. RESULTS Using this methodology, EN/ISO 13606-standardized archetypes that allow for the reuse of clinical information were created, and the knowledge representation of our clinical repository by modeling and mapping ontologies was extended. Furthermore, EN/ISO 13606-compliant EHR extracts of patients (6803), episodes (13,938), diagnosis (190,878), administered medication (222,225), cumulative drug dose (222,225), prescribed medication (351,247), movements between units (47,817), clinical observations (6,736,745), laboratory observations (3,392,873), limitation of life-sustaining treatment (1,298), and procedures (19,861) were created. Since the creation of the application that inserts data from extracts into the ontologies is not yet finished, the queries were tested and the methodology was validated by importing data from a random subset of patients into the ontologies using a locally developed Protégé plugin ("OntoLoad"). In total, 10 OMOP CDM-compliant tables ("Condition_occurrence," 864 records; "Death," 110; "Device_exposure," 56; "Drug_exposure," 5609; "Measurement," 2091; "Observation," 195; "Observation_period," 897; "Person," 922; "Visit_detail," 772; and "Visit_occurrence," 971) were successfully created and populated. CONCLUSIONS This study proposes a methodology for standardizing clinical data, thus allowing its reuse without any changes in the meaning of the modeled concepts. Although this paper focuses on health research, our methodology suggests that the data be initially standardized per EN/ISO 13606 to obtain EHR extracts with a high level of granularity that can be used for any purpose. Ontologies constitute a valuable approach for knowledge representation and standardization of health information in a standard-agnostic manner. With the proposed methodology, institutions can go from local raw data to standardized, semantically interoperable EN/ISO 13606 and OMOP repositories.
Collapse
Affiliation(s)
- Santiago Frid
- Medical Informatics Unit, Hospital Clínic de Barcelona, Barcelona, Spain
- Clinical Foundations Department, Universitat de Barcelona, Barcelona, Spain
| | - Xavier Pastor Duran
- Medical Informatics Unit, Hospital Clínic de Barcelona, Barcelona, Spain
- Clinical Foundations Department, Universitat de Barcelona, Barcelona, Spain
| | | | | | | | - Adolfo Muñoz Carrero
- Unit of Investigation in Telemedicine and Digital Health, Instituto de Salud Carlos III, Madrid, Spain
| | - Raimundo Lozano-Rubí
- Medical Informatics Unit, Hospital Clínic de Barcelona, Barcelona, Spain
- Clinical Foundations Department, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
6
|
Khnaisser C, Lavoie L, Fraikin B, Barton A, Dussault S, Burgun A, Ethier JF. Using an Ontology to Derive a Sharable and Interoperable Relational Data Model for Heterogeneous Healthcare Data and Various Applications. Methods Inf Med 2022; 61:e73-e88. [PMID: 35709746 PMCID: PMC9788910 DOI: 10.1055/a-1877-9498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
BACKGROUND A large volume of heavily fragmented data is generated daily in different healthcare contexts and is stored using various structures with different semantics. This fragmentation and heterogeneity make secondary use of data a challenge. Data integration approaches that derive a common data model from sources or requirements have some advantages. However, these approaches are often built for a specific application where the research questions are known. Thus, the semantic and structural reconciliation is often not reusable nor reproducible. A recent integration approach using knowledge models has been developed with ontologies that provide a strong semantic foundation. Nonetheless, deriving a data model that captures the richness of the ontology to store data with their full semantic remains a challenging task. OBJECTIVES This article addresses the following question: How to design a sharable and interoperable data model for storing heterogeneous healthcare data and their semantic to support various applications? METHOD This article describes a method using an ontological knowledge model to automatically generate a data model for a domain of interest. The model can then be implemented in a relational database which efficiently enables the collection, storage, and retrieval of data while keeping semantic ontological annotations so that the same data can be extracted for various applications for further processing. RESULTS This article (1) presents a comparison of existing methods for generating a relational data model from an ontology using 23 criteria, (2) describes standard conversion rules, and (3) presents O n t o R e l a , a prototype developed to demonstrate the conversion rules. CONCLUSION This work is a first step toward automating and refining the generation of sharable and interoperable relational data models using ontologies with a freely available tool. The remaining challenges to cover all the ontology richness in the relational model are pointed out.
Collapse
Affiliation(s)
- Christina Khnaisser
- GRIIS, Université de Sherbrooke, Sherbrooke, Canada,Address for correspondence Christina Khnaisser, PhD, GRIIS Université de SherbrookeSherbrooke J1K 2R1Canada
| | - Luc Lavoie
- GRIIS, Université de Sherbrooke, Sherbrooke, Canada
| | | | | | | | - Anita Burgun
- INSERM UMRS 1138 Team 22, Université de Paris, Paris, France
| | | |
Collapse
|
7
|
Pedrera-Jiménez M, García-Barrio N, Rubio-Mayo P, Tato-Gómez A, Cruz-Bermúdez JL, Bernal-Sobrino JL, Muñoz-Carrero A, Serrano-Balazote P. TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse. Methods Inf Med 2022; 61:e89-e102. [PMID: 36220109 PMCID: PMC9788916 DOI: 10.1055/s-0042-1757763] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
BACKGROUND During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable. OBJECTIVES This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization. METHODS The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML. RESULTS First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined. CONCLUSIONS This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them.
Collapse
Affiliation(s)
- Miguel Pedrera-Jiménez
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain,ETSI Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain,Address for correspondence Miguel Pedrera-Jiménez, Eng, MSc Health Informatics DepartmentHospital Universitario 12 de Octubre, Av. de Córdoba, s/n, 28041 MadridSpain
| | - Noelia García-Barrio
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | - Paula Rubio-Mayo
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | - Alberto Tato-Gómez
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | - Juan Luis Cruz-Bermúdez
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | - José Luis Bernal-Sobrino
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| | | | - Pablo Serrano-Balazote
- Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
| |
Collapse
|
8
|
Shang Y, Tian Y, Zhou M, Zhou T, Lyu K, Wang Z, Xin R, Liang T, Zhu S, Li J. EHR-Oriented Knowledge Graph System: Toward Efficient Utilization of Non-Used Information Buried in Routine Clinical Practice. IEEE J Biomed Health Inform 2021; 25:2463-2475. [PMID: 34057901 DOI: 10.1109/jbhi.2021.3085003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Non-used clinical information has negative implications on healthcare quality. Clinicians pay priority attention to clinical information relevant to their specialties during routine clinical practices but may be insensitive or less concerned about information showing disease risks beyond their specialties, resulting in delayed and missed diagnoses or improper management. In this study, we introduced an electronic health record (EHR)-oriented knowledge graph system to efficiently utilize non-used information buried in EHRs. EHR data were transformed into a semantic patient-centralized information model under the ontology structure of a knowledge graph. The knowledge graph then creates an EHR data trajectory and performs reasoning through semantic rules to identify important clinical findings within EHR data. A graphical reasoning pathway illustrates the reasoning footage and explains the clinical significance for clinicians to better understand the neglected information. An application study was performed to evaluate unconsidered chronic kidney disease (CKD) reminding for non-nephrology clinicians to identify important neglected information. The study covered 71,679 patients in non-nephrology departments. The system identified 2,774 patients meeting CKD diagnosis criteria and 10,377 patients requiring high attention. A follow-up study of 5,439 patients showed that 82.1% of patients who met the diagnosis criteria and 61.4% of patients requiring high attention were confirmed to be CKD positive during follow-up research. The application demonstrated that the proposed approach is feasible and effective in clinical information utilization. Additionally, it's valuable as an explainable artificial intelligence to provide interpretable recommendations for specialist physicians to understand the importance of non-used data and make comprehensive decisions.
Collapse
|
9
|
Gaudet-Blavignac C, Raisaro JL, Touré V, Österle S, Crameri K, Lovis C. A National, Semantic-Driven, Three-Pillar Strategy to Enable Health Data Secondary Usage Interoperability for Research Within the Swiss Personalized Health Network: Methodological Study. JMIR Med Inform 2021; 9:e27591. [PMID: 34185008 PMCID: PMC8277320 DOI: 10.2196/27591] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 04/27/2021] [Accepted: 05/19/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Interoperability is a well-known challenge in medical informatics. Current trends in interoperability have moved from a data model technocentric approach to sustainable semantics, formal descriptive languages, and processes. Despite many initiatives and investments for decades, the interoperability challenge remains crucial. The need for data sharing for most purposes ranging from patient care to secondary uses, such as public health, research, and quality assessment, faces unmet problems. OBJECTIVE This work was performed in the context of a large Swiss Federal initiative aiming at building a national infrastructure for reusing consented data acquired in the health care and research system to enable research in the field of personalized medicine in Switzerland. The initiative is the Swiss Personalized Health Network (SPHN). This initiative is providing funding to foster use and exchange of health-related data for research. As part of the initiative, a national strategy to enable a semantically interoperable clinical data landscape was developed and implemented. METHODS A deep analysis of various approaches to address interoperability was performed at the start, including large frameworks in health care, such as Health Level Seven (HL7) and Integrating Healthcare Enterprise (IHE), and in several domains, such as regulatory agencies (eg, Clinical Data Interchange Standards Consortium [CDISC]) and research communities (eg, Observational Medical Outcome Partnership [OMOP]), to identify bottlenecks and assess sustainability. Based on this research, a strategy composed of three pillars was designed. It has strong multidimensional semantics, descriptive formal language for exchanges, and as many data models as needed to comply with the needs of various communities. RESULTS This strategy has been implemented stepwise in Switzerland since the middle of 2019 and has been adopted by all university hospitals and high research organizations. The initiative is coordinated by a central organization, the SPHN Data Coordination Center of the SIB Swiss Institute of Bioinformatics. The semantics is mapped by domain experts on various existing standards, such as Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), Logical Observation Identifiers Names and Codes (LOINC), and International Classification of Diseases (ICD). The resource description framework (RDF) is used for storing and transporting data, and to integrate information from different sources and standards. Data transformers based on SPARQL query language are implemented to convert RDF representations to the numerous data models required by the research community or bridge with other systems, such as electronic case report forms. CONCLUSIONS The SPHN strategy successfully implemented existing standards in a pragmatic and applicable way. It did not try to build any new standards but used existing ones in a nondogmatic way. It has now been funded for another 4 years, bringing the Swiss landscape into a new dimension to support research in the field of personalized medicine and large interoperable clinical data.
Collapse
Affiliation(s)
- Christophe Gaudet-Blavignac
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Jean Louis Raisaro
- Data Science Group, Division of Information Systems, Lausanne University Hospital, Lausanne, Switzerland
- Precision Medicine Unit, Department of Laboratories, Lausanne University Hospital, Lausanne, Switzerland
| | - Vasundra Touré
- Personalized Health Informatics Group, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Sabine Österle
- Personalized Health Informatics Group, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Katrin Crameri
- Personalized Health Informatics Group, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Christian Lovis
- Division of Medical Information Sciences, Geneva University Hospitals, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| |
Collapse
|
10
|
Hammad R, Barhoush M, Abed-alguni BH. A Semantic-Based Approach for Managing Healthcare Big Data: A Survey. JOURNAL OF HEALTHCARE ENGINEERING 2020; 2020:8865808. [PMID: 33489061 PMCID: PMC7787845 DOI: 10.1155/2020/8865808] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 11/02/2020] [Accepted: 11/09/2020] [Indexed: 12/20/2022]
Abstract
Healthcare information systems can reduce the expenses of treatment, foresee episodes of pestilences, help stay away from preventable illnesses, and improve personal life satisfaction. As of late, considerable volumes of heterogeneous and differing medicinal services data are being produced from different sources covering clinic records of patients, lab results, and wearable devices, making it hard for conventional data processing to handle and manage this amount of data. Confronted with the difficulties and challenges facing the process of managing healthcare big data such as volume, velocity, and variety, healthcare information systems need to use new methods and techniques for managing and processing such data to extract useful information and knowledge. In the recent few years, a large number of organizations and companies have shown enthusiasm for using semantic web technologies with healthcare big data to convert data into knowledge and intelligence. In this paper, we review the state of the art on the semantic web for the healthcare industry. Based on our literature review, we will discuss how different techniques, standards, and points of view created by the semantic web community can participate in addressing the challenges related to healthcare big data.
Collapse
|