1
|
Riepenhausen S, Blumenstock M, Niklas C, Hegselmann S, Neuhaus P, Meidt A, Püttmann C, Storck M, Ganzinger M, Varghese J, Dugas M. Europe's Largest Research Infrastructure for Curated Medical Data Models with Semantic Annotations. Methods Inf Med 2024. [PMID: 38740374 DOI: 10.1055/s-0044-1786839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
BACKGROUND Structural metadata from the majority of clinical studies and routine health care systems is currently not yet available to the scientific community. OBJECTIVE To provide an overview of available contents in the Portal of Medical Data Models (MDM Portal). METHODS The MDM Portal is a registered European information infrastructure for research and health care, and its contents are curated and semantically annotated by medical experts. It enables users to search, view, discuss, and download existing medical data models. RESULTS The most frequent keyword is "clinical trial" (n = 18,777), and the most frequent disease-specific keyword is "breast neoplasms" (n = 1,943). Most data items are available in English (n = 545,749) and German (n = 109,267). Manually curated semantic annotations are available for 805,308 elements (554,352 items, 58,101 item groups, and 192,855 code list items), which were derived from 25,257 data models. In total, 1,609,225 Unified Medical Language System (UMLS) codes have been assigned, with 66,373 unique UMLS codes. CONCLUSION To our knowledge, the MDM Portal constitutes Europe's largest collection of medical data models with semantically annotated elements. As such, it can be used to increase compatibility of medical datasets and can be utilized as a large expert-annotated medical text corpus for natural language processing.
Collapse
Affiliation(s)
- Sarah Riepenhausen
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Max Blumenstock
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Christian Niklas
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Stefan Hegselmann
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Philipp Neuhaus
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Alexandra Meidt
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Cornelia Püttmann
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Michael Storck
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Matthias Ganzinger
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster, Nordrhein-Westfalen, Germany
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
- European Research Center for Information Systems (ERCIS), Münster, Nordrhein-Westfalen, Germany
| |
Collapse
|
2
|
Adams MCB, Hurley RW, Siddons A, Topaloglu U, Wandner LD. NIH HEAL Clinical Data Elements (CDE) implementation: NIH HEAL Initiative IMPOWR network IDEA-CC. PAIN MEDICINE (MALDEN, MASS.) 2023; 24:743-749. [PMID: 36799548 PMCID: PMC10321760 DOI: 10.1093/pm/pnad018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 02/14/2023] [Accepted: 02/15/2023] [Indexed: 02/18/2023]
Abstract
OBJECTIVE The National Institutes of Health (NIH) HEAL Initiative is making data findable, accessible, interoperable, and reusable (FAIR) to maximize the value of the unprecedented federal investment in pain and opioid-use disorder research. This involves standardizing the use of common data elements (CDE) for clinical research. METHODS This work describes the process of the selection, processing, harmonization, and design constraints of CDE across a pain and opioid use disorder clinical trials network (NIH HEAL IMPOWR). RESULTS The network alignment allowed for incorporation of newer data standards across the clinical trials. Specific advances included geographic coding (RUCA), deidentified patient identifiers (GUID), shareable clinical survey libraries (REDCap), and concept mapping to standardized concepts (UMLS). CONCLUSIONS While complex, harmonization across a network of chronic pain and opioid use disorder clinical trials with separate interventions can be optimized through use of CDEs and data standardization processes. This standardization process will support the robust secondary data analyses. Scaling this process could standardize CDE results across interventions or disease state which could help inform insurance companies or government organizations about coverage determinations. The development of the HEAL CDE program supports connecting isolated studies and solutions to each other, but the practical aspects may be challenging for some studies to implement. Leveraging tools and technology to simplify process and create ready to use resources may support wider adoption of consistent data standards.
Collapse
Affiliation(s)
- Meredith C B Adams
- Departments of Anesthesiology, Biomedical Informatics, and Public Health Sciences, Wake Forest University School of Medicine, Medical Center Boulevard, Winston-Salem, NC 27157, United States
| | - Robert W Hurley
- Departments of Anesthesiology, Translational Neuroscience, and Public Health Sciences, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
| | - Andrew Siddons
- National Institute of Neurological Disorders and Stroke, Bethesda, MD, United States
| | - Umit Topaloglu
- Department of Cancer Biology, Wake Forest University School of Medicine, Winston-Salem, NC 27157, United States
| | - Laura D Wandner
- National Institute of Neurological Disorders and Stroke, Bethesda, MD, United States
| |
Collapse
|
3
|
Rafee A, Riepenhausen S, Neuhaus P, Meidt A, Dugas M, Varghese J. ELaPro, a LOINC-mapped core dataset for top laboratory procedures of eligibility screening for clinical trials. BMC Med Res Methodol 2022; 22:141. [PMID: 35568796 PMCID: PMC9107639 DOI: 10.1186/s12874-022-01611-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 04/20/2022] [Indexed: 12/21/2022] Open
Abstract
Background Screening for eligible patients continues to pose a great challenge for many clinical trials. This has led to a rapidly growing interest in standardizing computable representations of eligibility criteria (EC) in order to develop tools that leverage data from electronic health record (EHR) systems. Although laboratory procedures (LP) represent a common entity of EC that is readily available and retrievable from EHR systems, there is a lack of interoperable data models for this entity of EC. A public, specialized data model that utilizes international, widely-adopted terminology for LP, e.g. Logical Observation Identifiers Names and Codes (LOINC®), is much needed to support automated screening tools. Objective The aim of this study is to establish a core dataset for LP most frequently requested to recruit patients for clinical trials using LOINC terminology. Employing such a core dataset could enhance the interface between study feasibility platforms and EHR systems and significantly improve automatic patient recruitment. Methods We used a semi-automated approach to analyze 10,516 screening forms from the Medical Data Models (MDM) portal’s data repository that are pre-annotated with Unified Medical Language System (UMLS). An automated semantic analysis based on concept frequency is followed by an extensive manual expert review performed by physicians to analyze complex recruitment-relevant concepts not amenable to automatic approach. Results Based on analysis of 138,225 EC from 10,516 screening forms, 55 laboratory procedures represented 77.87% of all UMLS laboratory concept occurrences identified in the selected EC forms. We identified 26,413 unique UMLS concepts from 118 UMLS semantic types and covered the vast majority of Medical Subject Headings (MeSH) disease domains. Conclusions Only a small set of common LP covers the majority of laboratory concepts in screening EC forms which supports the feasibility of establishing a focused core dataset for LP. We present ELaPro, a novel, LOINC-mapped, core dataset for the most frequent 55 LP requested in screening for clinical trials. ELaPro is available in multiple machine-readable data formats like CSV, ODM and HL7 FHIR. The extensive manual curation of this large number of free-text EC as well as the combining of UMLS and LOINC terminologies distinguishes this specialized dataset from previous relevant datasets in the literature. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01611-y.
Collapse
Affiliation(s)
- Ahmed Rafee
- Institute of Medical Informatics, University of Münster, Münster, Germany. .,Department of Internal Medicine (D), University Hospital of Münster, Münster, Germany.
| | - Sarah Riepenhausen
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Philipp Neuhaus
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Alexandra Meidt
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster, Germany.
| |
Collapse
|
4
|
Hegselmann S, Storck M, Gessner S, Neuhaus P, Varghese J, Bruland P, Meidt A, Mertens C, Riepenhausen S, Baier S, Stöcker B, Henke J, Schmidt CO, Dugas M. Pragmatic MDR: a metadata repository with bottom-up standardization of medical metadata through reuse. BMC Med Inform Decis Mak 2021; 21:160. [PMID: 34001121 PMCID: PMC8130274 DOI: 10.1186/s12911-021-01524-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 05/09/2021] [Indexed: 11/27/2022] Open
Abstract
Background The variety of medical documentation often leads to incompatible data elements that impede data integration between institutions. A common approach to standardize and distribute metadata definitions are ISO/IEC 11179 norm-compliant metadata repositories with top-down standardization. To the best of our knowledge, however, it is not yet common practice to reuse the content of publicly accessible metadata repositories for creation of case report forms or routine documentation. We suggest an alternative concept called pragmatic metadata repository, which enables a community-driven bottom-up approach for agreeing on data collection models. A pragmatic metadata repository collects real-world documentation and considers frequent metadata definitions as high quality with potential for reuse. Methods We implemented a pragmatic metadata repository proof of concept application and filled it with medical forms from the Portal of Medical Data Models. We applied this prototype in two use cases to demonstrate its capabilities for reusing metadata: first, integration into a study editor for the suggestion of data elements and, second, metadata synchronization between two institutions. Moreover, we evaluated the emergence of bottom-up standards in the prototype and two medical data managers assessed their quality for 24 medical concepts. Results The resulting prototype contained 466,569 unique metadata definitions. Integration into the study editor led to a reuse of 1836 items and item groups. During the metadata synchronization, semantic codes of 4608 data elements were transferred. Our evaluation revealed that for less complex medical concepts weak bottom-up standards could be established. However, more diverse disease-related concepts showed no convergence of data elements due to an enormous heterogeneity of metadata. The survey showed fair agreement (Kalpha = 0.50, 95% CI 0.43–0.56) for good item quality of bottom-up standards. Conclusions We demonstrated the feasibility of the pragmatic metadata repository concept for medical documentation. Applications of the prototype in two use cases suggest that it facilitates the reuse of data elements. Our evaluation showed that bottom-up standardization based on a large collection of real-world metadata can yield useful results. The proposed concept shall not replace existing top-down approaches, rather it complements them by showing what is commonly used in the community to guide other researchers. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-021-01524-8.
Collapse
Affiliation(s)
- Stefan Hegselmann
- Institute of Medical Informatics, University of Münster, Münster, Germany.
| | - Michael Storck
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Sophia Gessner
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Philipp Neuhaus
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Philipp Bruland
- University of Applied Sciences Ostwestfalen-Lippe, Lemgo, Germany
| | - Alexandra Meidt
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Cornelia Mertens
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Sarah Riepenhausen
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Sonja Baier
- Centre for Clinical Trials, University of Münster, Münster, Germany
| | - Benedikt Stöcker
- Centre for Clinical Trials, University of Münster, Münster, Germany
| | - Jörg Henke
- Institute of Community Medicine, University Medicine of Greifswald, Greifswald, Germany
| | | | - Martin Dugas
- Institute of Medical Informatics, University of Münster, Münster, Germany
| |
Collapse
|
5
|
Blitz R, Dugas M. Conceptual Design, Implementation, and Evaluation of Generic and Standard-Compliant Data Transfer into Electronic Health Records. Appl Clin Inform 2020; 11:374-386. [PMID: 32462639 PMCID: PMC7253309 DOI: 10.1055/s-0040-1710023] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Objectives
The objective of this study is the conceptual design, implementation and evaluation of a system for generic, standard-compliant data transfer into electronic health records (EHRs). This includes patient data from clinical research and medical care that has been semantically annotated and enhanced with metadata. The implementation is based on the single-source approach. Technical and clinical feasibilities, as well as cost-benefit efficiency, were investigated in everyday clinical practice.
Methods
Münster University Hospital is a tertiary care hospital with 1,457 beds and 10,823 staff who treated 548,110 patients in 2018. Single-source metadata architecture transformation (SMA:T) was implemented as an extension to the EHR system. This architecture uses Model Driven Software Development (MDSD) to generate documentation forms according to the Clinical Data Interchange Standards Consortium (CDISC) operational data model (ODM). Clinical data are stored in ODM format in the EHR system database. Documentation forms are based on Google's Material Design Standard. SMA:T was used at a total of five clinics and one administrative department in the period from March 1, 2018 until March 31, 2019 in everyday clinical practice.
Results
The technical and clinical feasibility of SMA:T was demonstrated in the course of the study. Seventeen documentation forms including 373 data items were created with SMA:T. Those were created for 2,484 patients by 283 users in everyday clinical practice. A total of 121 documentation forms were examined retrospectively. The Constructive cost model (COCOMO II) was used to calculate cost and time savings. The form development mean time was reduced by 83.4% from 3,357 to 557 hours. Average costs per form went down from EUR 953 to 158.
Conclusion
Automated generic transfer of standard-compliant data and metadata into EHRs is technically and clinically feasible, cost efficient, and a useful method to establish comprehensive and semantically annotated clinical documentation. Savings of time and personnel resources are possible.
Collapse
Affiliation(s)
- Rogério Blitz
- Business Unit IT, University Hospital Münster, Münster, Germany
| | - Martin Dugas
- Institute of Medical Informatics, University of Münster, Münster, Germany
| |
Collapse
|
6
|
von Martial S, Brix TJ, Klotz L, Neuhaus P, Berger K, Warnke C, Meuth SG, Wiendl H, Dugas M. EMR-integrated minimal core dataset for routine health care and multiple research settings: A case study for neuroinflammatory demyelinating diseases. PLoS One 2019; 14:e0223886. [PMID: 31613917 PMCID: PMC6793844 DOI: 10.1371/journal.pone.0223886] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 10/01/2019] [Indexed: 11/18/2022] Open
Abstract
Although routine health care and clinical trials usually require the documentation of similar information, data collection is performed independently from each other, resulting in redundant documentation efforts. Standardizing routine documentation can enable secondary use for medical research. Neuroinflammatory demyelinating diseases (NIDs) represent a heterogeneous group of diseases requiring further research to improve patient management. The aim of this work is to develop, implement and evaluate a minimal core dataset in routine health care with a focus on secondary use as case study for NIDs. Therefore, a draft minimal core dataset for NIDs was created by analyzing routine, clinical trial, registry, biobank documentation and existing data standards for NIDs. Data elements (DEs) were converted into the standard format Operational Data Model, semantically annotated and analyzed via frequency analysis. The analysis produced 1958 DEs based on 864 distinct medical concepts. After review and finalization by an interdisciplinary team of neurologists, epidemiologists and medical computer scientists, the minimal core dataset (NID CDEs) consists of 46 common DEs capturing disease-specific information for reuse in the discharge letter and other research settings. It covers the areas of diagnosis, laboratory results, disease progress, expanded disability status scale, therapy and magnetic resonance imaging findings. NID CDEs was implemented in two German university hospitals and a usability study in clinical routine was conducted (participants n = 16) showing a good usability (Mean SUS = 75). From May 2017 to February 2018, 755 patients were documented with the NID CDEs, which indicates the feasibility of developing a minimal core dataset for structured documentation based on previously used documentation standards and integrating the dataset into clinical routine. By sharing, translating and reusing the minimal dataset, a transnational harmonized documentation of patients with NIDs might be realized, supporting interoperability in medical research.
Collapse
Affiliation(s)
- Sophia von Martial
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Tobias J. Brix
- Institute of Medical Informatics, University of Münster, Münster, Germany
- * E-mail:
| | - Luisa Klotz
- Department of Neurology, University of Münster, Münster, Germany
| | - Philipp Neuhaus
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Klaus Berger
- Institute of Epidemiology and Social Medicine, University of Münster, Münster, Germany
| | - Clemens Warnke
- Department of Neurology, University of Köln, Köln, Germany
| | - Sven G. Meuth
- Department of Neurology, University of Münster, Münster, Germany
| | - Heinz Wiendl
- Department of Neurology, University of Münster, Münster, Germany
| | - Martin Dugas
- Institute of Medical Informatics, University of Münster, Münster, Germany
| |
Collapse
|
7
|
Kentgen M, Varghese J, Samol A, Waltenberger J, Dugas M. Common Data Elements for Acute Coronary Syndrome: Analysis Based on the Unified Medical Language System. JMIR Med Inform 2019; 7:e14107. [PMID: 31444871 PMCID: PMC6729118 DOI: 10.2196/14107] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 06/21/2019] [Accepted: 07/04/2019] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Standardization in clinical documentation can increase efficiency and can save time and resources. OBJECTIVE The objectives of this work are to compare documentation forms for acute coronary syndrome (ACS), check for standardization, and generate a list of the most common data elements using semantic form annotation with the Unified Medical Language System (UMLS). METHODS Forms from registries, studies, risk scores, quality assurance, official guidelines, and routine documentation from four hospitals in Germany were semantically annotated using UMLS. This allowed for automatic comparison of concept frequencies and the generation of a list of the most common concepts. RESULTS A total of 3710 forms items from 86 sources were semantically annotated using 842 unique UMLS concepts. Half of all medical concept occurrences were covered by 60 unique concepts, which suggests the existence of a core dataset of relevant concepts. Overlap percentages between forms were relatively low, hinting at inconsistent documentation structures and lack of standardization. CONCLUSIONS This analysis shows a lack of standardized and semantically enriched documentation for patients with ACS. Efforts made by official institutions like the European Society for Cardiology have not yet been fully implemented. Utilizing a standardized and annotated core dataset of the most important data concepts could make export and automatic reuse of data easier. The generated list of common data elements is an exemplary implementation suggestion of the concepts to use in a standardized approach.
Collapse
Affiliation(s)
- Markus Kentgen
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Alexander Samol
- Medical Faculty, University Hospital of Münster, Münster, Germany
| | | | - Martin Dugas
- Institute of Medical Informatics, University of Münster, Münster, Germany
| |
Collapse
|
8
|
Holz C, Kessler T, Dugas M, Varghese J. Core Data Elements in Acute Myeloid Leukemia: A Unified Medical Language System-Based Semantic Analysis and Experts' Review. JMIR Med Inform 2019; 7:e13554. [PMID: 31407666 PMCID: PMC6709897 DOI: 10.2196/13554] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 05/08/2019] [Accepted: 05/31/2019] [Indexed: 01/27/2023] Open
Abstract
Background For cancer domains such as acute myeloid leukemia (AML), a large set of data elements is obtained from different institutions with heterogeneous data definitions within one patient course. The lack of clinical data harmonization impedes cross-institutional electronic data exchange and future meta-analyses. Objective This study aimed to identify and harmonize a semantic core of common data elements (CDEs) in clinical routine and research documentation, based on a systematic metadata analysis of existing documentation models. Methods Lists of relevant data items were collected and reviewed by hematologists from two university hospitals regarding routine documentation and several case report forms of clinical trials for AML. In addition, existing registries and international recommendations were included. Data items were coded to medical concepts via the Unified Medical Language System (UMLS) by a physician and reviewed by another physician. On the basis of the coded concepts, the data sources were analyzed for concept overlaps and identification of most frequent concepts. The most frequent concepts were then implemented as data elements in the standardized format of the Operational Data Model by the Clinical Data Interchange Standards Consortium. Results A total of 3265 medical concepts were identified, of which 1414 were unique. Among the 1414 unique medical concepts, the 50 most frequent ones cover 26.98% of all concept occurrences within the collected AML documentation. The top 100 concepts represent 39.48% of all concepts’ occurrences. Implementation of CDEs is available on a European research infrastructure and can be downloaded in different formats for reuse in different electronic data capture systems. Conclusions Information management is a complex process for research-intense disease entities as AML that is associated with a large set of lab-based diagnostics and different treatment options. Our systematic UMLS-based analysis revealed the existence of a core data set and an exemplary reusable implementation for harmonized data capture is available on an established metadata repository.
Collapse
Affiliation(s)
- Christian Holz
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Torsten Kessler
- Department of Medicine A, University Hospital of Münster, Münster, Germany
| | - Martin Dugas
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster, Germany
| |
Collapse
|
9
|
Representing oncology in datasets: Standard or custom biomedical terminology? INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2019.100186] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
10
|
Varghese J, Sandmann S, Dugas M. Web-Based Information Infrastructure Increases the Interrater Reliability of Medical Coders: Quasi-Experimental Study. J Med Internet Res 2018; 20:e274. [PMID: 30322834 PMCID: PMC6231825 DOI: 10.2196/jmir.9644] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 05/03/2018] [Accepted: 06/28/2018] [Indexed: 01/05/2023] Open
Abstract
Background Medical coding is essential for standardized communication and integration of clinical data. The Unified Medical Language System by the National Library of Medicine is the largest clinical terminology system for medical coders and Natural Language Processing tools. However, the abundance of ambiguous codes leads to low rates of uniform coding among different coders. Objective The objective of our study was to measure uniform coding among different medical experts in terms of interrater reliability and analyze the effect on interrater reliability using an expert- and Web-based code suggestion system. Methods We conducted a quasi-experimental study in which 6 medical experts coded 602 medical items from structured quality assurance forms or free-text eligibility criteria of 20 different clinical trials. The medical item content was selected on the basis of mortality-leading diseases according to World Health Organization data. The intervention comprised using a semiautomatic code suggestion tool that is linked to a European information infrastructure providing a large medical text corpus of >300,000 medical form items with expert-assigned semantic codes. Krippendorff alpha (Kalpha) with bootstrap analysis was used for the interrater reliability analysis, and coding times were measured before and after the intervention. Results The intervention improved interrater reliability in structured quality assurance form items (from Kalpha=0.50, 95% CI 0.43-0.57 to Kalpha=0.62 95% CI 0.55-0.69) and free-text eligibility criteria (from Kalpha=0.19, 95% CI 0.14-0.24 to Kalpha=0.43, 95% CI 0.37-0.50) while preserving or slightly reducing the mean coding time per item for all 6 coders. Regardless of the intervention, precoordination and structured items were associated with significantly high interrater reliability, but the proportion of items that were precoordinated significantly increased after intervention (eligibility criteria: OR 4.92, 95% CI 2.78-8.72; quality assurance: OR 1.96, 95% CI 1.19-3.25). Conclusions The Web-based code suggestion mechanism improved interrater reliability toward moderate or even substantial intercoder agreement. Precoordination and the use of structured versus free-text data elements are key drivers of higher interrater reliability.
Collapse
Affiliation(s)
- Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Sarah Sandmann
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Martin Dugas
- Institute of Medical Informatics, European Research Center for Information Systems, Münster, Germany
| |
Collapse
|
11
|
Varghese J, Fujarski M, Hegselmann S, Neuhaus P, Dugas M. CDEGenerator: an online platform to learn from existing data models to build model registries. Clin Epidemiol 2018; 10:961-970. [PMID: 30127646 PMCID: PMC6089100 DOI: 10.2147/clep.s170075] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
OBJECTIVE Best-practice data models harmonize semantics and data structure of medical variables in clinical or epidemiological studies. While there exist several published data sets, it remains challenging to find and reuse published eligibility criteria or other data items that match specific needs of a newly planned study or registry. A novel Internet-based method for rapid comparison of published data models was implemented to enable reuse, customization, and harmonization of item catalogs for the early planning and development phase of research databases. METHODS Based on prior work, a European information infrastructure with a large collection of medical data models was established. A newly developed analysis module called CDEGenerator provides systematic comparison of selected data models and user-tailored creation of minimum data sets or harmonized item catalogs. Usability was assessed by eight external medical documentation experts in a workshop by the umbrella organization for networked medical research in Germany with the System Usability Scale. RESULTS The analysis and item-tailoring module provides multilingual comparisons of semantically complex eligibility criteria of clinical trials. The System Usability Scale yielded "good usability" (mean 75.0, range 65.0-92.5). User-tailored models can be exported to several data formats, such as XLS, REDCap or Operational Data Model by the Clinical Data Interchange Standards Consortium, which is supported by the US Food and Drug Administration and European Medicines Agency for metadata exchange of clinical studies. CONCLUSION The online tool provides user-friendly methods to reuse, compare, and thus learn from data items of standardized or published models to design a blueprint for a harmonized research database.
Collapse
Affiliation(s)
| | - Michael Fujarski
- Faculty of Mathematics and Computer Sciences, University of Münster
| | | | | | - Martin Dugas
- Institute of Medical Informatics, University of Münster,
- Institute of Medical Informatics, European Research Center for Information Systems (ERCIS), Münster, Germany
| |
Collapse
|
12
|
Abstract
OBJECTIVES To summarize significant developments in Clinical Research Informatics (CRI) over the past two years and discuss future directions. METHODS Survey of advances, open problems and opportunities in this field based on exploration of current literature. RESULTS Recent advances are structured according to three use cases of clinical research: Protocol feasibility, patient identification/ recruitment and clinical trial execution. DISCUSSION CRI is an evolving, dynamic field of research. Global collaboration, open metadata, content standards with semantics and computable eligibility criteria are key success factors for future developments in CRI.
Collapse
Affiliation(s)
- M Dugas
- Prof. Dr. Martin Dugas, Institute of Medical Informatics, University of Münster, Albert-Schweitzer-Campus 1
- A11, D-48149 Münster, Germany, Tel: +49 251 83 55262, E-mail:
| |
Collapse
|
13
|
Bruland P, Dugas M. S2O - A software tool for integrating research data from general purpose statistic software into electronic data capture systems. BMC Med Inform Decis Mak 2017; 17:3. [PMID: 28061771 PMCID: PMC5219713 DOI: 10.1186/s12911-016-0402-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Accepted: 11/22/2016] [Indexed: 11/28/2022] Open
Abstract
Background Data capture for clinical registries or pilot studies is often performed in spreadsheet-based applications like Microsoft Excel or IBM SPSS. Usually, data is transferred into statistic software, such as SAS, R or IBM SPSS Statistics, for analyses afterwards. Spreadsheet-based solutions suffer from several drawbacks: It is generally not possible to ensure a sufficient right and role management; it is not traced who has changed data when and why. Therefore, such systems are not able to comply with regulatory requirements for electronic data capture in clinical trials. In contrast, Electronic Data Capture (EDC) software enables a reliable, secure and auditable collection of data. In this regard, most EDC vendors support the CDISC ODM standard to define, communicate and archive clinical trial meta- and patient data. Advantages of EDC systems are support for multi-user and multicenter clinical trials as well as auditable data. Migration from spreadsheet based data collection to EDC systems is labor-intensive and time-consuming at present. Hence, the objectives of this research work are to develop a mapping model and implement a converter between the IBM SPSS and CDISC ODM standard and to evaluate this approach regarding syntactic and semantic correctness. Results A mapping model between IBM SPSS and CDISC ODM data structures was developed. SPSS variables and patient values can be mapped and converted into ODM. Statistical and display attributes from SPSS are not corresponding to any ODM elements; study related ODM elements are not available in SPSS. The S2O converting tool was implemented as command-line-tool using the SPSS internal Java plugin. Syntactic and semantic correctness was validated with different ODM tools and reverse transformation from ODM into SPSS format. Clinical data values were also successfully transformed into the ODM structure. Conclusion Transformation between the spreadsheet format IBM SPSS and the ODM standard for definition and exchange of trial data is feasible. S2O facilitates migration from Excel- or SPSS-based data collections towards reliable EDC systems. Thereby, advantages of EDC systems like reliable software architecture for secure and traceable data collection and particularly compliance with regulatory requirements are achievable. Electronic supplementary material The online version of this article (doi:10.1186/s12911-016-0402-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Philipp Bruland
- Institute of Medical Informatics, University of Münster, 48149, Münster, Germany.
| | - Martin Dugas
- Institute of Medical Informatics, University of Münster, 48149, Münster, Germany
| |
Collapse
|
14
|
Design of case report forms based on a public metadata registry: re-use of data elements to improve compatibility of data. Trials 2016; 17:566. [PMID: 27899162 PMCID: PMC5129226 DOI: 10.1186/s13063-016-1691-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 11/10/2016] [Indexed: 11/17/2022] Open
Abstract
Background Clinical trials use many case report forms (CRFs) per patient. Because of the astronomical number of potential CRFs, data element re-use at the design stage is attractive to foster compatibility of data from different trials. The objective of this work is to assess the technical feasibility of a CRF editor with connection to a public metadata registry (MDR) to support data element re-use. Results Based on the Medical Data Models portal, an ISO/IEC 11179-compliant MDR was implemented and connected to a web-based CRF editor. Three use cases were implemented: re-use at the form, item group and data element levels. Conclusions CRF design with data element re-use from a public MDR is feasible. A prototypic system is available. The main limitation of the system is the amount of available MDR content.
Collapse
|