Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Varghese J, Dugas M. Frequency analysis of medical concepts in clinical trials and their coverage in MeSH and SNOMED-CT. Methods Inf Med 2014;54:83-92. [PMID: 25346408 DOI: 10.3414/me14-01-0046] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 10/05/2014] [Indexed: 11/09/2022]

For:	Varghese J, Dugas M. Frequency analysis of medical concepts in clinical trials and their coverage in MeSH and SNOMED-CT. Methods Inf Med 2014;54:83-92. [PMID: 25346408 DOI: 10.3414/me14-01-0046] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 10/05/2014] [Indexed: 11/09/2022]

Number

Cited by Other Article(s)

Riepenhausen S, Blumenstock M, Niklas C, Hegselmann S, Neuhaus P, Meidt A, Püttmann C, Storck M, Ganzinger M, Varghese J, Dugas M. Europe's Largest Research Infrastructure for Curated Medical Data Models with Semantic Annotations. Methods Inf Med 2024. [PMID: 38740374 DOI: 10.1055/s-0044-1786839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]

Adams MCB, Hurley RW, Siddons A, Topaloglu U, Wandner LD. NIH HEAL Clinical Data Elements (CDE) implementation: NIH HEAL Initiative IMPOWR network IDEA-CC. PAIN MEDICINE (MALDEN, MASS.) 2023;24:743-749. [PMID: 36799548 PMCID: PMC10321760 DOI: 10.1093/pm/pnad018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 02/14/2023] [Accepted: 02/15/2023] [Indexed: 02/18/2023]

Rafee A, Riepenhausen S, Neuhaus P, Meidt A, Dugas M, Varghese J. ELaPro, a LOINC-mapped core dataset for top laboratory procedures of eligibility screening for clinical trials. BMC Med Res Methodol 2022;22:141. [PMID: 35568796 PMCID: PMC9107639 DOI: 10.1186/s12874-022-01611-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 04/20/2022] [Indexed: 12/21/2022] Open

Abstract

Background

Screening for eligible patients continues to pose a great challenge for many clinical trials. This has led to a rapidly growing interest in standardizing computable representations of eligibility criteria (EC) in order to develop tools that leverage data from electronic health record (EHR) systems. Although laboratory procedures (LP) represent a common entity of EC that is readily available and retrievable from EHR systems, there is a lack of interoperable data models for this entity of EC. A public, specialized data model that utilizes international, widely-adopted terminology for LP, e.g. Logical Observation Identifiers Names and Codes (LOINC®), is much needed to support automated screening tools.

Objective

The aim of this study is to establish a core dataset for LP most frequently requested to recruit patients for clinical trials using LOINC terminology. Employing such a core dataset could enhance the interface between study feasibility platforms and EHR systems and significantly improve automatic patient recruitment.

Methods

We used a semi-automated approach to analyze 10,516 screening forms from the Medical Data Models (MDM) portal’s data repository that are pre-annotated with Unified Medical Language System (UMLS). An automated semantic analysis based on concept frequency is followed by an extensive manual expert review performed by physicians to analyze complex recruitment-relevant concepts not amenable to automatic approach.

Results

Based on analysis of 138,225 EC from 10,516 screening forms, 55 laboratory procedures represented 77.87% of all UMLS laboratory concept occurrences identified in the selected EC forms. We identified 26,413 unique UMLS concepts from 118 UMLS semantic types and covered the vast majority of Medical Subject Headings (MeSH) disease domains.

Conclusions

Only a small set of common LP covers the majority of laboratory concepts in screening EC forms which supports the feasibility of establishing a focused core dataset for LP. We present ELaPro, a novel, LOINC-mapped, core dataset for the most frequent 55 LP requested in screening for clinical trials. ELaPro is available in multiple machine-readable data formats like CSV, ODM and HL7 FHIR. The extensive manual curation of this large number of free-text EC as well as the combining of UMLS and LOINC terminologies distinguishes this specialized dataset from previous relevant datasets in the literature.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-022-01611-y.

Collapse

Hegselmann S, Storck M, Gessner S, Neuhaus P, Varghese J, Bruland P, Meidt A, Mertens C, Riepenhausen S, Baier S, Stöcker B, Henke J, Schmidt CO, Dugas M. Pragmatic MDR: a metadata repository with bottom-up standardization of medical metadata through reuse. BMC Med Inform Decis Mak 2021;21:160. [PMID: 34001121 PMCID: PMC8130274 DOI: 10.1186/s12911-021-01524-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 05/09/2021] [Indexed: 11/27/2022] Open

Abstract

Background

The variety of medical documentation often leads to incompatible data elements that impede data integration between institutions. A common approach to standardize and distribute metadata definitions are ISO/IEC 11179 norm-compliant metadata repositories with top-down standardization. To the best of our knowledge, however, it is not yet common practice to reuse the content of publicly accessible metadata repositories for creation of case report forms or routine documentation. We suggest an alternative concept called pragmatic metadata repository, which enables a community-driven bottom-up approach for agreeing on data collection models. A pragmatic metadata repository collects real-world documentation and considers frequent metadata definitions as high quality with potential for reuse.

Methods

We implemented a pragmatic metadata repository proof of concept application and filled it with medical forms from the Portal of Medical Data Models. We applied this prototype in two use cases to demonstrate its capabilities for reusing metadata: first, integration into a study editor for the suggestion of data elements and, second, metadata synchronization between two institutions. Moreover, we evaluated the emergence of bottom-up standards in the prototype and two medical data managers assessed their quality for 24 medical concepts.

Results

The resulting prototype contained 466,569 unique metadata definitions. Integration into the study editor led to a reuse of 1836 items and item groups. During the metadata synchronization, semantic codes of 4608 data elements were transferred. Our evaluation revealed that for less complex medical concepts weak bottom-up standards could be established. However, more diverse disease-related concepts showed no convergence of data elements due to an enormous heterogeneity of metadata. The survey showed fair agreement (K_alpha = 0.50, 95% CI 0.43–0.56) for good item quality of bottom-up standards.

Conclusions

We demonstrated the feasibility of the pragmatic metadata repository concept for medical documentation. Applications of the prototype in two use cases suggest that it facilitates the reuse of data elements. Our evaluation showed that bottom-up standardization based on a large collection of real-world metadata can yield useful results. The proposed concept shall not replace existing top-down approaches, rather it complements them by showing what is commonly used in the community to guide other researchers.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12911-021-01524-8.

Collapse

Blitz R, Dugas M. Conceptual Design, Implementation, and Evaluation of Generic and Standard-Compliant Data Transfer into Electronic Health Records. Appl Clin Inform 2020;11:374-386. [PMID: 32462639 PMCID: PMC7253309 DOI: 10.1055/s-0040-1710023] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open

Abstract

Objectives The objective of this study is the conceptual design, implementation and evaluation of a system for generic, standard-compliant data transfer into electronic health records (EHRs). This includes patient data from clinical research and medical care that has been semantically annotated and enhanced with metadata. The implementation is based on the single-source approach. Technical and clinical feasibilities, as well as cost-benefit efficiency, were investigated in everyday clinical practice.

Methods Münster University Hospital is a tertiary care hospital with 1,457 beds and 10,823 staff who treated 548,110 patients in 2018. Single-source metadata architecture transformation (SMA:T) was implemented as an extension to the EHR system. This architecture uses Model Driven Software Development (MDSD) to generate documentation forms according to the Clinical Data Interchange Standards Consortium (CDISC) operational data model (ODM). Clinical data are stored in ODM format in the EHR system database. Documentation forms are based on Google's Material Design Standard. SMA:T was used at a total of five clinics and one administrative department in the period from March 1, 2018 until March 31, 2019 in everyday clinical practice.

Results The technical and clinical feasibility of SMA:T was demonstrated in the course of the study. Seventeen documentation forms including 373 data items were created with SMA:T. Those were created for 2,484 patients by 283 users in everyday clinical practice. A total of 121 documentation forms were examined retrospectively. The Constructive cost model (COCOMO II) was used to calculate cost and time savings. The form development mean time was reduced by 83.4% from 3,357 to 557 hours. Average costs per form went down from EUR 953 to 158.

Conclusion Automated generic transfer of standard-compliant data and metadata into EHRs is technically and clinically feasible, cost efficient, and a useful method to establish comprehensive and semantically annotated clinical documentation. Savings of time and personnel resources are possible.

Collapse

von Martial S, Brix TJ, Klotz L, Neuhaus P, Berger K, Warnke C, Meuth SG, Wiendl H, Dugas M. EMR-integrated minimal core dataset for routine health care and multiple research settings: A case study for neuroinflammatory demyelinating diseases. PLoS One 2019;14:e0223886. [PMID: 31613917 PMCID: PMC6793844 DOI: 10.1371/journal.pone.0223886] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 10/01/2019] [Indexed: 11/18/2022] Open

Abstract

Although routine health care and clinical trials usually require the documentation of similar information, data collection is performed independently from each other, resulting in redundant documentation efforts. Standardizing routine documentation can enable secondary use for medical research. Neuroinflammatory demyelinating diseases (NIDs) represent a heterogeneous group of diseases requiring further research to improve patient management. The aim of this work is to develop, implement and evaluate a minimal core dataset in routine health care with a focus on secondary use as case study for NIDs. Therefore, a draft minimal core dataset for NIDs was created by analyzing routine, clinical trial, registry, biobank documentation and existing data standards for NIDs. Data elements (DEs) were converted into the standard format Operational Data Model, semantically annotated and analyzed via frequency analysis. The analysis produced 1958 DEs based on 864 distinct medical concepts. After review and finalization by an interdisciplinary team of neurologists, epidemiologists and medical computer scientists, the minimal core dataset (NID CDEs) consists of 46 common DEs capturing disease-specific information for reuse in the discharge letter and other research settings. It covers the areas of diagnosis, laboratory results, disease progress, expanded disability status scale, therapy and magnetic resonance imaging findings. NID CDEs was implemented in two German university hospitals and a usability study in clinical routine was conducted (participants n = 16) showing a good usability (Mean SUS = 75). From May 2017 to February 2018, 755 patients were documented with the NID CDEs, which indicates the feasibility of developing a minimal core dataset for structured documentation based on previously used documentation standards and integrating the dataset into clinical routine. By sharing, translating and reusing the minimal dataset, a transnational harmonized documentation of patients with NIDs might be realized, supporting interoperability in medical research.

Collapse

Kentgen M, Varghese J, Samol A, Waltenberger J, Dugas M. Common Data Elements for Acute Coronary Syndrome: Analysis Based on the Unified Medical Language System. JMIR Med Inform 2019;7:e14107. [PMID: 31444871 PMCID: PMC6729118 DOI: 10.2196/14107] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 06/21/2019] [Accepted: 07/04/2019] [Indexed: 01/29/2023] Open

Holz C, Kessler T, Dugas M, Varghese J. Core Data Elements in Acute Myeloid Leukemia: A Unified Medical Language System-Based Semantic Analysis and Experts' Review. JMIR Med Inform 2019;7:e13554. [PMID: 31407666 PMCID: PMC6709897 DOI: 10.2196/13554] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 05/08/2019] [Accepted: 05/31/2019] [Indexed: 01/27/2023] Open

Abstract

Background

For cancer domains such as acute myeloid leukemia (AML), a large set of data elements is obtained from different institutions with heterogeneous data definitions within one patient course. The lack of clinical data harmonization impedes cross-institutional electronic data exchange and future meta-analyses.

Objective

This study aimed to identify and harmonize a semantic core of common data elements (CDEs) in clinical routine and research documentation, based on a systematic metadata analysis of existing documentation models.

Methods

Lists of relevant data items were collected and reviewed by hematologists from two university hospitals regarding routine documentation and several case report forms of clinical trials for AML. In addition, existing registries and international recommendations were included. Data items were coded to medical concepts via the Unified Medical Language System (UMLS) by a physician and reviewed by another physician. On the basis of the coded concepts, the data sources were analyzed for concept overlaps and identification of most frequent concepts. The most frequent concepts were then implemented as data elements in the standardized format of the Operational Data Model by the Clinical Data Interchange Standards Consortium.

Results

A total of 3265 medical concepts were identified, of which 1414 were unique. Among the 1414 unique medical concepts, the 50 most frequent ones cover 26.98% of all concept occurrences within the collected AML documentation. The top 100 concepts represent 39.48% of all concepts’ occurrences. Implementation of CDEs is available on a European research infrastructure and can be downloaded in different formats for reuse in different electronic data capture systems.

Conclusions

Information management is a complex process for research-intense disease entities as AML that is associated with a large set of lab-based diagnostics and different treatment options. Our systematic UMLS-based analysis revealed the existence of a core data set and an exemplary reusable implementation for harmonized data capture is available on an established metadata repository.

Collapse

Representing oncology in datasets: Standard or custom biomedical terminology? INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2019.100186] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Varghese J, Sandmann S, Dugas M. Web-Based Information Infrastructure Increases the Interrater Reliability of Medical Coders: Quasi-Experimental Study. J Med Internet Res 2018;20:e274. [PMID: 30322834 PMCID: PMC6231825 DOI: 10.2196/jmir.9644] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 05/03/2018] [Accepted: 06/28/2018] [Indexed: 01/05/2023] Open

Abstract

Background

Medical coding is essential for standardized communication and integration of clinical data. The Unified Medical Language System by the National Library of Medicine is the largest clinical terminology system for medical coders and Natural Language Processing tools. However, the abundance of ambiguous codes leads to low rates of uniform coding among different coders.

Objective

The objective of our study was to measure uniform coding among different medical experts in terms of interrater reliability and analyze the effect on interrater reliability using an expert- and Web-based code suggestion system.

Methods

We conducted a quasi-experimental study in which 6 medical experts coded 602 medical items from structured quality assurance forms or free-text eligibility criteria of 20 different clinical trials. The medical item content was selected on the basis of mortality-leading diseases according to World Health Organization data. The intervention comprised using a semiautomatic code suggestion tool that is linked to a European information infrastructure providing a large medical text corpus of >300,000 medical form items with expert-assigned semantic codes. Krippendorff alpha (K_alpha) with bootstrap analysis was used for the interrater reliability analysis, and coding times were measured before and after the intervention.

Results

The intervention improved interrater reliability in structured quality assurance form items (from K_alpha=0.50, 95% CI 0.43-0.57 to K_alpha=0.62 95% CI 0.55-0.69) and free-text eligibility criteria (from K_alpha=0.19, 95% CI 0.14-0.24 to K_alpha=0.43, 95% CI 0.37-0.50) while preserving or slightly reducing the mean coding time per item for all 6 coders. Regardless of the intervention, precoordination and structured items were associated with significantly high interrater reliability, but the proportion of items that were precoordinated significantly increased after intervention (eligibility criteria: OR 4.92, 95% CI 2.78-8.72; quality assurance: OR 1.96, 95% CI 1.19-3.25).

Conclusions

The Web-based code suggestion mechanism improved interrater reliability toward moderate or even substantial intercoder agreement. Precoordination and the use of structured versus free-text data elements are key drivers of higher interrater reliability.

Collapse

Varghese J, Fujarski M, Hegselmann S, Neuhaus P, Dugas M. CDEGenerator: an online platform to learn from existing data models to build model registries. Clin Epidemiol 2018;10:961-970. [PMID: 30127646 PMCID: PMC6089100 DOI: 10.2147/clep.s170075] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Dugas M. Clinical Research Informatics: Recent Advances and Future Directions. Yearb Med Inform 2017;10:174-7. [PMID: 26293865 DOI: 10.15265/iy-2015-010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open

Bruland P, Dugas M. S2O - A software tool for integrating research data from general purpose statistic software into electronic data capture systems. BMC Med Inform Decis Mak 2017;17:3. [PMID: 28061771 PMCID: PMC5219713 DOI: 10.1186/s12911-016-0402-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Accepted: 11/22/2016] [Indexed: 11/28/2022] Open

Abstract

Background

Data capture for clinical registries or pilot studies is often performed in spreadsheet-based applications like Microsoft Excel or IBM SPSS. Usually, data is transferred into statistic software, such as SAS, R or IBM SPSS Statistics, for analyses afterwards. Spreadsheet-based solutions suffer from several drawbacks: It is generally not possible to ensure a sufficient right and role management; it is not traced who has changed data when and why. Therefore, such systems are not able to comply with regulatory requirements for electronic data capture in clinical trials. In contrast, Electronic Data Capture (EDC) software enables a reliable, secure and auditable collection of data. In this regard, most EDC vendors support the CDISC ODM standard to define, communicate and archive clinical trial meta- and patient data. Advantages of EDC systems are support for multi-user and multicenter clinical trials as well as auditable data. Migration from spreadsheet based data collection to EDC systems is labor-intensive and time-consuming at present. Hence, the objectives of this research work are to develop a mapping model and implement a converter between the IBM SPSS and CDISC ODM standard and to evaluate this approach regarding syntactic and semantic correctness.

Results

A mapping model between IBM SPSS and CDISC ODM data structures was developed. SPSS variables and patient values can be mapped and converted into ODM. Statistical and display attributes from SPSS are not corresponding to any ODM elements; study related ODM elements are not available in SPSS. The S2O converting tool was implemented as command-line-tool using the SPSS internal Java plugin. Syntactic and semantic correctness was validated with different ODM tools and reverse transformation from ODM into SPSS format. Clinical data values were also successfully transformed into the ODM structure.

Conclusion

Transformation between the spreadsheet format IBM SPSS and the ODM standard for definition and exchange of trial data is feasible. S2O facilitates migration from Excel- or SPSS-based data collections towards reliable EDC systems. Thereby, advantages of EDC systems like reliable software architecture for secure and traceable data collection and particularly compliance with regulatory requirements are achievable.

Electronic supplementary material

The online version of this article (doi:10.1186/s12911-016-0402-4) contains supplementary material, which is available to authorized users.

Collapse

Design of case report forms based on a public metadata registry: re-use of data elements to improve compatibility of data. Trials 2016;17:566. [PMID: 27899162 PMCID: PMC5129226 DOI: 10.1186/s13063-016-1691-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 11/10/2016] [Indexed: 11/17/2022] Open