1
|
Kirsten T, Meineke FA, Loeffler-Wirth H, Beger C, Uciteli A, Stäubert S, Löbe M, Hänsel R, Rauscher FG, Schuster J, Peschel T, Herre H, Wagner J, Zachariae S, Engel C, Scholz M, Rahm E, Binder H, Loeffler M. The Leipzig Health Atlas-An Open Platform to Present, Archive, and Share Biomedical Data, Analyses, and Models Online. Methods Inf Med 2022; 61:e103-e115. [PMID: 35915977 PMCID: PMC9788914 DOI: 10.1055/a-1914-1985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
BACKGROUND Clinical trials, epidemiological studies, clinical registries, and other prospective research projects, together with patient care services, are main sources of data in the medical research domain. They serve often as a basis for secondary research in evidence-based medicine, prediction models for disease, and its progression. This data are often neither sufficiently described nor accessible. Related models are often not accessible as a functional program tool for interested users from the health care and biomedical domains. OBJECTIVE The interdisciplinary project Leipzig Health Atlas (LHA) was developed to close this gap. LHA is an online platform that serves as a sustainable archive providing medical data, metadata, models, and novel phenotypes from clinical trials, epidemiological studies, and other medical research projects. METHODS Data, models, and phenotypes are described by semantically rich metadata. The platform prefers to share data and models presented in original publications but is also open for nonpublished data. LHA provides and associates unique permanent identifiers for each dataset and model. Hence, the platform can be used to share prepared, quality-assured datasets and models while they are referenced in publications. All managed data, models, and phenotypes in LHA follow the FAIR principles, with public availability or restricted access for specific user groups. RESULTS The LHA platform is in productive mode (https://www.health-atlas.de/). It is already used by a variety of clinical trial and research groups and is becoming increasingly popular also in the biomedical community. LHA is an integral part of the forthcoming initiative building a national research data infrastructure for health in Germany.
Collapse
Affiliation(s)
- Toralf Kirsten
- Department of Medical Data Science, Leipzig University Medical Center, Leipzig, Germany,Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany,Interdisciplinary Centre for Bioinformatics, Leipzig University, Leipzig, Germany,Address for correspondence Toralf Kirsten Department of Medical Data Science, Leipzig UniversityHärtelstraße 16-18, 04107 LeipzigGermany
| | - Frank A. Meineke
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany
| | - Henry Loeffler-Wirth
- LIFE Research Centre for Civilization Diseases, Leipzig University, Leipzig, Germany
| | - Christoph Beger
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany
| | - Alexandr Uciteli
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany
| | - Sebastian Stäubert
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany
| | - Matthias Löbe
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany,Interdisciplinary Centre for Bioinformatics, Leipzig University, Leipzig, Germany
| | - René Hänsel
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany
| | - Franziska G. Rauscher
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany,Interdisciplinary Centre for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Judith Schuster
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany
| | - Thomas Peschel
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany
| | - Heinrich Herre
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany
| | - Jonas Wagner
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany,Interdisciplinary Centre for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Silke Zachariae
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany
| | - Christoph Engel
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany,Interdisciplinary Centre for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Markus Scholz
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany
| | - Erhard Rahm
- Department of Computer Sciences, Leipzig University, Leipzig, Germany
| | - Hans Binder
- LIFE Research Centre for Civilization Diseases, Leipzig University, Leipzig, Germany
| | - Markus Loeffler
- Institute for Medical Informatics, Statistics, and Epidemiology, Leipzig University, Leipzig, Germany,Interdisciplinary Centre for Bioinformatics, Leipzig University, Leipzig, Germany,LIFE Research Centre for Civilization Diseases, Leipzig University, Leipzig, Germany
| | | |
Collapse
|
2
|
Christen V, Häntschel T, Christen P, Rahm E. Privacy-preserving record linkage using autoencoders. Int J Data Sci Anal 2022. [DOI: 10.1007/s41060-022-00377-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
AbstractPrivacy-preserving record linkage (PPRL) is the process aimed at identifying records that represent the same real-world entity across different data sources while guaranteeing the privacy of sensitive information about these entities. A popular PPRL method is to encode sensitive plain-text data into Bloom filters (BFs), bit vectors that enable the efficient calculation of similarities between records that is required for PPRL. However, BF encoding cannot completely prevent the re-identification of plain-text values because sets of BFs can contain bit patterns that can be mapped to plain-text values using cryptanalysis attacks. Various hardening techniques have therefore been proposed that modify the bit patterns in BFs with the aim to prevent such attacks. However, it has been shown that even hardened BFs can still be vulnerable to attacks. To avoid any such attacks, we propose a novel encoding technique for PPRL based on autoencoders that transforms BFs into vectors of real numbers. To achieve a high comparison quality of the generated numerical vectors, we propose a method that guarantees the comparability of encodings generated by the different data owners. Experiments on real-world data sets show that our technique achieves high linkage quality and prevents known cryptanalysis attacks on BF encoding.
Collapse
|
3
|
Ayala D, Hernández I, Ruiz D, Rahm E. Corrigendum to Multi-source dataset of e-commerce products with attributes for property matching [Data in Brief 41 (2022) 107884]. Data Brief 2022; 42:108134. [PMID: 35450018 PMCID: PMC9018135 DOI: 10.1016/j.dib.2022.108134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
4
|
Ayala D, Hernández I, Ruiz D, Rahm E. Multi-source dataset of e-commerce products with attributes for property matching. Data Brief 2022; 41:107884. [PMID: 35198667 PMCID: PMC8847803 DOI: 10.1016/j.dib.2022.107884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/12/2022] [Accepted: 01/26/2022] [Indexed: 10/30/2022] Open
Abstract
Schema/ontology matching consists in finding matches between types, properties and entities in heterogeneous sources of data in order to integrate them, which has become increasingly relevant with the development of web technologies and open data initiatives. One of the involved tasks is the matching of data properties, which attempts to try to find correspondences between the attributes of the entities. This is challenging due to the at times different names of equivalent properties. Furthermore, some properties may not be equivalent, but still match in 1..n relationships. These difficulties create the need for varied evaluation datasets for two reasons. First, they are needed to evaluate existing techniques in a variety of scenarios. Second, they enable the training of supervised techniques that may even become context-independent if trained with data from diverse enough contexts. To support the evaluation and training of data property matching techniques, we present a collection dataset consisting of product records from four different contexts. These datasets are the result of transforming two different existing datasets. In one of the datasets, some properties were filtered for being too noisy. The resulting processed dataset consists of json files with a listing of the product records and their properties, and a separate grouping of the properties that determines which ones match. It contains information about 2860 entities, with 4386 properties and 13350 pairwise matches.
Collapse
|
5
|
|
6
|
Rohde F, Franke M, Sehili Z, Lablans M, Rahm E. Optimization of the Mainzelliste software for fast privacy-preserving record linkage. J Transl Med 2021; 19:33. [PMID: 33451317 PMCID: PMC7809773 DOI: 10.1186/s12967-020-02678-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 12/14/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Data analysis for biomedical research often requires a record linkage step to identify records from multiple data sources referring to the same person. Due to the lack of unique personal identifiers across these sources, record linkage relies on the similarity of personal data such as first and last names or birth dates. However, the exchange of such identifying data with a third party, as is the case in record linkage, is generally subject to strict privacy requirements. This problem is addressed by privacy-preserving record linkage (PPRL) and pseudonymization services. Mainzelliste is an open-source record linkage and pseudonymization service used to carry out PPRL processes in real-world use cases. METHODS We evaluate the linkage quality and performance of the linkage process using several real and near-real datasets with different properties w.r.t. size and error-rate of matching records. We conduct a comparison between (plaintext) record linkage and PPRL based on encoded records (Bloom filters). Furthermore, since the Mainzelliste software offers no blocking mechanism, we extend it by phonetic blocking as well as novel blocking schemes based on locality-sensitive hashing (LSH) to improve runtime for both standard and privacy-preserving record linkage. RESULTS The Mainzelliste achieves high linkage quality for PPRL using field-level Bloom filters due to the use of an error-tolerant matching algorithm that can handle variances in names, in particular missing or transposed name compounds. However, due to the absence of blocking, the runtimes are unacceptable for real use cases with larger datasets. The newly implemented blocking approaches improve runtimes by orders of magnitude while retaining high linkage quality. CONCLUSION We conduct the first comprehensive evaluation of the record linkage facilities of the Mainzelliste software and extend it with blocking methods to improve its runtime. We observed a very high linkage quality for both plaintext as well as encoded data even in the presence of errors. The provided blocking methods provide order of magnitude improvements regarding runtime performance thus facilitating the use in research projects with large datasets and many participants.
Collapse
Affiliation(s)
- Florens Rohde
- Database Group, University of Leipzig, Leipzig, Germany.
| | - Martin Franke
- Database Group, University of Leipzig, Leipzig, Germany
| | - Ziad Sehili
- Database Group, University of Leipzig, Leipzig, Germany
| | - Martin Lablans
- Federated Information Systems, German Cancer Research Center, Heidelberg, Germany.,Complex Data Processing in Medical Informatics, University Medical Center Mannheim, Mannheim, Germany
| | - Erhard Rahm
- Database Group, University of Leipzig, Leipzig, Germany
| |
Collapse
|
7
|
|
8
|
|
9
|
Cardoso S, Reynaud-Delaître C, Da Silveira M, Lin YC, Groß A, Rahm E, Pruski C. Evolving semantic annotations through multiple versions of controlled medical terminologies. Health Technol 2018. [DOI: 10.1007/s12553-018-0261-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
10
|
Winter A, Stäubert S, Ammon D, Aiche S, Beyan O, Bischoff V, Daumke P, Decker S, Funkat G, Gewehr JE, de Greiff A, Haferkamp S, Hahn U, Henkel A, Kirsten T, Klöss T, Lippert J, Löbe M, Lowitsch V, Maassen O, Maschmann J, Meister S, Mikolajczyk R, Nüchter M, Pletz MW, Rahm E, Riedel M, Saleh K, Schuppert A, Smers S, Stollenwerk A, Uhlig S, Wendt T, Zenker S, Fleig W, Marx G, Scherag A, Löffler M. Smart Medical Information Technology for Healthcare (SMITH). Methods Inf Med 2018; 57:e92-e105. [PMID: 30016815 PMCID: PMC6193398 DOI: 10.3414/me18-02-0004] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
INTRODUCTION This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. "Smart Medical Information Technology for Healthcare (SMITH)" is one of four consortia funded by the German Medical Informatics Initiative (MI-I) to create an alliance of universities, university hospitals, research institutions and IT companies. SMITH's goals are to establish Data Integration Centers (DICs) at each SMITH partner hospital and to implement use cases which demonstrate the usefulness of the approach. OBJECTIVES To give insight into architectural design issues underlying SMITH data integration and to introduce the use cases to be implemented. GOVERNANCE AND POLICIES SMITH implements a federated approach as well for its governance structure as for its information system architecture. SMITH has designed a generic concept for its data integration centers. They share identical services and functionalities to take best advantage of the interoperability architectures and of the data use and access process planned. The DICs provide access to the local hospitals' Electronic Medical Records (EMR). This is based on data trustee and privacy management services. DIC staff will curate and amend EMR data in the Health Data Storage. METHODOLOGY AND ARCHITECTURAL FRAMEWORK To share medical and research data, SMITH's information system is based on communication and storage standards. We use the Reference Model of the Open Archival Information System and will consistently implement profiles of Integrating the Health Care Enterprise (IHE) and Health Level Seven (HL7) standards. Standard terminologies will be applied. The SMITH Market Place will be used for devising agreements on data access and distribution. 3LGM2 for enterprise architecture modeling supports a consistent development process.The DIC reference architecture determines the services, applications and the standardsbased communication links needed for efficiently supporting the ingesting, data nourishing, trustee, privacy management and data transfer tasks of the SMITH DICs. The reference architecture is adopted at the local sites. Data sharing services and the market place enable interoperability. USE CASES The methodological use case "Phenotype Pipeline" (PheP) constructs algorithms for annotations and analyses of patient-related phenotypes according to classification rules or statistical models based on structured data. Unstructured textual data will be subject to natural language processing to permit integration into the phenotyping algorithms. The clinical use case "Algorithmic Surveillance of ICU Patients" (ASIC) focusses on patients in Intensive Care Units (ICU) with the acute respiratory distress syndrome (ARDS). A model-based decision-support system will give advice for mechanical ventilation. The clinical use case HELP develops a "hospital-wide electronic medical record-based computerized decision support system to improve outcomes of patients with blood-stream infections" (HELP). ASIC and HELP use the PheP. The clinical benefit of the use cases ASIC and HELP will be demonstrated in a change of care clinical trial based on a step wedge design. DISCUSSION SMITH's strength is the modular, reusable IT architecture based on interoperability standards, the integration of the hospitals' information management departments and the public-private partnership. The project aims at sustainability beyond the first 4-year funding period.
Collapse
Grants
- German Federal Ministry of Education and Research Grant No's. 01ZZ1609A, 01ZZ1609B, 01ZZ1609C, 01ZZ1803A, 01ZZ1803B, 01ZZ1803C, 01ZZ1803D, 01ZZ1803E, 01ZZ1803F, 01ZZ1803G, 01ZZ1803H, 01ZZ1803I, 01ZZ1803J, 01ZZ1803K, 01ZZ1803L, 01ZZ1803M, 01ZZ1803N
Collapse
Affiliation(s)
- Alfred Winter
- Leipzig University, Institute of Medical Informatics, Statistics and Epidemiology, Leipzig, Germany
- Correspondence to: Prof. Alfred Winter Leipzig UniversityInstitute of Medical Informatics, Statistics and EpidemiologyHaertelstr. 16–1804107 LeipzigGermany
| | - Sebastian Stäubert
- Leipzig University, Institute of Medical Informatics, Statistics and Epidemiology, Leipzig, Germany
| | - Danny Ammon
- University Medical Center Jena, Central Service Provider For Information Technology, Jena, Germany
| | | | - Oya Beyan
- RWTH Aachen University, Chair of Computer Science 5, Aachen, Germany
| | - Verena Bischoff
- University of Leipzig Medical Center, Division Staff and Justice, Leipzig, Germany
| | | | - Stefan Decker
- RWTH Aachen University, Chair of Computer Science 5, Aachen, Germany
| | - Gert Funkat
- University of Leipzig Medical Center, Division Information Management, Leipzig, Germany
| | - Jan E. Gewehr
- University Medical Center Hamburg-Eppendorf, Business Division for Information Technology, Hamburg, Germany
| | - Armin de Greiff
- Essen University Hospital, Central Information Technology, Essen, Germany
| | - Silke Haferkamp
- RWTH Aachen University Hospital, Division Information Technology, Aachen, Germany
| | - Udo Hahn
- Friedrich-Schiller-Universität Jena, Language & Information Engineering Lab (JULIE Lab), Jena, Germany
| | - Andreas Henkel
- University Medical Center Jena, Central Service Provider For Information Technology, Jena, Germany
| | - Toralf Kirsten
- Leipzig University, LIFE Research Centre for Civilization Diseases, Leipzig, Germany
| | - Thomas Klöss
- Martin-Luther-Universität Halle-Wittenberg Medical Center, Medical Director, Halle, Germany
| | | | - Matthias Löbe
- Leipzig University, Institute of Medical Informatics, Statistics and Epidemiology, Leipzig, Germany
| | - Volker Lowitsch
- RWTH Aachen University Hospital, Division Information Technology, Aachen, Germany
| | - Oliver Maassen
- RWTH Aachen University Hospital, Department of Intensive Care and Intermediate Care, Aachen, Germany
| | - Jens Maschmann
- University Medical Center Jena, Medical Director, Jena, Germany
| | - Sven Meister
- Fraunhofer Institute for Software and Systems Engineering, Dortmund, Germany
| | - Rafael Mikolajczyk
- Martin-Luther-Universität Halle-Wittenberg, Institute of Medical Epidemiology, Biometry and Informatics, Halle, Germany
| | - Matthias Nüchter
- Leipzig University, LIFE Research Centre for Civilization Diseases, Leipzig, Germany
| | - Mathias W. Pletz
- University Medical Center Jena, Institute of Infectious Diseases and Infection Control, Jena, Germany
| | - Erhard Rahm
- Leipzig University, Department of Computer Science – Database Group, Leipzig, Germany
| | - Morris Riedel
- Forschungszentrum Jülich, Jülich Supercomputing Centre, Jülich, Germany
| | - Kutaiba Saleh
- University Medical Center Jena, Central Service Provider For Information Technology, Jena, Germany
| | - Andreas Schuppert
- RWTH Aachen University, Institute for Computational Biomedicine II, Aachen, Germany
| | - Stefan Smers
- University of Leipzig Medical Center, Division Information Management, Leipzig, Germany
| | - André Stollenwerk
- RWTH Aachen University, Informatik 11 – Embedded Software, Aachen, Germany
| | - Stefan Uhlig
- RWTH Aachen University, Medical Faculty, Dean, Aachen, Germany
| | - Thomas Wendt
- University of Leipzig Medical Center, Data Integration Center, Leipzig, Germany
| | - Sven Zenker
- University of Bonn Medical Center, Department of Anesthesiology and Intensive Care Medicine, Bonn, Germany
| | - Wolfgang Fleig
- University of Leipzig Medical Center, Medical Director, Leipzig, Germany
| | - Gernot Marx
- RWTH Aachen University Hospital, Department of Intensive Care and Intermediate Care, Aachen, Germany
| | - André Scherag
- University Medical Center Jena, Center for Sepsis Control and Care, Jena, Germany
- University Medical Center Jena, Institute of Medical Statistics, Computer and Data Sciences (IMSID), Jena, Germany
| | - Markus Löffler
- Leipzig University, Institute of Medical Informatics, Statistics and Epidemiology, Leipzig, Germany
| |
Collapse
|
11
|
Mueller R, Rahm E, Ramsch J, Heller B, Loeffler M, Greiner U. AdaptFlow: Protocol-based Medical Treatment Using Adaptive Workflows. Methods Inf Med 2018. [DOI: 10.1055/s-0038-1633926] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
Summary
Objectives:
In many medical domains investigator-initiated clinical trials are used to introduce new treatments and hence act as implementations of guideline-based therapies. Trial protocols contain detailed instructions to conduct the therapy and additionally specify reactions to exceptional situations (for instance an infection or a toxicity). To increase quality in health care and raise the number of patients treated according to trial protocols, a consultation system is needed that supports the handling of the complex trial therapy processes efficiently. Our objective was to design and evaluate a consultation system that should 1) observe the status of the therapies currently being applied, 2) offer automatic recognition of exceptional situations and appropriate decision support and 3) provide an automatic adaptation of affected therapy processes to handle exceptional situations.
Methods:
We applied a hybrid approach that combines process support for the timely and efficient execution of the therapy processes as offered by workflow management systems with a knowledge and rule base and a mechanism for dynamic workflow adaptation to change running therapy processes if induced by changed patient condition.
Results and Conclusions:
This approach has been implemented in the AdaptFlow prototype. We performed several evaluation studies on the practicability of the approach and the usefulness of the system. These studies show that the AdaptFlow prototype offers adequate support for the execution of real-world investigator-initiated trial protocols and is able to handle a large number of exceptions.
Collapse
|
12
|
Groß A, Pruski C, Rahm E. Evolution of biomedical ontologies and mappings: Overview of recent approaches. Comput Struct Biotechnol J 2016; 14:333-40. [PMID: 27642503 PMCID: PMC5018063 DOI: 10.1016/j.csbj.2016.08.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 08/19/2016] [Accepted: 08/23/2016] [Indexed: 11/16/2022] Open
Abstract
Biomedical ontologies are heavily used to annotate data, and different ontologies are often interlinked by ontology mappings. These ontology-based mappings and annotations are used in many applications and analysis tasks. Since biomedical ontologies are continuously updated dependent artifacts can become outdated and need to undergo evolution as well. Hence there is a need for largely automated approaches to keep ontology-based mappings up-to-date in the presence of evolving ontologies. In this article, we survey current approaches and novel directions in the context of ontology and mapping evolution. We will discuss requirements for mapping adaptation and provide a comprehensive overview on existing approaches. We will further identify open challenges and outline ideas for future developments.
Collapse
Affiliation(s)
- Anika Groß
- Institute of Computer Science, Universität Leipzig, P.O. Box 100920, 04009 Leipzig, Germany
| | - Cédric Pruski
- Luxembourg Institute of Science and Technology, 5 Avenue des Hauts-Fourneaux, L-4362 Esch-sur-Alzette, Luxembourg
| | - Erhard Rahm
- Institute of Computer Science, Universität Leipzig, P.O. Box 100920, 04009 Leipzig, Germany
| |
Collapse
|
13
|
Affiliation(s)
- Erhard Rahm
- Universität Leipzig, Institut für Informatik, Augustusplatz 10, 04109 Leipzig, Germany
| |
Collapse
|
14
|
Abstract
We introduce a novel approach to extract semantic relations (e.g., is-a and part-of relations) from Wikipedia articles. These relations are used to build up a large and up-to-date thesaurus providing background knowledge for tasks such as determining semantic ontology mappings. Our automatic approach uses a comprehensive set of semantic patterns, finite state machines and NLP techniques to extract millions of relations between concepts. An evaluation for different domains shows the high quality and effectiveness of the proposed approach. We also illustrate the value of the newly found relations for improving existing ontology mappings.
Collapse
Affiliation(s)
- Patrick Arnold
- Department of Computer Science, Leipzig University, Augustusplatz 10, Leipzig, 04109, Germany
| | - Erhard Rahm
- Department of Computer Science, Leipzig University, Augustusplatz 10, Leipzig, 04109, Germany
| |
Collapse
|
15
|
|
16
|
|
17
|
Abstract
OBJECTIVE To address the problem of mapping local laboratory terminologies to Logical Observation Identifiers Names and Codes (LOINC). To study different ontology matching algorithms and investigate how the probability of term combinations in LOINC helps to increase match quality and reduce manual effort. MATERIALS AND METHODS We proposed two matching strategies: full name and multi-part. The multi-part approach also considers the occurrence probability of combined concept parts. It can further recommend possible combinations of concept parts to allow more local terms to be mapped. Three real-world laboratory databases from Taiwanese hospitals were used to validate the proposed strategies with respect to different quality measures and execution run time. A comparison with the commonly used tool, Regenstrief LOINC Mapping Assistant (RELMA) Lab Auto Mapper (LAM), was also carried out. RESULTS The new multi-part strategy yields the best match quality, with F-measure values between 89% and 96%. It can automatically match 70-85% of the laboratory terminologies to LOINC. The recommendation step can further propose mapping to (proposed) LOINC concepts for 9-20% of the local terminology concepts. On average, 91% of the local terminology concepts can be correctly mapped to existing or newly proposed LOINC concepts. CONCLUSIONS The mapping quality of the multi-part strategy is significantly better than that of LAM. It enables domain experts to perform LOINC matching with little manual work. The probability of term combinations proved to be a valuable strategy for increasing the quality of match results, providing recommendations for proposed LOINC conepts, and decreasing the run time for match processing.
Collapse
Affiliation(s)
- Li-Hui Lee
- Department of Computer Science, University of Leipzig, Leipzig, Germany Institute of Public Health, National Yang-Ming University, Taipei, Taiwan
| | - Anika Groß
- Department of Computer Science, University of Leipzig, Leipzig, Germany Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| | - Michael Hartung
- Department of Computer Science, University of Leipzig, Leipzig, Germany Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| | - Der-Ming Liou
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Erhard Rahm
- Department of Computer Science, University of Leipzig, Leipzig, Germany Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| |
Collapse
|
18
|
Abstract
MOTIVATION Ontologies are used in the annotation and analysis of biological data. As knowledge accumulates, ontologies and annotation undergo constant modifications to reflect this new knowledge. These modifications may influence the results of statistical applications such as functional enrichment analyses that describe experimental data in terms of ontological groupings. Here, we investigate to what degree modifications of the Gene Ontology (GO) impact these statistical analyses for both experimental and simulated data. The analysis is based on new measures for the stability of result sets and considers different ontology and annotation changes. RESULTS Our results show that past changes in the GO are non-uniformly distributed over different branches of the ontology. Considering the semantic relatedness of significant categories in analysis results allows a more realistic stability assessment for functional enrichment studies. We observe that the results of term-enrichment analyses tend to be surprisingly stable despite changes in ontology and annotation.
Collapse
Affiliation(s)
- Anika Groß
- Department of Computer Science, University of Leipzig, Leipzig, Germany.
| | | | | | | | | |
Collapse
|
19
|
|
20
|
Kirsten T, Gross A, Hartung M, Rahm E. GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution. J Biomed Semantics 2011; 2:6. [PMID: 21914205 PMCID: PMC3198872 DOI: 10.1186/2041-1480-2-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Accepted: 09/13/2011] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Ontologies are increasingly used to structure and semantically describe entities of domains, such as genes and proteins in life sciences. Their increasing size and the high frequency of updates resulting in a large set of ontology versions necessitates efficient management and analysis of this data. RESULTS We present GOMMA, a generic infrastructure for managing and analyzing life science ontologies and their evolution. GOMMA utilizes a generic repository to uniformly and efficiently manage ontology versions and different kinds of mappings. Furthermore, it provides components for ontology matching, and determining evolutionary ontology changes. These components are used by analysis tools, such as the Ontology Evolution Explorer (OnEX) and the detection of unstable ontology regions. We introduce the component-based infrastructure and show analysis results for selected components and life science applications. GOMMA is available at http://dbs.uni-leipzig.de/GOMMA. CONCLUSIONS GOMMA provides a comprehensive and scalable infrastructure to manage large life science ontologies and analyze their evolution. Key functions include a generic storage of ontology versions and mappings, support for ontology matching and determining ontology changes. The supported features for analyzing ontology changes are helpful to assess their impact on ontology-dependent applications such as for term enrichment. GOMMA complements OnEX by providing functionalities to manage various versions of mappings between two ontologies and allows combining different match approaches.
Collapse
Affiliation(s)
- Toralf Kirsten
- Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.
| | | | | | | |
Collapse
|
21
|
|
22
|
|
23
|
|
24
|
Abstract
Background Numerous ontologies have recently been developed in life sciences to support a consistent annotation of biological objects, such as genes or proteins. These ontologies underlie continuous changes which can impact existing annotations. Therefore, it is valuable for users of ontologies to study the stability of ontologies and to see how many and what kind of ontology changes occurred. Results We present OnEX (Ontology Evolution EXplorer) a system for exploring ontology changes. Currently, OnEX provides access to about 560 versions of 16 well-known life science ontologies. The system is based on a three-tier architecture including an ontology version repository, a middleware component and the OnEX web application. Interactive workflows allow a systematic and explorative change analysis of ontologies and their concepts as well as the semi-automatic migration of out-dated annotations to the current version of an ontology. Conclusion OnEX provides a user-friendly web interface to explore information about changes in current life science ontologies. It is available at .
Collapse
Affiliation(s)
- Michael Hartung
- Interdisciplinary Centre for Bioinformatics, University of Leipzig, Härtelstrasse 16-18, 04107 Leipzig, Germany.
| | | | | | | |
Collapse
|
25
|
|
26
|
Abstract
Abstract We introduce the GeWare data warehouse platform for the integrated analysis of clinical information, microarray data and annotations within large biomedical research studies. Clinical data is obtained from a commercial study management system while publicly available data is integrated using a mediator approach. The platform utilizes a generic approach to manage different types of annotations. We outline the overall architecture of the platform, its implementation as well as the main processing and analysis workflows.
Collapse
Affiliation(s)
- Erhard Rahm
- 1Dept. of Computer Sciences, University of Leipzig Germany
- 2Interdisciplinary Center for Bioinformatics, University of Leipzig Germany
| | - Toralf Kirsten
- 3Interdisciplinary Center for Bioinformatics, University of Leipzig, Germany
| | - Jörg Lange
- 3Interdisciplinary Center for Bioinformatics, University of Leipzig, Germany
| |
Collapse
|
27
|
Prüfer K, Muetzel B, Do HH, Weiss G, Khaitovich P, Rahm E, Pääbo S, Lachmann M, Enard W. FUNC: a package for detecting significant associations between gene sets and ontological annotations. BMC Bioinformatics 2007; 8:41. [PMID: 17284313 PMCID: PMC1800870 DOI: 10.1186/1471-2105-8-41] [Citation(s) in RCA: 149] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2006] [Accepted: 02/06/2007] [Indexed: 11/17/2022] Open
Abstract
Background Genome-wide expression, sequence and association studies typically yield large sets of gene candidates, which must then be further analysed and interpreted. Information about these genes is increasingly being captured and organized in ontologies, such as the Gene Ontology. Relationships between the gene sets identified by experimental methods and biological knowledge can be made explicit and used in the interpretation of results. However, it is often difficult to assess the statistical significance of such analyses since many inter-dependent categories are tested simultaneously. Results We developed the program package FUNC that includes and expands on currently available methods to identify significant associations between gene sets and ontological annotations. Implemented are several tests in particular well suited for genome wide sequence comparisons, estimates of the family-wise error rate, the false discovery rate, a sensitive estimator of the global significance of the results and an algorithm to reduce the complexity of the results. Conclusion FUNC is a versatile and useful tool for the analysis of genome-wide data. It is freely available under the GPL license and also accessible via a web service.
Collapse
Affiliation(s)
- Kay Prüfer
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
| | - Bjoern Muetzel
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
| | - Hong-Hai Do
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107, Germany
| | - Gunter Weiss
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
| | - Philipp Khaitovich
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
- Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Erhard Rahm
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107, Germany
| | - Svante Pääbo
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
| | - Michael Lachmann
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
| | - Wolfgang Enard
- Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
| |
Collapse
|
28
|
Kirsten T, Lange J, Rahm E. An Integrated Platform for Analyzing Molecular-Biological Data Within Clinical Studies. Current Trends in Database Technology – EDBT 2006 2006. [DOI: 10.1007/11896548_31] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
29
|
Greiner U, Mueller R, Rahm E, Ramsch J, Heller B, Loeffler M. AdaptFlow: protocol-based medical treatment using adaptive workflows. Methods Inf Med 2005; 44:80-8. [PMID: 15778798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
OBJECTIVES In many medical domains investigator-initiated clinical trials are used to introduce new treatments and hence act as implementations of guideline-based therapies. Trial protocols contain detailed instructions to conduct the therapy and additionally specify reactions to exceptional situations (for instance an infection or a toxicity). To increase quality in health care and raise the number of patients treated according to trial protocols, a consultation system is needed that supports the handling of the complex trial therapy processes efficiently. Our objective was to design and evaluate a consultation system that should 1) observe the status of the therapies currently being applied, 2) offer automatic recognition of exceptional situations and appropriate decision support and 3) provide an automatic adaptation of affected therapy processes to handle exceptional situations. METHODS We applied a hybrid approach that combines process support for the timely and efficient execution of the therapy processes as offered by workflow management systems with a knowledge and rule base and a mechanism for dynamic workflow adaptation to change running therapy processes if induced by changed patient condition. RESULTS AND CONCLUSIONS This approach has been implemented in the AdaptFlow prototype. We performed several evaluation studies on the practicability of the approach and the usefulness of the system. These studies show that the AdaptFlow prototype offers adequate support for the execution of real-world investigator-initiated trial protocols and is able to handle a large number of exceptions.
Collapse
Affiliation(s)
- U Greiner
- Department of Computer Science, University of Leipzig, Germany.
| | | | | | | | | | | |
Collapse
|
30
|
|
31
|
Greiner U, Ramsch J, Heller B, Löffler M, Müller R, Rahm E. Adaptive guideline-based treatment workflows with AdaptFlow. Stud Health Technol Inform 2004; 101:113-7. [PMID: 15537211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
One goal in modern medicine is to increase the treatment quality. A major step towards this aim is to support the execution of standardized, guideline-based clinical protocols, which are used in many medical domains, e.g., for oncological chemotherapies. Standardized chemotherapy protocols contain detailed and structured therapy plans describing the single therapy steps (e.g., examinations or drug applications). Therefore, workflow management systems offer good support for these processes. However, the treatment of a particular patient often requires modifications due to unexpected infections, toxicities, or social factors. The modifications are described in the treatment protocol but not as part of the standard process. To be able to further execute the therapy workflows in case of exceptions running workflows have to be adapted dynamically. Furthermore, the physician should be supported by automated exception detection and decision support for derivation of necessary modifications. The AdaptFlow prototype offers the required support for the field of oncological chemotherapies by enhancing a workflow system with dynamic workflow adaptation and rule based decision support for exception detection and handling.
Collapse
|
32
|
|
33
|
|
34
|
|
35
|
Rahm E, Ferguson D. Cache management for shared sequential data access. INFORM SYST 1993. [DOI: 10.1016/0306-4379(93)90017-u] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
36
|
|
37
|
Rahm E. [Fluoride tablets: who has the responsibility?]. Zahntechnik (Zur) 1970; 28:362-3. [PMID: 4250952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|