1
|
Raboudi A, Allanic M, Balvay D, Hervé PY, Viel T, Yoganathan T, Certain A, Hilbey J, Charlet J, Durupt A, Boutinaud P, Eynard B, Tavitian B. The BMS-LM ontology for biomedical data reporting throughout the lifecycle of a research study: From data model to ontology. J Biomed Inform 2022; 127:104007. [DOI: 10.1016/j.jbi.2022.104007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 12/24/2021] [Accepted: 01/28/2022] [Indexed: 11/16/2022]
|
2
|
Schröder M, Staehlke S, Groth P, Nebe JB, Spors S, Krüger F. Structure-based knowledge acquisition from electronic lab notebooks for research data provenance documentation. J Biomed Semantics 2022; 13:4. [PMID: 35101121 PMCID: PMC8802522 DOI: 10.1186/s13326-021-00257-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 12/07/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Electronic Laboratory Notebooks (ELNs) are used to document experiments and investigations in the wet-lab. Protocols in ELNs contain a detailed description of the conducted steps including the necessary information to understand the procedure and the raised research data as well as to reproduce the research investigation. The purpose of this study is to investigate whether such ELN protocols can be used to create semantic documentation of the provenance of research data by the use of ontologies and linked data methodologies. METHODS Based on an ELN protocol of a biomedical wet-lab experiment, a retrospective provenance model of the raised research data describing the details of the experiment in a machine-interpretable way is manually engineered. Furthermore, an automated approach for knowledge acquisition from ELN protocols is derived from these results. This structure-based approach exploits the structure in the experiment's description such as headings, tables, and links, to translate the ELN protocol into a semantic knowledge representation. To satisfy the Findable, Accessible, Interoperable, and Reuseable (FAIR) guiding principles, a ready-to-publish bundle is created that contains the research data together with their semantic documentation. RESULTS While the manual modelling efforts serve as proof of concept by employing one protocol, the automated structure-based approach demonstrates the potential generalisation with seven ELN protocols. For each of those protocols, a ready-to-publish bundle is created and, by employing the SPARQL query language, it is illustrated that questions about the processes and the obtained research data can be answered. CONCLUSIONS The semantic documentation of research data obtained from the ELN protocols allows for the representation of the retrospective provenance of research data in a machine-interpretable way. Research Object Crate (RO-Crate) bundles including these models enable researchers to easily share the research data including the corresponding documentation, but also to search and relate the experiment to each other.
Collapse
Affiliation(s)
- Max Schröder
- Institute of Communications Engineering, University of Rostock, Rostock, Germany
- University Library, University of Rostock, Rostock, Germany
| | - Susanne Staehlke
- Department of Cell Biology, University Medical Center Rostock, Rostock, Germany
| | - Paul Groth
- Informatics Institute, University of Amsterdam, Amsterdam, Netherlands
| | - J. Barbara Nebe
- Department of Cell Biology, University Medical Center Rostock, Rostock, Germany
- Department Life, Light & Matter, University of Rostock, Rostock, Germany
| | - Sascha Spors
- Institute of Communications Engineering, University of Rostock, Rostock, Germany
| | - Frank Krüger
- Institute of Communications Engineering, University of Rostock, Rostock, Germany
- Department Knowledge, Culture & Transformation, University of Rostock, Rostock, Germany
| |
Collapse
|
3
|
Naderi N, Knafou J, Copara J, Ruch P, Teodoro D. Ensemble of Deep Masked Language Models for Effective Named Entity Recognition in Health and Life Science Corpora. Front Res Metr Anal 2021; 6:689803. [PMID: 34870074 PMCID: PMC8640190 DOI: 10.3389/frma.2021.689803] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 10/11/2021] [Indexed: 11/13/2022] Open
Abstract
The health and life science domains are well known for their wealth of named entities found in large free text corpora, such as scientific literature and electronic health records. To unlock the value of such corpora, named entity recognition (NER) methods are proposed. Inspired by the success of transformer-based pretrained models for NER, we assess how individual and ensemble of deep masked language models perform across corpora of different health and life science domains-biology, chemistry, and medicine-available in different languages-English and French. Individual deep masked language models, pretrained on external corpora, are fined-tuned on task-specific domain and language corpora and ensembled using classical majority voting strategies. Experiments show statistically significant improvement of the ensemble models over an individual BERT-based baseline model, with an overall best performance of 77% macro F1-score. We further perform a detailed analysis of the ensemble results and show how their effectiveness changes according to entity properties, such as length, corpus frequency, and annotation consistency. The results suggest that the ensembles of deep masked language models are an effective strategy for tackling NER across corpora from the health and life science domains.
Collapse
Affiliation(s)
- Nona Naderi
- Information Science Department, University of Applied Sciences and Arts of Western Switzerland (HES-SO), Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Julien Knafou
- Information Science Department, University of Applied Sciences and Arts of Western Switzerland (HES-SO), Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland.,Computer Science Department, University of Geneva, Geneva, Switzerland
| | - Jenny Copara
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.,Information Science Department, University of Applied Sciences and Arts of Western Switzerland (HES-SO), Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Patrick Ruch
- Information Science Department, University of Applied Sciences and Arts of Western Switzerland (HES-SO), Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Douglas Teodoro
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland.,Information Science Department, University of Applied Sciences and Arts of Western Switzerland (HES-SO), Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| |
Collapse
|
4
|
Abstract
Experimental protocols are key when planning, performing and publishing research in many disciplines, especially in relation to the reporting of materials and methods. However, they vary in their content, structure and associated data elements. This article presents a guideline for describing key content for reporting experimental protocols in the domain of life sciences, together with the methodology followed in order to develop such guideline. As part of our work, we propose a checklist that contains 17 data elements that we consider fundamental to facilitate the execution of the protocol. These data elements are formally described in the SMART Protocols ontology. By providing guidance for the key content to be reported, we aim (1) to make it easier for authors to report experimental protocols with necessary and sufficient information that allow others to reproduce an experiment, (2) to promote consistency across laboratories by delivering an adaptable set of data elements, and (3) to make it easier for reviewers and editors to measure the quality of submitted manuscripts against an established criteria. Our checklist focuses on the content, what should be included. Rather than advocating a specific format for protocols in life sciences, the checklist includes a full description of the key data elements that facilitate the execution of the protocol.
Collapse
Affiliation(s)
- Olga Giraldo
- Ontology Engineering Group, Campus de Montegancedo, Boadilla del Monte, Universidad Politécnica de Madrid, Madrid, Spain
| | - Alexander Garcia
- Ontology Engineering Group, Campus de Montegancedo, Boadilla del Monte, Universidad Politécnica de Madrid, Madrid, Spain
- Technische Universität Graz, Graz, Austria
| | - Oscar Corcho
- Ontology Engineering Group, Campus de Montegancedo, Boadilla del Monte, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
5
|
Abstract
Robotic automation in synthetic biology is especially relevant for liquid handling to facilitate complex experiments. However, research tasks that are not highly standardized are still rarely automated in practice. Two main reasons for this are the substantial investments required to translate molecular biological protocols into robot programs, and the fact that the resulting programs are often too specific to be easily reused and shared. Recent developments of standardized protocols and dedicated programming languages for liquid-handling operations addressed some aspects of ease-of-use and portability of protocols. However, either they focus on simplicity, at the expense of enabling complex protocols, or they entail detailed programming, with corresponding skills and efforts required from the users. To reconcile these trade-offs, we developed Roboliq, a software system that uses artificial intelligence (AI) methods to integrate (i) generic formal, yet intuitive, protocol descriptions, (ii) complete, but usually hidden, programming capabilities, and (iii) user-system interactions to automatically generate executable, optimized robot programs. Roboliq also enables high-level specifications of complex tasks with conditional execution. To demonstrate the system's benefits for experiments that are difficult to perform manually because of their complexity, duration, or time-critical nature, we present three proof-of-principle applications for the reproducible, quantitative characterization of GFP variants.
Collapse
Affiliation(s)
- Ellis Whitehead
- Department of Biosystems Science and Engineering, ETH Zurich and SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058 Basel, Switzerland
| | - Fabian Rudolf
- Department of Biosystems Science and Engineering, ETH Zurich and SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058 Basel, Switzerland
| | - Hans-Michael Kaltenbach
- Department of Biosystems Science and Engineering, ETH Zurich and SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058 Basel, Switzerland
| | - Jörg Stelling
- Department of Biosystems Science and Engineering, ETH Zurich and SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058 Basel, Switzerland
| |
Collapse
|
6
|
Abstract
Background An experimental protocol is a sequence of tasks and operations executed to perform experimental research in biological and biomedical areas, e.g. biology, genetics, immunology, neurosciences, virology. Protocols often include references to equipment, reagents, descriptions of critical steps, troubleshooting and tips, as well as any other information that researchers deem important for facilitating the reusability of the protocol. Although experimental protocols are central to reproducibility, the descriptions are often cursory. There is the need for a unified framework with respect to the syntactic structure and the semantics for representing experimental protocols. Results In this paper we present “SMART Protocols ontology”, an ontology for representing experimental protocols. Our ontology represents the protocol as a workflow with domain specific knowledge embedded within a document. We also present the Sample Instrument Reagent Objective (SIRO) model, which represents the minimal common information shared across experimental protocols. SIRO was conceived in the same realm as the Patient Intervention Comparison Outcome (PICO) model that supports search, retrieval and classification purposes in evidence based medicine. We evaluate our approach against a set of competency questions modeled as SPARQL queries and processed against a set of published and unpublished protocols modeled with the SP Ontology and the SIRO model. Our approach makes it possible to answer queries such as Which protocols use tumor tissue as a sample. Conclusion Improving reporting structures for experimental protocols requires collective efforts from authors, peer reviewers, editors and funding bodies. The SP Ontology is a contribution towards this goal. We build upon previous experiences and bringing together the view of researchers managing protocols in their laboratory work. Website: https://smartprotocols.github.io/.
Collapse
Affiliation(s)
- Olga Giraldo
- Ontology Engineering Group, Madrid, Universidad Politécnica de Madrid, Madrid, 28660, Spain.
| | - Alexander García
- Ontology Engineering Group, Madrid, Universidad Politécnica de Madrid, Madrid, 28660, Spain
| | | | - Oscar Corcho
- Ontology Engineering Group, Madrid, Universidad Politécnica de Madrid, Madrid, 28660, Spain
| |
Collapse
|
7
|
|
8
|
Charlet J, Darmoni SJ. Knowledge Representation and Management. From Ontology to Annotation. Findings from the Yearbook 2015 Section on Knowledge Representation and Management. Yearb Med Inform 2015; 10:134-6. [PMID: 26293860 DOI: 10.15265/iy-2015-038] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVE To summarize the best papers in the field of Knowledge Representation and Management (KRM). METHODS A comprehensive review of medical informatics literature was performed to select some of the most interesting papers of KRM published in 2014. RESULTS Four articles were selected, two focused on annotation and information retrieval using an ontology. The two others focused mainly on ontologies, one dealing with the usage of a temporal ontology in order to analyze the content of narrative document, one describing a methodology for building multilingual ontologies. CONCLUSION Semantic models began to show their efficiency, coupled with annotation tools.
Collapse
Affiliation(s)
- J Charlet
- Dr Jean Charlet, LIMICS - INSERM U1142, Campus des Cordeliers, 15, rue de l'école de médecine, 75006 Paris, France, Tél. +33 1 44 27 91 09, E-mail:
| | | |
Collapse
|
9
|
Duncan J, Eilbeck K, Narus SP, Clyde S, Thornton S, Staes C. Building an Ontology for Identity Resolution in Healthcare and Public Health. Online J Public Health Inform 2015; 7:e219. [PMID: 26392849 PMCID: PMC4576444 DOI: 10.5210/ojphi.v7i2.6010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
UNLABELLED Integration of disparate information from electronic health records, clinical data warehouses, birth certificate registries and other public health information systems offers great potential for clinical care, public health practice, and research. Such integration, however, depends on correctly matching patient-specific records using demographic identifiers. Without standards for these identifiers, record linkage is complicated by issues of structural and semantic heterogeneity. OBJECTIVES Our objectives were to develop and validate an ontology to: 1) identify components of identity and events subsequent to birth that result in creation, change, or sharing of identity information; 2) develop an ontology to facilitate data integration from multiple healthcare and public health sources; and 3) validate the ontology's ability to model identity-changing events over time. METHODS We interviewed domain experts in area hospitals and public health programs and developed process models describing the creation and transmission of identity information among various organizations for activities subsequent to a birth event. We searched for existing relevant ontologies. We validated the content of our ontology with simulated identity information conforming to scenarios identified in our process models. RESULTS We chose the Simple Event Model (SEM) to describe events in early childhood and integrated the Clinical Element Model (CEM) for demographic information. We demonstrated the ability of the combined SEM-CEM ontology to model identity events over time. CONCLUSION The use of an ontology can overcome issues of semantic and syntactic heterogeneity to facilitate record linkage.
Collapse
Affiliation(s)
- Jeffrey Duncan
- Department of Biomedical Informatics, University of
Utah, Salt Lake City, UT USA
| | - Karen Eilbeck
- Department of Biomedical Informatics, University of
Utah, Salt Lake City, UT USA
| | - Scott P. Narus
- Department of Biomedical Informatics, University of
Utah, Salt Lake City, UT USA
- Intermountain Healthcare, Salt Lake City, UT
USA
| | - Stephen Clyde
- Department of Computer Science, Utah State
University, Logan, UT USA
| | - Sidney Thornton
- Department of Biomedical Informatics, University of
Utah, Salt Lake City, UT USA
- Intermountain Healthcare, Salt Lake City, UT
USA
| | - Catherine Staes
- Department of Biomedical Informatics, University of
Utah, Salt Lake City, UT USA
| |
Collapse
|
10
|
Abstract
The thirteenth NETTAB workshop, NETTAB 2013, was devoted to semantic, social, and mobile applications for bioinformatics and biomedical laboratories. Topics included issues, methods, algorithms, and technologies for the design and development of tools and platforms able to provide semantic, social, and mobile applications supporting bioinformatics and the activities carried out in a biomedical laboratory. About 30 scientific contributions were presentedat NETTAB 2013, including keynote and tutorial talks, oral communications, and posters. Best contributions presented at the workshop were later submitted to a special Call for this Supplement. Here, we provide an overview of the workshop and introduce manuscripts that have been accepted for publication in this Supplement.
Collapse
|