1
|
Vogt L, Strömert P, Matentzoglu N, Karam N, Konrad M, Prinz M, Baum R. Suggestions for extending the FAIR Principles based on a linguistic perspective on semantic interoperability. Sci Data 2025; 12:688. [PMID: 40274834 PMCID: PMC12022272 DOI: 10.1038/s41597-025-05011-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 04/15/2025] [Indexed: 04/26/2025] Open
Abstract
FAIR (meta)data presuppose their successful communication between machines and humans while preserving meaning and reference. The FAIR Guiding Principles lack specificity regarding semantic interoperability. We adopt a linguistic perspective on semantic interoperability and investigate the structures and conventions ensuring reliable communication of textual information, drawing parallels with data structures by understanding both as models. We propose a conceptual model of semantic interoperability, comprising intensional and extensional terminological interoperability, as well as logical and schema propositional interoperability. Since there cannot be a universally accepted best vocabulary and best (meta)data schema, establishing semantic interoperability necessitates the provision of comprehensive sets of intensional and extensional entity mappings and schema crosswalks. In accordance with our conceptual model, we suggest additions to the FAIR Guiding Principles that encompass the requirements for semantic interoperability. Additionally, we argue that attaining FAIRness of (meta)data requires not only their organization into FAIR Digital Objects, but also the establishment of a FAIR ecosystem of FAIR Services, that include a terminology, a schema, and an operations service.
Collapse
Affiliation(s)
- Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hanover, Germany.
| | - Philip Strömert
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hanover, Germany
| | | | - Naouel Karam
- Institute for Applied Informatics (InfAI), University of Leipzig, Leipzig, Germany
| | - Marcel Konrad
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hanover, Germany
| | - Manuel Prinz
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hanover, Germany
| | - Roman Baum
- ZB MED - Information Centre for Life Sciences, Gleueler Straβe 60, 50931, Cologne, Germany
| |
Collapse
|
2
|
Duesing S, Bennett J, Overton JA, Vita R, Peters B. Standardizing free-text data exemplified by two fields from the Immune Epitope Database. J Biomed Semantics 2025; 16:5. [PMID: 40121509 PMCID: PMC11929277 DOI: 10.1186/s13326-025-00324-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Accepted: 02/25/2025] [Indexed: 03/25/2025] Open
Abstract
BACKGROUND While unstructured data, such as free text, constitutes a large amount of publicly available biomedical data, it is underutilized in automated analyses due to the difficulty of extracting meaning from it. Normalizing free-text data, i.e., removing inessential variance, enables the use of structured vocabularies like ontologies to represent the data and allow for harmonized queries over it. This paper presents an adaptable tool for free-text normalization and an evaluation of the application of this tool to two different fields curated from the literature in the Immune Epitope Database (IEDB): "age" and "data-location" (the part of a paper in which data was found). RESULTS Free text entries for the database fields for subject age (4095 distinct values) and publication data-location (251,810 distinct values) in the IEDB were analyzed. Normalization was performed in three steps, namely character normalization, word normalization, and phrase normalization, using generalizable rules developed and applied with the tool presented in this manuscript. For the age dataset, in the character stage, the application of 21 rules resulted in 99.97% output validity; in the word stage, the application of 94 rules resulted in 98.06% output validity; and in the phrase stage, the application of 16 rules resulted in 83.81% output validity. For the data-location dataset, in the character stage, the application of 39 rules resulted in 99.99% output validity; in the word stage, the application of 187 rules resulted in 98.46% output validity; and in the phrase stage, the application of 12 rules resulted in 97.95% output validity. CONCLUSIONS We developed a generalizable approach for normalization of free text as found in database fields with content on a specific topic. Creating and testing the rules took a one-time effort for a given field that can now be applied to data as it is being curated. The standardization achieved in two datasets tested produces significantly reduced variance in the content which enhances the findability and usability of that data, chiefly by improving search functionality and enabling linkages with formal ontologies.
Collapse
Affiliation(s)
- Sebastian Duesing
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA, 92037, USA.
| | - Jason Bennett
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA, 92037, USA
| | | | - Randi Vita
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA, 92037, USA
| | - Bjoern Peters
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA, 92037, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
3
|
Tan SZK, Baksi S, Bjerregaard TG, Elangovan P, Gopalakrishnan TK, Hric D, Joumaa J, Li B, Rabbani K, Venkatesan SK, Valdez JD, Kuriakose SV. Digital evolution: Novo Nordisk's shift to ontology-based data management. J Biomed Semantics 2025; 16:6. [PMID: 40121504 PMCID: PMC11929979 DOI: 10.1186/s13326-025-00327-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 03/10/2025] [Indexed: 03/25/2025] Open
Abstract
The amount of biomedical data is growing, and managing it is increasingly challenging. While Findable, Accessible, Interoperable and Reusable (FAIR) data principles provide guidance, their adoption has proven difficult, especially in larger enterprises like pharmaceutical companies. In this manuscript, we describe how we leverage an Ontology-Based Data Management (OBDM) strategy for digital transformation in Novo Nordisk Research & Early Development. Here, we include both our technical blueprint and our approach for organizational change management. We further discuss how such an OBDM ecosystem plays a pivotal role in the organization's digital aspirations for data federation and discovery fuelled by artificial intelligence. Our aim for this paper is to share the lessons learned in order to foster dialogue with parties navigating similar waters while collectively advancing the efforts in the fields of data management, semantics and data driven drug discovery.
Collapse
Affiliation(s)
| | - Shounak Baksi
- Novo Nordisk A/S, Novo Nordisk Park 1, Måløv, 2760, Denmark
| | | | | | | | - Darko Hric
- Novo Nordisk A/S, Novo Nordisk Park 1, Måløv, 2760, Denmark
| | - Joffrey Joumaa
- Novo Nordisk A/S, Novo Nordisk Park 1, Måløv, 2760, Denmark
| | - Beidi Li
- Novo Nordisk A/S, Novo Nordisk Park 1, Måløv, 2760, Denmark
| | - Kashif Rabbani
- Novo Nordisk A/S, Novo Nordisk Park 1, Måløv, 2760, Denmark
| | | | | | | |
Collapse
|
4
|
Braun I, Hartley E, Olson D, Matentzoglu N, Schaper K, Walls R, Vasilevsky N. Increased discoverability of rare disease datasets through knowledge graph integration. JAMIA Open 2025; 8:ooaf001. [PMID: 39926165 PMCID: PMC11806703 DOI: 10.1093/jamiaopen/ooaf001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 12/23/2024] [Accepted: 01/30/2025] [Indexed: 02/11/2025] Open
Abstract
Objectives Demonstrate a methodology for improving discoverability of rare disease datasets by enriching source data with biological associations. Materials and Methods We developed an extension of the Biolink semantic model to incorporate patient data and generated a knowledge graph (KG) comprising patient data and associations between biological entities in an existing KG, leveraging existing mappings and mapping standards. Results The enriched model of patient data can support a search application that is aware of biological associations and provides a semantic search interface to discover and summarize patient datasets within the broader biological context. Discussion and Conclusion Our methodology enriches datasets with a wealth of additional biological knowledge, improving discoverability. Using condition concepts, we illustrate techniques that could be applied to other entities within source data such as measurements and observations. This work provides a foundational framework for how source data can be modeled to improve accuracy of upstream language models for natural language querying.
Collapse
Affiliation(s)
- Ian Braun
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, United States
| | - Emily Hartley
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, United States
| | - Daniel Olson
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, United States
| | | | - Kevin Schaper
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Ramona Walls
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, United States
| | - Nicole Vasilevsky
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, United States
| |
Collapse
|
5
|
Lin CL, Huang PC, Gräßle S, Grathwol C, Tremouilhac P, Vanderheiden S, Hodapp P, Herres-Pawlis S, Hoffmann A, Fink F, Manolikakes G, Opatz T, Link A, Marques MMB, Daumann LJ, Tsotsalas M, Biedermann F, Mutlu H, Täuscher E, Bach F, Drees T, Neumann S, Harivyasi SS, Jung N, Bräse S. Linking Research Data with Physically Preserved Research Materials in Chemistry. Sci Data 2025; 12:130. [PMID: 39843501 PMCID: PMC11754846 DOI: 10.1038/s41597-025-04404-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 01/03/2025] [Indexed: 01/24/2025] Open
Abstract
Results of scientific work in chemistry can usually be obtained in the form of materials and data. A big step towards transparency and reproducibility of the scientific work can be gained if scientists publish their data in research data repositories in a FAIR manner. Nevertheless, in order to make chemistry a sustainable discipline, obtaining FAIR data is insufficient and a comprehensive concept that includes preservation of materials is needed. In order to offer a comprehensive infrastructure to find and access data and materials that were generated in chemistry projects, we combined the infrastructure Chemotion repository with an archive for chemical compounds. Samples play a key role in this concept: we describe how FAIR metadata of a virtual sample representation can be used to refer to a physically available sample in a materials' archive and to link it with the FAIR research data gained using the said sample. We further describe the measures to make the physically available samples not only FAIR through their metadata but also findable, accessible and reusable.
Collapse
Affiliation(s)
- Chia-Lin Lin
- Institute of Biological and Chemical Systems - Functional Molecular Systems (IBCS-FMS), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany
| | - Pei-Chi Huang
- Institute of Biological and Chemical Systems - Functional Molecular Systems (IBCS-FMS), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany
| | - Simone Gräßle
- Institute of Biological and Chemical Systems - Functional Molecular Systems (IBCS-FMS), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany
| | - Christoph Grathwol
- Institute of Biological and Chemical Systems - Functional Molecular Systems (IBCS-FMS), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany
| | - Pierre Tremouilhac
- Institute of Biological and Chemical Systems - Functional Molecular Systems (IBCS-FMS), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany
| | - Sylvia Vanderheiden
- Institute of Biological and Chemical Systems - Functional Molecular Systems (IBCS-FMS), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany
| | - Patrick Hodapp
- Institute for Biological Interfaces 3 - Soft Matter Laboratory (IBG 3 - SML), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany
| | - Sonja Herres-Pawlis
- RWTH Aachen University, Institute of Inorganic Chemistry, Landoltweg 1a, 52074, Aachen, Germany
| | - Alexander Hoffmann
- RWTH Aachen University, Institute of Inorganic Chemistry, Landoltweg 1a, 52074, Aachen, Germany
| | - Fabian Fink
- RWTH Aachen University, Institute of Inorganic Chemistry, Landoltweg 1a, 52074, Aachen, Germany
| | - Georg Manolikakes
- RPTU Kaiserslautern-Landau, Department Chemie, Erwin-Schrödinger-Str. Geb. 54, 67663, Kaiserslautern, Germany
| | - Till Opatz
- JGU Mainz, Department Chemie, Duesbergweg 10-14, 55128, Mainz, Germany
| | - Andreas Link
- Universität Greifswald, Institut für Pharmazie, Friedrich-Ludwig-Jahn-Str. 17, 17489, Greifswald, Germany
| | - M Manuel B Marques
- LAQV-REQUIMTE, Department of Chemistry, NOVA School of Science and Technology, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
| | - Lena J Daumann
- Chair of Bioinorganic Chemistry, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 13, 40225, Düsseldorf, Germany
| | - Manuel Tsotsalas
- Institute of Functional Interfaces (IFG), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany
| | - Frank Biedermann
- Institute of Nanotechnology (INT), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany
| | - Hatice Mutlu
- Institut de Science des Matériaux de MulhouseUMR 7361 CNRS/Université de Haute Alsace15 rue Jean Starcky, Mulhouse Cedex, 68057, France
| | - Eric Täuscher
- Technische Universität Ilmenau, Institut für Chemie und Biotechnik, Weimarer Straße 25, 98693, Ilmenau, Germany
| | - Felix Bach
- FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH, Hermann-von-Helmholtz-Platz 1, 76344, Eggenstein-Leopoldshafen, Germany
| | - Tim Drees
- Legal Affairs, Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, Computational Plant Biochemistry group, Halle, Germany
| | - Shashank S Harivyasi
- Institute of Biological and Chemical Systems - Functional Molecular Systems (IBCS-FMS), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany
| | - Nicole Jung
- Institute of Biological and Chemical Systems - Functional Molecular Systems (IBCS-FMS), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany.
- Karlsruhe Nano Micro Facility (KNMFi), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany.
| | - Stefan Bräse
- Institute of Biological and Chemical Systems - Functional Molecular Systems (IBCS-FMS), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany.
- Institute of Organic Chemistry (IOC), Karlsruhe Institute of Technology, Kaiserstraße 12, 76131, Karlsruhe, Germany.
| |
Collapse
|
6
|
Kyoda K, Itoga H, Yamagata Y, Fujisawa E, Wang F, Miranda-Miranda M, Yamamoto H, Nakano Y, Tohsato Y, Onami S. SSBD: an ecosystem for enhanced sharing and reuse of bioimaging data. Nucleic Acids Res 2025; 53:D1716-D1723. [PMID: 39479781 PMCID: PMC11701685 DOI: 10.1093/nar/gkae860] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/07/2024] [Accepted: 09/21/2024] [Indexed: 01/18/2025] Open
Abstract
SSBD (https://ssbd.riken.jp) is a platform for the sharing and reuse of bioimaging data. As part of efforts to build a bioimaging data ecosystem, SSBD has recently been updated to a two-tiered data resource comprising SSBD:repository, a public repository for the sharing of all types of bioimaging data reported in journals, and SSBD:database, an added-value database for the sharing of curated, highly reusable, metadata-rich data. This update addresses the conflicting demands of rapid data publication and sharing of richly annotated data, thereby promoting bioimaging data sharing and reuse. With this update, SSBD is now positioned as a core repository and database within the foundingGIDE, an international consortium working to establish a global image data ecosystem. Harmonizing metadata between data resources enables cross-searching and data exchange with data resources from other countries and regions.
Collapse
Affiliation(s)
- Koji Kyoda
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Hiroya Itoga
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Yuki Yamagata
- Life Science Data Sharing Unit, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
- Integrated Bioresource Information Division, RIKEN Bioresource Research Center, 3-1-1 Koyadai, Tsukuba, Ibaraki 350-0074, Japan
| | - Emi Fujisawa
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Fangfang Wang
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
- Life Science Data Sharing Unit, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Miguel Miranda-Miranda
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Haruna Yamamoto
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Yasue Nakano
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - Yukako Tohsato
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
- Faculty of Information Science and Engineering, Ritsumeikan University, 2-150 Iwakura-cho, Ibaraki, Osaka 567-8570, Japan
| | - Shuichi Onami
- Laboratory for Developmental Dynamics, RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
- Life Science Data Sharing Unit, RIKEN Information R&D and Strategy Headquarters, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| |
Collapse
|
7
|
Vita R, Blazeska N, Marrama D, Duesing S, Bennett J, Greenbaum J, De Almeida Mendes M, Mahita J, Wheeler D, Cantrell J, Overton J, Natale D, Sette A, Peters B. The Immune Epitope Database (IEDB): 2024 update. Nucleic Acids Res 2025; 53:D436-D443. [PMID: 39558162 PMCID: PMC11701597 DOI: 10.1093/nar/gkae1092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/11/2024] [Accepted: 10/24/2024] [Indexed: 11/20/2024] Open
Abstract
Over the past 20 years, the Immune Epitope Database (IEDB, iedb.org) has established itself as the foremost resource for immune epitope data. The IEDB catalogs published epitopes and their contextual experimental data in a freely searchable public resource. The IEDB team manually curates data from the literature into a structured format and spans infectious, allergic, autoimmune, and transplant diseases. Here, we describe the enhancements made since our 2018 paper, capturing user-directed updates to the search interface, advanced data exports, increases in data quality, and improved interoperability across related resources. As we look forward to the next 20 years, we are confident in our ability to meet the needs of our users and to contribute to the broader field of data standardization.
Collapse
Affiliation(s)
- Randi Vita
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA 92037, USA
| | - Nina Blazeska
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA 92037, USA
| | - Daniel Marrama
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA 92037, USA
| | - Sebastian Duesing
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA 92037, USA
| | - Jason Bennett
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA 92037, USA
| | - Jason Greenbaum
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA 92037, USA
| | | | - Jarjapu Mahita
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA 92037, USA
| | | | | | | | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Alessandro Sette
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA 92037, USA
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Bjoern Peters
- Center for Vaccine Innovation, La Jolla Institute for Immunology, La Jolla, CA 92037, USA
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
8
|
Tabatabaei Hosseini SA, Kazemzadeh R, Foster BJ, Arpali E, Süsal C. New Tools for Data Harmonization and Their Potential Applications in Organ Transplantation. Transplantation 2024; 108:2306-2317. [PMID: 38755748 PMCID: PMC11581435 DOI: 10.1097/tp.0000000000005048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 03/15/2024] [Accepted: 03/19/2024] [Indexed: 05/18/2024]
Abstract
In organ transplantation, accurate analysis of clinical outcomes requires large, high-quality data sets. Not only are outcomes influenced by a multitude of factors such as donor, recipient, and transplant characteristics and posttransplant events but they may also change over time. Although large data sets already exist and are continually expanding in transplant registries and health institutions, these data are rarely combined for analysis because of a lack of harmonization. Promoted by the digitalization of the healthcare sector, effective data harmonization tools became available, with potential applications also for organ transplantation. We discuss herein the present problems in the harmonization of organ transplant data and offer solutions to enhance its accuracy through the use of emerging new tools. To overcome the problem of inadequate representation of transplantation-specific terms, ontologies and common data models particular to this field could be created and supported by a consortium of related stakeholders to ensure their broad acceptance. Adopting clear data-sharing policies can diminish administrative barriers that impede collaboration between organizations. Secure multiparty computation frameworks and the artificial intelligence (AI) approach federated learning can facilitate decentralized and harmonized analysis of data sets, without sharing sensitive data and compromising patient privacy. A common image data model built upon a standardized format would be beneficial to AI-based analysis of pathology images. Implementation of these promising new tools and measures, ideally with the involvement and support of transplant societies, is expected to produce improved integration and harmonization of transplant data and greater accuracy in clinical decision-making, enabling improved patient outcomes.
Collapse
Affiliation(s)
| | - Reza Kazemzadeh
- Transplant Immunology Research Center of Excellence, Koç University Hospital, Istanbul, Turkey
| | - Bethany Joy Foster
- Department of Pediatrics, McGill University, Montreal, QC, Canada
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
- Research Institute of the McGill University Health Centre, McGill University, Montreal, QC, Canada
| | - Emre Arpali
- Transplant Immunology Research Center of Excellence, Koç University Hospital, Istanbul, Turkey
| | - Caner Süsal
- Transplant Immunology Research Center of Excellence, Koç University Hospital, Istanbul, Turkey
| |
Collapse
|
9
|
Duesing S, Bennett J, Overton JA, Vita R, Peters B. Standardizing Free-Text Data Exemplified by Age and Data-Location Fields in the Immune Epitope Database. RESEARCH SQUARE 2024:rs.3.rs-5363542. [PMID: 39606440 PMCID: PMC11601825 DOI: 10.21203/rs.3.rs-5363542/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Background While unstructured data, such as free text, constitutes a large amount of publicly available biomedical data, it is underutilized in automated analyses due to the difficulty of extracting meaning from it. Normalizing free-text data, i.e., removing inessential variance, enables the use of structured vocabularies like ontologies to represent the data and allow for harmonized queries over it. This paper presents an adaptable tool for free-text normalization and an evaluation of the application of this tool to two different sets of unstructured biomedical data curated from the literature in the Immune Epitope Database (IEDB): age and data-location. Results Free text entries for the database fields for subject age (4095 distinct values) and publication data-location (251,810 distinct values) in the IEDB were analyzed. Normalization was performed in three steps, namely character normalization, word normalization, and phrase normalization, using generalizable rules developed and applied with the tool presented in this manuscript. For the age dataset, in the character stage, the application of 21 rules resulted in 99.97% output validity; in the word stage, the application of 94 rules resulted in 98.06% output validity; and in the phrase stage, the application of 16 rules resulted in 83.81% output validity. For the data-location dataset, in the character stage, the application of 39 rules resulted in 99.99% output validity; in the word stage, the application of 187 rules resulted in 98.46% output validity; and in the phrase stage, the application of 12 rules resulted in 97.95% output validity. Conclusions We developed a generalizable approach for normalization of free text as found in database fields with content on a specific topic. Creating and testing the rules took a one-time effort for a given field that can now be applied to data as it is being curated. The standardization achieved in two datasets tested produces significantly reduced variance in the content which enhances the findability and usability of that data, chiefly by improving search functionality and enabling linkages with formal ontologies.
Collapse
Affiliation(s)
| | | | | | - Randi Vita
- La Jolla Institute For Allergy & Immunology
| | | |
Collapse
|
10
|
Toro S, Anagnostopoulos AV, Bello SM, Blumberg K, Cameron R, Carmody L, Diehl AD, Dooley DM, Duncan WD, Fey P, Gaudet P, Harris NL, Joachimiak MP, Kiani L, Lubiana T, Munoz-Torres MC, O'Neil S, Osumi-Sutherland D, Puig-Barbe A, Reese JT, Reiser L, Robb SM, Ruemping T, Seager J, Sid E, Stefancsik R, Weber M, Wood V, Haendel MA, Mungall CJ. Dynamic Retrieval Augmented Generation of Ontologies using Artificial Intelligence (DRAGON-AI). J Biomed Semantics 2024; 15:19. [PMID: 39415214 PMCID: PMC11484368 DOI: 10.1186/s13326-024-00320-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 09/08/2024] [Indexed: 10/18/2024] Open
Abstract
BACKGROUND Ontologies are fundamental components of informatics infrastructure in domains such as biomedical, environmental, and food sciences, representing consensus knowledge in an accurate and computable form. However, their construction and maintenance demand substantial resources and necessitate substantial collaboration between domain experts, curators, and ontology experts. We present Dynamic Retrieval Augmented Generation of Ontologies using AI (DRAGON-AI), an ontology generation method employing Large Language Models (LLMs) and Retrieval Augmented Generation (RAG). DRAGON-AI can generate textual and logical ontology components, drawing from existing knowledge in multiple ontologies and unstructured text sources. RESULTS We assessed performance of DRAGON-AI on de novo term construction across ten diverse ontologies, making use of extensive manual evaluation of results. Our method has high precision for relationship generation, but has slightly lower precision than from logic-based reasoning. Our method is also able to generate definitions deemed acceptable by expert evaluators, but these scored worse than human-authored definitions. Notably, evaluators with the highest level of confidence in a domain were better able to discern flaws in AI-generated definitions. We also demonstrated the ability of DRAGON-AI to incorporate natural language instructions in the form of GitHub issues. CONCLUSIONS These findings suggest DRAGON-AI's potential to substantially aid the manual ontology construction process. However, our results also underscore the importance of having expert curators and ontology editors drive the ontology generation process.
Collapse
Affiliation(s)
- Sabrina Toro
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | | | - Kai Blumberg
- Department of Agriculture, Beltsville Human Nutrition Research Center, Beltsville, MD, USA
| | | | - Leigh Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | | | - Petra Fey
- Northwestern University, Evanston, IL, USA
| | - Pascale Gaudet
- SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Nomi L Harris
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Leila Kiani
- Independent Scientific Information Analyst, Philadelphia, USA
| | | | | | - Shawn O'Neil
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | | | - Justin T Reese
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Sofia Mc Robb
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | | | | | - Eric Sid
- National Center for Advancing Translational Sciences, Bethesda, MD, USA
| | - Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Magalie Weber
- INRAE, French National Research Institute for Agriculture, Food and Environment, UR BIA, Nantes, France
| | | | | | | |
Collapse
|
11
|
Turki H, Dossou BFP, Emezue CC, Owodunni AT, Hadj Taieb MA, Ben Aouicha M, Ben Hassen H, Masmoudi A. MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed. J Biomed Semantics 2024; 15:18. [PMID: 39354632 PMCID: PMC11445994 DOI: 10.1186/s13326-024-00319-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 08/31/2024] [Indexed: 10/03/2024] Open
Abstract
Biomedical relation classification has been significantly improved by the application of advanced machine learning techniques on the raw texts of scholarly publications. Despite this improvement, the reliance on large chunks of raw text makes these algorithms suffer in terms of generalization, precision, and reliability. The use of the distinctive characteristics of bibliographic metadata can prove effective in achieving better performance for this challenging task. In this research paper, we introduce an approach for biomedical relation classification using the qualifiers of co-occurring Medical Subject Headings (MeSH). First of all, we introduce MeSH2Matrix, our dataset consisting of 46,469 biomedical relations curated from PubMed publications using our approach. Our dataset includes a matrix that maps associations between the qualifiers of subject MeSH keywords and those of object MeSH keywords. It also specifies the corresponding Wikidata relation type and the superclass of semantic relations for each relation. Using MeSH2Matrix, we build and train three machine learning models (Support Vector Machine [SVM], a dense model [D-Model], and a convolutional neural network [C-Net]) to evaluate the efficiency of our approach for biomedical relation classification. Our best model achieves an accuracy of 70.78% for 195 classes and 83.09% for five superclasses. Finally, we provide confusion matrix and extensive feature analyses to better examine the relationship between the MeSH qualifiers and the biomedical relations being classified. Our results will hopefully shed light on developing better algorithms for biomedical ontology classification based on the MeSH keywords of PubMed publications. For reproducibility purposes, MeSH2Matrix, as well as all our source codes, are made publicly accessible at https://github.com/SisonkeBiotik-Africa/MeSH2Matrix .
Collapse
Affiliation(s)
- Houcemeddine Turki
- Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia.
| | | | - Chris Chinenye Emezue
- Mila Quebec AI Institute, Montreal, Canada
- Technical University of Munich, Munich, Germany
| | | | - Mohamed Ali Hadj Taieb
- Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
| | - Mohamed Ben Aouicha
- Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
| | - Hanen Ben Hassen
- Laboratory of Probability and Statistics, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
| | - Afif Masmoudi
- Laboratory of Probability and Statistics, Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
| |
Collapse
|
12
|
Amusat OO, Hegde H, Mungall CJ, Giannakou A, Byers NP, Gunter D, Fagnan K, Ramakrishnan L. Automated annotation of scientific texts for ML-based keyphrase extraction and validation. Database (Oxford) 2024; 2024:baae093. [PMID: 39331731 PMCID: PMC11959184 DOI: 10.1093/database/baae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 06/28/2024] [Accepted: 08/12/2024] [Indexed: 09/29/2024]
Abstract
Advanced omics technologies and facilities generate a wealth of valuable data daily; however, the data often lack the essential metadata required for researchers to find, curate, and search them effectively. The lack of metadata poses a significant challenge in the utilization of these data sets. Machine learning (ML)-based metadata extraction techniques have emerged as a potentially viable approach to automatically annotating scientific data sets with the metadata necessary for enabling effective search. Text labeling, usually performed manually, plays a crucial role in validating machine-extracted metadata. However, manual labeling is time-consuming and not always feasible; thus, there is a need to develop automated text labeling techniques in order to accelerate the process of scientific innovation. This need is particularly urgent in fields such as environmental genomics and microbiome science, which have historically received less attention in terms of metadata curation and creation of gold-standard text mining data sets. In this paper, we present two novel automated text labeling approaches for the validation of ML-generated metadata for unlabeled texts, with specific applications in environmental genomics. Our techniques show the potential of two new ways to leverage existing information that is only available for select documents within a corpus to validate ML models, which can then be used to describe the remaining documents in the corpus. The first technique exploits relationships between different types of data sources related to the same research study, such as publications and proposals. The second technique takes advantage of domain-specific controlled vocabularies or ontologies. In this paper, we detail applying these approaches in the context of environmental genomics research for ML-generated metadata validation. Our results show that the proposed label assignment approaches can generate both generic and highly specific text labels for the unlabeled texts, with up to 44% of the labels matching with those suggested by a ML keyword extraction algorithm.
Collapse
Affiliation(s)
- Oluwamayowa O Amusat
- Scientific Data Division, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Harshad Hegde
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Anna Giannakou
- Scientific Data Division, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Neil P Byers
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Dan Gunter
- Scientific Data Division, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Kjiersten Fagnan
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Lavanya Ramakrishnan
- Scientific Data Division, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| |
Collapse
|
13
|
Lim S, Johannesson P. An Ontology to Bridge the Clinical Management of Patients and Public Health Responses for Strengthening Infectious Disease Surveillance: Design Science Study. JMIR Form Res 2024; 8:e53711. [PMID: 39325530 PMCID: PMC11467600 DOI: 10.2196/53711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 03/31/2024] [Accepted: 07/01/2024] [Indexed: 09/27/2024] Open
Abstract
BACKGROUND Novel surveillance approaches using digital technologies, including the Internet of Things (IoT), have evolved, enhancing traditional infectious disease surveillance systems by enabling real-time detection of outbreaks and reaching a wider population. However, disparate, heterogenous infectious disease surveillance systems often operate in silos due to a lack of interoperability. As a life-changing clinical use case, the COVID-19 pandemic has manifested that a lack of interoperability can severely inhibit public health responses to emerging infectious diseases. Interoperability is thus critical for building a robust ecosystem of infectious disease surveillance and enhancing preparedness for future outbreaks. The primary enabler for semantic interoperability is ontology. OBJECTIVE This study aims to design the IoT-based management of infectious disease ontology (IoT-MIDO) to enhance data sharing and integration of data collected from IoT-driven patient health monitoring, clinical management of individual patients, and disparate heterogeneous infectious disease surveillance. METHODS The ontology modeling approach was chosen for its semantic richness in knowledge representation, flexibility, ease of extensibility, and capability for knowledge inference and reasoning. The IoT-MIDO was developed using the basic formal ontology (BFO) as the top-level ontology. We reused the classes from existing BFO-based ontologies as much as possible to maximize the interoperability with other BFO-based ontologies and databases that rely on them. We formulated the competency questions as requirements for the ontology to achieve the intended goals. RESULTS We designed an ontology to integrate data from heterogeneous sources, including IoT-driven patient monitoring, clinical management of individual patients, and infectious disease surveillance systems. This integration aims to facilitate the collaboration between clinical care and public health domains. We also demonstrate five use cases using the simplified ontological models to show the potential applications of IoT-MIDO: (1) IoT-driven patient monitoring, risk assessment, early warning, and risk management; (2) clinical management of patients with infectious diseases; (3) epidemic risk analysis for timely response at the public health level; (4) infectious disease surveillance; and (5) transforming patient information into surveillance information. CONCLUSIONS The development of the IoT-MIDO was driven by competency questions. Being able to answer all the formulated competency questions, we successfully demonstrated that our ontology has the potential to facilitate data sharing and integration for orchestrating IoT-driven patient health monitoring in the context of an infectious disease epidemic, clinical patient management, infectious disease surveillance, and epidemic risk analysis. The novelty and uniqueness of the ontology lie in building a bridge to link IoT-based individual patient monitoring and early warning based on patient risk assessment to infectious disease epidemic surveillance at the public health level. The ontology can also serve as a starting point to enable potential decision support systems, providing actionable insights to support public health organizations and practitioners in making informed decisions in a timely manner.
Collapse
Affiliation(s)
- Sachiko Lim
- Department of Computer and Systems Sciences, Stockholm University, Kista, Sweden
| | - Paul Johannesson
- Department of Computer and Systems Sciences, Stockholm University, Kista, Sweden
| |
Collapse
|
14
|
Wang Y, Ren X, Gao K, Chen M, Huang Q, Yan S, Zhu Y, Sun X, Chen Y, Ge L, Gu J, Gao F, Hu W, Hong L, Zhao C, Shang H, Jin Y. Ontology of clinical practice guidelines for Integrated Traditional Chinese and Western Medicine. J Evid Based Med 2024; 17:604-614. [PMID: 39238154 DOI: 10.1111/jebm.12639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 08/28/2024] [Indexed: 09/07/2024]
Abstract
OBJECTIVE Clinical practice guidelines (CPGs) for Integrated Traditional Chinese and Western Medicine (TCM and WM) are important medical documents used to assist medical decision-making and are of great significance for standardizing clinical pathways. However, due to the constraints of text format, it is difficult for Integrated TCM and WM CPGs to play a real role in medical practice. In addition, how to standardize the structure and semantic relationships between Integrated TCM and WM CPG knowledge, and realize the construction of computable, sharable and reliable CPGs, remains an urgent issue to be addressed. Therefore, we are proposing an ontology of CPGs for Integrated TCM and WM. METHODS We first initialized domain concepts and relationships to ensure the accuracy of the ontology knowledge structure. We then screened CPGs that meet the standards for Integrated TCM and WM, analyzed and classified the contents, and extracted the common structures. Based on the seven-step ontology construction method combined with inference-complement, referring to the representation methods and hierarchical relationships of terms and concepts in MeSH, ICD-10, SNOMED-CT, and other ontologies and terminology sets, we formed the concept structure and semantic relationship tables for the ontology. We also achieved the matching and mapping between the ontology and reference ontologies and term sets. Next, we defined the aspects and constraints of properties, selected multiple Integrated TCM and WM CPGs as instances to populate, and used ontology reasoning tools and formulated defined inference rules to reason and extend the ontology. Finally, we evaluated the performance of the ontology. RESULTS The content of the Integrated TCM and WM CPGs is divided into nine parts: basic information, background, development method, clinical question, recommendation, evidence, conclusion, result, and reason for recommendations. The Integrated TCM and WM CPG ontology has 152 classes and defines 90 object properties and 114 data properties, with a maximum classification depth of 4 layers. The terms of disease, drug and examination item names in the ontology have been standardized. CONCLUSIONS This study proposes an Integrated TCM and WM CPG ontology. The ontology adopts a modular design, which has both sharing and scaling ability, and can express rich guideline knowledge. It provides important support for the semantic processing and computational application of guideline documents.
Collapse
Affiliation(s)
- Yongbo Wang
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Xiangying Ren
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Kuang Gao
- School of Computer Science, Wuhan University, Wuhan, China
| | - Mukun Chen
- School of Computer Science, Wuhan University, Wuhan, China
| | - Qiao Huang
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Siyu Yan
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Yan Zhu
- Institute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Xin Sun
- Chinese Evidence-Based Medicine and Cochrane China Center, West China Hospital, Sichuan University, Chengdu, China
| | - Yaolong Chen
- The Evidence-Based Medicine Center, Lanzhou University, Lanzhou, China
| | - Long Ge
- The Evidence-Based Medicine Center, Lanzhou University, Lanzhou, China
| | - Jinguang Gu
- College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China
| | - Feng Gao
- College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China
| | - Wenbin Hu
- School of Computer Science, Wuhan University, Wuhan, China
| | - Liang Hong
- School of Information Management, Wuhan University, Wuhan, China
| | - Chen Zhao
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
- China Center for Evidence Based Traditional Chinese Medicine, Beijing, China
| | - Hongcai Shang
- Dongzhimen Hospital, Beijing University of Traditional Chinese Medicine, Beijing, China
| | - Yinghui Jin
- Center for Evidence-Based and Translational Medicine, Zhongnan Hospital of Wuhan University, Wuhan, China
| |
Collapse
|
15
|
Mulero-Hernández J, Mironov V, Miñarro-Giménez JA, Kuiper M, Fernández-Breis J. Integration of chromosome locations and functional aspects of enhancers and topologically associating domains in knowledge graphs enables versatile queries about gene regulation. Nucleic Acids Res 2024; 52:e69. [PMID: 38967009 PMCID: PMC11347148 DOI: 10.1093/nar/gkae566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 06/12/2024] [Accepted: 06/19/2024] [Indexed: 07/06/2024] Open
Abstract
Knowledge about transcription factor binding and regulation, target genes, cis-regulatory modules and topologically associating domains is not only defined by functional associations like biological processes or diseases but also has a determinative genome location aspect. Here, we exploit these location and functional aspects together to develop new strategies to enable advanced data querying. Many databases have been developed to provide information about enhancers, but a schema that allows the standardized representation of data, securing interoperability between resources, has been lacking. In this work, we use knowledge graphs for the standardized representation of enhancers and topologically associating domains, together with data about their target genes, transcription factors, location on the human genome, and functional data about diseases and gene ontology annotations. We used this schema to integrate twenty-five enhancer datasets and two domain datasets, creating the most powerful integrative resource in this field to date. The knowledge graphs have been implemented using the Resource Description Framework and integrated within the open-access BioGateway knowledge network, generating a resource that contains an interoperable set of knowledge graphs (enhancers, TADs, genes, proteins, diseases, GO terms, and interactions between domains). We show how advanced queries, which combine functional and location restrictions, can be used to develop new hypotheses about functional aspects of gene expression regulation.
Collapse
Affiliation(s)
- Juan Mulero-Hernández
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Vladimir Mironov
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - José Antonio Miñarro-Giménez
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - Jesualdo Tomás Fernández-Breis
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| |
Collapse
|
16
|
Duncan WD, Diller M, Dooley D, Hogan WR, Beverley J. Concretizing plan specifications as realizables within the OBO foundry. J Biomed Semantics 2024; 15:15. [PMID: 39160586 PMCID: PMC11334599 DOI: 10.1186/s13326-024-00315-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 06/23/2024] [Indexed: 08/21/2024] Open
Abstract
BACKGROUND Within the Open Biological and Biomedical Ontology (OBO) Foundry, many ontologies represent the execution of a plan specification as a process in which a realizable entity that concretizes the plan specification, a "realizable concretization" (RC), is realized. This representation, which we call the "RC-account", provides a straightforward way to relate a plan specification to the entity that bears the realizable concretization and the process that realizes the realizable concretization. However, the adequacy of the RC-account has not been evaluated in the scientific literature. In this manuscript, we provide this evaluation and, thereby, give ontology developers sound reasons to use or not use the RC-account pattern. RESULTS Analysis of the RC-account reveals that it is not adequate for representing failed plans. If the realizable concretization is flawed in some way, it is unclear what (if any) relation holds between the realizable entity and the plan specification. If the execution (i.e., realization) of the realizable concretization fails to carry out the actions given in the plan specification, it is unclear under the RC-account how to directly relate the failed execution to the entity carrying out the instructions given in the plan specification. These issues are exacerbated in the presence of changing plans. CONCLUSIONS We propose two solutions for representing failed plans. The first uses the Common Core Ontologies 'prescribed by' relation to connect a plan specification to the entity or process that utilizes the plan specification as a guide. The second, more complex, solution incorporates the process of creating a plan (in the sense of an intention to execute a plan specification) into the representation of executing plan specifications. We hypothesize that the first solution (i.e., use of 'prescribed by') is adequate for most situations. However, more research is needed to test this hypothesis as well as explore the other solutions presented in this manuscript.
Collapse
|
17
|
Romao P, Neuenschwander S, Zbinden C, Seidel K, Sariyar M. An ontology-based tool for modeling and documenting events in neurosurgery. BMC Med Inform Decis Mak 2024; 24:216. [PMID: 39085883 PMCID: PMC11293115 DOI: 10.1186/s12911-024-02615-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 07/17/2024] [Indexed: 08/02/2024] Open
Abstract
BACKGROUND Intraoperative neurophysiological monitoring (IOM) plays a pivotal role in enhancing patient safety during neurosurgical procedures. This vital technique involves the continuous measurement of evoked potentials to provide early warnings and ensure the preservation of critical neural structures. One of the primary challenges has been the effective documentation of IOM events with semantically enriched characterizations. This study aimed to address this challenge by developing an ontology-based tool. METHODS We structured the development of the IOM Documentation Ontology (IOMDO) and the associated tool into three distinct phases. The initial phase focused on the ontology's creation, drawing from the OBO (Open Biological and Biomedical Ontology) principles. The subsequent phase involved agile software development, a flexible approach to encapsulate the diverse requirements and swiftly produce a prototype. The last phase entailed practical evaluation within real-world documentation settings. This crucial stage enabled us to gather firsthand insights, assessing the tool's functionality and efficacy. The observations made during this phase formed the basis for essential adjustments to ensure the tool's productive utilization. RESULTS The core entities of the ontology revolve around central aspects of IOM, including measurements characterized by timestamp, type, values, and location. Concepts and terms of several ontologies were integrated into IOMDO, e.g., the Foundation Model of Anatomy (FMA), the Human Phenotype Ontology (HPO) and the ontology for surgical process models (OntoSPM) related to general surgical terms. The software tool developed for extending the ontology and the associated knowledge base was built with JavaFX for the user-friendly frontend and Apache Jena for the robust backend. The tool's evaluation involved test users who unanimously found the interface accessible and usable, even for those without extensive technical expertise. CONCLUSIONS Through the establishment of a structured and standardized framework for characterizing IOM events, our ontology-based tool holds the potential to enhance the quality of documentation, benefiting patient care by improving the foundation for informed decision-making. Furthermore, researchers can leverage the semantically enriched data to identify trends, patterns, and areas for surgical practice enhancement. To optimize documentation through ontology-based approaches, it's crucial to address potential modeling issues that are associated with the Ontology of Adverse Events.
Collapse
Affiliation(s)
| | | | - Chantal Zbinden
- Department of Neurosurgery, Inselspital, University Hospital, Bern, Switzerland
| | - Kathleen Seidel
- Department of Neurosurgery, Inselspital, University Hospital, Bern, Switzerland
| | - Murat Sariyar
- Bern University of Applied Sciences, Bern, Switzerland.
| |
Collapse
|
18
|
Bartnik A, Serra LM, Smith M, Duncan WD, Wishnie L, Ruttenberg A, Dwyer MG, Diehl AD. MRIO: the Magnetic Resonance Imaging Acquisition and Analysis Ontology. Neuroinformatics 2024; 22:269-283. [PMID: 38763990 DOI: 10.1007/s12021-024-09664-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/22/2024] [Indexed: 05/21/2024]
Abstract
Magnetic resonance imaging of the brain is a useful tool in both the clinic and research settings, aiding in the diagnosis and treatments of neurological disease and expanding our knowledge of the brain. However, there are many challenges inherent in managing and analyzing MRI data, due in large part to the heterogeneity of data acquisition. To address this, we have developed MRIO, the Magnetic Resonance Imaging Acquisition and Analysis Ontology. MRIO provides well-reasoned classes and logical axioms for the acquisition of several MRI acquisition types and well-known, peer-reviewed analysis software, facilitating the use of MRI data. These classes provide a common language for the neuroimaging research process and help standardize the organization and analysis of MRI data for reproducible datasets. We also provide queries for automated assignment of analyses for given MRI types. MRIO aids researchers in managing neuroimaging studies by helping organize and annotate MRI data and integrating with existing standards such as Digital Imaging and Communications in Medicine and the Brain Imaging Data Structure, enhancing reproducibility and interoperability. MRIO was constructed according to Open Biomedical Ontologies Foundry principles and has contributed several classes to the Ontology for Biomedical Investigations to help bridge neuroimaging data to other domains. MRIO addresses the need for a "common language" for MRI that can help manage the neuroimaging research, by enabling researchers to identify appropriate analyses for sets of scans and facilitating data organization and reporting.
Collapse
Affiliation(s)
- Alexander Bartnik
- Buffalo Neuroimaging Analysis Center, Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Lucas M Serra
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Mackenzie Smith
- Buffalo Neuroimaging Analysis Center, Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - William D Duncan
- College of Dentistry, University of Florida, Gainesville, FL, USA
| | - Lauren Wishnie
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Alan Ruttenberg
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Michael G Dwyer
- Buffalo Neuroimaging Analysis Center, Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Alexander D Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA.
| |
Collapse
|
19
|
Mehta D, Whorton JM, Shahriari R, Ragan ED, Bona JP, Hogan WR, Sexton KW, Brochhausen M. Expanding the Ontology of Organizational Structures of Trauma Centers and Trauma Systems. CEUR WORKSHOP PROCEEDINGS 2024; 3939:short3. [PMID: 40196293 PMCID: PMC11973606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/09/2025]
Abstract
A knowledge gap exists regarding the impact of organizational parameters of trauma centers and patient outcomes. This is partially due to such organizational parameters being understudied. The Ontology of Organizational Structures of Trauma Centers and Trauma Systems (OOSTT) provides a controlled vocabulary to study that specific area. It is used in tools created by the TIPTOE project to provide trauma stakeholders with novel insights on role of organizational parameters and patient outcomes. This paper reports the extension of OOSTT to cover relevant patient outcome measures.
Collapse
Affiliation(s)
- Diya Mehta
- Harvey Mudd College, 301 Platt Blvd, Claremont, CA 91711,
USA
| | - Justin M. Whorton
- University of Arkansas for Medical Sciences, 4301 W Markham
St, Little Rock, AR 72205, USA
| | - Reza Shahriari
- University of Florida, 201 Criser Hall PO Box 114000
Gainesville, FL 32611, USA
| | - Eric D Ragan
- University of Florida, 201 Criser Hall PO Box 114000
Gainesville, FL 32611, USA
| | - Jonathan P. Bona
- University of Arkansas for Medical Sciences, 4301 W Markham
St, Little Rock, AR 72205, USA
| | - William R. Hogan
- Medical College of Wisconsin, 8701 W Watertown Plank Rd,
Milwaukee, WI 53226, USA
| | - Kevin W. Sexton
- University of Arkansas for Medical Sciences, 4301 W Markham
St, Little Rock, AR 72205, USA
| | - Mathias Brochhausen
- University of Arkansas for Medical Sciences, 4301 W Markham
St, Little Rock, AR 72205, USA
| |
Collapse
|
20
|
Zheng J, Li X, Masci AM, Kahn H, Huffman A, Asfaw E, Pan Y, Guo J, He V, Song J, Seleznev AI, Lin AY, He Y. Empowering standardization of cancer vaccines through ontology: enhanced modeling and data analysis. J Biomed Semantics 2024; 15:12. [PMID: 38890666 PMCID: PMC11186274 DOI: 10.1186/s13326-024-00312-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 05/21/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND The exploration of cancer vaccines has yielded a multitude of studies, resulting in a diverse collection of information. The heterogeneity of cancer vaccine data significantly impedes effective integration and analysis. While CanVaxKB serves as a pioneering database for over 670 manually annotated cancer vaccines, it is important to distinguish that a database, on its own, does not offer the structured relationships and standardized definitions found in an ontology. Recognizing this, we expanded the Vaccine Ontology (VO) to include those cancer vaccines present in CanVaxKB that were not initially covered, enhancing VO's capacity to systematically define and interrelate cancer vaccines. RESULTS An ontology design pattern (ODP) was first developed and applied to semantically represent various cancer vaccines, capturing their associated entities and relations. By applying the ODP, we generated a cancer vaccine template in a tabular format and converted it into the RDF/OWL format for generation of cancer vaccine terms in the VO. '12MP vaccine' was used as an example of cancer vaccines to demonstrate the application of the ODP. VO also reuses reference ontology terms to represent entities such as cancer diseases and vaccine hosts. Description Logic (DL) and SPARQL query scripts were developed and used to query for cancer vaccines based on different vaccine's features and to demonstrate the versatility of the VO representation. Additionally, ontological modeling was applied to illustrate cancer vaccine related concepts and studies for in-depth cancer vaccine analysis. A cancer vaccine-specific VO view, referred to as "CVO," was generated, and it contains 928 classes including 704 cancer vaccines. The CVO OWL file is publicly available on: http://purl.obolibrary.org/obo/vo/cvo.owl , for sharing and applications. CONCLUSION To facilitate the standardization, integration, and analysis of cancer vaccine data, we expanded the Vaccine Ontology (VO) to systematically model and represent cancer vaccines. We also developed a pipeline to automate the inclusion of cancer vaccines and associated terms in the VO. This not only enriches the data's standardization and integration, but also leverages ontological modeling to deepen the analysis of cancer vaccine information, maximizing benefits for researchers and clinicians. AVAILABILITY The VO-cancer GitHub website is: https://github.com/vaccineontology/VO/tree/master/CVO .
Collapse
Affiliation(s)
- Jie Zheng
- Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Xingxian Li
- College of Literature, Science, and the Arts, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Anna Maria Masci
- Data Impact and Governance, Technology Data and Innovation, University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Hayleigh Kahn
- College of Literature, Science, and the Arts, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Anthony Huffman
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Eliyas Asfaw
- University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Yuanyi Pan
- Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Jinjing Guo
- Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Virginia He
- The College of Brown University, Brown University, Providence, RI, 02912, USA
| | - Justin Song
- College of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Andrey I Seleznev
- Dietrich School of Arts and Sciences, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Asiyah Yu Lin
- Axle Research and Technology, Rockville, MD, 20852, USA
| | - Yongqun He
- Unit for Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Rogel Cancer Center, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
21
|
Faria D, Eugénio P, Contreiras Silva M, Balbi L, Bedran G, Kallor AA, Nunes S, Palkowski A, Waleron M, Alfaro JA, Pesquita C. The Immunopeptidomics Ontology (ImPO). Database (Oxford) 2024; 2024:baae014. [PMID: 38857186 PMCID: PMC11164101 DOI: 10.1093/database/baae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 11/30/2023] [Accepted: 02/22/2024] [Indexed: 06/12/2024]
Abstract
The adaptive immune response plays a vital role in eliminating infected and aberrant cells from the body. This process hinges on the presentation of short peptides by major histocompatibility complex Class I molecules on the cell surface. Immunopeptidomics, the study of peptides displayed on cells, delves into the wide variety of these peptides. Understanding the mechanisms behind antigen processing and presentation is crucial for effectively evaluating cancer immunotherapies. As an emerging domain, immunopeptidomics currently lacks standardization-there is neither an established terminology nor formally defined semantics-a critical concern considering the complexity, heterogeneity, and growing volume of data involved in immunopeptidomics studies. Additionally, there is a disconnection between how the proteomics community delivers the information about antigen presentation and its uptake by the clinical genomics community. Considering the significant relevance of immunopeptidomics in cancer, this shortcoming must be addressed to bridge the gap between research and clinical practice. In this work, we detail the development of the ImmunoPeptidomics Ontology, ImPO, the first effort at standardizing the terminology and semantics in the domain. ImPO aims to encapsulate and systematize data generated by immunopeptidomics experimental processes and bioinformatics analysis. ImPO establishes cross-references to 24 relevant ontologies, including the National Cancer Institute Thesaurus, Mondo Disease Ontology, Logical Observation Identifier Names and Codes and Experimental Factor Ontology. Although ImPO was developed using expert knowledge to characterize a large and representative data collection, it may be readily used to encode other datasets within the domain. Ultimately, ImPO facilitates data integration and analysis, enabling querying, inference and knowledge generation and importantly bridging the gap between the clinical proteomics and genomics communities. As the field of immunogenomics uses protein-level immunopeptidomics data, we expect ImPO to play a key role in supporting a rich and standardized description of the large-scale data that emerging high-throughput technologies are expected to bring in the near future. Ontology URL: https://zenodo.org/record/10237571 Project GitHub: https://github.com/liseda-lab/ImPO/blob/main/ImPO.owl.
Collapse
Affiliation(s)
- Daniel Faria
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Rua Alves Redol, 9, Lisboa 1000-029, Portugal
| | - Patrícia Eugénio
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Marta Contreiras Silva
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Laura Balbi
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Georges Bedran
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
| | - Ashwin Adrian Kallor
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
| | - Susana Nunes
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| | - Aleksander Palkowski
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
| | - Michal Waleron
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
| | - Javier A Alfaro
- International Centre for Cancer Vaccine Science, University of Gdansk, ul. Kładki 24, Gdańsk 80-822, Poland
- Department of Biochemistry and Microbiology, University of Victoria, 3800 Finnerty Rd, Victoria, British Columbia, BC V8P 5C2, Canada
- Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, Old College, South Bridge, Edinburgh, EH8 9YL, UK
- The Canadian Association for Responsible AI in Medicine, Victoria, Canada
| | - Catia Pesquita
- LASIGE, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, Lisboa 1749-016, Portugal
| |
Collapse
|
22
|
Lars V, Tobias K, Robert H. Semantic units: organizing knowledge graphs into semantically meaningful units of representation. J Biomed Semantics 2024; 15:7. [PMID: 38802877 PMCID: PMC11131308 DOI: 10.1186/s13326-024-00310-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 05/14/2024] [Indexed: 05/29/2024] Open
Abstract
BACKGROUND In today's landscape of data management, the importance of knowledge graphs and ontologies is escalating as critical mechanisms aligned with the FAIR Guiding Principles-ensuring data and metadata are Findable, Accessible, Interoperable, and Reusable. We discuss three challenges that may hinder the effective exploitation of the full potential of FAIR knowledge graphs. RESULTS We introduce "semantic units" as a conceptual solution, although currently exemplified only in a limited prototype. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs by adding another layer of triples on top of the conventional data layer. Semantic units and their subgraphs are represented by their own resource that instantiates a corresponding semantic unit class. We distinguish statement and compound units as basic categories of semantic units. A statement unit is the smallest, independent proposition that is semantically meaningful for a human reader. Depending on the relation of its underlying proposition, it consists of one or more triples. Organizing a knowledge graph into statement units results in a partition of the graph, with each triple belonging to exactly one statement unit. A compound unit, on the other hand, is a semantically meaningful collection of statement and compound units that form larger subgraphs. Some semantic units organize the graph into different levels of representational granularity, others orthogonally into different types of granularity trees or different frames of reference, structuring and organizing the knowledge graph into partially overlapping, partially enclosed subgraphs, each of which can be referenced by its own resource. CONCLUSIONS Semantic units, applicable in RDF/OWL and labeled property graphs, offer support for making statements about statements and facilitate graph-alignment, subgraph-matching, knowledge graph profiling, and for management of access restrictions to sensitive data. Additionally, we argue that organizing the graph into semantic units promotes the differentiation of ontological and discursive information, and that it also supports the differentiation of multiple frames of reference within the graph.
Collapse
Affiliation(s)
- Vogt Lars
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hanover, Germany.
| | - Kuhn Tobias
- Department of Computer Science, Vrije Universiteit, Amsterdam, Netherlands
| | - Hoehndorf Robert
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia
| |
Collapse
|
23
|
Baudrit C, Fernandez C, Couteaux J, Ndiaye A. Electronic knowledge books (eK-Books) as a medium to capitalise on and transfer scientific, engineering, operational, technological and craft knowledge. PLoS One 2024; 19:e0299150. [PMID: 38758949 PMCID: PMC11101106 DOI: 10.1371/journal.pone.0299150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 02/06/2024] [Indexed: 05/19/2024] Open
Abstract
The capitalisation on and transfer of technological, engineering and scientific knowledge associated with empirical know-how is an important issue for the sustainability and development of manufacturing. Indeed, certain sectors of industry are facing the increasing ageing of the labour force, recruitment difficulties and high staff turnover, leading to a loss of knowledge and know-how. In a context of numerical and digital transition and the migration of processes to industry 4.0, one of major challenges manufacturers face today is their capacity to build intelligent platforms for acquiring, storing and transferring their know-how and knowledge. It is crucial to create new media and tools for staff training and development capable of capturing knowledge and reusing it to create a project history through expertise and data collection. This paper presents the methodology and guidelines for implementing electronic knowledge books (eK-Books), along with their uses. The eK-Book is a semantic web-based hypertext medium (channel) allowing stakeholders to capitalise on, structure and transfer knowledge by using concept maps, process maps, influence graphs, downloadable documents, web pages and hypermedia knowledge sheets. They are intended for engineers, expert or novice technicians, manufacturers, sector coordinators and plant managers, as well as trainers and learners. They are usable and manageable in all types of environments and with different levels of accessibility. This paper highlights (1) the transfer knowledge capacity of eK-Books and (2) their usability in two agri-food sectors namely (1) the cheese sector with protected designation of origin (PDO) and protected geographical indication (PGI), and (2) the butchery and cold meat sectors.
Collapse
Affiliation(s)
- Cédric Baudrit
- Institut national de recherche pour l’agriculture, l’alimentation et l’environnement, Institut de mécanique et d’ingénierie, Talence, France
| | - Christophe Fernandez
- Institut national de recherche pour l’agriculture, l’alimentation et l’environnement, Institut de mécanique et d’ingénierie, Talence, France
| | - Julien Couteaux
- Université de Bordeaux, Institut de mécanique et d’ingénierie, Talence, France
| | - Amadou Ndiaye
- Institut national de recherche pour l’agriculture, l’alimentation et l’environnement, Institut de mécanique et d’ingénierie, Talence, France
| |
Collapse
|
24
|
García S A, Costa M, García-Zarzoso A, Pastor O. CardioHotspots: a database of mutational hotspots for cardiac disorders. Database (Oxford) 2024; 2024:0. [PMID: 38752292 PMCID: PMC11096770 DOI: 10.1093/database/baae034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 03/21/2024] [Accepted: 04/20/2024] [Indexed: 05/19/2024]
Abstract
Mutational hotspots are DNA regions with an abnormally high frequency of genetic variants. Identifying whether a variant is located in a mutational hotspot is critical for determining the variant's role in disorder predisposition, development, and treatment response. Despite their significance, current databases on mutational hotspots are limited to the oncology domain. However, identifying mutational hotspots is critical for any disorder in which genetics plays a role. This is true for the world's leading cause of death: cardiac disorders. In this work, we present CardioHotspots, a literature-based database of manually curated hotspots for cardiac diseases. This is the only database we know of that provides high-quality and easily accessible information about hotspots associated with cardiac disorders. CardioHotspots is publicly accessible via a web-based platform (https://genomics-hub.pros.dsic.upv.es:3099/). Database URL: https://genomics-hub.pros.dsic.upv.es:3099/.
Collapse
Affiliation(s)
- Alberto García S
- *Corresponding author: Tel: +34 96 387 70 00; Fax: +34 96 387 90 09;
| | - Mireia Costa
- PROS Research Center, VRAIN, Polytechnic University of Valencia, Camino de Vera S/N, Valencia 46022, Spain
| | - Alba García-Zarzoso
- PROS Research Center, VRAIN, Polytechnic University of Valencia, Camino de Vera S/N, Valencia 46022, Spain
| | - Oscar Pastor
- PROS Research Center, VRAIN, Polytechnic University of Valencia, Camino de Vera S/N, Valencia 46022, Spain
| |
Collapse
|
25
|
Ross KE, Bastian FB, Buys M, Cook CE, D’Eustachio P, Harrison M, Hermjakob H, Li D, Lord P, Natale DA, Peters B, Sternberg PW, Su AI, Thakur M, Thomas PD, Bateman A. Perspectives on tracking data reuse across biodata resources. BIOINFORMATICS ADVANCES 2024; 4:vbae057. [PMID: 38721398 PMCID: PMC11076920 DOI: 10.1093/bioadv/vbae057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 03/13/2024] [Accepted: 04/11/2024] [Indexed: 06/14/2024]
Abstract
Motivation Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge. Results The article reports on a one-day virtual workshop hosted by the UniProt Consortium in March 2023, attended by representatives from biodata resources, experts in data management, and NIH program managers. Workshop discussions focused on strategies for tracking data reuse, best practices for reusing data, and the challenges associated with data reuse and tracking. Surveys and discussions showed that data reuse is widespread, but critical information for reproducibility is sometimes lacking. Challenges include costs of tracking data reuse, tensions between tracking data and open sharing, restrictive licenses, and difficulties in tracking commercial data use. Recommendations that emerged from the discussion include: development of standardized formats for documenting data reuse, education about the obstacles posed by restrictive licenses, and continued recognition by funding agencies that data management is a critical activity that requires dedicated resources. Availability and implementation Summaries of survey results are available at: https://docs.google.com/forms/d/1j-VU2ifEKb9C-sW6l3ATB79dgHdRk5v_lESv2hawnso/viewanalytics (survey of data providers) and https://docs.google.com/forms/d/18WbJFutUd7qiZoEzbOytFYXSfWFT61hVce0vjvIwIjk/viewanalytics (survey of users).
Collapse
Affiliation(s)
- Karen E Ross
- Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, United States
| | - Frederic B Bastian
- Evolutionary Bioinformatics Group, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
| | | | | | - Peter D’Eustachio
- Department of Biochemistry & Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10012, United States
| | - Melissa Harrison
- Literature Services, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Henning Hermjakob
- Molecular Systems, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Donghui Li
- Chan Zuckerberg Initiative, Redwood City, CA 94063, United States
| | - Phillip Lord
- School of Computing, Newcastle University, Newcastle upon Tyne NE4 5TG, United Kingdom
| | - Darren A Natale
- Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, United States
| | - Bjoern Peters
- Center for Vaccine Innovation, La Jolla Institute of Immunology, La Jolla, CA 92037, United States
| | - Paul W Sternberg
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, United States
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Matthew Thakur
- Data Services, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90089, United States
| | - Alex Bateman
- MSCB, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| |
Collapse
|
26
|
Caballero-Oteyza A, Crisponi L, Peng XP, Yauy K, Volpi S, Giardino S, Freeman AF, Grimbacher B, Proietti M. GenIA, the Genetic Immunology Advisor database for inborn errors of immunity. J Allergy Clin Immunol 2024; 153:831-843. [PMID: 38040041 DOI: 10.1016/j.jaci.2023.11.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 10/23/2023] [Accepted: 11/15/2023] [Indexed: 12/03/2023]
Abstract
BACKGROUND To date, no publicly accessible platform has captured and synthesized all of the layered dimensions of genotypic, phenotypic, and mechanistic information published in the field of inborn errors of immunity (IEIs). Such a platform would represent the extensive and complex landscape of IEIs and could increase the rate of diagnosis in patients with a suspected IEI, which remains unacceptably low. OBJECTIVE Our aim was to create an expertly curated, patient-centered, multidimensional IEI database that enables aggregation and sophisticated data interrogation and promotes involvement from diverse stakeholders across the community. METHODS The database structure was designed following a subject-centered model and written in Structured Query Language (SQL). The web application is written in Hypertext Preprocessor (PHP), Hypertext Markup Language (HTML), Cascading Style Sheets (CSS), and JavaScript. All data stored in the Genetic Immunology Advisor (GenIA) are extracted by manually reviewing published research articles. RESULTS We completed data collection and curation for 24 pilot genes. Using these data, we have exemplified how GenIA can provide quick access to structured, longitudinal, more thorough, comprehensive, and up-to-date IEI knowledge than do currently existing databases, such as ClinGen, Human Phenotype Ontology (HPO), ClinVar, or Online Mendelian Inheritance in Man (OMIM), with which GenIA intends to dovetail. CONCLUSIONS GenIA strives to accurately capture the extensive genetic, mechanistic, and phenotypic heterogeneity found across IEIs, as well as genetic paradigms and diagnostic pitfalls associated with individual genes and conditions. The IEI community's involvement will help promote GenIA as an enduring resource that supports and improves knowledge sharing, research, diagnosis, and care for patients with genetic immune disease.
Collapse
Affiliation(s)
- Andrés Caballero-Oteyza
- Clinic for Immunology and Rheumatology, Hanover Medical School, Hanover, Germany; RESiST-Cluster of Excellence 2155, Hanover Medical School, Hanover, Germany; Institute for Immunodeficiency, Center for Chronic Immunodeficiency, University Hospital Freiburg, Freiburg, Germany.
| | - Laura Crisponi
- Institute for Genetic and Biomedical Research, The National Research Council, Monserrato, Cagliari, Italy
| | - Xiao P Peng
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Md
| | - Kevin Yauy
- University of Montpellier, LIRMM, CNRS, Reference Center for Congenital Anomalies, Clinical Genetic Unit, Montpellier University Hospital Center, Montpellier, France
| | - Stefano Volpi
- Center for Autoinflammatory Diseases and Immunodeficiencies, Pediatric Rheumatology Clinic, IRCCS Istituto Giannina Gaslini, Genova, and DINOGMI, Università degli Studi di Genova, Genova, Italy
| | - Stefano Giardino
- Hematopoietic Stem Cell Transplantation Unit, IRCCS Istituto Giannina Gaslini, Genova, Italy
| | - Alexandra F Freeman
- Laboratory of Clinical Immunology and Microbiology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Md
| | - Bodo Grimbacher
- Institute for Immunodeficiency, Center for Chronic Immunodeficiency, University Hospital Freiburg, Freiburg, Germany; Clinic of Rheumatology and Clinical Immunology, Center for Chronic Immunodeficiency, Medical Center, Faculty of Medicine, Albert-Ludwigs University of Freiburg, Freiburg, Germany; RESiST-Cluster of Excellence 2155, Hanover Medical School, Satellite Center Freiburg, Freiburg, Germany; German Center for Infection Research, Satellite Center Freiburg, Freiburg, Germany; Centre for Integrative Biological Signalling Studies, Albert-Ludwigs University of Freiburg, Freiburg, Germany
| | - Michele Proietti
- Clinic for Immunology and Rheumatology, Hanover Medical School, Hanover, Germany; RESiST-Cluster of Excellence 2155, Hanover Medical School, Hanover, Germany; Institute for Immunodeficiency, Center for Chronic Immunodeficiency, University Hospital Freiburg, Freiburg, Germany.
| |
Collapse
|
27
|
Behr AS, Borgelt H, Kockmann N. Ontologies4Cat: investigating the landscape of ontologies for catalysis research data management. J Cheminform 2024; 16:16. [PMID: 38326906 PMCID: PMC10851519 DOI: 10.1186/s13321-024-00807-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 01/22/2024] [Indexed: 02/09/2024] Open
Abstract
As scientific digitization advances it is imperative ensuring data is Findable, Accessible, Interoperable, and Reusable (FAIR) for machine-processable data. Ontologies play a vital role in enhancing data FAIRness by explicitly representing knowledge in a machine-understandable format. Research data in catalysis research often exhibits complexity and diversity, necessitating a respectively broad collection of ontologies. While ontology portals such as EBI OLS and BioPortal aid in ontology discovery, they lack deep classification, while quality metrics for ontology reusability and domains are absent for the domain of catalysis research. Thus, this work provides an approach for systematic collection of ontology metadata with focus on the catalysis research data value chain. By classifying ontologies by subdomains of catalysis research, the approach is offering efficient comparison across ontologies. Furthermore, a workflow and codebase is presented, facilitating representation of the metadata on GitHub. Finally, a method is presented to automatically map the classes contained in the ontologies of the metadata collection against each other, providing further insights on relatedness of the ontologies listed. The presented methodology is designed for its reusability, enabling its adaptation to other ontology collections or domains of knowledge. The ontology metadata taken up for this work and the code developed and described in this work are available in a GitHub repository at: https://github.com/nfdi4cat/Ontology-Overview-of-NFDI4Cat .
Collapse
Affiliation(s)
- Alexander S Behr
- Laboratory of Equipment Design, Faculty of Biochemical and Chemical Engineering, TU-Dortmund University, Emil-Figge-Strasse 68, 44139, Dortmund, NRW, Germany.
| | - Hendrik Borgelt
- Laboratory of Equipment Design, Faculty of Biochemical and Chemical Engineering, TU-Dortmund University, Emil-Figge-Strasse 68, 44139, Dortmund, NRW, Germany
| | - Norbert Kockmann
- Laboratory of Equipment Design, Faculty of Biochemical and Chemical Engineering, TU-Dortmund University, Emil-Figge-Strasse 68, 44139, Dortmund, NRW, Germany
| |
Collapse
|
28
|
Beverley J, Babcock S, Carvalho G, Cowell LG, Duesing S, He Y, Hurley R, Merrell E, Scheuermann RH, Smith B. Coordinating virus research: The Virus Infectious Disease Ontology. PLoS One 2024; 19:e0285093. [PMID: 38236918 PMCID: PMC10796065 DOI: 10.1371/journal.pone.0285093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 04/12/2023] [Indexed: 01/22/2024] Open
Abstract
The COVID-19 pandemic prompted immense work on the investigation of the SARS-CoV-2 virus. Rapid, accurate, and consistent interpretation of generated data is thereby of fundamental concern. Ontologies-structured, controlled, vocabularies-are designed to support consistency of interpretation, and thereby to prevent the development of data silos. This paper describes how ontologies are serving this purpose in the COVID-19 research domain, by following principles of the Open Biological and Biomedical Ontology (OBO) Foundry and by reusing existing ontologies such as the Infectious Disease Ontology (IDO) Core, which provides terminological content common to investigations of all infectious diseases. We report here on the development of an IDO extension, the Virus Infectious Disease Ontology (VIDO), a reference ontology covering viral infectious diseases. We motivate term and definition choices, showcase reuse of terms from existing OBO ontologies, illustrate how ontological decisions were motivated by relevant life science research, and connect VIDO to the Coronavirus Infectious Disease Ontology (CIDO). We next use terms from these ontologies to annotate selections from life science research on SARS-CoV-2, highlighting how ontologies employing a common upper-level vocabulary may be seamlessly interwoven. Finally, we outline future work, including bacteria and fungus infectious disease reference ontologies currently under development, then cite uses of VIDO and CIDO in host-pathogen data analytics, electronic health record annotation, and ontology conflict-resolution projects.
Collapse
Affiliation(s)
- John Beverley
- Department of Philosophy, University at Buffalo, Buffalo, NY, United States of America
- National Center for Ontological Research, Buffalo, NY, United States of America
| | - Shane Babcock
- National Center for Ontological Research, Buffalo, NY, United States of America
- Air Force Research Laboratory, Wright Patterson Air Force Base, Riverside, OH, United States of America
| | - Gustavo Carvalho
- Department of Cognitive Science, Northwestern University, Evanston, IL, United States of America
| | - Lindsay G. Cowell
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, United States of America
| | - Sebastian Duesing
- Department of Philosophy, Loyola University, Chicago, IL, United States of America
| | - Yongqun He
- Computational Medicine and Bioinformatics, University of Michigan Medical School, He Group, Ann Arbor, MI, United States of America
| | - Regina Hurley
- National Center for Ontological Research, Buffalo, NY, United States of America
- Department of Philosophy, Northwestern University, Evanston, IL, United States of America
| | - Eric Merrell
- Department of Philosophy, University at Buffalo, Buffalo, NY, United States of America
- National Center for Ontological Research, Buffalo, NY, United States of America
| | - Richard H. Scheuermann
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA, United States of America
- Department of Pathology, University of California, San Diego, CA, United States of America
- Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, CA, United States of America
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, NY, United States of America
- National Center for Ontological Research, Buffalo, NY, United States of America
| |
Collapse
|
29
|
Niehues A, de Visser C, Hagenbeek FA, Kulkarni P, Pool R, Karu N, Kindt ASD, Singh G, Vermeiren RRJM, Boomsma DI, van Dongen J, ’t Hoen PAC, van Gool AJ. A multi-omics data analysis workflow packaged as a FAIR Digital Object. Gigascience 2024; 13:giad115. [PMID: 38217405 PMCID: PMC10787363 DOI: 10.1093/gigascience/giad115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 11/14/2023] [Accepted: 12/10/2023] [Indexed: 01/15/2024] Open
Abstract
BACKGROUND Applying good data management and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles in research projects can help disentangle knowledge discovery, study result reproducibility, and data reuse in future studies. Based on the concepts of the original FAIR principles for research data, FAIR principles for research software were recently proposed. FAIR Digital Objects enable discovery and reuse of Research Objects, including computational workflows for both humans and machines. Practical examples can help promote the adoption of FAIR practices for computational workflows in the research community. We developed a multi-omics data analysis workflow implementing FAIR practices to share it as a FAIR Digital Object. FINDINGS We conducted a case study investigating shared patterns between multi-omics data and childhood externalizing behavior. The analysis workflow was implemented as a modular pipeline in the workflow manager Nextflow, including containers with software dependencies. We adhered to software development practices like version control, documentation, and licensing. Finally, the workflow was described with rich semantic metadata, packaged as a Research Object Crate, and shared via WorkflowHub. CONCLUSIONS Along with the packaged multi-omics data analysis workflow, we share our experiences adopting various FAIR practices and creating a FAIR Digital Object. We hope our experiences can help other researchers who develop omics data analysis workflows to turn FAIR principles into practice.
Collapse
Affiliation(s)
- Anna Niehues
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
| | - Casper de Visser
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Fiona A Hagenbeek
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Purva Kulkarni
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
- Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - René Pool
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Naama Karu
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, 2333 AL Leiden, The Netherlands
| | - Alida S D Kindt
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, 2333 AL Leiden, The Netherlands
| | - Gurnoor Singh
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Robert R J M Vermeiren
- Department of Child and Adolescent Psychiatry, LUMC-Curium, Leiden University Medical Center, 2342 AK Oegstgeest, The Netherlands
| | - Dorret I Boomsma
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
- Amsterdam Reproduction & Development (AR&D) Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Jenny van Dongen
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
- Amsterdam Reproduction & Development (AR&D) Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Peter A C ’t Hoen
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Alain J van Gool
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
| |
Collapse
|
30
|
Eloe-Fadrosh EA, Mungall CJ, Miller MA, Smith M, Patil SS, Kelliher JM, Johnson LYD, Rodriguez FE, Chain PSG, Hu B, Thornton MB, McCue LA, McHardy AC, Harris NL, Reddy TBK, Mukherjee S, Hunter CI, Walls R, Schriml LM. A Practical Approach to Using the Genomic Standards Consortium MIxS Reporting Standard for Comparative Genomics and Metagenomics. Methods Mol Biol 2024; 2802:587-609. [PMID: 38819573 DOI: 10.1007/978-1-0716-3838-5_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Comparative analysis of (meta)genomes necessitates aggregation, integration, and synthesis of well-annotated data using standards. The Genomic Standards Consortium (GSC) collaborates with the research community to develop and maintain the Minimum Information about any (x) Sequence (MIxS) reporting standard for genomic data. To facilitate the use of the GSC's MIxS reporting standard, we provide a description of the structure and terminology, how to navigate ontologies for required terms in MIxS, and demonstrate practical usage through a soil metagenome example.
Collapse
Affiliation(s)
- Emiley A Eloe-Fadrosh
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - Christopher J Mungall
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Mark Andrew Miller
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Montana Smith
- Pacific Northwest National Laboratory, Richland, WA, USA
| | - Sujay Sanjeev Patil
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Julia M Kelliher
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Leah Y D Johnson
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | | | - Patrick S G Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Bin Hu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Michael B Thornton
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Lee Ann McCue
- Pacific Northwest National Laboratory, Richland, WA, USA
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Nomi L Harris
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - T B K Reddy
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Supratim Mukherjee
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Christopher I Hunter
- GigaScience Press, Hong Kong Science Park, Pak Shek Kok, New Territories, Hong Kong
| | | | - Lynn M Schriml
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| |
Collapse
|
31
|
Carmody LC, Gargano MA, Toro S, Vasilevsky NA, Adam MP, Blau H, Chan LE, Gomez-Andres D, Horvath R, Kraus ML, Ladewig MS, Lewis-Smith D, Lochmüller H, Matentzoglu NA, Munoz-Torres MC, Schuetz C, Seitz B, Similuk MN, Sparks TN, Strauss T, Swietlik EM, Thompson R, Zhang XA, Mungall CJ, Haendel MA, Robinson PN. The Medical Action Ontology: A tool for annotating and analyzing treatments and clinical management of human disease. MED 2023; 4:913-927.e3. [PMID: 37963467 PMCID: PMC10842845 DOI: 10.1016/j.medj.2023.10.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/31/2023] [Accepted: 10/14/2023] [Indexed: 11/16/2023]
Abstract
BACKGROUND Navigating the clinical literature to determine the optimal clinical management for rare diseases presents significant challenges. We introduce the Medical Action Ontology (MAxO), an ontology specifically designed to organize medical procedures, therapies, and interventions. METHODS MAxO incorporates logical structures that link MAxO terms to numerous other ontologies within the OBO Foundry. Term development involves a blend of manual and semi-automated processes. Additionally, we have generated annotations detailing diagnostic modalities for specific phenotypic abnormalities defined by the Human Phenotype Ontology (HPO). We introduce a web application, POET, that facilitates MAxO annotations for specific medical actions for diseases using the Mondo Disease Ontology. FINDINGS MAxO encompasses 1,757 terms spanning a wide range of biomedical domains, from human anatomy and investigations to the chemical and protein entities involved in biological processes. These terms annotate phenotypic features associated with specific disease (using HPO and Mondo). Presently, there are over 16,000 MAxO diagnostic annotations that target HPO terms. Through POET, we have created 413 MAxO annotations specifying treatments for 189 rare diseases. CONCLUSIONS MAxO offers a computational representation of treatments and other actions taken for the clinical management of patients. Its development is closely coupled to Mondo and HPO, broadening the scope of our computational modeling of diseases and phenotypic features. We invite the community to contribute disease annotations using POET (https://poet.jax.org/). MAxO is available under the open-source CC-BY 4.0 license (https://github.com/monarch-initiative/MAxO). FUNDING NHGRI 1U24HG011449-01A1 and NHGRI 5RM1HG010860-04.
Collapse
Affiliation(s)
- Leigh C Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Sabrina Toro
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Margaret P Adam
- University of Washington School of Medicine, Seattle, WA, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - David Gomez-Andres
- Pediatric Neurology, Vall d'Hebron Institut de Recerca (VHIR), Hospital Universitari Vall d'Hebron, Vall d'Hebron Barcelona Hospital Campus, Passeig Vall d'Hebron 119-129, 08035 Barcelona, Spain
| | - Rita Horvath
- Department of Clinical Neurosciences, University of Cambridge, Robinson Way, Cambridge CB2 0PY, UK
| | - Megan L Kraus
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Markus S Ladewig
- Department of Ophthalmology, Klinikum Saarbrücken, Saarbrücken, Germany
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Hanns Lochmüller
- Children's Hospital of Eastern Ontario Research Institute, Ottowa, Canada; Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada; Brain and Mind Research Institute, University of Ottawa, Ottawa, Canada; Department of Neuropediatrics and Muscle Disorders, Medical Center - University of Freiburg, Faculty of Medicine, Freiburg, Germany; Centro Nacional de Análisis Genómico, Barcelona, Spain
| | | | | | - Catharina Schuetz
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Berthold Seitz
- Department of Ophthalmology, Saarland University Medical Center UKS, Homburg, Saar, Germany
| | - Morgan N Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Teresa N Sparks
- Department of Obstetrics, Gynecology, & Reproductive Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Timmy Strauss
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Emilia M Swietlik
- Department of Medicine, University of Cambridge, Heart and Lung Research Institute, Cambridge CB2 0BB, UK
| | - Rachel Thompson
- Children's Hospital of Eastern Ontario Research Institute, Ottowa, Canada
| | | | | | | | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|
32
|
Dumschott K, Dörpholz H, Laporte MA, Brilhaus D, Schrader A, Usadel B, Neumann S, Arnaud E, Kranz A. Ontologies for increasing the FAIRness of plant research data. FRONTIERS IN PLANT SCIENCE 2023; 14:1279694. [PMID: 38098789 PMCID: PMC10720748 DOI: 10.3389/fpls.2023.1279694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023]
Abstract
The importance of improving the FAIRness (findability, accessibility, interoperability, reusability) of research data is undeniable, especially in the face of large, complex datasets currently being produced by omics technologies. Facilitating the integration of a dataset with other types of data increases the likelihood of reuse, and the potential of answering novel research questions. Ontologies are a useful tool for semantically tagging datasets as adding relevant metadata increases the understanding of how data was produced and increases its interoperability. Ontologies provide concepts for a particular domain as well as the relationships between concepts. By tagging data with ontology terms, data becomes both human- and machine- interpretable, allowing for increased reuse and interoperability. However, the task of identifying ontologies relevant to a particular research domain or technology is challenging, especially within the diverse realm of fundamental plant research. In this review, we outline the ontologies most relevant to the fundamental plant sciences and how they can be used to annotate data related to plant-specific experiments within metadata frameworks, such as Investigation-Study-Assay (ISA). We also outline repositories and platforms most useful for identifying applicable ontologies or finding ontology terms.
Collapse
Affiliation(s)
- Kathryn Dumschott
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Hannah Dörpholz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Dominik Brilhaus
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Andrea Schrader
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, Cologne, Germany
| | - Björn Usadel
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
- Institute for Biological Data Science & Cluster of Excellence on Plant Sciences (CEPLAS), Faculty of Mathematics and Life Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Steffen Neumann
- Program Center MetaCom, Leibniz Institute of Plant Biochemistry, Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| | - Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Angela Kranz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| |
Collapse
|
33
|
Drobnjakovic M, Hart R, Kulvatunyou BS, Ivezic N, Srinivasan V. Current challenges and recent advances on the path towards continuous biomanufacturing. Biotechnol Prog 2023; 39:e3378. [PMID: 37493037 DOI: 10.1002/btpr.3378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 05/13/2023] [Accepted: 06/21/2023] [Indexed: 07/27/2023]
Abstract
Continuous biopharmaceutical manufacturing is currently a field of intense research due to its potential to make the entire production process more optimal for the modern, ever-evolving biopharmaceutical market. Compared to traditional batch manufacturing, continuous bioprocessing is more efficient, adjustable, and sustainable and has reduced capital costs. However, despite its clear advantages, continuous bioprocessing is yet to be widely adopted in commercial manufacturing. This article provides an overview of the technological roadblocks for extensive adoptions and points out the recent advances that could help overcome them. In total, three key areas for improvement are identified: Quality by Design (QbD) implementation, integration of upstream and downstream technologies, and data and knowledge management. First, the challenges to QbD implementation are explored. Specifically, process control, process analytical technology (PAT), critical process parameter (CPP) identification, and mathematical models for bioprocess control and design are recognized as crucial for successful QbD realizations. Next, the difficulties of end-to-end process integration are examined, with a particular emphasis on downstream processing. Finally, the problem of data and knowledge management and its potential solutions are outlined where ontologies and data standards are pointed out as key drivers of progress.
Collapse
Affiliation(s)
- Milos Drobnjakovic
- Systems Integration Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Roger Hart
- National Institute for Innovation in Manufacturing Biopharmaceuticals, Newark, New Jersey, USA
| | - Boonserm Serm Kulvatunyou
- Systems Integration Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Nenad Ivezic
- Systems Integration Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| | - Vijay Srinivasan
- Systems Integration Division, National Institute of Standards and Technology, Gaithersburg, Maryland, USA
| |
Collapse
|
34
|
Stefancsik R, Balhoff JP, Balk MA, Ball RL, Bello SM, Caron AR, Chesler EJ, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA)-computational traits for the life sciences. Mamm Genome 2023; 34:364-378. [PMID: 37076585 PMCID: PMC10382347 DOI: 10.1007/s00335-023-09992-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/06/2023] [Indexed: 04/21/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, 27517, USA
| | - Meghan A Balk
- Natural History Museum, University of Oslo, Oslo, Norway
| | - Robyn L Ball
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | | | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Laura W Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Nicole Vasilevsky
- Data Collaboration Center, Critical Path Institute, Tucson, AZ, 85718, USA
| | | | | |
Collapse
|
35
|
Penn S, Lomax J, Karlsson A, Antonucci V, Zachmann CD, Kanza S, Schurer S, Turner J. An extension of the BioAssay Ontology to include pharmacokinetic/pharmacodynamic terminology for the enrichment of scientific workflows. J Biomed Semantics 2023; 14:10. [PMID: 37568227 PMCID: PMC10416407 DOI: 10.1186/s13326-023-00288-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 04/29/2023] [Indexed: 08/13/2023] Open
Abstract
With the capacity to produce and record data electronically, Scientific research and the data associated with it have grown at an unprecedented rate. However, despite a decent amount of data now existing in an electronic form, it is still common for scientific research to be recorded in an unstructured text format with inconsistent context (vocabularies) which vastly reduces the potential for direct intelligent analysis. Research has demonstrated that the use of semantic technologies such as ontologies to structure and enrich scientific data can greatly improve this potential. However, whilst there are many ontologies that can be used for this purpose, there is still a vast quantity of scientific terminology that does not have adequate semantic representation. A key area for expansion identified by the authors was the pharmacokinetic/pharmacodynamic (PK/PD) domain due to its high usage across many areas of Pharma. As such we have produced a set of these terms and other bioassay related terms to be incorporated into the BioAssay Ontology (BAO), which was identified as the most relevant ontology for this work. A number of use cases developed by experts in the field were used to demonstrate how these new ontology terms can be used, and to set the scene for the continuation of this work with a look to expanding this work out into further relevant domains. The work done in this paper was part of Phase 1 of the SEED project (Semantically Enriching electronic laboratory notebook (eLN) Data).
Collapse
Affiliation(s)
- Steve Penn
- Pfizer Inc, 1 Portland Street, Cambridge, MA 02139 USA
| | - Jane Lomax
- Scibite an Elsevier Company, Scibite Ltd, Biodata Innovation Centre, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1DR UK
| | - Anneli Karlsson
- Scibite an Elsevier Company, Scibite Ltd, Biodata Innovation Centre, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1DR UK
| | | | - Carl-Dieter Zachmann
- Sanofi-Aventis Deutschland GmbH, R&D / Integrated Drug Discovery, Industriepark Hoechst, Frankfurt am Main, H831 C.0156, 65926 Germany
| | - Samantha Kanza
- Department of Chemistry, University of Southampton, Highfield Campus, University Road, Southampton, SO17 1BJ UK
| | - Stephan Schurer
- Department of Cellular and Molecular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| | - John Turner
- Department of Cellular and Molecular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136 USA
| |
Collapse
|
36
|
Bartnik A, Serra LM, Smith M, Duncan WD, Wishnie L, Ruttenberg A, Dwyer MG, Diehl AD. MRIO: The Magnetic Resonance Imaging Acquisition and Analysis Ontology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.04.552020. [PMID: 37609265 PMCID: PMC10441376 DOI: 10.1101/2023.08.04.552020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Objective Magnetic resonance imaging of the brain is a useful tool in both the clinic and research settings, aiding in the diagnosis and treatments of neurological disease and expanding our knowledge of the brain. However, there are many challenges inherent in managing and analyzing MRI data, due in large part to the heterogeneity of data acquisition. Materials and Methods To address this, we have developed MRIO, the Magnetic Resonance Imaging Acquisition and Analysis Ontology. Results MRIO provides well-reasoned classes and logical axioms for the acquisition of several MRI acquisition types and well-known, peer-reviewed analysis software, facilitating the use of MRI data. These classes provide a common language for the neuroimaging research process and help standardize the organization and analysis of MRI data for reproducible datasets. We also provide queries for automated assignment of analyses for given MRI types. Discussion MRIO aids researchers in managing neuroimaging studies by helping organize and annotate MRI data and integrating with existing standards such as Digital Imaging and Communications in Medicine and the Brain Imaging Data Structure, enhancing reproducibility and interoperability. MRIO was constructed according to Open Biomedical Ontologies Foundry principals and has contributed several terms to the Ontology for Biomedical Investigations to help bridge neuroimaging data to other domains. Conclusion MRIO addresses the need for a "common language" for MRI that can help manage the neuroimaging research, by enabling researchers to identify appropriate analyses for sets of scans and facilitating data organization and reporting.
Collapse
Affiliation(s)
- Alexander Bartnik
- Buffalo Neuroimaging Analysis Center, Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Lucas M. Serra
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Mackenzie Smith
- Buffalo Neuroimaging Analysis Center, Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | | | - Lauren Wishnie
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Alan Ruttenberg
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Michael G. Dwyer
- Buffalo Neuroimaging Analysis Center, Department of Neurology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Alexander D. Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|
37
|
Reder GK, Gower AH, Kronström F, Halle R, Mahamuni V, Patel A, Hayatnagarkar H, Soldatova LN, King RD. Genesis-DB: a database for autonomous laboratory systems. BIOINFORMATICS ADVANCES 2023; 3:vbad102. [PMID: 37600845 PMCID: PMC10432352 DOI: 10.1093/bioadv/vbad102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/13/2023] [Accepted: 08/01/2023] [Indexed: 08/22/2023]
Abstract
Summary Artificial intelligence (AI)-driven laboratory automation-combining robotic labware and autonomous software agents-is a powerful trend in modern biology. We developed Genesis-DB, a database system designed to support AI-driven autonomous laboratories by providing software agents access to large quantities of structured domain information. In addition, we present a new ontology for modeling data and metadata from autonomously performed yeast microchemostat cultivations in the framework of the Genesis robot scientist system. We show an example of how Genesis-DB enables the research life cycle by modeling yeast gene regulation, guiding future hypotheses generation and design of experiments. Genesis-DB supports AI-driven discovery through automated reasoning and its design is portable, generic, and easily extensible to other AI-driven molecular biology laboratory data and beyond. Availability and implementation Genesis-DB code and installation instructions are available at the GitHub repository https://github.com/TW-Genesis/genesis-database-system.git. The database use case demo code and data are also available through GitHub (https://github.com/TW-Genesis/genesis-database-demo.git). The ontology can be downloaded here: https://github.com/TW-Genesis/genesis-ontology/releases/download/v0.0.23/genesis.owl. The ontology term descriptions (including mappings to existing ontologies) and maintenance standard operating procedures can be found at: https://github.com/TW-Genesis/genesis-ontology.
Collapse
Affiliation(s)
- Gabriel K Reder
- The Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, 412 58, Sweden
| | - Alexander H Gower
- The Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, 412 58, Sweden
| | - Filip Kronström
- The Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, 412 58, Sweden
| | - Rushikesh Halle
- Engineering for Research (e4r™), Thoughtworks Technologies (India) Pvt Ltd, Pune, 411006, India
| | - Vinay Mahamuni
- Engineering for Research (e4r™), Thoughtworks Technologies (India) Pvt Ltd, Pune, 411006, India
| | - Amit Patel
- Engineering for Research (e4r™), Thoughtworks Technologies (India) Pvt Ltd, Pune, 411006, India
| | - Harshal Hayatnagarkar
- Engineering for Research (e4r™), Thoughtworks Technologies (India) Pvt Ltd, Pune, 411006, India
| | - Larisa N Soldatova
- Department of Computing, Goldsmiths, University of London, London, SE14 6AD, United Kingdom
| | - Ross D King
- The Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, 412 58, Sweden
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, CB3 0AS, United Kingdom
- Alan Turing Institute, London, NW1 2DB, United Kingdom
| |
Collapse
|
38
|
Carmody LC, Gargano MA, Toro S, Vasilevsky NA, Adam MP, Blau H, Chan LE, Gomez-Andres D, Horvath R, Kraus ML, Ladewig MS, Lewis-Smith D, Lochmüller H, Matentzoglu NA, Munoz-Torres MC, Schuetz C, Seitz B, Similuk MN, Sparks TN, Strauss T, Swietlik EM, Thompson R, Zhang XA, Mungall CJ, Haendel MA, Robinson PN. The Medical Action Ontology: A Tool for Annotating and Analyzing Treatments and Clinical Management of Human Disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.13.23292612. [PMID: 37503136 PMCID: PMC10370244 DOI: 10.1101/2023.07.13.23292612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Navigating the vast landscape of clinical literature to find optimal treatments and management strategies can be a challenging task, especially for rare diseases. To address this task, we introduce the Medical Action Ontology (MAxO), the first ontology specifically designed to organize medical procedures, therapies, and interventions in a structured way. Currently, MAxO contains 1757 medical action terms added through a combination of manual and semi-automated processes. MAxO was developed with logical structures that make it compatible with several other ontologies within the Open Biological and Biomedical Ontologies (OBO) Foundry. These cover a wide range of biomedical domains, from human anatomy and investigations to the chemical and protein entities involved in biological processes. We have created a database of over 16000 annotations that describe diagnostic modalities for specific phenotypic abnormalities as defined by the Human Phenotype Ontology (HPO). Additionally, 413 annotations are provided for medical actions for 189 rare diseases. We have developed a web application called POET (https://poet.jax.org/) for the community to use to contribute MAxO annotations. MAxO provides a computational representation of treatments and other actions taken for the clinical management of patients. The development of MAxO is closely coupled to the Mondo Disease Ontology (Mondo) and the Human Phenotype Ontology (HPO) and expands the scope of our computational modeling of diseases and phenotypic features to include diagnostics and therapeutic actions. MAxO is available under the open-source CC-BY 4.0 license (https://github.com/monarch-initiative/MAxO).
Collapse
Affiliation(s)
- Leigh C Carmody
- The Jackson Laboratory for Genomic Medicine,Farmington,CT,United States
| | - Michael A Gargano
- The Jackson Laboratory for Genomic Medicine,Farmington,CT,United States
| | - Sabrina Toro
- University of Colorado Anschutz Medical Campus,Aurora,CO,United States
| | | | - Margaret P Adam
- University of Washington School of Medicine, Seattle, WA, United States
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine,Farmington,CT,United States
| | | | - David Gomez-Andres
- Pediatric Neurology, Vall d'Hebron Institut de Recerca (VHIR), Hospital Universitari Vall d'Hebron, Vall d'Hebron Barcelona Hospital Campus., Passeig Vall d'Hebron 119-129, 08035 Barcelona, Spain
| | - Rita Horvath
- Department of Clinical Neurosciences, University of Cambridge, Robinson Way CB2 0PY, Cambridge UK
| | - Megan L Kraus
- University of Colorado Anschutz Medical Campus,Aurora,CO,United States
| | - Markus S Ladewig
- Department of Ophthalmology,Klinikum Saarbrücken,Saarbrücken,Germany
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, United Kingdom
| | | | | | | | - Catharina Schuetz
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Berthold Seitz
- Department of Ophthalmology,Saarland University Hospital UKS,Homburg/Saar Germany
| | - Morgan N Similuk
- National Institute of Allergy and Infectious Diseases,National Institutes of Health,Bethesda,MD,United States
| | - Teresa N Sparks
- Department of Obstetrics, Gynecology, & Reproductive Sciences, University of California, San Francisco, San Francisco, CA 94143
| | - Timmy Strauss
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Emilia M Swietlik
- Department of Medicine, University of Cambridge, Heart and Lung Research Institute, CB2 0BB, Cambridge, UK
| | | | | | | | - Melissa A Haendel
- University of Colorado Anschutz Medical Campus,Aurora,CO,United States
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine,Farmington,CT,United States
| |
Collapse
|
39
|
Dooley D, Naravane T. Ontological how and why: action and objective of planned processes in the food domain. Front Artif Intell 2023; 6:1137961. [PMID: 37469931 PMCID: PMC10352767 DOI: 10.3389/frai.2023.1137961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 06/05/2023] [Indexed: 07/21/2023] Open
Abstract
The computational modeling of food processing, aimed at various applications including industrial automation, robotics, food safety, preservation, energy conservation, and recipe nutrition estimation, has been ongoing for decades within food science research labs, industry, and regulatory agencies. The datasets from this prior work have the potential to advance the field of data-driven modeling if they can be harmonized, but this requires a standardized language as a starting point. Our primary goal is to explore two interdependent aspects of this language: the granularity of process modeling sub-parts and parameter details and the substitution of compatible inputs and processes. A delicate semantic distinction-categorizing planned processes based on the objectives they seek to fulfill vs. categorizing them by the actions or mechanisms they utilize-helps organize and facilitate this endeavor. To bring an ontological lens to process modeling, we employ the Open Biological and Biomedical Ontology Foundry ontological framework to organize two main classes of the FoodOn upper-level material processing hierarchy according to objective and mechanism, respectively. We include examples of material processing by mechanism, ranging from abstract ones such as "application of energy" down to specific classes such as "heating by microwave." Similarly, material processing by objective-often a transformation to bring about materials with certain qualities or composition-can, for example, range from "material processing by heating threshold" to "steaming rice".
Collapse
Affiliation(s)
- Damion Dooley
- Centre for Infectious Disease Genomics and One Health, Simon Fraser University, Burnaby, BC, Canada
| | - Tarini Naravane
- Biological Systems Engineering, University of California, Davis, Davis, CA, United States
| |
Collapse
|
40
|
Boguslav MR, Salem NM, White EK, Sullivan KJ, Bada M, Hernandez TL, Leach SM, Hunter LE. Creating an ignorance-base: Exploring known unknowns in the scientific literature. J Biomed Inform 2023; 143:104405. [PMID: 37270143 PMCID: PMC10528083 DOI: 10.1016/j.jbi.2023.104405] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 05/18/2023] [Accepted: 05/21/2023] [Indexed: 06/05/2023]
Abstract
BACKGROUND Scientific discovery progresses by exploring new and uncharted territory. More specifically, it advances by a process of transforming unknown unknowns first into known unknowns, and then into knowns. Over the last few decades, researchers have developed many knowledge bases to capture and connect the knowns, which has enabled topic exploration and contextualization of experimental results. But recognizing the unknowns is also critical for finding the most pertinent questions and their answers. Prior work on known unknowns has sought to understand them, annotate them, and automate their identification. However, no knowledge-bases yet exist to capture these unknowns, and little work has focused on how scientists might use them to trace a given topic or experimental result in search of open questions and new avenues for exploration. We show here that a knowledge base of unknowns can be connected to ontologically grounded biomedical knowledge to accelerate research in the field of prenatal nutrition. RESULTS We present the first ignorance-base, a knowledge-base created by combining classifiers to recognize ignorance statements (statements of missing or incomplete knowledge that imply a goal for knowledge) and biomedical concepts over the prenatal nutrition literature. This knowledge-base places biomedical concepts mentioned in the literature in context with the ignorance statements authors have made about them. Using our system, researchers interested in the topic of vitamin D and prenatal health were able to uncover three new avenues for exploration (immune system, respiratory system, and brain development) by searching for concepts enriched in ignorance statements. These were buried among the many standard enriched concepts. Additionally, we used the ignorance-base to enrich concepts connected to a gene list associated with vitamin D and spontaneous preterm birth and found an emerging topic of study (brain development) in an implied field (neuroscience). The researchers could look to the field of neuroscience for potential answers to the ignorance statements. CONCLUSION Our goal is to help students, researchers, funders, and publishers better understand the state of our collective scientific ignorance (known unknowns) in order to help accelerate research through the continued illumination of and focus on the known unknowns and their respective goals for scientific knowledge.
Collapse
Affiliation(s)
- Mayla R Boguslav
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA.
| | - Nourah M Salem
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| | - Elizabeth K White
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA; Center for Genes, Environment and Health, National Jewish Health, Jackson Street, Denver, 80206, CO, USA
| | - Katherine J Sullivan
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| | - Michael Bada
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| | - Teri L Hernandez
- College of Nursing, Department of Medicine/Division of Endocrinology, Metabolism, & Diabetes, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| | - Sonia M Leach
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA; Center for Genes, Environment and Health, National Jewish Health, Jackson Street, Denver, 80206, CO, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado, Anschutz Medical Campus, E 17th Avenue, Aurora, 80045, CO, USA
| |
Collapse
|
41
|
Dooley D, Nguyen MH, Hsiao WWL. OntoTrek: 3D visualization of application ontology class hierarchies. PLoS One 2023; 18:e0286728. [PMID: 37267413 DOI: 10.1371/journal.pone.0286728] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 05/22/2023] [Indexed: 06/04/2023] Open
Abstract
An application ontology often reuses terms from other related, compatible ontologies. The extent of this interconnectedness is not readily apparent when browsing through larger textual presentations of term class hierarchies, be it Manchester text format OWL files or within an ontology editor like Protege. Users must either note ontology sources in term identifiers, or look at ontology import file term origins. Diagrammatically, this same information may be easier to perceive in 2 dimensional network or hierarchical graphs that visually code ontology term origins. However, humans, having stereoscopic vision and navigational acuity around colored and textured shapes, should benefit even more from a coherent 3-dimensional interactive visualization of ontology that takes advantage of perspective to offer both foreground focus on content and a stable background context. We present OntoTrek, a 3D ontology visualizer that enables ontology stakeholders-students, software developers, curation teams, and funders-to recognize the presence of imported terms and their domains, ultimately illustrating how projects can capture knowledge through a vocabulary of interwoven community-supported ontology resources.
Collapse
Affiliation(s)
- Damion Dooley
- Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Matthew H Nguyen
- Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
| | - William W L Hsiao
- Faculty of Health Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
42
|
Sheffield NC, LeRoy NJ, Khoroshevskyi O. Challenges to sharing sample metadata in computational genomics. Front Genet 2023; 14:1154198. [PMID: 37287537 PMCID: PMC10243526 DOI: 10.3389/fgene.2023.1154198] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 05/09/2023] [Indexed: 06/09/2023] Open
Affiliation(s)
- Nathan C. Sheffield
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, United States
- School of Data Science, University of Virginia, Charlottesville, VA, United States
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA, United States
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA, United States
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA, United States
| | - Nathan J. LeRoy
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, United States
| | - Oleksandr Khoroshevskyi
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, United States
| |
Collapse
|
43
|
Kang H, Hou L, Gu Y, Lu X, Li J, Li Q. Drug-disease association prediction with literature based multi-feature fusion. Front Pharmacol 2023; 14:1205144. [PMID: 37284317 PMCID: PMC10239876 DOI: 10.3389/fphar.2023.1205144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 05/09/2023] [Indexed: 06/08/2023] Open
Abstract
Introduction: Exploring the potential efficacy of a drug is a valid approach for drug development with shorter development times and lower costs. Recently, several computational drug repositioning methods have been introduced to learn multi-features for potential association prediction. However, fully leveraging the vast amount of information in the scientific literature to enhance drug-disease association prediction is a great challenge. Methods: We constructed a drug-disease association prediction method called Literature Based Multi-Feature Fusion (LBMFF), which effectively integrated known drugs, diseases, side effects and target associations from public databases as well as literature semantic features. Specifically, a pre-training and fine-tuning BERT model was introduced to extract literature semantic information for similarity assessment. Then, we revealed drug and disease embeddings from the constructed fusion similarity matrix by a graph convolutional network with an attention mechanism. Results: LBMFF achieved superior performance in drug-disease association prediction with an AUC value of 0.8818 and an AUPR value of 0.5916. Discussion: LBMFF achieved relative improvements of 31.67% and 16.09%, respectively, over the second-best results, compared to single feature methods and seven existing state-of-the-art prediction methods on the same test datasets. Meanwhile, case studies have verified that LBMFF can discover new associations to accelerate drug development. The proposed benchmark dataset and source code are available at: https://github.com/kang-hongyu/LBMFF.
Collapse
Affiliation(s)
- Hongyu Kang
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing, China
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Li Hou
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yaowen Gu
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xiao Lu
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing, China
| | - Jiao Li
- Institute of Medical Information, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Qin Li
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
44
|
Maleki A, Crispino E, Italia SA, Di Salvatore V, Chiacchio MA, Sips F, Bursi R, Russo G, Maimone D, Pappalardo F. Moving forward through the in silico modeling of multiple sclerosis: Treatment layer implementation and validation. Comput Struct Biotechnol J 2023; 21:3081-3090. [PMID: 37266405 PMCID: PMC10230825 DOI: 10.1016/j.csbj.2023.05.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 05/17/2023] [Accepted: 05/18/2023] [Indexed: 06/03/2023] Open
Abstract
Multiple sclerosis is an autoimmune inflammatory disease that affects the central nervous system through chronic demyelination and loss of oligodendrocytes. Since the relapsing-remitting form is the most prevalent, relapse-reducing therapies are a primary choice for specialists. Universal Immune System Simulator is an agent-based model that simulates the human immune system dynamics under physiological conditions and during several diseases, including multiple sclerosis. In this work, we extended the UISS-MS disease layer by adding two new treatments, i.e., cladribine and ocrelizumab, to show that UISS-MS can be potentially used to predict the effects of any existing or newly designed treatment against multiple sclerosis. To retrospectively validate UISS-MS with ocrelizumab and cladribine, we extracted the clinical and MRI data from patients included in two clinical trials, thus creating specific cohorts of digital patients for predicting and validating the effects of the considered drugs. The obtained results mirror those of the clinical trials, demonstrating that UISS-MS can correctly simulate the mechanisms of action and outcomes of the treatments. The successful retrospective validation concurred to confirm that UISS-MS can be considered a digital twin solution to be used as a support system to inform clinical decisions and predict disease course and therapeutic response at a single patient level.
Collapse
Affiliation(s)
- Avisa Maleki
- Department of Mathematics and Computer Science, University of Catania, Viale Andrea Doria 6, Catania 95125, Italy
| | - Elena Crispino
- Department of Biomedical and Biotechnological Sciences, University of Catania, Via Santa Sofia 97, Catania 95125, Italy
| | - Serena Anna Italia
- Department of Drug and Health Sciences, University of Catania, Viale Andrea Doria 6, Catania 95125, Italy
| | - Valentina Di Salvatore
- Department of Drug and Health Sciences, University of Catania, Viale Andrea Doria 6, Catania 95125, Italy
| | - Maria Assunta Chiacchio
- Department of Drug and Health Sciences, University of Catania, Viale Andrea Doria 6, Catania 95125, Italy
| | - Fianne Sips
- InSilicoTrials Technologies BV, 's Hertogenbosch, the Netherlands
| | - Roberta Bursi
- InSilicoTrials Technologies BV, 's Hertogenbosch, the Netherlands
| | - Giulia Russo
- Department of Drug and Health Sciences, University of Catania, Viale Andrea Doria 6, Catania 95125, Italy
- Mimesis SRL, Catania, Italy
| | - Davide Maimone
- Centro Sclerosi Multipla, UOC Neurologia, ARNAS Garibaldi, P.zza S. Maria di Gesù, Catania 95124, Italy
| | - Francesco Pappalardo
- Department of Drug and Health Sciences, University of Catania, Viale Andrea Doria 6, Catania 95125, Italy
| |
Collapse
|
45
|
Mahita J, Ha B, Gambiez A, Schendel SL, Li H, Hastie KM, Dennison SM, Li K, Kuzmina N, Periasamy S, Bukreyev A, Munt JE, Osei-Twum M, Atyeo C, Overton JA, Vita R, Guzman-Orozco H, Mendes M, Kojima M, Halfmann PJ, Kawaoka Y, Alter G, Gagnon L, Baric RS, Tomaras GD, Germann T, Bedinger D, Greenbaum JA, Saphire EO, Peters B. Coronavirus Immunotherapeutic Consortium Database. Database (Oxford) 2023; 2023:7034146. [PMID: 36763096 PMCID: PMC9913043 DOI: 10.1093/database/baac112] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 11/30/2022] [Accepted: 12/22/2022] [Indexed: 02/11/2023]
Abstract
The coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has seen multiple anti-SARS-CoV-2 antibodies being generated globally. It is difficult, however, to assemble a useful compendium of these biological properties if they are derived from experimental measurements performed at different sites under different experimental conditions. The Coronavirus Immunotherapeutic Consortium (COVIC) circumvents these issues by experimentally testing blinded antibodies side by side for several functional activities. To collect these data in a consistent fashion and make it publicly available, we established the COVIC database (COVIC-DB, https://covicdb.lji.org/). This database enables systematic analysis and interpretation of this large-scale dataset by providing a comprehensive view of various features such as affinity, neutralization, in vivo protection and effector functions for each antibody. Interactive graphs enable direct comparisons of antibodies based on select functional properties. We demonstrate how the COVIC-DB can be utilized to examine relationships among antibody features, thereby guiding the design of therapeutic antibody cocktails. Database URL https://covicdb.lji.org/.
Collapse
Affiliation(s)
| | | | - Anais Gambiez
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Sharon L Schendel
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Haoyang Li
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Kathryn M Hastie
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - S Moses Dennison
- Center for Human Systems Immunology, Departments of Surgery, Immunology, and Molecular Genetics and Microbiology and Duke Human Vaccine Institute, Duke University, Durham, NC 27701, USA
| | - Kan Li
- Center for Human Systems Immunology, Departments of Surgery, Immunology, and Molecular Genetics and Microbiology and Duke Human Vaccine Institute, Duke University, Durham, NC 27701, USA
| | - Natalia Kuzmina
- Department of Pathology, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77555-0609, USA,Department of Microbiology and Immunology, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77555-1019, USA
| | - Sivakumar Periasamy
- Department of Pathology, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77555-0609, USA,Department of Microbiology and Immunology, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77555-1019, USA
| | - Alexander Bukreyev
- Department of Pathology, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77555-0609, USA,Department of Microbiology and Immunology, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77555-1019, USA,Galveston National Laboratory, University of Texas Medical Branch at Galveston, 301 University Blvd, Galveston, TX 77550, USA
| | - Jennifer E Munt
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, 135 Dauer Drive, 2101 McGavran-Greenberg Hall,CB #7435, Chapel Hill, NC 27599-7435, USA
| | - Mary Osei-Twum
- Nexelis, a Q2 Solutions Company, 525 Boulevard Cartier Ouest, Laval, Quebec H7V 3S8, Canada
| | - Caroline Atyeo
- Ragon Institute of MGH, MIT and Harvard, 400 Technology Square, Cambrige, MA 02139-3583, USA
| | - James A Overton
- Knocean Inc., 107 Quebec Ave. Toronto, Ontario, M6P 2T3, Canada
| | - Randi Vita
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Hector Guzman-Orozco
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Marcus Mendes
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Mari Kojima
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Peter J Halfmann
- Influenza Research Institute, Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, WI 53711, USA
| | - Yoshihiro Kawaoka
- Influenza Research Institute, Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, WI 53711, USA,Division of Virology, Department of Microbiology and Immunology, Institute of Medical Science, University of Tokyo, Tokyo 108-8639, Japan,The Research Center for Global Viral Diseases, National Center for Global Health and Medicine Research Institute, Tokyo 162-8655, Japan
| | - Galit Alter
- Ragon Institute of MGH, MIT and Harvard, 400 Technology Square, Cambrige, MA 02139-3583, USA
| | - Luc Gagnon
- Nexelis, a Q2 Solutions Company, 525 Boulevard Cartier Ouest, Laval, Quebec H7V 3S8, Canada
| | - Ralph S Baric
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, 135 Dauer Drive, 2101 McGavran-Greenberg Hall,CB #7435, Chapel Hill, NC 27599-7435, USA,Department of Microbiology and Immunology, School of Medicine, 125 Marson Farm Road, Chapel Hill, NC 27599-7290, USA
| | - Georgia D Tomaras
- Center for Human Systems Immunology, Departments of Surgery, Immunology, and Molecular Genetics and Microbiology and Duke Human Vaccine Institute, Duke University, Durham, NC 27701, USA
| | - Tim Germann
- Carterra Inc., 825 N. 300 W.Ste, C309, Salt Lake City, UT 84103, USA
| | - Daniel Bedinger
- Carterra Inc., 825 N. 300 W.Ste, C309, Salt Lake City, UT 84103, USA
| | - Jason A Greenbaum
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | | | - Bjoern Peters
- Correspondence may also be addressed to Bjoern Peters. Tel: +1858 752 6914; Fax: +858-752-6987;
| |
Collapse
|
46
|
Stefancsik R, Balhoff JP, Balk MA, Ball R, Bello SM, Caron AR, Chessler E, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.525742. [PMID: 36747660 PMCID: PMC9900877 DOI: 10.1101/2023.01.26.525742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Meghan A. Balk
- National Ecological Observatory Network, Battelle, Boulder, CO 80301, USA
| | - Robyn Ball
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | - Anita R. Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Laura W. Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L. Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A. McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Nicole Vasilevsky
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | | |
Collapse
|
47
|
Bona JP, Utecht J, Bost S, Brochhausen M, Prior F. The PRISM semantic cohort builder: a novel tool to search and access clinical data in TCIA imaging collections. Phys Med Biol 2023; 68:014003. [PMID: 36279873 PMCID: PMC9855624 DOI: 10.1088/1361-6560/ac9d1d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 10/24/2022] [Indexed: 12/24/2022]
Abstract
The cancer imaging archive (TICA) receives and manages an ever-increasing quantity of clinical (non-image) data containing valuable information about subjects in imaging collections. To harmonize and integrate these data, we have first cataloged the types of information occurring across public TCIA collections. We then produced mappings for these diverse instance data using ontology-based representation patterns and transformed the data into a knowledge graph in a semantic database. This repository combined the transformed instance data with relevant background knowledge from domain ontologies. The resulting repository of semantically integrated data is a rich source of information about subjects that can be queried across imaging collections. Building on this work we have implemented and deployed a REST API and a user-facing semantic cohort builder tool. This tool allows allow researchers and other users to search and identify groups of subject-level records based on non-image data that were not queryable prior to this work. The search results produced by this interface link to images, allowing users to quickly identify and view images matching the selection criteria, as well as allowing users to export the harmonized clinical data.
Collapse
Affiliation(s)
- Jonathan P Bona
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Joseph Utecht
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Sarah Bost
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, United States of America
| | - Mathias Brochhausen
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America,
Department of Medical Humanities and Bioethics, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas, United States of America
| | - Fred Prior
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America,
Department of Radiology, University of Arkansas for Medical Sciences (UAMS), Little Rock, Arkansas, United States of America
| |
Collapse
|
48
|
Ament SA, Adkins RS, Carter R, Chrysostomou E, Colantuoni C, Crabtree J, Creasy HH, Degatano K, Felix V, Gandt P, Garden G, Giglio M, Herb BR, Khajouei F, Kiernan E, McCracken C, McDaniel K, Nadendla S, Nickel L, Olley D, Orvis J, Receveur J, Schor M, Sonthalia S, Tickle T, Way J, Hertzano R, Mahurkar A, White O. The Neuroscience Multi-Omic Archive: a BRAIN Initiative resource for single-cell transcriptomic and epigenomic data from the mammalian brain. Nucleic Acids Res 2023; 51:D1075-D1085. [PMID: 36318260 PMCID: PMC9825473 DOI: 10.1093/nar/gkac962] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 09/30/2022] [Accepted: 10/27/2022] [Indexed: 11/06/2022] Open
Abstract
Scalable technologies to sequence the transcriptomes and epigenomes of single cells are transforming our understanding of cell types and cell states. The Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative Cell Census Network (BICCN) is applying these technologies at unprecedented scale to map the cell types in the mammalian brain. In an effort to increase data FAIRness (Findable, Accessible, Interoperable, Reusable), the NIH has established repositories to make data generated by the BICCN and related BRAIN Initiative projects accessible to the broader research community. Here, we describe the Neuroscience Multi-Omic Archive (NeMO Archive; nemoarchive.org), which serves as the primary repository for genomics data from the BRAIN Initiative. Working closely with other BRAIN Initiative researchers, we have organized these data into a continually expanding, curated repository, which contains transcriptomic and epigenomic data from over 50 million brain cells, including single-cell genomic data from all of the major regions of the adult and prenatal human and mouse brains, as well as substantial single-cell genomic data from non-human primates. We make available several tools for accessing these data, including a searchable web portal, a cloud-computing interface for large-scale data processing (implemented on Terra, terra.bio), and a visualization and analysis platform, NeMO Analytics (nemoanalytics.org).
Collapse
Affiliation(s)
- Seth A Ament
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Psychiatry, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Ricky S Adkins
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Robert Carter
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Elena Chrysostomou
- Department of Otorhinolaryngology Head and Neck Surgery, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Carlo Colantuoni
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Departments of Neurology and Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Jonathan Crabtree
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Heather H Creasy
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Kylee Degatano
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Victor Felix
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Peter Gandt
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Gwenn A Garden
- Department of Neurology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Michelle Giglio
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Brian R Herb
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Farzaneh Khajouei
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Elizabeth Kiernan
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Carrie McCracken
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Kennedy McDaniel
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Suvarna Nadendla
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Lance Nickel
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Dustin Olley
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Joshua Orvis
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Joseph P Receveur
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Mike Schor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Shreyash Sonthalia
- Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Timothy L Tickle
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jessica Way
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ronna Hertzano
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Otorhinolaryngology Head and Neck Surgery, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Anup A Mahurkar
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Owen R White
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD, USA
| |
Collapse
|
49
|
Blazeska N, Kosaloglu-Yalcin Z, Vita R, Peters B, Sette A. IEDB and CEDAR: Two Sibling Databases to Serve the Global Scientific Community. Methods Mol Biol 2023; 2673:133-149. [PMID: 37258911 PMCID: PMC11008223 DOI: 10.1007/978-1-0716-3239-0_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Various methodologies have been utilized to analyze epitope-specific responses in the context of non-self-antigens, such as those associated with infectious diseases and allergies, and in the context of self-antigens, such as those associated with transplantation, autoimmunity, and cancer. Further to this, epitope-specific data, and its associated immunological context, are crucial to training and developing predictive algorithms and pipelines for the development of specific vaccines and diagnostics. In this chapter, we describe the methodology utilized to derive two sibling resources, the Immune Epitope Database (IEDB) and Cancer Epitope Database and Analysis Resource (CEDAR), to specifically host this data, and make them freely available to the scientific community.
Collapse
Affiliation(s)
- Nina Blazeska
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Zeynep Kosaloglu-Yalcin
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Randi Vita
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Bjoern Peters
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Alessandro Sette
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
50
|
Porubsky VL, Sauro HM. A Practical Guide to Reproducible Modeling for Biochemical Networks. Methods Mol Biol 2023; 2634:107-138. [PMID: 37074576 DOI: 10.1007/978-1-0716-3008-2_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2023]
Abstract
While scientific disciplines revere reproducibility, many studies - experimental and computational alike - fall short of this ideal and cannot be reproduced or even repeated when the model is shared. For computational modeling of biochemical networks, there is a dearth of formal training and resources available describing how to practically implement reproducible methods, despite a wealth of existing tools and formats which could be used to support reproducibility. This chapter points the reader to useful software tools and standardized formats that support reproducible modeling of biochemical networks and provides suggestions on how to implement reproducible methods in practice. Many of the suggestions encourage readers to use best practices from the software development community in order to automate, test, and version control their model components. A Jupyter Notebook demonstrating several of the key steps in building a reproducible biochemical network model is included to supplement the recommendations in the text.
Collapse
Affiliation(s)
| | - Herbert M Sauro
- University of Washington, Department of Bioengineering, Seattle, WA, USA
| |
Collapse
|