51
|
T'Joen V, Vaneeckhaute L, Priem S, Van Woensel S, Bekaert S, Berneel E, Van Der Straeten C. Rationalized Development of a Campus-Wide Cell Line Dataset for Implementation in the Biobank LIMS System at Bioresource Center Ghent. Front Med (Lausanne) 2019; 6:137. [PMID: 31294023 PMCID: PMC6603147 DOI: 10.3389/fmed.2019.00137] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 06/04/2019] [Indexed: 11/13/2022] Open
Abstract
The Bioresource center Ghent is the central hospital-integrated biobank of Ghent University Hospital. Our mission is to facilitate translational biomedical research by collecting, storing and providing high quality biospecimens to researchers. Several of our biobank partners store large amounts of cell lines. As cell lines are highly important both in basic research and preclinical screening phases, good annotation, authentication, and quality of these cell lines is pivotal in translational biomedical science. A Biobank Information Management System (BIMS) was implemented as sample and data management system for human bodily material. The samples are annotated by the use of defined datasets, based on the BRISQ (Biospecimen Reporting for Improved Study Quality) and Minimum Information About Biobank data Sharing (MIABIS) guidelines completed with SPREC (Standard PREanalytical Coding) information. However, the defined dataset for human bodily material is not ideal to capture the specific cell line data. Therefore, we set out to develop a rationalized cell line dataset. Through comparison of different datasets of online cell banks (human, animal, and stem cell), we established an extended cell line dataset of 156 data fields that was further analyzed until a smaller dataset—the survey dataset of 54 data fields—was obtained. The survey dataset was spread throughout our campus to all cell line users to rationalize the fields of the dataset and their potential use. Analysis of the survey data revealed only small differences in preferences in data fields between human, animal, and stem cell lines. Hence, one essential dataset for human, animal and stem cell lines was compiled consisting of 33 data fields. The essential dataset was prepared for implementation in our BIMS system. Good Clinical Data Management Practices formed the basis of our decisions in the implementation phase. Known standards, reference lists and ontologies (such as ICD-10-CM, animal taxonomy, cell line ontology…) were considered. The semantics of the data fields were clearly defined, enhancing the data quality of the stored cell lines. Therefore, we created an essential cell line dataset with defined data fields, useable for multiple cell line users.
Collapse
Affiliation(s)
- Veronique T'Joen
- Bioresource Center Ghent, Health, Innovation and Research Center, Ghent University Hospital, Ghent, Belgium
| | - Lieven Vaneeckhaute
- Data Management Unit, Health, Innovation and Research Center, Ghent University Hospital, Ghent, Belgium
| | - Sara Priem
- Bioresource Center Ghent, Health, Innovation and Research Center, Ghent University Hospital, Ghent, Belgium
| | - Steven Van Woensel
- Bioresource Center Ghent, Health, Innovation and Research Center, Ghent University Hospital, Ghent, Belgium
| | - Sofie Bekaert
- Department of Public Health and Primary Care, Faculty for Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Elke Berneel
- Bioresource Center Ghent, Health, Innovation and Research Center, Ghent University Hospital, Ghent, Belgium
| | | |
Collapse
|
52
|
Yu H, Nysak S, Garg N, Ong E, Ye X, Zhang X, He Y. ODAE: Ontology-based systematic representation and analysis of drug adverse events and its usage in study of adverse events given different patient age and disease conditions. BMC Bioinformatics 2019; 20:199. [PMID: 31074377 PMCID: PMC6509876 DOI: 10.1186/s12859-019-2729-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Background Drug adverse events (AEs), or called adverse drug events (ADEs), are ranked one of the leading causes of mortality. The Ontology of Adverse Events (OAE) has been widely used for adverse event AE representation, standardization, and analysis. OAE-based ADE-specific ontologies, including ODNAE for drug-associated neuropathy-inducing AEs and OCVDAE for cardiovascular drug AEs, have also been developed and used. However, these ADE-specific ontologies do not consider the effects of other factors (e.g., age and drug-treated disease) on the outcomes of ADEs. With more ontological studies of ADEs, it is also critical to develop a general purpose ontology for representing ADEs for various types of drugs. Results Our survey of FDA drug package insert documents and other resources for 224 neuropathy-inducing drugs discovered that many drugs (e.g., sirolimus and linezolid) cause different AEs given patients’ age or the diseases treated by the drugs. To logically represent the complex relations among drug, drug ingredient and mechanism of action, AE, age, disease, and other related factors, an ontology design pattern was developed and applied to generate a community-driven open-source Ontology of Drug Adverse Events (ODAE). The ODAE development follows the OBO Foundry ontology development principles (e.g., openness and collaboration). Built on a generalizable ODAE design pattern and extending the OAE and NDF-RT ontology, ODAE has represented various AEs associated with the over 200 neuropathy-inducing drugs given different age and disease conditions. ODAE is now deposited in the Ontobee for browsing and queries. As a demonstration of usage, a SPARQL query of the ODAE knowledge base was developed to identify all the drugs having the mechanisms of ion channel interactions, the diseases treated with the drugs, and AEs after the treatment in adult patients. AE-specific drug class effects were also explored using ODAE and SPARQL. Conclusion ODAE provides a general representation of ADEs given different conditions and can be used for querying scientific questions. ODAE is also a robust knowledge base and platform for semantic and logic representation and study of ADEs of more drugs in the future. Electronic supplementary material The online version of this article (10.1186/s12859-019-2729-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hong Yu
- Department of Pulmonary and Critical Care Medicine, Guizhou Provincial People's Hospital, Guiyang, 550002, Guizhou, China. .,Guizhou University Medical College, Guiyang, 550025, Guizhou, China.
| | - Solomiya Nysak
- College of Literature, Science, and the Arts, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Noemi Garg
- College of Pharmacy, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Edison Ong
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Xianwei Ye
- Department of Pulmonary and Critical Care Medicine, Guizhou Provincial People's Hospital, Guiyang, 550002, Guizhou, China.,Guizhou University Medical College, Guiyang, 550025, Guizhou, China
| | - Xiangyan Zhang
- Department of Pulmonary and Critical Care Medicine, Guizhou Provincial People's Hospital, Guiyang, 550002, Guizhou, China.,Guizhou University Medical College, Guiyang, 550025, Guizhou, China
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
53
|
Pan H, Bian X, Yang S, He Y, Yang X, Liu Y. The cell line ontology-based representation, integration and analysis of cell lines used in China. BMC Bioinformatics 2019; 20:179. [PMID: 31272367 PMCID: PMC6509802 DOI: 10.1186/s12859-019-2724-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Chinese National Infrastructure of Cell Line stores and distributes cell lines for biomedical research in China. This study aims to represent and integrate the information of NICR cell lines into the community-based Cell Line Ontology (CLO). RESULTS We have aligned, represented, and added all identified 2704 cell line cells in NICR to CLO. We also proposed new ontology design patterns to represent the usage of cell line cells as disease models by inducing tumor formation in model organisms, and the relations between cell line cells and their expressed or overexpressed genes or proteins. The resulting CLO-NICR ontology also includes the Chinese representation of the NICR cell line information. CLO-NICR was merged into the general CLO. To serve the cell research community in China, the Chinese version of CLO-NICR was also generated and deposited in the OntoChina ontology repository. The usage of CLO-NICR was demonstrated by DL query and knowledge extraction. CONCLUSIONS In summary, all identified cell lines from NICR are represented by the semantics framework of CLO and incorporated into CLO as a most recent update. We also generated a CLO-NICR and its Chinese view (CLO-NICR-Cv). The development of CLO-NICR and CLO-NIC-Cv allows the integration of the cell lines from NICR into the community-based CLO ontology and provides an integrative platform to support different applications of CLO in China.
Collapse
Affiliation(s)
- Hongjie Pan
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| | - Xiaocui Bian
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| | - Sheng Yang
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
| | - Xiaolin Yang
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| | - Yuqin Liu
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
54
|
Serra LM, Duncan WD, Diehl AD. An ontology for representing hematologic malignancies: the cancer cell ontology. BMC Bioinformatics 2019; 20:181. [PMID: 31272372 PMCID: PMC6509834 DOI: 10.1186/s12859-019-2722-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Within the cancer domain, ontologies play an important role in the integration and annotation of data in order to support numerous biomedical tools and applications. This work seeks to leverage existing standards in immunophenotyping cell types found in hematologic malignancies to provide an ontological representation of them to aid in data annotation and analysis for patient data. RESULTS We have developed the Cancer Cell Ontology according to OBO Foundry principles as an extension of the Cell Ontology. We define classes in Cancer Cell Ontology by using a genus-differentia approach using logical axioms capturing the expression of cellular surface markers in order to represent types of hematologic malignancies. By adopting conventions used in the Cell Ontology, we have created human and computer-readable definitions for 300 classes of blood cancers, based on the EGIL classification system for leukemias, and relying upon additional classification approaches for multiple myelomas and other hematologic malignancies. CONCLUSION We have demonstrated a proof of concept for leveraging the built-in logical axioms of the ontology in order to classify patient surface marker data into appropriate diagnostic categories. We plan to integrate our ontology into existing tools for flow cytometry data analysis to facilitate the automated diagnosis of hematologic malignancies.
Collapse
Affiliation(s)
- Lucas M Serra
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA.
| | - William D Duncan
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Alexander D Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| |
Collapse
|
55
|
He Y, Duncan WD, Cooper DJ, Hansen J, Iyengar R, Ong E, Walker K, Tibi O, Smith S, Serra LM, Zheng J, Sarntivijai S, Schürer S, O'Shea KS, Diehl AD. OSCI: standardized stem cell ontology representation and use cases for stem cell investigation. BMC Bioinformatics 2019; 20:180. [PMID: 31272389 PMCID: PMC6509805 DOI: 10.1186/s12859-019-2723-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023] Open
Abstract
Background Stem cells and stem cell lines are widely used in biomedical research. The Cell Ontology (CL) and Cell Line Ontology (CLO) are two community-based OBO Foundry ontologies in the domains of in vivo cells and in vitro cell line cells, respectively. Results To support standardized stem cell investigations, we have developed an Ontology for Stem Cell Investigations (OSCI). OSCI imports stem cell and cell line terms from CL and CLO, and investigation-related terms from existing ontologies. A novel focus of OSCI is its application in representing metadata types associated with various stem cell investigations. We also applied OSCI to systematically categorize experimental variables in an induced pluripotent stem cell line cell study related to bipolar disorder. In addition, we used a semi-automated literature mining approach to identify over 200 stem cell gene markers. The relations between these genes and stem cells are modeled and represented in OSCI. Conclusions OSCI standardizes stem cells found in vivo and in vitro and in various stem cell investigation processes and entities. The presented use cases demonstrate the utility of OSCI in iPSC studies and literature mining related to bipolar disorder.
Collapse
Affiliation(s)
- Yongqun He
- University of Michigan Medical School, Ann Arbor, MI, USA.
| | | | | | - Jens Hansen
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,SBCNY, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ravi Iyengar
- Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.,SBCNY, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Edison Ong
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Kendal Walker
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Omar Tibi
- John Hopkins Unversity, Baltimore, MD, USA
| | | | - Lucas M Serra
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Jie Zheng
- University of Pennsylvania, Philadelphia, PA, USA
| | | | | | - K Sue O'Shea
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Alexander D Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA.
| |
Collapse
|
56
|
Sarntivijai S, He Y, Diehl AD. Cells in ExperimentaL Life Sciences (CELLS-2018): capturing the knowledge of normal and diseased cells with ontologies. BMC Bioinformatics 2019; 20:183. [PMID: 31272374 PMCID: PMC6509796 DOI: 10.1186/s12859-019-2721-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Cell cultures and cell lines are widely used in life science experiments. In conjunction with the 2018 International Conference on Biomedical Ontology (ICBO-2018), the 2nd International Workshop on Cells in ExperimentaL Life Science (CELLS-2018) focused on two themes of knowledge representation, for newly-discovered cell types and for cells in disease states. This workshop included five oral presentations and a general discussion session. Two new ontologies, including the Cancer Cell Ontology (CCL) and the Ontology for Stem Cell Investigations (OSCI), were reported in the workshop. In another representation, the Cell Line Ontology (CLO) framework was applied and extended to represent cell line cells used in China and their Chinese representation. Other presentations included a report on the application of ontologies to cross-compare cell types and marker patterns used in flow cytometry studies, and a presentation on new experimental findings about novel cell types based on single cell RNA sequencing assay and their corresponding ontological representation. The general discussion session focused on the ontology design patterns in representing newly-discovered cell types and cells in disease states.
Collapse
Affiliation(s)
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, MI USA
| | - Alexander D. Diehl
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY USA
| |
Collapse
|
57
|
Improving the Utility of the Tox21 Dataset by Deep Metadata Annotations and Constructing Reusable Benchmarked Chemical Reference Signatures. Molecules 2019; 24:molecules24081604. [PMID: 31018579 PMCID: PMC6515292 DOI: 10.3390/molecules24081604] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 04/16/2019] [Accepted: 04/19/2019] [Indexed: 02/03/2023] Open
Abstract
The Toxicology in the 21st Century (Tox21) project seeks to develop and test methods for high-throughput examination of the effect certain chemical compounds have on biological systems. Although primary and toxicity assay data were readily available for multiple reporter gene modified cell lines, extensive annotation and curation was required to improve these datasets with respect to how FAIR (Findable, Accessible, Interoperable, and Reusable) they are. In this study, we fully annotated the Tox21 published data with relevant and accepted controlled vocabularies. After removing unreliable data points, we aggregated the results and created three sets of signatures reflecting activity in the reporter gene assays, cytotoxicity, and selective reporter gene activity, respectively. We benchmarked these signatures using the chemical structures of the tested compounds and obtained generally high receiver operating characteristic (ROC) scores, suggesting good quality and utility of these signatures and the underlying data. We analyzed the results to identify promiscuous individual compounds and chemotypes for the three signature categories and interpreted the results to illustrate the utility and re-usability of the datasets. With this study, we aimed to demonstrate the importance of data standards in reporting screening results and high-quality annotations to enable re-use and interpretation of these data. To improve the data with respect to all FAIR criteria, all assay annotations, cleaned and aggregate datasets, and signatures were made available as standardized dataset packages (Aggregated Tox21 bioactivity data, 2019).
Collapse
|
58
|
Yao F, Madani Tonekaboni SA, Safikhani Z, Smirnov P, El-Hachem N, Freeman M, Manem VSK, Haibe-Kains B. Tissue specificity of in vitro drug sensitivity. J Am Med Inform Assoc 2019; 25:158-166. [PMID: 29016819 DOI: 10.1093/jamia/ocx062] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 05/22/2017] [Indexed: 12/11/2022] Open
Abstract
Objectives We sought to investigate the tissue specificity of drug sensitivities in large-scale pharmacological studies and compare these associations to those found in drug clinical indications. Materials and Methods We leveraged the curated cell line response data from PharmacoGx and applied an enrichment algorithm on drug sensitivity values' area under the drug dose-response curves (AUCs) with and without adjustment for general level of drug sensitivity. Results We observed tissue specificity in 63% of tested drugs, with 8% of total interactions deemed significant (false discovery rate <0.05). By restricting the drug-tissue interactions to those with AUC > 0.2, we found that in 52% of interactions, the tissue was predictive of drug sensitivity (concordance index > 0.65). When compared with clinical indications, the observed overlap was weak (Matthew correlation coefficient, MCC = 0.0003, P > .10). Discussion While drugs exhibit significant tissue specificity in vitro, there is little overlap with clinical indications. This can be attributed to factors such as underlying biological differences between in vitro models and patient tumors, or the inability of tissue-specific drugs to bring additional benefits beyond gold standard treatments during clinical trials. Conclusion Our meta-analysis of pan-cancer drug screening datasets indicates that most tested drugs exhibit tissue-specific sensitivities in a large panel of cancer cell lines. However, the observed preclinical results do not translate to the clinical setting. Our results suggest that additional research into showing parallels between preclinical and clinical data is required to increase the translational potential of in vitro drug screening.
Collapse
Affiliation(s)
- Fupan Yao
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Seyed Ali Madani Tonekaboni
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Zhaleh Safikhani
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Petr Smirnov
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Nehme El-Hachem
- Integrative Systems Biology, Institut de Recherches Cliniques de Montréal, Montreal, Quebec, Canada.,Department of Medicine, University of Montreal, Montréal, Quebec, Canada
| | - Mark Freeman
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada
| | - Venkata Satya Kumar Manem
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Ontario Institute of Cancer Research, Toronto, Ontario, Canada
| |
Collapse
|
59
|
Oughtred R, Stark C, Breitkreutz BJ, Rust J, Boucher L, Chang C, Kolas N, O’Donnell L, Leung G, McAdam R, Zhang F, Dolma S, Willems A, Coulombe-Huntington J, Chatr-aryamontri A, Dolinski K, Tyers M. The BioGRID interaction database: 2019 update. Nucleic Acids Res 2019; 47:D529-D541. [PMID: 30476227 PMCID: PMC6324058 DOI: 10.1093/nar/gky1079] [Citation(s) in RCA: 935] [Impact Index Per Article: 155.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 10/15/2018] [Accepted: 11/22/2018] [Indexed: 12/17/2022] Open
Abstract
The Biological General Repository for Interaction Datasets (BioGRID: https://thebiogrid.org) is an open access database dedicated to the curation and archival storage of protein, genetic and chemical interactions for all major model organism species and humans. As of September 2018 (build 3.4.164), BioGRID contains records for 1 598 688 biological interactions manually annotated from 55 809 publications for 71 species, as classified by an updated set of controlled vocabularies for experimental detection methods. BioGRID also houses records for >700 000 post-translational modification sites. BioGRID now captures chemical interaction data, including chemical-protein interactions for human drug targets drawn from the DrugBank database and manually curated bioactive compounds reported in the literature. A new dedicated aspect of BioGRID annotates genome-wide CRISPR/Cas9-based screens that report gene-phenotype and gene-gene relationships. An extension of the BioGRID resource called the Open Repository for CRISPR Screens (ORCS) database (https://orcs.thebiogrid.org) currently contains over 500 genome-wide screens carried out in human or mouse cell lines. All data in BioGRID is made freely available without restriction, is directly downloadable in standard formats and can be readily incorporated into existing applications via our web service platforms. BioGRID data are also freely distributed through partner model organism databases and meta-databases.
Collapse
Affiliation(s)
- Rose Oughtred
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Chris Stark
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Bobby-Joe Breitkreutz
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Jennifer Rust
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Lorrie Boucher
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Christie Chang
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Nadine Kolas
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Lara O’Donnell
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Genie Leung
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Rochelle McAdam
- Arthur and Sonia Labatt Brain Tumor Research Center and Developmental and Stem Cell Biology, The Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
| | - Frederick Zhang
- Arthur and Sonia Labatt Brain Tumor Research Center and Developmental and Stem Cell Biology, The Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
| | - Sonam Dolma
- Arthur and Sonia Labatt Brain Tumor Research Center and Developmental and Stem Cell Biology, The Hospital for Sick Children, Toronto, Ontario M5G 0A4, Canada
| | - Andrew Willems
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
| | - Jasmin Coulombe-Huntington
- Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Quebec H3C 3J7, Canada
| | - Andrew Chatr-aryamontri
- Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Quebec H3C 3J7, Canada
| | - Kara Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Mike Tyers
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada
- Institute for Research in Immunology and Cancer, Université de Montréal, Montréal, Quebec H3C 3J7, Canada
| |
Collapse
|
60
|
Sayers S, Li L, Ong E, Deng S, Fu G, Lin Y, Yang B, Zhang S, Fa Z, Zhao B, Xiang Z, Li Y, Zhao XM, Olszewski MA, Chen L, He Y. Victors: a web-based knowledge base of virulence factors in human and animal pathogens. Nucleic Acids Res 2019; 47:D693-D700. [PMID: 30365026 PMCID: PMC6324020 DOI: 10.1093/nar/gky999] [Citation(s) in RCA: 113] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 10/07/2018] [Accepted: 10/09/2018] [Indexed: 12/21/2022] Open
Abstract
Virulence factors (VFs) are molecules that allow microbial pathogens to overcome host defense mechanisms and cause disease in a host. It is critical to study VFs for better understanding microbial pathogenesis and host defense mechanisms. Victors (http://www.phidias.us/victors) is a novel, manually curated, web-based integrative knowledge base and analysis resource for VFs of pathogens that cause infectious diseases in human and animals. Currently, Victors contains 5296 VFs obtained via manual annotation from peer-reviewed publications, with 4648, 179, 105 and 364 VFs originating from 51 bacterial, 54 viral, 13 parasitic and 8 fungal species, respectively. Our data analysis identified many VF-specific patterns. Within the global VF pool, cytoplasmic proteins were more common, while adhesins were less common compared to findings on protective vaccine antigens. Many VFs showed homology with host proteins and the human proteins interacting with VFs represented the hubs of human-pathogen interactions. All Victors data are queriable with a user-friendly web interface. The VFs can also be searched by a customized BLAST sequence similarity searching program. These VFs and their interactions with the host are represented in a machine-readable Ontology of Host-Pathogen Interactions. Victors supports the 'One Health' research as a vital source of VFs in human and animal pathogens.
Collapse
Affiliation(s)
- Samantha Sayers
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Li Li
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Edison Ong
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Shunzhou Deng
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Veterinary Medicine, Jiangxi Agricultural University, Nanchang, Jiangxi 330045, China
| | - Guanghua Fu
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Institute of Animal Husbandry and Veterinary Medicine, Fujian Academy of Agricultural Sciences, Fuzhou, Fujian 350013, China
| | - Yu Lin
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Brian Yang
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Shelley Zhang
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Zhenzong Fa
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Health System and Research Service, VA Ann Arbor Health Systems, Ann Arbor 48109, MI, USA
| | - Bin Zhao
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Zuoshuang Xiang
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Yongqing Li
- Institute of Animal Husbandry and Veterinary Medicine, Beijing Municipal Academy of Agriculture and Forestry Sciences, Beijing 100097, China
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Michal A Olszewski
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Health System and Research Service, VA Ann Arbor Health Systems, Ann Arbor 48109, MI, USA
| | - Luonan Chen
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
- School of Life Science and Technology, Shanghai Tech University, Shanghai 201210, China
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
61
|
Hoyt CT, Domingo-Fernández D, Aldisi R, Xu L, Kolpeja K, Spalek S, Wollert E, Bachman J, Gyori BM, Greene P, Hofmann-Apitius M. Re-curation and rational enrichment of knowledge graphs in Biological Expression Language. Database (Oxford) 2019; 2019:baz068. [PMID: 31225582 PMCID: PMC6587072 DOI: 10.1093/database/baz068] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 04/03/2019] [Accepted: 04/29/2019] [Indexed: 12/23/2022]
Abstract
The rapid accumulation of new biomedical literature not only causes curated knowledge graphs (KGs) to become outdated and incomplete, but also makes manual curation an impractical and unsustainable solution. Automated or semi-automated workflows are necessary to assist in prioritizing and curating the literature to update and enrich KGs. We have developed two workflows: one for re-curating a given KG to assure its syntactic and semantic quality and another for rationally enriching it by manually revising automatically extracted relations for nodes with low information density. We applied these workflows to the KGs encoded in Biological Expression Language from the NeuroMMSig database using content that was pre-extracted from MEDLINE abstracts and PubMed Central full-text articles using text mining output integrated by INDRA. We have made this workflow freely available at https://github.com/bel-enrichment/bel-enrichment.
Collapse
Affiliation(s)
- Charles Tapley Hoyt
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Rana Aldisi
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Lingling Xu
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Kristian Kolpeja
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| | - Sandra Spalek
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| | - Esther Wollert
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| | - John Bachman
- Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Ave, Boston, MA, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Ave, Boston, MA, USA
| | - Patrick Greene
- Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Ave, Boston, MA, USA
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|
62
|
Bukhari SAC, Martínez-Romero M, O' Connor MJ, Egyedi AL, Willrett D, Graybeal J, Musen MA, Cheung KH, Kleinstein SH. CEDAR OnDemand: a browser extension to generate ontology-based scientific metadata. BMC Bioinformatics 2018; 19:268. [PMID: 30012108 PMCID: PMC6048706 DOI: 10.1186/s12859-018-2247-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 06/14/2018] [Indexed: 12/17/2022] Open
Abstract
Background Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets. To increase reuse, discoverability, and reproducibility of the described experiments, datasets should be appropriately annotated by using agreed-upon terms, ideally from ontologies or other controlled term sources. Results This work presents “CEDAR OnDemand”, a browser extension powered by the NCBO (National Center for Biomedical Ontology) BioPortal that enables users to seamlessly enter ontology-based metadata through existing web forms native to individual repositories. CEDAR OnDemand analyzes the web page contents to identify the text input fields and associate them with relevant ontologies which are recommended automatically based upon input fields’ labels (using the NCBO ontology recommender) and a pre-defined list of ontologies. These field-specific ontologies are used for controlling metadata entry. CEDAR OnDemand works for any web form designed in the HTML format. We demonstrate how CEDAR OnDemand works through the NCBI (National Center for Biotechnology Information) BioSample web-based metadata entry. Conclusion CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories. CEDAR OnDemand is available freely on the Google Chrome store https://chrome.google.com/webstore/search/CEDAROnDemand
Collapse
Affiliation(s)
| | - Marcos Martínez-Romero
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Martin J O' Connor
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Attila L Egyedi
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Debra Willrett
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - John Graybeal
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Mark A Musen
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| | - Kei-Hoi Cheung
- Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA. .,Department of Emergency Medicine and Yale Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, USA.
| | - Steven H Kleinstein
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA. .,Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
| |
Collapse
|
63
|
Ławrynowicz A, Potoniec J, Robaczyk M, Tudorache T. Discovery of Emerging Design Patterns in Ontologies Using Tree Mining. SEMANTIC WEB 2018; 9:517-544. [PMID: 30505251 PMCID: PMC6261490 DOI: 10.3233/sw-170280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The research goal of this work is to investigate modeling patterns that recur in ontologies. Such patterns may originate from certain design solutions, and they may possibly indicate emerging ontology design patterns. We describe our tree-mining method for identifying the emerging design patterns. The method works in two steps: (1) we transform the ontology axioms in a tree shape in order to find axiom patterns; and then, (2) we use association analysis to mine co-occuring axiom patterns in order to extract emerging design patterns. We conduct an experimental study on a set of 331 ontologies from the BioPortal repository. We show that recurring axiom patterns appear across all individual ontologies, as well as across the whole set. In individual ontologies, we find frequent and non-trivial patterns with and without variables. Some of the former patterns have more than 300,000 occurrences. The longest pattern without a variable discovered from the whole ontology set has size 12, and it appears in 14 ontologies. To the best of our knowledge, this is the first method for automatic discovery of emerging design patterns in ontologies. Finally, we demonstrate that we are able to automatically detect patterns, for which we have manually confirmed that they are fragments of ontology design patterns described in the literature. Since our method is not specific to particular ontologies, we conclude that we should be able to discover new, emerging design patterns for arbitrary ontology sets.
Collapse
Affiliation(s)
- Agnieszka Ławrynowicz
- Faculty of Computing, Poznan University of Technology, ul. Piotrowo 3, 60-965 Poznan, Poland
| | - Jedrzej Potoniec
- Faculty of Computing, Poznan University of Technology, ul. Piotrowo 3, 60-965 Poznan, Poland
| | - Michał Robaczyk
- Faculty of Computing, Poznan University of Technology, ul. Piotrowo 3, 60-965 Poznan, Poland
| | - Tania Tudorache
- Stanford Center for Biomedical Informatics Research, Stanford University, 1265 Welch Road, Stanford, CA 94305, USA
| |
Collapse
|
64
|
Sustainable data and metadata management at the BD2K-LINCS Data Coordination and Integration Center. Sci Data 2018; 5:180117. [PMID: 29917015 PMCID: PMC6007090 DOI: 10.1038/sdata.2018.117] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 05/11/2018] [Indexed: 12/18/2022] Open
Abstract
The NIH-funded LINCS Consortium is creating an extensive reference library of cell-based perturbation response signatures and sophisticated informatics tools incorporating a large number of perturbagens, model systems, and assays. To date, more than 350 datasets have been generated including transcriptomics, proteomics, epigenomics, cell phenotype and competitive binding profiling assays. The large volume and variety of data necessitate rigorous data standards and effective data management including modular data processing pipelines and end-user interfaces to facilitate accurate and reliable data exchange, curation, validation, standardization, aggregation, integration, and end user access. Deep metadata annotations and the use of qualified data standards enable integration with many external resources. Here we describe the end-to-end data processing and management at the DCIC to generate a high-quality and persistent product. Our data management and stewardship solutions enable a functioning Consortium and make LINCS a valuable scientific resource that aligns with big data initiatives such as the BD2K NIH Program and concords with emerging data science best practices including the findable, accessible, interoperable, and reusable (FAIR) principles.
Collapse
|
65
|
Zaritsky A. Sharing and reusing cell image data. Mol Biol Cell 2018; 29:1274-1280. [PMID: 29851565 PMCID: PMC5994892 DOI: 10.1091/mbc.e17-10-0606] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Revised: 04/02/2018] [Accepted: 04/06/2018] [Indexed: 01/19/2023] Open
Abstract
The rapid growth in content and complexity of cell image data creates an opportunity for synergy between experimental and computational scientists. Sharing microscopy data enables computational scientists to develop algorithms and tools for data analysis, integration, and mining. These tools can be applied by experimentalists to promote hypothesis-generation and discovery. We are now at the dawn of this revolution: infrastructure is being developed for data standardization, deposition, sharing, and analysis; some journals and funding agencies mandate data deposition; data journals publish high-content microscopy data sets; quantification becomes standard in scientific publications; new analytic tools are being developed and dispatched to the community; and huge data sets are being generated by individual labs and philanthropic initiatives. In this Perspective, I reflect on sharing and reusing cell image data and the opportunities that will come along with it.
Collapse
|
66
|
Giraldo O, Garcia A, Corcho O. A guideline for reporting experimental protocols in life sciences. PeerJ 2018; 6:e4795. [PMID: 29868256 PMCID: PMC5978404 DOI: 10.7717/peerj.4795] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Accepted: 04/29/2018] [Indexed: 01/01/2023] Open
Abstract
Experimental protocols are key when planning, performing and publishing research in many disciplines, especially in relation to the reporting of materials and methods. However, they vary in their content, structure and associated data elements. This article presents a guideline for describing key content for reporting experimental protocols in the domain of life sciences, together with the methodology followed in order to develop such guideline. As part of our work, we propose a checklist that contains 17 data elements that we consider fundamental to facilitate the execution of the protocol. These data elements are formally described in the SMART Protocols ontology. By providing guidance for the key content to be reported, we aim (1) to make it easier for authors to report experimental protocols with necessary and sufficient information that allow others to reproduce an experiment, (2) to promote consistency across laboratories by delivering an adaptable set of data elements, and (3) to make it easier for reviewers and editors to measure the quality of submitted manuscripts against an established criteria. Our checklist focuses on the content, what should be included. Rather than advocating a specific format for protocols in life sciences, the checklist includes a full description of the key data elements that facilitate the execution of the protocol.
Collapse
Affiliation(s)
- Olga Giraldo
- Ontology Engineering Group, Campus de Montegancedo, Boadilla del Monte, Universidad Politécnica de Madrid, Madrid, Spain
| | - Alexander Garcia
- Ontology Engineering Group, Campus de Montegancedo, Boadilla del Monte, Universidad Politécnica de Madrid, Madrid, Spain
- Technische Universität Graz, Graz, Austria
| | - Oscar Corcho
- Ontology Engineering Group, Campus de Montegancedo, Boadilla del Monte, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
67
|
Abstract
The Cellosaurus is a knowledge resource on cell lines. It aims to describe all cell lines used in biomedical research. Its scope encompasses both vertebrates and invertebrates. Currently, information for >100,000 cell lines is provided. For each cell line, it provides a wealth of information, cross-references, and literature citations. The Cellosaurus is available on the ExPASy server (https://web.expasy.org/cellosaurus/) and can be downloaded in a variety of formats. Among its many uses, the Cellosaurus is a key resource to help researchers identify potentially contaminated/misidentified cell lines, thus contributing to improving the quality of research in the life sciences.
Collapse
Affiliation(s)
- Amos Bairoch
- Computer and Laboratory Investigation of Proteins of Human Origin Group, Faculty of Medicine, Swiss Institute of Bioinformatics, University of Geneva, Geneva 4, Switzerland
| |
Collapse
|
68
|
Vita R, Overton JA, Peters B. Identification of errors in the IEDB using ontologies. Database (Oxford) 2018; 2018:4904119. [PMID: 29688357 PMCID: PMC5824775 DOI: 10.1093/database/bay005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 12/11/2017] [Accepted: 01/04/2018] [Indexed: 12/02/2022]
Abstract
The Immune Epitope Database (IEDB) is a free online resource that has manually curated over 18 500 references from the scientific literature. Our database presents experimental data relating to the recognition of immune epitopes by the adaptive immune system in a structured, searchable manner. In order to be consistent and accurate in our data representation across many different journals, authors and curators, we have implemented several quality control measures, such as curation rules, controlled vocabularies and links to external ontologies and other resources. Ontologies and other resources have greatly benefited the IEDB through improved search interfaces, easier curation practices, interoperability between the IEDB and other databases and the identification of errors within our dataset. Here, we will elaborate on how ontology mapping and usage can be used to find and correct errors in a manually curated database.Database URL: www.iedb.org.
Collapse
Affiliation(s)
- Randi Vita
- Center for Infectious Disease, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - James A Overton
- Center for Infectious Disease, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Bjoern Peters
- Center for Infectious Disease, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| |
Collapse
|
69
|
Ong E, Sarntivijai S, Jupp S, Parkinson H, He Y. Comparison, alignment, and synchronization of cell line information between CLO and EFO. BMC Bioinformatics 2017; 18:557. [PMID: 29322915 PMCID: PMC5763470 DOI: 10.1186/s12859-017-1979-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Experimental Factor Ontology (EFO) is an application ontology driven by experimental variables including cell lines to organize and describe the diverse experimental variables and data resided in the EMBL-EBI resources. The Cell Line Ontology (CLO) is an OBO community-based ontology that contains information of immortalized cell lines and relevant experimental components. EFO integrates and extends ontologies from the bio-ontology community to drive a number of practical applications. It is desirable that the community shares design patterns and therefore that EFO reuses the cell line representation from the Cell Line Ontology (CLO). There are, however, challenges to be addressed when developing a common ontology design pattern for representing cell lines in both EFO and CLO. RESULTS In this study, we developed a strategy to compare and map cell line terms between EFO and CLO. We examined Cellosaurus resources for EFO-CLO cross-references. Text labels of cell lines from both ontologies were verified by biological information axiomatized in each source. The study resulted in the identification 873 EFO-CLO aligned and 344 EFO unique immortalized permanent cell lines. All of these cell lines were updated to CLO and the cell line related information was merged. A design pattern that integrates EFO and CLO was also developed. CONCLUSION Our study compared, aligned, and synchronized the cell line information between CLO and EFO. The final updated CLO will be examined as the candidate ontology to import and replace eligible EFO cell line classes thereby supporting the interoperability in the bio-ontology domain. Our mapping pipeline illustrates the use of ontology in aiding biological data standardization and integration through the biological and semantics content of cell lines.
Collapse
Affiliation(s)
- Edison Ong
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Samples, Phenotypes, and Ontologies Team, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridge, UK
| | - Sirarat Sarntivijai
- Samples, Phenotypes, and Ontologies Team, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridge, UK
| | - Simon Jupp
- Samples, Phenotypes, and Ontologies Team, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridge, UK
| | - Helen Parkinson
- Samples, Phenotypes, and Ontologies Team, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridge, UK
| | - Yongqun He
- Center of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Unit of Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI USA
| |
Collapse
|
70
|
Ong E, Xie J, Ni Z, Liu Q, Sarntivijai S, Lin Y, Cooper D, Terryn R, Stathias V, Chung C, Schürer S, He Y. Ontological representation, integration, and analysis of LINCS cell line cells and their cellular responses. BMC Bioinformatics 2017; 18:556. [PMID: 29322930 PMCID: PMC5763302 DOI: 10.1186/s12859-017-1981-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background Aiming to understand cellular responses to different perturbations, the NIH Common Fund Library of Integrated Network-based Cellular Signatures (LINCS) program involves many institutes and laboratories working on over a thousand cell lines. The community-based Cell Line Ontology (CLO) is selected as the default ontology for LINCS cell line representation and integration. Results CLO has consistently represented all 1097 LINCS cell lines and included information extracted from the LINCS Data Portal and ChEMBL. Using MCF 10A cell line cells as an example, we demonstrated how to ontologically model LINCS cellular signatures such as their non-tumorigenic epithelial cell type, three-dimensional growth, latrunculin-A-induced actin depolymerization and apoptosis, and cell line transfection. A CLO subset view of LINCS cell lines, named LINCS-CLOview, was generated to support systematic LINCS cell line analysis and queries. In summary, LINCS cell lines are currently associated with 43 cell types, 131 tissues and organs, and 121 cancer types. The LINCS-CLO view information can be queried using SPARQL scripts. Conclusions CLO was used to support ontological representation, integration, and analysis of over a thousand LINCS cell line cells and their cellular responses.
Collapse
Affiliation(s)
- Edison Ong
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jiangan Xie
- Unit of Laboratory Animal Medicine and Department of Micro biology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Zhaohui Ni
- Unit of Laboratory Animal Medicine and Department of Micro biology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Qingping Liu
- Unit of Laboratory Animal Medicine and Department of Micro biology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Sirarat Sarntivijai
- Samples, Phenotypes and Ontologies Team, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridge, UK
| | - Yu Lin
- Department of Molecular and Cellular Pharmacology, University of Miami, Miami, FL, USA
| | - Daniel Cooper
- Department of Molecular and Cellular Pharmacology, University of Miami, Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, University of Miami, Miami, FL, USA
| | - Raymond Terryn
- Department of Molecular and Cellular Pharmacology, University of Miami, Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, University of Miami, Miami, FL, USA
| | - Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, University of Miami, Miami, FL, USA.,BD2K LINCS Data Coordination and Integration Center, University of Miami, Miami, FL, USA
| | - Caty Chung
- BD2K LINCS Data Coordination and Integration Center, University of Miami, Miami, FL, USA.,Center for Computational Science, University of Miami, Miami, FL, USA
| | - Stephan Schürer
- Department of Molecular and Cellular Pharmacology, University of Miami, Miami, FL, USA. .,BD2K LINCS Data Coordination and Integration Center, University of Miami, Miami, FL, USA. .,Center for Computational Science, University of Miami, Miami, FL, USA.
| | - Yongqun He
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. .,Unit of Laboratory Animal Medicine and Department of Micro biology and Immunology, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
71
|
Abstract
BACKGROUND Cell lines and cell types are extensively studied in biomedical research yielding to a significant amount of publications each year. Identifying cell lines and cell types precisely in publications is crucial for science reproducibility and knowledge integration. There are efforts for standardisation of the cell nomenclature based on ontology development to support FAIR principles of the cell knowledge. However, it is important to analyse the usage of cell nomenclature in publications at a large scale for understanding the level of uptake of cell nomenclature in literature by scientists. In this study, we analyse the usage of cell nomenclature, both in Vivo, and in Vitro in biomedical literature by using text mining methods and present our results. RESULTS We identified 59% of the cell type classes in the Cell Ontology and 13% of the cell line classes in the Cell Line Ontology in the literature. Our analysis showed that cell line nomenclature is much more ambiguous compared to the cell type nomenclature. However, trends indicate that standardised nomenclature for cell lines and cell types are being increasingly used in publications by the scientists. CONCLUSIONS Our findings provide an insight to understand how experimental cells are described in publications and may allow for an improved standardisation of cell type and cell line nomenclature as well as can be utilised to develop efficient text mining applications on cell types and cell lines. All data generated in this study is available at https://github.com/shenay/CellNomenclatureStudy.
Collapse
Affiliation(s)
- Şenay Kafkas
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University Science and Technology, 4700 KAUST, Thuwal, 23955-6900 Saudi Arabia
| | - Sirarat Sarntivijai
- The European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, SD CB10 1 UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University Science and Technology, 4700 KAUST, Thuwal, 23955-6900 Saudi Arabia
| |
Collapse
|
72
|
Vita R, Overton JA, Sette A, Peters B. Better living through ontologies at the Immune Epitope Database. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:3074785. [PMID: 28365732 PMCID: PMC5467561 DOI: 10.1093/database/bax014] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2016] [Accepted: 02/06/2017] [Indexed: 12/27/2022]
Abstract
The Immune Epitope Database (IEDB) project incorporates independently developed ontologies and controlled vocabularies into its curation and search interface. This simplifies curation practices, improves the user query experience and facilitates interoperability between the IEDB and other resources. While the use of independently developed ontologies has long been recommended as a best practice, there continues to be a significant number of projects that develop their own vocabularies instead, or that do not fully utilize the power of ontologies that they are using. We describe how we use ontologies in the IEDB, providing a concrete example of the benefits of ontologies in practice. Database URL:www.iedb.org
Collapse
Affiliation(s)
- Randi Vita
- La Jolla Institute for Allergy & Immunology, Center for Infectious Disease, La Jolla, CA 92037, USA
| | - James A Overton
- La Jolla Institute for Allergy & Immunology, Center for Infectious Disease, La Jolla, CA 92037, USA
| | - Alessandro Sette
- La Jolla Institute for Allergy & Immunology, Center for Infectious Disease, La Jolla, CA 92037, USA
| | - Bjoern Peters
- La Jolla Institute for Allergy & Immunology, Center for Infectious Disease, La Jolla, CA 92037, USA
| |
Collapse
|
73
|
Giraldo O, García A, López F, Corcho O. Using semantics for representing experimental protocols. J Biomed Semantics 2017; 8:52. [PMID: 29132408 PMCID: PMC5683383 DOI: 10.1186/s13326-017-0160-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 10/15/2017] [Indexed: 02/19/2024] Open
Abstract
Background An experimental protocol is a sequence of tasks and operations executed to perform experimental research in biological and biomedical areas, e.g. biology, genetics, immunology, neurosciences, virology. Protocols often include references to equipment, reagents, descriptions of critical steps, troubleshooting and tips, as well as any other information that researchers deem important for facilitating the reusability of the protocol. Although experimental protocols are central to reproducibility, the descriptions are often cursory. There is the need for a unified framework with respect to the syntactic structure and the semantics for representing experimental protocols. Results In this paper we present “SMART Protocols ontology”, an ontology for representing experimental protocols. Our ontology represents the protocol as a workflow with domain specific knowledge embedded within a document. We also present the Sample Instrument Reagent Objective (SIRO) model, which represents the minimal common information shared across experimental protocols. SIRO was conceived in the same realm as the Patient Intervention Comparison Outcome (PICO) model that supports search, retrieval and classification purposes in evidence based medicine. We evaluate our approach against a set of competency questions modeled as SPARQL queries and processed against a set of published and unpublished protocols modeled with the SP Ontology and the SIRO model. Our approach makes it possible to answer queries such as Which protocols use tumor tissue as a sample. Conclusion Improving reporting structures for experimental protocols requires collective efforts from authors, peer reviewers, editors and funding bodies. The SP Ontology is a contribution towards this goal. We build upon previous experiences and bringing together the view of researchers managing protocols in their laboratory work. Website: https://smartprotocols.github.io/.
Collapse
Affiliation(s)
- Olga Giraldo
- Ontology Engineering Group, Madrid, Universidad Politécnica de Madrid, Madrid, 28660, Spain.
| | - Alexander García
- Ontology Engineering Group, Madrid, Universidad Politécnica de Madrid, Madrid, 28660, Spain
| | | | - Oscar Corcho
- Ontology Engineering Group, Madrid, Universidad Politécnica de Madrid, Madrid, 28660, Spain
| |
Collapse
|
74
|
Ontology-based systematical representation and drug class effect analysis of package insert-reported adverse events associated with cardiovascular drugs used in China. Sci Rep 2017; 7:13819. [PMID: 29061976 PMCID: PMC5653862 DOI: 10.1038/s41598-017-12580-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 09/07/2017] [Indexed: 01/31/2023] Open
Abstract
With increased usage of cardiovascular drugs (CVDs) for treating cardiovascular diseases, it is important to analyze CVD-associated adverse events (AEs). In this study, we systematically collected package insert-reported AEs associated with CVDs used in China, and developed and analyzed an Ontology of Cardiovascular Drug AEs (OCVDAE). Extending the Ontology of AEs (OAE) and NDF-RT, OCVDAE includes 194 CVDs, CVD ingredients, mechanisms of actions (MoAs), and CVD-associated 736 AEs. An AE-specific drug class effect is defined to exist when all the drugs (drug chemical ingredients or drug products) in a drug class are associated with an AE, which is formulated as a new proportional class level ratio (“PCR”) = 1. Our PCR-based heatmap analysis identified many class level drug effects on different AE classes such as behavioral and neurological AE and digestive system AE. Additional drug-AE correlation tests (i.e., class-level PRR, Chi-squared, and minimal case reports) were also modified and applied to further detect statistically significant drug class effects. Two drug ingredient classes and three CVD MoA classes were found to have statistically significant class effects on 13 AEs. For example, the CVD Active Transporter Interactions class (including reserpine, indapamide, digoxin, and deslanoside) has statistically significant class effect on anorexia and diarrhea AEs.
Collapse
|
75
|
Becnel LB, Ochsner SA, Darlington YF, McOwiti A, Kankanamge WH, Dehart M, Naumov A, McKenna NJ. Discovering relationships between nuclear receptor signaling pathways, genes, and tissues in Transcriptomine. Sci Signal 2017; 10:10/476/eaah6275. [DOI: 10.1126/scisignal.aah6275] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
76
|
Conceptual Modeling for Genomics: Building an Integrated Repository of Open Data. CONCEPTUAL MODELING 2017. [DOI: 10.1007/978-3-319-69904-2_26] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
77
|
Abstract
The overarching goal of the Gene Ontology (GO) Consortium is to provide researchers in biology and biomedicine with all current functional information concerning genes and the cellular context under which these occur. When the GO was started in the 1990s surprisingly little attention had been given to how functional information about genes was to be uniformly captured, structured in a computable form, and made accessible to biologists. Because knowledge of gene, protein, ncRNA, and molecular complex roles is continuously accumulating and changing, the GO needed to be a dynamic resource, accurately tracking ongoing research results over time. Here I describe the progress that has been made over the years towards this goal, and the work that still remains to be done, to make of the Gene Ontology (GO) Consortium realize its goal of offering the most comprehensive and up-to-date resource for information on gene function.
Collapse
Affiliation(s)
- Suzanna E Lewis
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA.
| |
Collapse
|
78
|
Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrián-Uhalte E, Davies M, Dedman N, Karlsson A, Magariños MP, Overington JP, Papadatos G, Smit I, Leach AR. The ChEMBL database in 2017. Nucleic Acids Res 2016; 45:D945-D954. [PMID: 27899562 PMCID: PMC5210557 DOI: 10.1093/nar/gkw1074] [Citation(s) in RCA: 1489] [Impact Index Per Article: 165.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Revised: 10/21/2016] [Accepted: 10/30/2016] [Indexed: 11/14/2022] Open
Abstract
ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services.
Collapse
Affiliation(s)
- Anna Gaulton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anne Hersey
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Michał Nowotka
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - A Patrícia Bento
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Jon Chambers
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - David Mendez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Prudence Mutowo
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Francis Atkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Louisa J Bellis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Elena Cibrián-Uhalte
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Mark Davies
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Nathan Dedman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Anneli Karlsson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - María Paula Magariños
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - John P Overington
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - George Papadatos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ines Smit
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Andrew R Leach
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
79
|
Fernández JM, de la Torre V, Richardson D, Royo R, Puiggròs M, Moncunill V, Fragkogianni S, Clarke L, Flicek P, Rico D, Torrents D, Carrillo de Santa Pau E, Valencia A. The BLUEPRINT Data Analysis Portal. Cell Syst 2016; 3:491-495.e5. [PMID: 27863955 PMCID: PMC5919098 DOI: 10.1016/j.cels.2016.10.021] [Citation(s) in RCA: 92] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Revised: 10/10/2016] [Accepted: 10/24/2016] [Indexed: 10/20/2022]
Abstract
The impact of large and complex epigenomic datasets on biological insights or clinical applications is limited by the lack of accessibility by easy, intuitive, and fast tools. Here, we describe an epigenomics comparative cyber-infrastructure (EPICO), an open-access reference set of libraries to develop comparative epigenomic data portals. Using EPICO, large epigenome projects can make available their rich datasets to the community without requiring specific technical skills. As a first instance of EPICO, we implemented the BLUEPRINT Data Analysis Portal (BDAP). BDAP provides a desktop for the comparative analysis of epigenomes of hematopoietic cell types based on results, such as the position of epigenetic features, from basic analysis pipelines. The BDAP interface facilitates interactive exploration of genomic regions, genes, and pathways in the context of differentiation of hematopoietic lineages. This work represents initial steps toward broadly accessible integrative analysis of epigenomic data across international consortia. EPICO can be accessed at https://github.com/inab, and BDAP can be accessed at http://blueprint-data.bsc.es.
Collapse
Affiliation(s)
- José María Fernández
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain; Spanish Bioinformatics Institute INB-ISCIII ES-ELIXIR, Madrid 28029, Spain
| | - Victor de la Torre
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain; Spanish Bioinformatics Institute INB-ISCIII ES-ELIXIR, Madrid 28029, Spain
| | - David Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Romina Royo
- Spanish Bioinformatics Institute INB-ISCIII ES-ELIXIR, Madrid 28029, Spain; Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB, Research Program in Computational Biology, BSC - CRG - IRB, Barcelona 08028, Spain
| | - Montserrat Puiggròs
- Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB, Research Program in Computational Biology, BSC - CRG - IRB, Barcelona 08028, Spain
| | - Valentí Moncunill
- Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB, Research Program in Computational Biology, BSC - CRG - IRB, Barcelona 08028, Spain
| | - Stamatina Fragkogianni
- Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB, Research Program in Computational Biology, BSC - CRG - IRB, Barcelona 08028, Spain
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Daniel Rico
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - David Torrents
- Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB, Research Program in Computational Biology, BSC - CRG - IRB, Barcelona 08028, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona 08010, Spain
| | | | - Alfonso Valencia
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain; Spanish Bioinformatics Institute INB-ISCIII ES-ELIXIR, Madrid 28029, Spain.
| |
Collapse
|
80
|
Ong E, He Y. Community-based Ontology Development, Annotation and Discussion with MediaWiki extension Ontokiwi and Ontokiwi-based Ontobedia. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2016; 2016:65-74. [PMID: 27570653 PMCID: PMC5001762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Hundreds of biological and biomedical ontologies have been developed to support data standardization, integration and analysis. Although ontologies are typically developed for community usage, community efforts in ontology development are limited. To support ontology visualization, distribution, and community-based annotation and development, we have developed Ontokiwi, an ontology extension to the MediaWiki software. Ontokiwi displays hierarchical classes and ontological axioms. Ontology classes and axioms can be edited and added using Ontokiwi form or MediaWiki source editor. Ontokiwi also inherits MediaWiki features such as Wikitext editing and version control. Based on the Ontokiwi/MediaWiki software package, we have developed Ontobedia, which targets to support community-based development and annotations of biological and biomedical ontologies. As demonstrations, we have loaded the Ontology of Adverse Events (OAE) and the Cell Line Ontology (CLO) into Ontobedia. Our studies showed that Ontobedia was able to achieve expected Ontokiwi features.
Collapse
Affiliation(s)
- Edison Ong
- University of Michigan Medical School, Ann Arbor, MI
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, MI
| |
Collapse
|
81
|
Diehl AD, Meehan TF, Bradford YM, Brush MH, Dahdul WM, Dougall DS, He Y, Osumi-Sutherland D, Ruttenberg A, Sarntivijai S, Van Slyke CE, Vasilevsky NA, Haendel MA, Blake JA, Mungall CJ. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability. J Biomed Semantics 2016; 7:44. [PMID: 27377652 PMCID: PMC4932724 DOI: 10.1186/s13326-016-0088-7] [Citation(s) in RCA: 172] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 06/23/2016] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND The Cell Ontology (CL) is an OBO Foundry candidate ontology covering the domain of canonical, natural biological cell types. Since its inception in 2005, the CL has undergone multiple rounds of revision and expansion, most notably in its representation of hematopoietic cells. For in vivo cells, the CL focuses on vertebrates but provides general classes that can be used for other metazoans, which can be subtyped in species-specific ontologies. CONSTRUCTION AND CONTENT Recent work on the CL has focused on extending the representation of various cell types, and developing new modules in the CL itself, and in related ontologies in coordination with the CL. For example, the Kidney and Urinary Pathway Ontology was used as a template to populate the CL with additional cell types. In addition, subtypes of the class 'cell in vitro' have received improved definitions and labels to provide for modularity with the representation of cells in the Cell Line Ontology and Reagent Ontology. Recent changes in the ontology development methodology for CL include a switch from OBO to OWL for the primary encoding of the ontology, and an increasing reliance on logical definitions for improved reasoning. UTILITY AND DISCUSSION The CL is now mandated as a metadata standard for large functional genomics and transcriptomics projects, and is used extensively for annotation, querying, and analyses of cell type specific data in sequencing consortia such as FANTOM5 and ENCODE, as well as for the NIAID ImmPort database and the Cell Image Library. The CL is also a vital component used in the modular construction of other biomedical ontologies-for example, the Gene Ontology and the cross-species anatomy ontology, Uberon, use CL to support the consistent representation of cell types across different levels of anatomical granularity, such as tissues and organs. CONCLUSIONS The ongoing improvements to the CL make it a valuable resource to both the OBO Foundry community and the wider scientific community, and we continue to experience increased interest in the CL both among developers and within the user community.
Collapse
Affiliation(s)
- Alexander D. Diehl
- />Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203 USA
| | - Terrence F. Meehan
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Yvonne M. Bradford
- />ZFIN, the Zebrafish Model Organism Database, 5291 University of Oregon, Eugene, OR 97403 USA
| | - Matthew H. Brush
- />Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon 97239 USA
| | - Wasila M. Dahdul
- />Department of Biology, University of South Dakota, Vermillion, SD 57069 USA
- />National Evolutionary Synthesis Center, Durham, NC 27705 USA
| | - David S. Dougall
- />Southwestern Medical Center, University of Texas, Dallas, TX 75235 USA
| | - Yongqun He
- />Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109 USA
| | - David Osumi-Sutherland
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Alan Ruttenberg
- />Oral Diagnostics Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14210 USA
| | - Sirarat Sarntivijai
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Ceri E. Van Slyke
- />ZFIN, the Zebrafish Model Organism Database, 5291 University of Oregon, Eugene, OR 97403 USA
| | - Nicole A. Vasilevsky
- />Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon 97239 USA
| | - Melissa A. Haendel
- />Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon 97239 USA
| | | | | |
Collapse
|
82
|
Sarntivijai S, Vasant D, Jupp S, Saunders G, Bento AP, Gonzalez D, Betts J, Hasan S, Koscielny G, Dunham I, Parkinson H, Malone J. Linking rare and common disease: mapping clinical disease-phenotypes to ontologies in therapeutic target validation. J Biomed Semantics 2016; 7:8. [PMID: 27011785 PMCID: PMC4804633 DOI: 10.1186/s13326-016-0051-7] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Accepted: 02/02/2016] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The Centre for Therapeutic Target Validation (CTTV - https://www.targetvalidation.org/) was established to generate therapeutic target evidence from genome-scale experiments and analyses. CTTV aims to support the validity of therapeutic targets by integrating existing and newly-generated data. Data integration has been achieved in some resources by mapping metadata such as disease and phenotypes to the Experimental Factor Ontology (EFO). Additionally, the relationship between ontology descriptions of rare and common diseases and their phenotypes can offer insights into shared biological mechanisms and potential drug targets. Ontologies are not ideal for representing the sometimes associated type relationship required. This work addresses two challenges; annotation of diverse big data, and representation of complex, sometimes associated relationships between concepts. METHODS Semantic mapping uses a combination of custom scripting, our annotation tool 'Zooma', and expert curation. Disease-phenotype associations were generated using literature mining on Europe PubMed Central abstracts, which were manually verified by experts for validity. Representation of the disease-phenotype association was achieved by the Ontology of Biomedical AssociatioN (OBAN), a generic association representation model. OBAN represents associations between a subject and object i.e., disease and its associated phenotypes and the source of evidence for that association. The indirect disease-to-disease associations are exposed through shared phenotypes. This was applied to the use case of linking rare to common diseases at the CTTV. RESULTS EFO yields an average of over 80% of mapping coverage in all data sources. A 42% precision is obtained from the manual verification of the text-mined disease-phenotype associations. This results in 1452 and 2810 disease-phenotype pairs for IBD and autoimmune disease and contributes towards 11,338 rare diseases associations (merged with existing published work [Am J Hum Genet 97:111-24, 2015]). An OBAN result file is downloadable at http://sourceforge.net/p/efo/code/HEAD/tree/trunk/src/efoassociations/. Twenty common diseases are linked to 85 rare diseases by shared phenotypes. A generalizable OBAN model for association representation is presented in this study. CONCLUSIONS Here we present solutions to large-scale annotation-ontology mapping in the CTTV knowledge base, a process for disease-phenotype mining, and propose a generic association model, 'OBAN', as a means to integrate disease using shared phenotypes. AVAILABILITY EFO is released monthly and available for download at http://www.ebi.ac.uk/efo/.
Collapse
Affiliation(s)
- Sirarat Sarntivijai
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Drashtti Vasant
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Simon Jupp
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Gary Saunders
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - A Patrícia Bento
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Daniel Gonzalez
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Joanna Betts
- Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; GSK, Medicine Research Centre, Stevenage, SG1 2NY UK
| | - Samiul Hasan
- Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; GSK, Medicine Research Centre, Stevenage, SG1 2NY UK
| | - Gautier Koscielny
- Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; GSK, Medicine Research Centre, Stevenage, SG1 2NY UK
| | - Ian Dunham
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Helen Parkinson
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - James Malone
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| |
Collapse
|
83
|
He Y. Ontology-based Vaccine and Drug Adverse Event Representation and Theory-guided Systematic Causal Network Analysis toward Integrative Pharmacovigilance Research. ACTA ACUST UNITED AC 2016; 2:113-128. [PMID: 27458549 DOI: 10.1007/s40495-016-0055-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Compared with controlled terminologies (e.g., MedDRA, CTCAE, and WHO-ART), the community-based Ontology of AEs (OAE) has many advantages in adverse event (AE) classifications. The OAE-derived Ontology of Vaccine AEs (OVAE) and Ontology of Drug Neuropathy AEs (ODNAE) serve as AE knowledge bases and support data integration and analysis. The Immune Response Gene Network Theory explains molecular mechanisms of vaccine-related AEs. The OneNet Theory of Life treats the whole process of a life of an organism as a single complex and dynamic network (i.e., OneNet). A new "OneNet effectiveness" tenet is proposed here to expand the OneNet theory. Derived from the OneNet theory, the author hypothesizes that one human uses one single genotype-rooted mechanism to respond to different vaccinations and drug treatments, and experimentally identified mechanisms are manifestations of the OneNet blueprint mechanism under specific conditions. The theories and ontologies interact together as semantic frameworks to support integrative pharmacovigilance research.
Collapse
Affiliation(s)
- Yongqun He
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109, USA. Department of Microbiology and Immunology, University of Michigan Medical School, Ann Arbor, MI 48109, USA. Center for Computational Medicine and Biology, University of Michigan Medical School, Ann Arbor, MI 48109, USA. Comprehensive Cancer Center, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
84
|
Lin Y, Xiang Z, He Y. Ontology-based representation and analysis of host-Brucella interactions. J Biomed Semantics 2015; 6:37. [PMID: 26445639 PMCID: PMC4594885 DOI: 10.1186/s13326-015-0036-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2012] [Accepted: 09/23/2015] [Indexed: 11/26/2022] Open
Abstract
Background Biomedical ontologies are representations of classes of entities in the biomedical domain and how these classes are related in computer- and human-interpretable formats. Ontologies support data standardization and exchange and provide a basis for computer-assisted automated reasoning. IDOBRU is an ontology in the domain of Brucella and brucellosis. Brucella is a Gram-negative intracellular bacterium that causes brucellosis, the most common zoonotic disease in the world. In this study, IDOBRU is used as a platform to model and analyze how the hosts, especially host macrophages, interact with virulent Brucella strains or live attenuated Brucella vaccine strains. Such a study allows us to better integrate and understand intricate Brucella pathogenesis and host immunity mechanisms. Results Different levels of host-Brucella interactions based on different host cell types and Brucella strains were first defined ontologically. Three important processes of virulent Brucella interacting with host macrophages were represented: Brucella entry into macrophage, intracellular trafficking, and intracellular replication. Two Brucella pathogenesis mechanisms were ontologically represented: Brucella Type IV secretion system that supports intracellular trafficking and replication, and Brucella erythritol metabolism that participates in Brucella intracellular survival and pathogenesis. The host cell death pathway is critical to the outcome of host-Brucella interactions. For better survival and replication, virulent Brucella prevents macrophage cell death. However, live attenuated B. abortus vaccine strain RB51 induces caspase-2-mediated proinflammatory cell death. Brucella-associated cell death processes are represented in IDOBRU. The gene and protein information of 432 manually annotated Brucella virulence factors were represented using the Ontology of Genes and Genomes (OGG) and Protein Ontology (PRO), respectively. Seven inference rules were defined to capture the knowledge of host-Brucella interactions and implemented in IDOBRU. Current IDOBRU includes 3611 ontology terms. SPARQL queries identified many results that are critical to the host-Brucella interactions. For example, out of 269 protein virulence factors related to macrophage-Brucella interactions, 81 are critical to Brucella intracellular replication inside macrophages. A SPARQL query also identified 11 biological processes important for Brucella virulence. Conclusions To systematically represent and analyze fundamental host-pathogen interaction mechanisms, we provided for the first time comprehensive ontological modeling of host-pathogen interactions using Brucella as the pathogen model. The methods and ontology representations used in our study are generic and can be broadened to study the interactions between hosts and other pathogens. Electronic supplementary material The online version of this article (doi:10.1186/s13326-015-0036-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yu Lin
- Unit of Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, 1150 W. Medical Center Dr, Ann Arbor, MI 48109 USA
| | - Zuoshuang Xiang
- Unit of Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, 1150 W. Medical Center Dr, Ann Arbor, MI 48109 USA
| | - Yongqun He
- Unit of Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, 1150 W. Medical Center Dr, Ann Arbor, MI 48109 USA
| |
Collapse
|
85
|
Horvatovich P, Lundberg EK, Chen YJ, Sung TY, He F, Nice EC, Goode RJ, Yu S, Ranganathan S, Baker MS, Domont GB, Velasquez E, Li D, Liu S, Wang Q, He QY, Menon R, Guan Y, Corrales FJ, Segura V, Casal JI, Pascual-Montano A, Albar JP, Fuentes M, Gonzalez-Gonzalez M, Diez P, Ibarrola N, Degano RM, Mohammed Y, Borchers CH, Urbani A, Soggiu A, Yamamoto T, Salekdeh GH, Archakov A, Ponomarenko E, Lisitsa A, Lichti CF, Mostovenko E, Kroes RA, Rezeli M, Végvári Á, Fehniger TE, Bischoff R, Vizcaíno JA, Deutsch EW, Lane L, Nilsson CL, Marko-Varga G, Omenn GS, Jeong SK, Lim JS, Paik YK, Hancock WS. Quest for Missing Proteins: Update 2015 on Chromosome-Centric Human Proteome Project. J Proteome Res 2015; 14:3415-3431. [PMID: 26076068 DOI: 10.1021/pr5013009] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
This paper summarizes the recent activities of the Chromosome-Centric Human Proteome Project (C-HPP) consortium, which develops new technologies to identify yet-to-be annotated proteins (termed "missing proteins") in biological samples that lack sufficient experimental evidence at the protein level for confident protein identification. The C-HPP also aims to identify new protein forms that may be caused by genetic variability, post-translational modifications, and alternative splicing. Proteogenomic data integration forms the basis of the C-HPP's activities; therefore, we have summarized some of the key approaches and their roles in the project. We present new analytical technologies that improve the chemical space and lower detection limits coupled to bioinformatics tools and some publicly available resources that can be used to improve data analysis or support the development of analytical assays. Most of this paper's content has been compiled from posters, slides, and discussions presented in the series of C-HPP workshops held during 2014. All data (posters, presentations) used are available at the C-HPP Wiki (http://c-hpp.webhosting.rug.nl/) and in the Supporting Information.
Collapse
Affiliation(s)
- Péter Horvatovich
- Analytical Biochemistry, Department of Pharmacy, University of Groningen , A. Deusinglaan 1, 9713 AV Groningen, The Netherlands
| | - Emma K Lundberg
- Science for Life Laboratory, KTH - Royal Institute of Technology , SE-171 21 Stockholm, Sweden
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica , 128 Academia Road Sec. 2, Taipei 115, Taiwan
| | - Ting-Yi Sung
- Institute of Information Science, Academia Sinica , 128 Academia Road Sec. 2, Taipei 115, Taiwan
| | - Fuchu He
- The State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine , No. 27 Taiping Road, Haidian District, Beijing 100850, China
| | - Edouard C Nice
- Department of Biochemistry and Molecular Biology, Monash University , Clayton, Victoria 3800, Australia
| | - Robert J Goode
- Department of Biochemistry and Molecular Biology, Monash University , Clayton, Victoria 3800, Australia
| | - Simon Yu
- Department of Biochemistry and Molecular Biology, Monash University , Clayton, Victoria 3800, Australia
| | - Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence in Bioinformatics, Macquarie University , Sydney, New South Wales 2109, Australia
| | - Mark S Baker
- Australian School of Advanced Medicine, Macquarie University , Sydney, NSW 2109, Australia
| | - Gilberto B Domont
- Proteomics Unit, Institute of Chemistry, Federal University of Rio de Janeiro , Cidade Universitária, Av Athos da Silveira Ramos 149, CT-A542, 21941-909 Rio de Janeriro, Rj, Brazil
| | - Erika Velasquez
- Proteomics Unit, Institute of Chemistry, Federal University of Rio de Janeiro , Cidade Universitária, Av Athos da Silveira Ramos 149, CT-A542, 21941-909 Rio de Janeriro, Rj, Brazil
| | - Dong Li
- The State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine , No. 27 Taiping Road, Haidian District, Beijing 100850, China
| | - Siqi Liu
- Beijing Institute of Genomics and BGI Shenzhen , No. 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- BGI Shenzhen , Beishan Road, Yantian District, Shenzhen, 518083, China
| | - Quanhui Wang
- Beijing Institute of Genomics and BGI Shenzhen , No. 1 Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Qing-Yu He
- ■ Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, College of Life Science and Technology, Jinan University , Guangzhou 510632, China
| | - Rajasree Menon
- Department of Computational Medicine & Bioinformatics, University of Michigan , 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States
| | - Yuanfang Guan
- Departments of Computational Medicine & Bioinformatics and Computer Sciences, University of Michigan , 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States
| | - Fernando J Corrales
- ProteoRed-ISCIII, Biomolecular and Bioinformatics Resources Platform (PRB2), Spanish Consortium of C-HPP (Chr-16), CIMA, University of Navarra, 31008 Pamplona, Spain
- Chr16 SpHPP Consortium , CIMA, University of Navarra, 31008 Pamplona, Spain
| | - Victor Segura
- ProteoRed-ISCIII, Biomolecular and Bioinformatics Resources Platform (PRB2), Spanish Consortium of C-HPP (Chr-16), CIMA, University of Navarra, 31008 Pamplona, Spain
- Chr16 SpHPP Consortium , CIMA, University of Navarra, 31008 Pamplona, Spain
| | - J Ignacio Casal
- Department of Cellular and Molecular Medicine, Centro de Investigaciones Biológicas (CIB-CSIC) , 28040 Madrid, Spain
| | | | - Juan P Albar
- Centro Nacional de Biotecnologia (CNB-CSIC) , Cantoblanco, 28049 Madrid, Spain
| | - Manuel Fuentes
- Cancer Research Center. Proteomics Unit and General Service of Cytometry, Department of Medicine, University of Salmanca-CSIC , IBSAL, Campus Miguel de Unamuno s/n, 37007 Salamanca, Spain
| | - Maria Gonzalez-Gonzalez
- Cancer Research Center. Proteomics Unit and General Service of Cytometry, Department of Medicine, University of Salmanca-CSIC , IBSAL, Campus Miguel de Unamuno s/n, 37007 Salamanca, Spain
| | - Paula Diez
- Cancer Research Center. Proteomics Unit and General Service of Cytometry, Department of Medicine, University of Salmanca-CSIC , IBSAL, Campus Miguel de Unamuno s/n, 37007 Salamanca, Spain
| | - Nieves Ibarrola
- Cancer Research Center. Proteomics Unit and General Service of Cytometry, Department of Medicine, University of Salmanca-CSIC , IBSAL, Campus Miguel de Unamuno s/n, 37007 Salamanca, Spain
| | - Rosa M Degano
- Cancer Research Center. Proteomics Unit and General Service of Cytometry, Department of Medicine, University of Salmanca-CSIC , IBSAL, Campus Miguel de Unamuno s/n, 37007 Salamanca, Spain
| | - Yassene Mohammed
- University of Victoria -Genome British Columbia Proteomics Centre, Vancouver Island Technology Park, #3101-4464 Markham Street, Victoria, British Columbia V8Z 7X8, Canada
- Center for Proteomics and Metabolomics, Leiden University Medical Center , 2333 ZA Leiden, The Netherlands
| | - Christoph H Borchers
- University of Victoria -Genome British Columbia Proteomics Centre, Vancouver Island Technology Park, #3101-4464 Markham Street, Victoria, British Columbia V8Z 7X8, Canada
| | - Andrea Urbani
- Proteomics and Metabonomic, Laboratory, Fondazione Santa Lucia , Rome, Italy
- Department of Experimental Medicine and Surgery, University of Rome "Tor Vergata" , Rome, Italy
| | - Alessio Soggiu
- Department of Veterinary Science and Public Health (DIVET), University of Milano , via Celoria 10, 20133 Milano, Italy
| | - Tadashi Yamamoto
- Institute of Nephrology, Graduate School of Medical and Dental Sciences, Niigata University , Niigata, Japan
| | - Ghasem Hosseini Salekdeh
- Department of Molecular Systems Biology at Cell Science Research Center, Royan Institute for Stem Cell Biology and Technology, ACECR, Tehran, Iran
- Department of Systems Biology, Agricultural Biotechnology Research Institute of Iran, Karaj, Iran
| | | | | | - Andrey Lisitsa
- Orechovich Institute of Biomedical Chemistry , Moscow, Russia
| | - Cheryl F Lichti
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch , Galveston, Texas 77555-0617, United States
| | - Ekaterina Mostovenko
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch , Galveston, Texas 77555-0617, United States
| | - Roger A Kroes
- Falk Center for Molecular Therapeutics, Department of Biomedical Engineering, Northwestern University , 1801 Maple Ave., Suite 4300, Evanston, Illinois 60201, United States
| | - Melinda Rezeli
- Clinical Protein Science & Imaging, Department of Biomedical Engineering, Lund University , BMC D13, 221 84 Lund, Sweden
| | - Ákos Végvári
- Clinical Protein Science & Imaging, Department of Biomedical Engineering, Lund University , BMC D13, 221 84 Lund, Sweden
| | - Thomas E Fehniger
- Clinical Protein Science & Imaging, Department of Biomedical Engineering, Lund University , BMC D13, 221 84 Lund, Sweden
| | - Rainer Bischoff
- Analytical Biochemistry, Department of Pharmacy, University of Groningen , A. Deusinglaan 1, 9713 AV Groningen, The Netherlands
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, CB10 1SD, Hinxton, Cambridge, United Kingdom
| | - Eric W Deutsch
- Institute for Systems Biology , 401 Terry Avenue North, Seattle, Washington 98109, United States
| | - Lydie Lane
- SIB Swiss Institute of Bioinformatics , Geneva, Switzerland
- Department of Human Protein Science, Faculty of Medicine, University of Geneva , Geneva, Switzerland
| | - Carol L Nilsson
- Department of Pharmacology and Toxicology, The University of Texas Medical Branch , Galveston, Texas 77555-0617, United States
| | - György Marko-Varga
- Clinical Protein Science & Imaging, Department of Biomedical Engineering, Lund University , BMC D13, 221 84 Lund, Sweden
| | - Gilbert S Omenn
- Departments of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics and School of Public Health, University of Michigan , 100 Washtenaw Avenue, Ann Arbor, Michigan 48109-2218, United States
| | - Seul-Ki Jeong
- Departments of Integrated Omics for Biomedical Science & Biochemistry, College of Life Science and Technology, Yonsei Proteome Research Center, Yonsei University , Seoul, 120-749, Korea
| | - Jong-Sun Lim
- Departments of Integrated Omics for Biomedical Science & Biochemistry, College of Life Science and Technology, Yonsei Proteome Research Center, Yonsei University , Seoul, 120-749, Korea
| | - Young-Ki Paik
- Departments of Integrated Omics for Biomedical Science & Biochemistry, College of Life Science and Technology, Yonsei Proteome Research Center, Yonsei University , Seoul, 120-749, Korea
| | - William S Hancock
- The Barnett Institute of Chemical and Biological Analysis, Northeastern University , 140 The Fenway, Boston, Massachusetts 02115, United States
| |
Collapse
|
86
|
Xiang Z, Zheng J, Lin Y, He Y. Ontorat: automatic generation of new ontology terms, annotations, and axioms based on ontology design patterns. J Biomed Semantics 2015; 6:4. [PMID: 25785185 PMCID: PMC4362828 DOI: 10.1186/2041-1480-6-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Accepted: 12/25/2014] [Indexed: 11/10/2022] Open
Abstract
Background It is time-consuming to build an ontology with many terms and axioms. Thus it is desired to automate the process of ontology development. Ontology Design Patterns (ODPs) provide a reusable solution to solve a recurrent modeling problem in the context of ontology engineering. Because ontology terms often follow specific ODPs, the Ontology for Biomedical Investigations (OBI) developers proposed a Quick Term Templates (QTTs) process targeted at generating new ontology classes following the same pattern, using term templates in a spreadsheet format. Results Inspired by the ODPs and QTTs, the Ontorat web application is developed to automatically generate new ontology terms, annotations of terms, and logical axioms based on a specific ODP(s). The inputs of an Ontorat execution include axiom expression settings, an input data file, ID generation settings, and a target ontology (optional). The axiom expression settings can be saved as a predesigned Ontorat setting format text file for reuse. The input data file is generated based on a template file created by a specific ODP (text or Excel format). Ontorat is an efficient tool for ontology expansion. Different use cases are described. For example, Ontorat was applied to automatically generate over 1,000 Japan RIKEN cell line cell terms with both logical axioms and rich annotation axioms in the Cell Line Ontology (CLO). Approximately 800 licensed animal vaccines were represented and annotated in the Vaccine Ontology (VO) by Ontorat. The OBI team used Ontorat to add assay and device terms required by ENCODE project. Ontorat was also used to add missing annotations to all existing Biobank specific terms in the Biobank Ontology. A collection of ODPs and templates with examples are provided on the Ontorat website and can be reused to facilitate ontology development. Conclusions With ever increasing ontology development and applications, Ontorat provides a timely platform for generating and annotating a large number of ontology terms by following design patterns. Availability: http://ontorat.hegroup.org/
Collapse
Affiliation(s)
| | - Jie Zheng
- University of Pennsylvania, Philadelphia, PA USA
| | - Yu Lin
- University of Michigan, Ann Arbor, MI USA
| | - Yongqun He
- University of Michigan, Ann Arbor, MI USA
| |
Collapse
|