51
|
Gallagher RV, Falster DS, Maitner BS, Salguero-Gómez R, Vandvik V, Pearse WD, Schneider FD, Kattge J, Poelen JH, Madin JS, Ankenbrand MJ, Penone C, Feng X, Adams VM, Alroy J, Andrew SC, Balk MA, Bland LM, Boyle BL, Bravo-Avila CH, Brennan I, Carthey AJR, Catullo R, Cavazos BR, Conde DA, Chown SL, Fadrique B, Gibb H, Halbritter AH, Hammock J, Hogan JA, Holewa H, Hope M, Iversen CM, Jochum M, Kearney M, Keller A, Mabee P, Manning P, McCormack L, Michaletz ST, Park DS, Perez TM, Pineda-Munoz S, Ray CA, Rossetto M, Sauquet H, Sparrow B, Spasojevic MJ, Telford RJ, Tobias JA, Violle C, Walls R, Weiss KCB, Westoby M, Wright IJ, Enquist BJ. Open Science principles for accelerating trait-based science across the Tree of Life. Nat Ecol Evol 2020; 4:294-303. [PMID: 32066887 DOI: 10.1038/s41559-020-1109-6] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 01/10/2020] [Indexed: 01/22/2023]
Abstract
Synthesizing trait observations and knowledge across the Tree of Life remains a grand challenge for biodiversity science. Species traits are widely used in ecological and evolutionary science, and new data and methods have proliferated rapidly. Yet accessing and integrating disparate data sources remains a considerable challenge, slowing progress toward a global synthesis to integrate trait data across organisms. Trait science needs a vision for achieving global integration across all organisms. Here, we outline how the adoption of key Open Science principles-open data, open source and open methods-is transforming trait science, increasing transparency, democratizing access and accelerating global synthesis. To enhance widespread adoption of these principles, we introduce the Open Traits Network (OTN), a global, decentralized community welcoming all researchers and institutions pursuing the collaborative goal of standardizing and integrating trait data across organisms. We demonstrate how adherence to Open Science principles is key to the OTN community and outline five activities that can accelerate the synthesis of trait data across the Tree of Life, thereby facilitating rapid advances to address scientific inquiries and environmental issues. Lessons learned along the path to a global synthesis of trait data will provide a framework for addressing similarly complex data science and informatics challenges.
Collapse
Affiliation(s)
- Rachael V Gallagher
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia.
| | - Daniel S Falster
- Evolution and Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Brian S Maitner
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Roberto Salguero-Gómez
- Department of Zoology, Oxford University, Oxford, UK.,Centre for Biodiversity and Conservation Science, University of Queensland, Brisbane, Queensland, Australia.,Evolutionary Demography Laboratory, Max Plank Institute for Demographic Research, Rostock, Germany
| | - Vigdis Vandvik
- Department of Biological Sciences, University of Bergen, Bergen, Norway.,Bjerknes Centre for Climate Research, University of Bergen, Bergen, Norway
| | - William D Pearse
- Ecology Center and Department of Biology, Utah State University, Logan, UT, USA
| | | | - Jens Kattge
- Max Planck Institute for Biogeochemistry, Jena, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| | | | - Joshua S Madin
- Hawai'i Institute of Marine Biology, University of Hawai'i at Manoa, Manoa, HI, USA
| | - Markus J Ankenbrand
- Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany.,Center for Computational and Theoretical Biology, Biocenter, University of Wuerzburg, Wuerzburg, Germany.,Comprehensive Heart Failure Center, University Hospital Wuerzburg, Wuerzburg, Germany
| | - Caterina Penone
- Institute of Plant Sciences, University of Bern, Bern, Switzerland
| | - Xiao Feng
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Vanessa M Adams
- Discipline of Geography and Spatial Sciences, University of Tasmania, Hobart, Tasmania, Australia
| | - John Alroy
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Samuel C Andrew
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australian Capital Territory, Australia
| | - Meghan A Balk
- Bio5 Institute, University of Arizona, Tucson, AZ, USA
| | - Lucie M Bland
- School of Life and Environmental Sciences, Centre for Integrative Ecology, Deakin University, Geelong, Victoria, Australia
| | - Brad L Boyle
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Catherine H Bravo-Avila
- Department of Biology, University of Miami, Miami, FL, USA.,Fairchild Tropical Botanic Garden, Coral Gables, FL, USA
| | - Ian Brennan
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Alexandra J R Carthey
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Renee Catullo
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Brittany R Cavazos
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
| | - Dalia A Conde
- Species360 Conservation Science Alliance, Bloomington, MN, USA.,Interdisciplinary Center on Population Dynamics, University of Southern Denmark, Odense, Denmark.,Department of Biology, University of Southern Denmark, Odense, Denmark
| | - Steven L Chown
- School of Biological Sciences, Monash University, Melbourne, Victoria, Australia
| | - Belen Fadrique
- Department of Biology, University of Miami, Miami, FL, USA
| | - Heloise Gibb
- Department of Ecology, Environment and Evolution and Centre for Future Landscapes, La Trobe University, Melbourne, Victoria, Australia
| | - Aud H Halbritter
- Department of Biological Sciences, University of Bergen, Bergen, Norway.,Bjerknes Centre for Climate Research, University of Bergen, Bergen, Norway
| | - Jennifer Hammock
- National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - J Aaron Hogan
- International Center for Tropical Botany, Department of Biological Sciences, Florida International University, Miami, FL, USA
| | - Hamish Holewa
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australian Capital Territory, Australia
| | - Michael Hope
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australian Capital Territory, Australia
| | - Colleen M Iversen
- Climate Change Science Institute and Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Malte Jochum
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.,Institute of Plant Sciences, University of Bern, Bern, Switzerland.,Institute of Biology, Leipzig University, Leipzig, Germany
| | - Michael Kearney
- School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Alexander Keller
- Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany.,Center for Computational and Theoretical Biology, Biocenter, University of Wuerzburg, Wuerzburg, Germany
| | - Paula Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA
| | - Peter Manning
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
| | - Luke McCormack
- Center for Tree Science, The Morton Arboretum, Lisle, IL, USA
| | - Sean T Michaletz
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada
| | - Daniel S Park
- Department of Organismic and Evolutionary Biology and Harvard University Herbaria, Harvard University, Cambridge, MA, USA
| | - Timothy M Perez
- Department of Biology, University of Miami, Miami, FL, USA.,Fairchild Tropical Botanic Garden, Coral Gables, FL, USA
| | - Silvia Pineda-Munoz
- School of Biological Sciences and School of Earth & Atmospheric Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Courtenay A Ray
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Maurizio Rossetto
- National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, New South Wales, Australia.,Queensland Alliance of Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
| | - Hervé Sauquet
- Evolution and Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales, Australia.,National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, New South Wales, Australia.,Ecologie Systématique Evolution, Univ. Paris-Sud, CNRS, AgroParisTech, Universite Paris-Saclay, Orsay, France
| | - Benjamin Sparrow
- TERN / School of Biological Sciences, Faculty of Science, The University of Adelaide, Adelaide, South Australia, Australia
| | - Marko J Spasojevic
- Department of Evolution, Ecology, and Organismal Biology, University of California Riverside, Riverside, CA, USA
| | - Richard J Telford
- Department of Biological Sciences, University of Bergen, Bergen, Norway.,Bjerknes Centre for Climate Research, University of Bergen, Bergen, Norway
| | - Joseph A Tobias
- Department of Life Sciences, Imperial College London, London, UK
| | - Cyrille Violle
- CEFE, CNRS, Univ Montpellier, Université Paul Valéry Montpellier, Montpellier, France
| | | | | | - Mark Westoby
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Ian J Wright
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Brian J Enquist
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA.,Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
52
|
Wang Y, Fan X, Chen L, Chang EIC, Ananiadou S, Tsujii J, Xu Y. Mapping anatomical related entities to human body parts based on wikipedia in discharge summaries. BMC Bioinformatics 2019; 20:430. [PMID: 31419946 PMCID: PMC6697955 DOI: 10.1186/s12859-019-3005-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 07/23/2019] [Indexed: 11/16/2022] Open
Abstract
*: Background Consisting of dictated free-text documents such as discharge summaries, medical narratives are widely used in medical natural language processing. Relationships between anatomical entities and human body parts are crucial for building medical text mining applications. To achieve this, we establish a mapping system consisting of a Wikipedia-based scoring algorithm and a named entity normalization method (NEN). The mapping system makes full use of information available on Wikipedia, which is a comprehensive Internet medical knowledge base. We also built a new ontology, Tree of Human Body Parts (THBP), from core anatomical parts by referring to anatomical experts and Unified Medical Language Systems (UMLS) to make the mapping system efficacious for clinical treatments. *: Result The gold standard is derived from 50 discharge summaries from our previous work, in which 2,224 anatomical entities are included. The F1-measure of the baseline system is 70.20%, while our algorithm based on Wikipedia achieves 86.67% with the assistance of NEN. *: Conclusions We construct a framework to map anatomical entities to THBP ontology using normalization and a scoring algorithm based on Wikipedia. The proposed framework is proven to be much more effective and efficient than the main baseline system.
Collapse
Affiliation(s)
- Yipei Wang
- State Key Laboratory of Software Development Environment and Key Laboratory of Biomechanics and Mechanobiology of Ministry of Education and Research Institute of Beihang University in Shenzhen, Beijing Advanced Innovation Center for Biomedical Engineering, Beihang University, Xueyuan Road No.37, Beijing, 100191 China
| | - Xingyu Fan
- Bioengineering College of Chongqing University, Shazheng Street No. 174, Chongqing, 400044 China
| | - Luoxin Chen
- State Key Laboratory of Software Development Environment and Key Laboratory of Biomechanics and Mechanobiology of Ministry of Education and Research Institute of Beihang University in Shenzhen, Beijing Advanced Innovation Center for Biomedical Engineering, Beihang University, Xueyuan Road No.37, Beijing, 100191 China
| | | | - Sophia Ananiadou
- The National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
| | - Junichi Tsujii
- The National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
- Artificial Intelligence Research Center (AIRC), Tokyo, Japan
| | - Yan Xu
- State Key Laboratory of Software Development Environment and Key Laboratory of Biomechanics and Mechanobiology of Ministry of Education and Research Institute of Beihang University in Shenzhen, Beijing Advanced Innovation Center for Biomedical Engineering, Beihang University, Xueyuan Road No.37, Beijing, 100191 China
- Microsoft Research, Danling Street No. 5, Beijing, 100080 China
| |
Collapse
|
53
|
Prieto-González D, Castilla-Rodríguez I, González E, Couce ML. Towards the automated economic assessment of newborn screening for rare diseases. J Biomed Inform 2019; 95:103216. [PMID: 31128259 DOI: 10.1016/j.jbi.2019.103216] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 05/17/2019] [Accepted: 05/18/2019] [Indexed: 12/14/2022]
Abstract
OBJECTIVE Economic assessments of newborn screening programs for rare diseases involve the use of models and require huge efforts to synthesize information from different sources. Sharing and automatically or semi-automatically reusing this information for new assessments would be desirable, but it is not possible nowadays due to the lack of suitable tools. MATERIAL AND METHODS We designed and implemented the Rare Diseases Ontology for Simulation (RaDiOS) after performing two reviews, and critically appraising the existing data repositories on rare diseases. The first review involved previous published economic assessments, and served to identify the main parameters required to model newborn screening. The second review aimed at locating existing data repositories potentially available to inform these parameters. RESULTS We found key model parameters on epidemiology, screening methods, diagnose methods, pathogenesis, treatment and follow-up tests. We also identified seven data repositories directly related to rare diseases. None of such repositories was well-suited for the automated generation of simulation models. We incorporated the identified parameters as structured classes and properties of the new ontology (RaDiOS). We carefully set the relationships among the parameters so to allow automated inference from the ontology. CONCLUSIONS RaDiOS is an ontology that serves as a data repository to automatically build simulation models for the economic assessment of newborn screening for rare diseases.
Collapse
Affiliation(s)
- David Prieto-González
- Departamento de Ingeniería Informática y de Sistemas, Universidad de La Laguna, Avda. Astrofísico Fco. Sánchez s/n, 38200, AP 456., La Laguna, Canary Islands, Spain
| | - Iván Castilla-Rodríguez
- Departamento de Ingeniería Informática y de Sistemas, Universidad de La Laguna, Avda. Astrofísico Fco. Sánchez s/n, 38200, AP 456., La Laguna, Canary Islands, Spain; Spanish Network of Health Services Research for Chronic Diseases (REDISSEC), Tenerife, Spain.
| | - Evelio González
- Departamento de Ingeniería Informática y de Sistemas, Universidad de La Laguna, Avda. Astrofísico Fco. Sánchez s/n, 38200, AP 456., La Laguna, Canary Islands, Spain
| | - María L Couce
- Unidad de Diagnóstico y Tratamiento de Enfermedades Metabólicas Congénitas, Servicio de Neonatología, Hospital Clínico Universitario de Santiago, Departamento de Pediatría, IDIS, CIBERER, Santiago de Compostela, La Coruña, Spain
| |
Collapse
|
54
|
Couto FM, Lamurias A. MER: a shell script and annotation server for minimal named entity recognition and linking. J Cheminform 2018; 10:58. [PMID: 30519990 PMCID: PMC6755715 DOI: 10.1186/s13321-018-0312-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Accepted: 11/30/2018] [Indexed: 01/17/2023] Open
Abstract
Named-entity recognition aims at identifying the fragments of text that mention entities of interest, that afterwards could be linked to a knowledge base where those entities are described. This manuscript presents our minimal named-entity recognition and linking tool (MER), designed with flexibility, autonomy and efficiency in mind. To annotate a given text, MER only requires: (1) a lexicon (text file) with the list of terms representing the entities of interest; (2) optionally a tab-separated values file with a link for each term; (3) and a Unix shell. Alternatively, the user can provide an ontology from where MER will automatically generate the lexicon and links files. The efficiency of MER derives from exploring the high performance and reliability of the text processing command-line tools grep and awk, and a novel inverted recognition technique. MER was deployed in a cloud infrastructure using multiple Virtual Machines to work as an annotation server and participate in the Technical Interoperability and Performance of annotation Servers task of BioCreative V.5. The results show that our solution processed each document (text retrieval and annotation) in less than 3 s on average without using any type of cache. MER was also compared to a state-of-the-art dictionary lookup solution obtaining competitive results not only in computational performance but also in precision and recall. MER is publicly available in a GitHub repository ( https://github.com/lasigeBioTM/MER ) and through a RESTful Web service ( http://labs.fc.ul.pt/mer/ ).
Collapse
Affiliation(s)
- Francisco M. Couto
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, 1749 016 Lisbon, Portugal
| | - Andre Lamurias
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, 1749 016 Lisbon, Portugal
- Faculty of Sciences, BioISI - Biosystems and Integrative Sciences Institute, University of Lisboa, Campo Grande, C8 bdg, 1749 016 Lisbon, Portugal
| |
Collapse
|
55
|
Affiliation(s)
- Melissa A Haendel
- From the Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, and the Linus Pauling Institute and the Center for Genome Research and Biocomputing, Oregon State University, Corvallis (M.A.H.); Johns Hopkins University Schools of Medicine, Public Health, and Nursing, Baltimore (C.G.C.); and the Jackson Laboratory for Genomic Medicine and the Institute for Systems Genomics, University of Connecticut - both in Farmington (P.N.R.)
| | - Christopher G Chute
- From the Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, and the Linus Pauling Institute and the Center for Genome Research and Biocomputing, Oregon State University, Corvallis (M.A.H.); Johns Hopkins University Schools of Medicine, Public Health, and Nursing, Baltimore (C.G.C.); and the Jackson Laboratory for Genomic Medicine and the Institute for Systems Genomics, University of Connecticut - both in Farmington (P.N.R.)
| | - Peter N Robinson
- From the Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, and the Linus Pauling Institute and the Center for Genome Research and Biocomputing, Oregon State University, Corvallis (M.A.H.); Johns Hopkins University Schools of Medicine, Public Health, and Nursing, Baltimore (C.G.C.); and the Jackson Laboratory for Genomic Medicine and the Institute for Systems Genomics, University of Connecticut - both in Farmington (P.N.R.)
| |
Collapse
|
56
|
Jackson LM, Fernando PC, Hanscom JS, Balhoff JP, Mabee PM. Automated Integration of Trees and Traits: A Case Study Using Paired Fin Loss Across Teleost Fishes. Syst Biol 2018; 67:559-575. [PMID: 29325126 PMCID: PMC6005059 DOI: 10.1093/sysbio/syx098] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Revised: 12/15/2017] [Accepted: 12/21/2017] [Indexed: 11/24/2022] Open
Abstract
Data synthesis required for large-scale macroevolutionary studies is challenging with the current tools available for integration. Using a classic question regarding the frequency of paired fin loss in teleost fishes as a case study, we sought to create automated methods to facilitate the integration of broad-scale trait data with a sizable species-level phylogeny. Similar to the evolutionary pattern previously described for limbs, pelvic and pectoral fin reduction and loss are thought to have occurred independently multiple times in the evolution of fishes. We developed a bioinformatics pipeline to identify the presence and absence of pectoral and pelvic fins of 12,582 species. To do this, we integrated a synthetic morphological supermatrix of phenotypic data for the pectoral and pelvic fins for teleost fishes from the Phenoscape Knowledgebase (two presence/absence characters for 3047 taxa) with a species-level tree for teleost fishes from the Open Tree of Life project (38,419 species). The integration method detailed herein harnessed a new combined approach by utilizing data based on ontological inference, as well as phylogenetic propagation, to reduce overall data loss. Using inference enabled by ontology-based annotations, missing data were reduced from 98.0% to 85.9%, and further reduced to 34.8% by phylogenetic data propagation. These methods allowed us to extend the data to an additional 11,293 species for a total of 12,582 species with trait data. The pectoral fin appears to have been independently lost in a minimum of 19 lineages and the pelvic fin in 48. Though interpretation is limited by lack of phylogenetic resolution at the species level, it appears that following loss, both pectoral and pelvic fins were regained several (3) to many (14) times respectively. Focused investigation into putative regains of the pectoral fin, all within one clade (Anguilliformes), showed that the pectoral fin was regained at least twice following loss. Overall, this study points to specific teleost clades where strategic phylogenetic resolution and genetic investigation will be necessary to understand the pattern and frequency of pectoral fin reversals.
Collapse
Affiliation(s)
- Laura M Jackson
- Department of Biology, University of South Dakota, 414 East Clark St., Vermillion, SD 57069, USA
| | - Pasan C Fernando
- Department of Biology, University of South Dakota, 414 East Clark St., Vermillion, SD 57069, USA
| | - Josh S Hanscom
- Department of Biology, University of South Dakota, 414 East Clark St., Vermillion, SD 57069, USA
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, 100 Europa Drive Suite 540, Chapel Hill, NC 27517, USA
| | - Paula M Mabee
- Department of Biology, University of South Dakota, 414 East Clark St., Vermillion, SD 57069, USA
| |
Collapse
|
57
|
Abstract
The Cellosaurus is a knowledge resource on cell lines. It aims to describe all cell lines used in biomedical research. Its scope encompasses both vertebrates and invertebrates. Currently, information for >100,000 cell lines is provided. For each cell line, it provides a wealth of information, cross-references, and literature citations. The Cellosaurus is available on the ExPASy server (https://web.expasy.org/cellosaurus/) and can be downloaded in a variety of formats. Among its many uses, the Cellosaurus is a key resource to help researchers identify potentially contaminated/misidentified cell lines, thus contributing to improving the quality of research in the life sciences.
Collapse
Affiliation(s)
- Amos Bairoch
- Computer and Laboratory Investigation of Proteins of Human Origin Group, Faculty of Medicine, Swiss Institute of Bioinformatics, University of Geneva, Geneva 4, Switzerland
| |
Collapse
|
58
|
Roux J, Liu J, Robinson-Rechavi M. Selective Constraints on Coding Sequences of Nervous System Genes Are a Major Determinant of Duplicate Gene Retention in Vertebrates. Mol Biol Evol 2018; 34:2773-2791. [PMID: 28981708 PMCID: PMC5850798 DOI: 10.1093/molbev/msx199] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
The evolutionary history of vertebrates is marked by three ancient whole-genome duplications: two successive rounds in the ancestor of vertebrates, and a third one specific to teleost fishes. Biased loss of most duplicates enriched the genome for specific genes, such as slow evolving genes, but this selective retention process is not well understood. To understand what drives the long-term preservation of duplicate genes, we characterized duplicated genes in terms of their expression patterns. We used a new method of expression enrichment analysis, TopAnat, applied to in situ hybridization data from thousands of genes from zebrafish and mouse. We showed that the presence of expression in the nervous system is a good predictor of a higher rate of retention of duplicate genes after whole-genome duplication. Further analyses suggest that purifying selection against the toxic effects of misfolded or misinteracting proteins, which is particularly strong in nonrenewing neural tissues, likely constrains the evolution of coding sequences of nervous system genes, leading indirectly to the preservation of duplicate genes after whole-genome duplication. Whole-genome duplications thus greatly contributed to the expansion of the toolkit of genes available for the evolution of profound novelties of the nervous system at the base of the vertebrate radiation.
Collapse
Affiliation(s)
- Julien Roux
- Département d'Ecologie et d'Evolution, Université de Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Jialin Liu
- Département d'Ecologie et d'Evolution, Université de Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Département d'Ecologie et d'Evolution, Université de Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
59
|
Osumi-Sutherland DJ, Ponta E, Courtot M, Parkinson H, Badi L. Using OWL reasoning to support the generation of novel gene sets for enrichment analysis. J Biomed Semantics 2018; 9:10. [PMID: 29444698 PMCID: PMC5813370 DOI: 10.1186/s13326-018-0175-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Accepted: 01/03/2018] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND The Gene Ontology (GO) consists of over 40,000 terms for biological processes, cell components and gene product activities linked into a graph structure by over 90,000 relationships. It has been used to annotate the functions and cellular locations of several million gene products. The graph structure is used by a variety of tools to group annotated genes into sets whose products share function or location. These gene sets are widely used to interpret the results of genomics experiments by assessing which sets are significantly over- or under-represented in results lists. F Hoffmann-La Roche Ltd. has developed a bespoke, manually maintained controlled vocabulary (RCV) for use in over-representation analysis. Many terms in this vocabulary group GO terms in novel ways that cannot easily be derived using the graph structure of the GO. For example, some RCV terms group GO terms by the cell, chemical or tissue type they refer to. Recent improvements in the content and formal structure of the GO make it possible to use logical queries in Web Ontology Language (OWL) to automatically map these cross-cutting classifications to sets of GO terms. We used this approach to automate mapping between RCV and GO, largely replacing the increasingly unsustainable manual mapping process. We then tested the utility of the resulting groupings for over-representation analysis. RESULTS We successfully mapped 85% of RCV terms to logical OWL definitions and showed that these could be used to recapitulate and extend manual mappings between RCV terms and the sets of GO terms subsumed by them. We also show that gene sets derived from the resulting GO terms sets can be used to detect the signatures of cell and tissue types in whole genome expression data. CONCLUSIONS The rich formal structure of the GO makes it possible to use reasoning to dynamically generate novel, biologically relevant groupings of GO terms. GO term groupings generated with this approach can be used in. over-representation analysis to detect cell and tissue type signatures in whole genome expression data.
Collapse
Affiliation(s)
- David J Osumi-Sutherland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK.
| | - Enrico Ponta
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, -4070, Basel, CH, Switzerland
| | - Melanie Courtot
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
| | - Helen Parkinson
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
| | - Laura Badi
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, -4070, Basel, CH, Switzerland
| |
Collapse
|
60
|
Chiu B, Pyysalo S, Vulić I, Korhonen A. Bio-SimVerb and Bio-SimLex: wide-coverage evaluation sets of word similarity in biomedicine. BMC Bioinformatics 2018; 19:33. [PMID: 29402212 PMCID: PMC5800055 DOI: 10.1186/s12859-018-2039-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 01/24/2018] [Indexed: 01/10/2023] Open
Abstract
Background Word representations support a variety of Natural Language Processing (NLP) tasks. The quality of these representations is typically assessed by comparing the distances in the induced vector spaces against human similarity judgements. Whereas comprehensive evaluation resources have recently been developed for the general domain, similar resources for biomedicine currently suffer from the lack of coverage, both in terms of word types included and with respect to the semantic distinctions. Notably, verbs have been excluded, although they are essential for the interpretation of biomedical language. Further, current resources do not discern between semantic similarity and semantic relatedness, although this has been proven as an important predictor of the usefulness of word representations and their performance in downstream applications. Results We present two novel comprehensive resources targeting the evaluation of word representations in biomedicine. These resources, Bio-SimVerb and Bio-SimLex, address the previously mentioned problems, and can be used for evaluations of verb and noun representations respectively. In our experiments, we have computed the Pearson’s correlation between performances on intrinsic and extrinsic tasks using twelve popular state-of-the-art representation models (e.g. word2vec models). The intrinsic–extrinsic correlations using our datasets are notably higher than with previous intrinsic evaluation benchmarks such as UMNSRS and MayoSRS. In addition, when evaluating representation models for their abilities to capture verb and noun semantics individually, we show a considerable variation between performances across all models. Conclusion Bio-SimVerb and Bio-SimLex enable intrinsic evaluation of word representations. This evaluation can serve as a predictor of performance on various downstream tasks in the biomedical domain. The results on Bio-SimVerb and Bio-SimLex using standard word representation models highlight the importance of developing dedicated evaluation resources for NLP in biomedicine for particular word classes (e.g. verbs). These are needed to identify the most accurate methods for learning class-specific representations. Bio-SimVerb and Bio-SimLex are publicly available.
Collapse
Affiliation(s)
- Billy Chiu
- Language Technology Laboratory, DTAL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK.
| | - Sampo Pyysalo
- Language Technology Laboratory, DTAL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK
| | - Ivan Vulić
- Language Technology Laboratory, DTAL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK
| | - Anna Korhonen
- Language Technology Laboratory, DTAL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK
| |
Collapse
|
61
|
Oliveira D, Pesquita C. Improving the interoperability of biomedical ontologies with compound alignments. J Biomed Semantics 2018; 9:1. [PMID: 29316968 PMCID: PMC5761129 DOI: 10.1186/s13326-017-0171-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Accepted: 12/21/2017] [Indexed: 12/29/2022] Open
Abstract
Background Ontologies are commonly used to annotate and help process life sciences data. Although their original goal is to facilitate integration and interoperability among heterogeneous data sources, when these sources are annotated with distinct ontologies, bridging this gap can be challenging. In the last decade, ontology matching systems have been evolving and are now capable of producing high-quality mappings for life sciences ontologies, usually limited to the equivalence between two ontologies. However, life sciences research is becoming increasingly transdisciplinary and integrative, fostering the need to develop matching strategies that are able to handle multiple ontologies and more complex relations between their concepts. Results We have developed ontology matching algorithms that are able to find compound mappings between multiple biomedical ontologies, in the form of ternary mappings, finding for instance that “aortic valve stenosis”(HP:0001650) is equivalent to the intersection between “aortic valve”(FMA:7236) and “constricted” (PATO:0001847). The algorithms take advantage of search space filtering based on partial mappings between ontology pairs, to be able to handle the increased computational demands. The evaluation of the algorithms has shown that they are able to produce meaningful results, with precision in the range of 60-92% for new mappings. The algorithms were also applied to the potential extension of logical definitions of the OBO and the matching of several plant-related ontologies. Conclusions This work is a first step towards finding more complex relations between multiple ontologies. The evaluation shows that the results produced are significant and that the algorithms could satisfy specific integration needs.
Collapse
Affiliation(s)
- Daniela Oliveira
- Insight Centre for Data Analytics, NUI Galway, Galway Business Park, Dangan, Galway, H91 AEX4, Ireland. .,LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal.
| | - Catia Pesquita
- LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| |
Collapse
|
62
|
Khan AM, Grant AH, Martinez A, Burns GAPC, Thatcher BS, Anekonda VT, Thompson BW, Roberts ZS, Moralejo DH, Blevins JE. Mapping Molecular Datasets Back to the Brain Regions They are Extracted from: Remembering the Native Countries of Hypothalamic Expatriates and Refugees. ADVANCES IN NEUROBIOLOGY 2018; 21:101-193. [PMID: 30334222 PMCID: PMC6310046 DOI: 10.1007/978-3-319-94593-4_6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This article focuses on approaches to link transcriptomic, proteomic, and peptidomic datasets mined from brain tissue to the original locations within the brain that they are derived from using digital atlas mapping techniques. We use, as an example, the transcriptomic, proteomic and peptidomic analyses conducted in the mammalian hypothalamus. Following a brief historical overview, we highlight studies that have mined biochemical and molecular information from the hypothalamus and then lay out a strategy for how these data can be linked spatially to the mapped locations in a canonical brain atlas where the data come from, thereby allowing researchers to integrate these data with other datasets across multiple scales. A key methodology that enables atlas-based mapping of extracted datasets-laser-capture microdissection-is discussed in detail, with a view of how this technology is a bridge between systems biology and systems neuroscience.
Collapse
Affiliation(s)
- Arshad M Khan
- UTEP Systems Neuroscience Laboratory, University of Texas at El Paso, El Paso, TX, USA.
- Department of Biological Sciences, University of Texas at El Paso, El Paso, TX, USA.
- Border Biomedical Research Center, University of Texas at El Paso, El Paso, TX, USA.
| | - Alice H Grant
- UTEP Systems Neuroscience Laboratory, University of Texas at El Paso, El Paso, TX, USA
- Department of Biological Sciences, University of Texas at El Paso, El Paso, TX, USA
- Graduate Program in Pathobiology, University of Texas at El Paso, El Paso, TX, USA
| | - Anais Martinez
- UTEP Systems Neuroscience Laboratory, University of Texas at El Paso, El Paso, TX, USA
- Department of Biological Sciences, University of Texas at El Paso, El Paso, TX, USA
- Graduate Program in Pathobiology, University of Texas at El Paso, El Paso, TX, USA
| | - Gully A P C Burns
- Information Sciences Institute, Viterbi School of Engineering, University of Southern California, Marina del Rey, CA, USA
| | - Brendan S Thatcher
- VA Puget Sound Health Care System, Office of Research and Development Medical Research Service, Department of Veterans Affairs Medical Center, Seattle, WA, USA
| | - Vishwanath T Anekonda
- VA Puget Sound Health Care System, Office of Research and Development Medical Research Service, Department of Veterans Affairs Medical Center, Seattle, WA, USA
| | - Benjamin W Thompson
- VA Puget Sound Health Care System, Office of Research and Development Medical Research Service, Department of Veterans Affairs Medical Center, Seattle, WA, USA
| | - Zachary S Roberts
- VA Puget Sound Health Care System, Office of Research and Development Medical Research Service, Department of Veterans Affairs Medical Center, Seattle, WA, USA
| | - Daniel H Moralejo
- Division of Neonatology, Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| | - James E Blevins
- VA Puget Sound Health Care System, Office of Research and Development Medical Research Service, Department of Veterans Affairs Medical Center, Seattle, WA, USA
- Division of Metabolism, Endocrinology, and Nutrition, Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA
| |
Collapse
|
63
|
Dahdul W, Manda P, Cui H, Balhoff JP, Dececchi TA, Ibrahim N, Lapp H, Vision T, Mabee PM. Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems. Database (Oxford) 2018; 2018:5255130. [PMID: 30576485 PMCID: PMC6301375 DOI: 10.1093/database/bay110] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 08/22/2018] [Accepted: 09/24/2018] [Indexed: 11/12/2022]
Abstract
Natural language descriptions of organismal phenotypes, a principal object of study in biology, are abundant in the biological literature. Expressing these phenotypes as logical statements using ontologies would enable large-scale analysis on phenotypic information from diverse systems. However, considerable human effort is required to make these phenotype descriptions amenable to machine reasoning. Natural language processing tools have been developed to facilitate this task, and the training and evaluation of these tools depend on the availability of high quality, manually annotated gold standard data sets. We describe the development of an expert-curated gold standard data set of annotated phenotypes for evolutionary biology. The gold standard was developed for the curation of complex comparative phenotypes for the Phenoscape project. It was created by consensus among three curators and consists of entity-quality expressions of varying complexity. We use the gold standard to evaluate annotations created by human curators and those generated by the Semantic CharaParser tool. Using four annotation accuracy metrics that can account for any level of relationship between terms from two phenotype annotations, we found that machine-human consistency, or similarity, was significantly lower than inter-curator (human-human) consistency. Surprisingly, allowing curatorsaccess to external information did not significantly increase the similarity of their annotations to the gold standard or have a significant effect on inter-curator consistency. We found that the similarity of machine annotations to the gold standard increased after new relevant ontology terms had been added. Evaluation by the original authors of the character descriptions indicated that the gold standard annotations came closer to representing their intended meaning than did either the curator or machine annotations. These findings point toward ways to better design software to augment human curators and the use of the gold standard corpus will allow training and assessment of new tools to improve phenotype annotation accuracy at scale.
Collapse
Affiliation(s)
| | - Prashanti Manda
- University of North Carolina at Greensboro, Greensboro, NC, USA
| | - Hong Cui
- University of Arizona, Tucson, AZ, USA
| | - James P Balhoff
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - T Alexander Dececchi
- University of South Dakota, Vermillion, SD, USA
- Current affiliation: University of Pittsburgh at Johnstown, Johnstown, PA, USA
| | - Nizar Ibrahim
- University of Chicago, Chicago, IL, USA
- Current affiliation: University of Detroit Mercy, Detroit, MI, USA & University of Portsmouth, Portsmouth, UK
| | | | - Todd Vision
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | |
Collapse
|
64
|
Lin Y, Mehta S, Küçük-McGinty H, Turner JP, Vidovic D, Forlin M, Koleti A, Nguyen DT, Jensen LJ, Guha R, Mathias SL, Ursu O, Stathias V, Duan J, Nabizadeh N, Chung C, Mader C, Visser U, Yang JJ, Bologa CG, Oprea TI, Schürer SC. Drug target ontology to classify and integrate drug discovery data. J Biomed Semantics 2017; 8:50. [PMID: 29122012 PMCID: PMC5679337 DOI: 10.1186/s13326-017-0161-x] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 10/17/2017] [Indexed: 11/12/2022] Open
Abstract
Background One of the most successful approaches to develop new small molecule therapeutics has been to start from a validated druggable protein target. However, only a small subset of potentially druggable targets has attracted significant research and development resources. The Illuminating the Druggable Genome (IDG) project develops resources to catalyze the development of likely targetable, yet currently understudied prospective drug targets. A central component of the IDG program is a comprehensive knowledge resource of the druggable genome. Results As part of that effort, we have developed a framework to integrate, navigate, and analyze drug discovery data based on formalized and standardized classifications and annotations of druggable protein targets, the Drug Target Ontology (DTO). DTO was constructed by extensive curation and consolidation of various resources. DTO classifies the four major drug target protein families, GPCRs, kinases, ion channels and nuclear receptors, based on phylogenecity, function, target development level, disease association, tissue expression, chemical ligand and substrate characteristics, and target-family specific characteristics. The formal ontology was built using a new software tool to auto-generate most axioms from a database while supporting manual knowledge acquisition. A modular, hierarchical implementation facilitate ontology development and maintenance and makes use of various external ontologies, thus integrating the DTO into the ecosystem of biomedical ontologies. As a formal OWL-DL ontology, DTO contains asserted and inferred axioms. Modeling data from the Library of Integrated Network-based Cellular Signatures (LINCS) program illustrates the potential of DTO for contextual data integration and nuanced definition of important drug target characteristics. DTO has been implemented in the IDG user interface Portal, Pharos and the TIN-X explorer of protein target disease relationships. Conclusions DTO was built based on the need for a formal semantic model for druggable targets including various related information such as protein, gene, protein domain, protein structure, binding site, small molecule drug, mechanism of action, protein tissue localization, disease association, and many other types of information. DTO will further facilitate the otherwise challenging integration and formal linking to biological assays, phenotypes, disease models, drug poly-pharmacology, binding kinetics and many other processes, functions and qualities that are at the core of drug discovery. The first version of DTO is publically available via the website http://drugtargetontology.org/, Github (http://github.com/DrugTargetOntology/DTO), and the NCBO Bioportal (http://bioportal.bioontology.org/ontologies/DTO). The long-term goal of DTO is to provide such an integrative framework and to populate the ontology with this information as a community resource. Electronic supplementary material The online version of this article (10.1186/s13326-017-0161-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yu Lin
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Saurabh Mehta
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Applied Chemistry, Delhi Technological University, Delhi, India
| | - Hande Küçük-McGinty
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - John Paul Turner
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Dusica Vidovic
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Michele Forlin
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Amar Koleti
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Science, Rockville, MD, USA
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Rajarshi Guha
- National Center for Advancing Translational Science, Rockville, MD, USA
| | - Stephen L Mathias
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Oleg Ursu
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Jianbin Duan
- Center for Computational Science, University of Miami, Coral Gables, FL, USA.,Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - Nooshin Nabizadeh
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Caty Chung
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Christopher Mader
- Center for Computational Science, University of Miami, Coral Gables, FL, USA
| | - Ubbo Visser
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - Jeremy J Yang
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Cristian G Bologa
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Tudor I Oprea
- Department of Internal Medicine, Translational Informatics Division, University of New Mexico School of Medicine, Albuquerque, NM, USA.
| | - Stephan C Schürer
- Center for Computational Science, University of Miami, Coral Gables, FL, USA. .,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL, USA.
| |
Collapse
|
65
|
Becnel LB, Hastak S, Ver Hoef W, Milius RP, Slack M, Wold D, Glickman ML, Brodsky B, Jaffe C, Kush R, Helton E. BRIDG: a domain information model for translational and clinical protocol-driven research. J Am Med Inform Assoc 2017; 24:882-890. [PMID: 28339791 PMCID: PMC6259662 DOI: 10.1093/jamia/ocx004] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Revised: 12/21/2016] [Accepted: 01/05/2017] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND It is critical to integrate and analyze data from biological, translational, and clinical studies with data from health systems; however, electronic artifacts are stored in thousands of disparate systems that are often unable to readily exchange data. OBJECTIVE To facilitate meaningful data exchange, a model that presents a common understanding of biomedical research concepts and their relationships with health care semantics is required. The Biomedical Research Integrated Domain Group (BRIDG) domain information model fulfills this need. Software systems created from BRIDG have shared meaning "baked in," enabling interoperability among disparate systems. For nearly 10 years, the Clinical Data Standards Interchange Consortium, the National Cancer Institute, the US Food and Drug Administration, and Health Level 7 International have been key stakeholders in developing BRIDG. METHODS BRIDG is an open-source Unified Modeling Language-class model developed through use cases and harmonization with other models. RESULTS With its 4+ releases, BRIDG includes clinical and now translational research concepts in its Common, Protocol Representation, Study Conduct, Adverse Events, Regulatory, Statistical Analysis, Experiment, Biospecimen, and Molecular Biology subdomains. INTERPRETATION The model is a Clinical Data Standards Interchange Consortium, Health Level 7 International, and International Standards Organization standard that has been utilized in national and international standards-based software development projects. It will continue to mature and evolve in the areas of clinical imaging, pathology, ontology, and vocabulary support. BRIDG 4.1.1 and prior releases are freely available at https://bridgmodel.nci.nih.gov .
Collapse
Affiliation(s)
- Lauren B Becnel
- Clinical Data Interchange Standards Consortium, Austin, TX, USA
- Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | - MaryAnn Slack
- Food and Drug Administration Office of Strategic Programs, Silver Spring, MD, USA
| | - Diane Wold
- Clinical Data Interchange Standards Consortium, Austin, TX, USA
| | - Michael L Glickman
- Computer Network Architects Inc. and ISO/TC 215 Health Informatics, Rockville, MD, USA
| | - Boris Brodsky
- Food and Drug Administration Office of Strategic Programs, Silver Spring, MD, USA
| | - Charles Jaffe
- HL7 (Health Level 7 International), Ann Arbor, MI, USA
| | - Rebecca Kush
- Clinical Data Interchange Standards Consortium, Austin, TX, USA
| | | |
Collapse
|
66
|
Osumi-Sutherland D, Courtot M, Balhoff JP, Mungall C. Dead simple OWL design patterns. J Biomed Semantics 2017; 8:18. [PMID: 28583177 PMCID: PMC5460348 DOI: 10.1186/s13326-017-0126-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 03/29/2017] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Bio-ontologies typically require multiple axes of classification to support the needs of their users. Development of such ontologies can only be made scalable and sustainable by the use of inference to automate classification via consistent patterns of axiomatization. Many bio-ontologies originating in OBO or OWL follow this approach. These patterns need to be documented in a form that requires minimal expertise to understand and edit and that can be validated and applied using any of the various programmatic approaches to working with OWL ontologies. RESULTS Here we describe a system, Dead Simple OWL Design Patterns (DOS-DPs), which fulfills these requirements, illustrating the system with examples from the Gene Ontology. CONCLUSIONS The rapid adoption of DOS-DPs by multiple ontology development projects illustrates both the ease-of use and the pressing need for the simple design pattern system we have developed.
Collapse
Affiliation(s)
- David Osumi-Sutherland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD UK
| | - Melanie Courtot
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge, CB10 1SD UK
| | | | - Christopher Mungall
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, 94720 CA USA
| |
Collapse
|
67
|
Abstract
The principles of genetics apply across the entire tree of life. At the cellular level we share biological mechanisms with species from which we diverged millions, even billions of years ago. We can exploit this common ancestry to learn about health and disease, by analyzing DNA and protein sequences, but also through the observable outcomes of genetic differences, i.e. phenotypes. To solve challenging disease problems we need to unify the heterogeneous data that relates genomics to disease traits. Without a big-picture view of phenotypic data, many questions in genetics are difficult or impossible to answer. The Monarch Initiative (https://monarchinitiative.org) provides tools for genotype-phenotype analysis, genomic diagnostics, and precision medicine across broad areas of disease.
Collapse
|
68
|
Becnel LB, Ochsner SA, Darlington YF, McOwiti A, Kankanamge WH, Dehart M, Naumov A, McKenna NJ. Discovering relationships between nuclear receptor signaling pathways, genes, and tissues in Transcriptomine. Sci Signal 2017; 10:10/476/eaah6275. [DOI: 10.1126/scisignal.aah6275] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
69
|
Abstract
The overarching goal of the Gene Ontology (GO) Consortium is to provide researchers in biology and biomedicine with all current functional information concerning genes and the cellular context under which these occur. When the GO was started in the 1990s surprisingly little attention had been given to how functional information about genes was to be uniformly captured, structured in a computable form, and made accessible to biologists. Because knowledge of gene, protein, ncRNA, and molecular complex roles is continuously accumulating and changing, the GO needed to be a dynamic resource, accurately tracking ongoing research results over time. Here I describe the progress that has been made over the years towards this goal, and the work that still remains to be done, to make of the Gene Ontology (GO) Consortium realize its goal of offering the most comprehensive and up-to-date resource for information on gene function.
Collapse
Affiliation(s)
- Suzanna E Lewis
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA.
| |
Collapse
|
70
|
Kamaraj US, Gough J, Polo JM, Petretto E, Rackham OJL. Computational methods for direct cell conversion. Cell Cycle 2016; 15:3343-3354. [PMID: 27736295 PMCID: PMC5224461 DOI: 10.1080/15384101.2016.1238119] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Accepted: 09/08/2016] [Indexed: 12/19/2022] Open
Abstract
Directed cell conversion (or transdifferentiation) of one somatic cell-type to another can be achieved by ectopic expression of a set of transcription factors. Since the experimental identification of transcription factors for transdifferentiation is extremely time-consuming and expensive, there are still relatively few transdifferentiations achieved in comparison to the number of human cell-types. However, the growing volume of transcriptional data available and the recent introduction of data-driven algorithmic approaches that predict factors for transdifferentiation holds great promise for accelerating this field. Here we review those computational methods whose in-silico predictions have been experimentally validated, highlighting differences and similarities. Our analysis reveals that the factors predicted by each method tend to be different due to varying source cells used, gene expression quantification and algorithmic steps. We show these differences have an impact on the regulatory influences downstream, with some methods favoring transcription factors regulating developmental progression and others favoring factors regulating mature cell processes. These computational approaches offer a starting point to predict and test novel factors for transdifferentiation. We argue that collecting high-quality gene expression data from single-cells or pure cell-populations across a broader set of cell-types would be necessary to improve the quality and consistency of the in-silico predictions.
Collapse
Affiliation(s)
- Uma S. Kamaraj
- Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore
| | - Julian Gough
- Department of Computer Science, University of Bristol, Bristol, UK
| | - Jose M. Polo
- Anatomy and Developmental Biology, Monash University, Clayton, Victoria, Australia
| | - Enrico Petretto
- Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore
| | - Owen J. L. Rackham
- Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore
| |
Collapse
|
71
|
Mungall CJ, McMurry JA, Köhler S, Balhoff JP, Borromeo C, Brush M, Carbon S, Conlin T, Dunn N, Engelstad M, Foster E, Gourdine JP, Jacobsen JOB, Keith D, Laraway B, Lewis SE, NguyenXuan J, Shefchek K, Vasilevsky N, Yuan Z, Washington N, Hochheiser H, Groza T, Smedley D, Robinson PN, Haendel MA. The Monarch Initiative: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res 2016; 45:D712-D722. [PMID: 27899636 PMCID: PMC5210586 DOI: 10.1093/nar/gkw1128] [Citation(s) in RCA: 207] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Revised: 10/26/2016] [Accepted: 11/02/2016] [Indexed: 02/04/2023] Open
Abstract
The correlation of phenotypic outcomes with genetic variation and environmental factors is a core pursuit in biology and biomedicine. Numerous challenges impede our progress: patient phenotypes may not match known diseases, candidate variants may be in genes that have not been characterized, model organisms may not recapitulate human or veterinary diseases, filling evolutionary gaps is difficult, and many resources must be queried to find potentially significant genotype–phenotype associations. Non-human organisms have proven instrumental in revealing biological mechanisms. Advanced informatics tools can identify phenotypically relevant disease models in research and diagnostic contexts. Large-scale integration of model organism and clinical research data can provide a breadth of knowledge not available from individual sources and can provide contextualization of data back to these sources. The Monarch Initiative (monarchinitiative.org) is a collaborative, open science effort that aims to semantically integrate genotype–phenotype data from many species and sources in order to support precision medicine, disease modeling, and mechanistic exploration. Our integrated knowledge graph, analytic tools, and web services enable diverse users to explore relationships between phenotypes and genotypes across species.
Collapse
Affiliation(s)
- Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Julie A McMurry
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | | | - Charles Borromeo
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Matthew Brush
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Seth Carbon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Tom Conlin
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Nathan Dunn
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Mark Engelstad
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Erin Foster
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - J P Gourdine
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Dan Keith
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Bryan Laraway
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Suzanna E Lewis
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jeremy NguyenXuan
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Kent Shefchek
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Nicole Vasilevsky
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Zhou Yuan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Nicole Washington
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Tudor Groza
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia
| | - Damian Smedley
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Peter N Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany.,The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032mUSA
| | - Melissa A Haendel
- Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, OR, 97239, USA
| |
Collapse
|
72
|
Komljenovic A, Roux J, Wollbrett J, Robinson-Rechavi M, Bastian FB. BgeeDB, an R package for retrieval of curated expression datasets and for gene list expression localization enrichment tests. F1000Res 2016; 5:2748. [PMID: 30467516 PMCID: PMC6113886 DOI: 10.12688/f1000research.9973.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/11/2016] [Indexed: 09/29/2023] Open
Abstract
BgeeDB is a collection of functions to import into R re-annotated, quality-controlled and reprocessed expression data available in the Bgee database. This includes data from thousands of wild-type healthy samples of multiple animal species, generated with different gene expression technologies (RNA-seq, Affymetrix microarrays, expressed sequence tags, and in situ hybridizations). BgeeDB facilitates downstream analyses, such as gene expression analyses with other Bioconductor packages. Moreover, BgeeDB includes a new gene set enrichment test for preferred localization of expression of genes in anatomical structures ("TopAnat"). Along with the classical Gene Ontology enrichment test, this test provides a complementary way to interpret gene lists. Availability: http://www.bioconductor.org/packages/BgeeDB/.
Collapse
Affiliation(s)
- Andrea Komljenovic
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Julien Roux
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Biomedicine, University of Basel, Basel, Switzerland
| | - Julien Wollbrett
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Frederic B. Bastian
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
73
|
Komljenovic A, Roux J, Wollbrett J, Robinson-Rechavi M, Bastian FB. BgeeDB, an R package for retrieval of curated expression datasets and for gene list expression localization enrichment tests. F1000Res 2016; 5:2748. [PMID: 30467516 PMCID: PMC6113886 DOI: 10.12688/f1000research.9973.2] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/30/2018] [Indexed: 12/12/2022] Open
Abstract
BgeeDB is a collection of functions to import into R re-annotated, quality-controlled and re-processed expression data available in the Bgee database. This includes data from thousands of wild-type healthy samples of multiple animal species, generated with different gene expression technologies (RNA-seq, Affymetrix microarrays, expressed sequence tags, and in situ hybridizations). BgeeDB facilitates downstream analyses, such as gene expression analyses with other Bioconductor packages. Moreover, BgeeDB includes a new gene set enrichment test for preferred localization of expression of genes in anatomical structures ("TopAnat"). Along with the classical Gene Ontology enrichment test, this test provides a complementary way to interpret gene lists. Availability: https://www.bioconductor.org/packages/BgeeDB/.
Collapse
Affiliation(s)
- Andrea Komljenovic
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Julien Roux
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Biomedicine, University of Basel, Basel, Switzerland
| | - Julien Wollbrett
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Frederic B. Bastian
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
74
|
Mattingly CJ, Boyles R, Lawler CP, Haugen AC, Dearry A, Haendel M. Laying a Community-Based Foundation for Data-Driven Semantic Standards in Environmental Health Sciences. ENVIRONMENTAL HEALTH PERSPECTIVES 2016; 124:1136-40. [PMID: 26871594 PMCID: PMC4977056 DOI: 10.1289/ehp.1510438] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Revised: 12/17/2015] [Accepted: 02/03/2016] [Indexed: 05/19/2023]
Abstract
BACKGROUND Despite increasing availability of environmental health science (EHS) data, development, and implementation of relevant semantic standards, such as ontologies or hierarchical vocabularies, has lagged. Consequently, integration and analysis of information needed to better model environmental influences on human health remains a significant challenge. OBJECTIVES We aimed to identify a committed community and mechanisms needed to develop EHS semantic standards that will advance understanding about the impacts of environmental exposures on human disease. METHODS The National Institute of Environmental Health Sciences sponsored the "Workshop for the Development of a Framework for Environmental Health Science Language" hosted at North Carolina State University on 15-16 September 2014. Through the assembly of data generators, users, publishers, and funders, we aimed to develop a foundation for enabling the development of community-based and data-driven standards that will ultimately improve standardization, sharing, and interoperability of EHS information. DISCUSSION Creating and maintaining an EHS common language is a continuous and iterative process, requiring community building around research interests and needs, enabling integration and reuse of existing data, and providing a low barrier of access for researchers needing to use or extend such a resource. CONCLUSIONS Recommendations included developing a community-supported web-based toolkit that would enable a) collaborative development of EHS research questions and use cases, b) construction of user-friendly tools for searching and extending existing semantic resources, c) education and guidance about standards and their implementation, and d) creation of a plan for governance and sustainability. CITATION Mattingly CJ, Boyles R, Lawler CP, Haugen AC, Dearry A, Haendel M. 2016. Laying a community-based foundation for data-driven semantic standards in environmental health sciences. Environ Health Perspect 124:1136-1140; http://dx.doi.org/10.1289/ehp.1510438.
Collapse
Affiliation(s)
- Carolyn J. Mattingly
- Department of Biological Sciences, and
- Center for Human Health and the Environment, North Carolina State University, Raleigh, North Carolina, USA
- Address correspondence to C.J. Mattingly, Department of Biological Sciences, North Carolina State University, Campus Box 7633, Raleigh, NC 27695-7617 USA. Telephone: (919) 515-1509. E-mail:
| | - Rebecca Boyles
- National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA
| | - Cindy P. Lawler
- National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA
| | - Astrid C. Haugen
- National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA
| | - Allen Dearry
- National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USA
| | - Melissa Haendel
- Library, and
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| |
Collapse
|
75
|
Diehl AD, Meehan TF, Bradford YM, Brush MH, Dahdul WM, Dougall DS, He Y, Osumi-Sutherland D, Ruttenberg A, Sarntivijai S, Van Slyke CE, Vasilevsky NA, Haendel MA, Blake JA, Mungall CJ. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability. J Biomed Semantics 2016; 7:44. [PMID: 27377652 PMCID: PMC4932724 DOI: 10.1186/s13326-016-0088-7] [Citation(s) in RCA: 170] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 06/23/2016] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND The Cell Ontology (CL) is an OBO Foundry candidate ontology covering the domain of canonical, natural biological cell types. Since its inception in 2005, the CL has undergone multiple rounds of revision and expansion, most notably in its representation of hematopoietic cells. For in vivo cells, the CL focuses on vertebrates but provides general classes that can be used for other metazoans, which can be subtyped in species-specific ontologies. CONSTRUCTION AND CONTENT Recent work on the CL has focused on extending the representation of various cell types, and developing new modules in the CL itself, and in related ontologies in coordination with the CL. For example, the Kidney and Urinary Pathway Ontology was used as a template to populate the CL with additional cell types. In addition, subtypes of the class 'cell in vitro' have received improved definitions and labels to provide for modularity with the representation of cells in the Cell Line Ontology and Reagent Ontology. Recent changes in the ontology development methodology for CL include a switch from OBO to OWL for the primary encoding of the ontology, and an increasing reliance on logical definitions for improved reasoning. UTILITY AND DISCUSSION The CL is now mandated as a metadata standard for large functional genomics and transcriptomics projects, and is used extensively for annotation, querying, and analyses of cell type specific data in sequencing consortia such as FANTOM5 and ENCODE, as well as for the NIAID ImmPort database and the Cell Image Library. The CL is also a vital component used in the modular construction of other biomedical ontologies-for example, the Gene Ontology and the cross-species anatomy ontology, Uberon, use CL to support the consistent representation of cell types across different levels of anatomical granularity, such as tissues and organs. CONCLUSIONS The ongoing improvements to the CL make it a valuable resource to both the OBO Foundry community and the wider scientific community, and we continue to experience increased interest in the CL both among developers and within the user community.
Collapse
Affiliation(s)
- Alexander D. Diehl
- />Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203 USA
| | - Terrence F. Meehan
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Yvonne M. Bradford
- />ZFIN, the Zebrafish Model Organism Database, 5291 University of Oregon, Eugene, OR 97403 USA
| | - Matthew H. Brush
- />Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon 97239 USA
| | - Wasila M. Dahdul
- />Department of Biology, University of South Dakota, Vermillion, SD 57069 USA
- />National Evolutionary Synthesis Center, Durham, NC 27705 USA
| | - David S. Dougall
- />Southwestern Medical Center, University of Texas, Dallas, TX 75235 USA
| | - Yongqun He
- />Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109 USA
| | - David Osumi-Sutherland
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Alan Ruttenberg
- />Oral Diagnostics Sciences, University at Buffalo School of Dental Medicine, Buffalo, NY 14210 USA
| | - Sirarat Sarntivijai
- />European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Ceri E. Van Slyke
- />ZFIN, the Zebrafish Model Organism Database, 5291 University of Oregon, Eugene, OR 97403 USA
| | - Nicole A. Vasilevsky
- />Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon 97239 USA
| | - Melissa A. Haendel
- />Ontology Development Group, Library, Oregon Health and Science University, Portland, Oregon 97239 USA
| | | | | |
Collapse
|
76
|
Dececchi TA, Mabee PM, Blackburn DC. Data Sources for Trait Databases: Comparing the Phenomic Content of Monographs and Evolutionary Matrices. PLoS One 2016; 11:e0155680. [PMID: 27191170 PMCID: PMC4871461 DOI: 10.1371/journal.pone.0155680] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 05/03/2016] [Indexed: 01/17/2023] Open
Abstract
Databases of organismal traits that aggregate information from one or multiple sources can be leveraged for large-scale analyses in biology. Yet the differences among these data streams and how well they capture trait diversity have never been explored. We present the first analysis of the differences between phenotypes captured in free text of descriptive publications ('monographs') and those used in phylogenetic analyses ('matrices'). We focus our analysis on osteological phenotypes of the limbs of four extinct vertebrate taxa critical to our understanding of the fin-to-limb transition. We find that there is low overlap between the anatomical entities used in these two sources of phenotype data, indicating that phenotypes represented in matrices are not simply a subset of those found in monographic descriptions. Perhaps as expected, compared to characters found in matrices, phenotypes in monographs tend to emphasize descriptive and positional morphology, be somewhat more complex, and relate to fewer additional taxa. While based on a small set of focal taxa, these qualitative and quantitative data suggest that either source of phenotypes alone will result in incomplete knowledge of variation for a given taxon. As a broader community develops to use and expand databases characterizing organismal trait diversity, it is important to recognize the limitations of the data sources and develop strategies to more fully characterize variation both within species and across the tree of life.
Collapse
Affiliation(s)
- T. Alex Dececchi
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Paula M. Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - David C. Blackburn
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
77
|
Sarntivijai S, Vasant D, Jupp S, Saunders G, Bento AP, Gonzalez D, Betts J, Hasan S, Koscielny G, Dunham I, Parkinson H, Malone J. Linking rare and common disease: mapping clinical disease-phenotypes to ontologies in therapeutic target validation. J Biomed Semantics 2016; 7:8. [PMID: 27011785 PMCID: PMC4804633 DOI: 10.1186/s13326-016-0051-7] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Accepted: 02/02/2016] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The Centre for Therapeutic Target Validation (CTTV - https://www.targetvalidation.org/) was established to generate therapeutic target evidence from genome-scale experiments and analyses. CTTV aims to support the validity of therapeutic targets by integrating existing and newly-generated data. Data integration has been achieved in some resources by mapping metadata such as disease and phenotypes to the Experimental Factor Ontology (EFO). Additionally, the relationship between ontology descriptions of rare and common diseases and their phenotypes can offer insights into shared biological mechanisms and potential drug targets. Ontologies are not ideal for representing the sometimes associated type relationship required. This work addresses two challenges; annotation of diverse big data, and representation of complex, sometimes associated relationships between concepts. METHODS Semantic mapping uses a combination of custom scripting, our annotation tool 'Zooma', and expert curation. Disease-phenotype associations were generated using literature mining on Europe PubMed Central abstracts, which were manually verified by experts for validity. Representation of the disease-phenotype association was achieved by the Ontology of Biomedical AssociatioN (OBAN), a generic association representation model. OBAN represents associations between a subject and object i.e., disease and its associated phenotypes and the source of evidence for that association. The indirect disease-to-disease associations are exposed through shared phenotypes. This was applied to the use case of linking rare to common diseases at the CTTV. RESULTS EFO yields an average of over 80% of mapping coverage in all data sources. A 42% precision is obtained from the manual verification of the text-mined disease-phenotype associations. This results in 1452 and 2810 disease-phenotype pairs for IBD and autoimmune disease and contributes towards 11,338 rare diseases associations (merged with existing published work [Am J Hum Genet 97:111-24, 2015]). An OBAN result file is downloadable at http://sourceforge.net/p/efo/code/HEAD/tree/trunk/src/efoassociations/. Twenty common diseases are linked to 85 rare diseases by shared phenotypes. A generalizable OBAN model for association representation is presented in this study. CONCLUSIONS Here we present solutions to large-scale annotation-ontology mapping in the CTTV knowledge base, a process for disease-phenotype mining, and propose a generic association model, 'OBAN', as a means to integrate disease using shared phenotypes. AVAILABILITY EFO is released monthly and available for download at http://www.ebi.ac.uk/efo/.
Collapse
Affiliation(s)
- Sirarat Sarntivijai
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Drashtti Vasant
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Simon Jupp
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Gary Saunders
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - A Patrícia Bento
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Daniel Gonzalez
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Joanna Betts
- Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; GSK, Medicine Research Centre, Stevenage, SG1 2NY UK
| | - Samiul Hasan
- Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; GSK, Medicine Research Centre, Stevenage, SG1 2NY UK
| | - Gautier Koscielny
- Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; GSK, Medicine Research Centre, Stevenage, SG1 2NY UK
| | - Ian Dunham
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Helen Parkinson
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - James Malone
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| |
Collapse
|
78
|
Hayman GT, Laulederkind SJF, Smith JR, Wang SJ, Petri V, Nigam R, Tutaj M, De Pons J, Dwinell MR, Shimoyama M. The Disease Portals, disease-gene annotation and the RGD disease ontology at the Rat Genome Database. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw034. [PMID: 27009807 PMCID: PMC4805243 DOI: 10.1093/database/baw034] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Accepted: 02/29/2016] [Indexed: 12/23/2022]
Abstract
The Rat Genome Database (RGD;http://rgd.mcw.edu/) provides critical datasets and software tools to a diverse community of rat and non-rat researchers worldwide. To meet the needs of the many users whose research is disease oriented, RGD has created a series of Disease Portals and has prioritized its curation efforts on the datasets important to understanding the mechanisms of various diseases. Gene-disease relationships for three species, rat, human and mouse, are annotated to capture biomarkers, genetic associations, molecular mechanisms and therapeutic targets. To generate gene-disease annotations more effectively and in greater detail, RGD initially adopted the MEDIC disease vocabulary from the Comparative Toxicogenomics Database and adapted it for use by expanding this framework with the addition of over 1000 terms to create the RGD Disease Ontology (RDO). The RDO provides the foundation for, at present, 10 comprehensive disease area-related dataset and analysis platforms at RGD, the Disease Portals. Two major disease areas are the focus of data acquisition and curation efforts each year, leading to the release of the related Disease Portals. Collaborative efforts to realize a more robust disease ontology are underway. Database URL:http://rgd.mcw.edu.
Collapse
Affiliation(s)
- G Thomas Hayman
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Stanley J F Laulederkind
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Jennifer R Smith
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Shur-Jen Wang
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Victoria Petri
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Rajni Nigam
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Marek Tutaj
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Jeff De Pons
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Melinda R Dwinell
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Physiology, Medical College of Wisconsin
| | - Mary Shimoyama
- Medical College of Wisconsin, Human and Molecular Genetics Center Department of Surgery, Medical College of Wisconsin, Milwaukee, WI, USA
| |
Collapse
|
79
|
Druzinsky RE, Balhoff JP, Crompton AW, Done J, German RZ, Haendel MA, Herrel A, Herring SW, Lapp H, Mabee PM, Muller HM, Mungall CJ, Sternberg PW, Van Auken K, Vinyard CJ, Williams SH, Wall CE. Muscle Logic: New Knowledge Resource for Anatomy Enables Comprehensive Searches of the Literature on the Feeding Muscles of Mammals. PLoS One 2016; 11:e0149102. [PMID: 26870952 PMCID: PMC4752357 DOI: 10.1371/journal.pone.0149102] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 01/27/2016] [Indexed: 01/27/2023] Open
Abstract
Background In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required. Development and Testing of the Ontology Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. Results and Significance Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities.
Collapse
Affiliation(s)
- Robert E. Druzinsky
- Department of Oral Biology, University of Illinois at Chicago, Chicago, Illinois, United States of America
- * E-mail:
| | - James P. Balhoff
- RTI International, Research Triangle Park, North Carolina, United States of America
| | - Alfred W. Crompton
- Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - James Done
- Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
| | - Rebecca Z. German
- Department of Anatomy and Neurobiology, Northeast Ohio Medical University, Rootstown, Ohio, United States of America
| | - Melissa A. Haendel
- Oregon Health and Science University, Portland, Oregon, United States of America
| | - Anthony Herrel
- Département d’Ecologie et de Gestion de la Biodiversité, Museum National d’Histoire Naturelle, Paris, France
| | - Susan W. Herring
- University of Washington, Department of Orthodontics, Seattle, Washington, United States of America
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, Durham, North Carolina, United States of America
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina, United States of America
| | - Paula M. Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Hans-Michael Muller
- Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
| | - Christopher J. Mungall
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Paul W. Sternberg
- Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
- Howard Hughes Medical Institute, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
| | - Kimberly Van Auken
- Division of Biology and Biological Engineering, M/C 156–29, California Institute of Technology, Pasadena, California, United States of America
| | - Christopher J. Vinyard
- Department of Anatomy and Neurobiology, Northeast Ohio Medical University, Rootstown, Ohio, United States of America
| | - Susan H. Williams
- Department of Biomedical Sciences, Ohio University Heritage College of Osteopathic Medicine, Athens, Ohio, United States of America
| | - Christine E. Wall
- Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, United States of America
| |
Collapse
|
80
|
Oellrich A, Meehan TF, Parkinson H, Sarntivijai S, White JK, Karp NA. Reporting phenotypes in mouse models when considering body size as a potential confounder. J Biomed Semantics 2016; 7:2. [PMID: 26865945 PMCID: PMC4748495 DOI: 10.1186/s13326-016-0050-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 02/02/2016] [Indexed: 01/09/2023] Open
Abstract
Genotype-phenotype studies aim to identify causative relationships between genes and phenotypes. The International Mouse Phenotyping Consortium is a high throughput phenotyping program whose goal is to collect phenotype data for a knockout mouse strain of every protein coding gene. The scale of the project requires an automatic analysis pipeline to detect abnormal phenotypes, and disseminate the resulting gene-phenotype annotation data into public resources. A body weight phenotype is a common result of knockout studies. As body weight correlates with many other biological traits, this challenges the interpretation of related gene-phenotype associations. Co-correlation can lead to gene-phenotype associations that are potentially misleading. Here we use statistical modelling to account for body weight as a potential confounder to assess the impact. We find that there is a considerable impact on previously established gene-phenotype associations due to an increase in sensitivity as well as the confounding effect. We investigated the existing ontologies to represent this phenotypic information and we explored ways to ontologically represent the results of the influence of confounders on gene-phenotype associations. With the scale of data being disseminated within the high throughput programs and the range of downstream studies that utilise these data, it is critical to consider how we improve the quality of the disseminated data and provide a robust ontological representation.
Collapse
Affiliation(s)
- Anika Oellrich
- />Mouse Informatics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire UK
- />Social Genetic & Developmental Psychiatry, King’s College London, London, UK
| | - Terrence F. Meehan
- />Samples, Phenotypes and Ontologies, European Molecular Biology Laboratory—European Bioinformatics Institute, Hinxton, Cambridge UK
| | - Helen Parkinson
- />Samples, Phenotypes and Ontologies, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
| | - Sirarat Sarntivijai
- />Samples, Phenotypes and Ontologies, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
- />The Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
| | - Jacqueline K. White
- />Mouse Genetics Project, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire UK
| | - Natasha A. Karp
- />Mouse Informatics Group, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire UK
| |
Collapse
|
81
|
Thessen AE, Bunker DE, Buttigieg PL, Cooper LD, Dahdul WM, Domisch S, Franz NM, Jaiswal P, Lawrence-Dill CJ, Midford PE, Mungall CJ, Ramírez MJ, Specht CD, Vogt L, Vos RA, Walls RL, White JW, Zhang G, Deans AR, Huala E, Lewis SE, Mabee PM. Emerging semantics to link phenotype and environment. PeerJ 2015; 3:e1470. [PMID: 26713234 PMCID: PMC4690371 DOI: 10.7717/peerj.1470] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 11/12/2015] [Indexed: 11/20/2022] Open
Abstract
Understanding the interplay between environmental conditions and phenotypes is a fundamental goal of biology. Unfortunately, data that include observations on phenotype and environment are highly heterogeneous and thus difficult to find and integrate. One approach that is likely to improve the status quo involves the use of ontologies to standardize and link data about phenotypes and environments. Specifying and linking data through ontologies will allow researchers to increase the scope and flexibility of large-scale analyses aided by modern computing methods. Investments in this area would advance diverse fields such as ecology, phylogenetics, and conservation biology. While several biological ontologies are well-developed, using them to link phenotypes and environments is rare because of gaps in ontological coverage and limits to interoperability among ontologies and disciplines. In this manuscript, we present (1) use cases from diverse disciplines to illustrate questions that could be answered more efficiently using a robust linkage between phenotypes and environments, (2) two proof-of-concept analyses that show the value of linking phenotypes to environments in fishes and amphibians, and (3) two proposed example data models for linking phenotypes and environments using the extensible observation ontology (OBOE) and the Biological Collections Ontology (BCO); these provide a starting point for the development of a data model linking phenotypes and environments.
Collapse
Affiliation(s)
- Anne E. Thessen
- Ronin Institute for Independent Scholarship, Monclair, NJ, United States
- The Data Detektiv, Waltham, MA, United States
| | - Daniel E. Bunker
- Department of Biological Sciences, New Jersey Institute of Technology, Newark, NJ, United States
| | - Pier Luigi Buttigieg
- HGF-MPG Group for Deep Sea Ecology and Technology, Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar-und Meeresforschung, Bremerhaven, Germany
| | - Laurel D. Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Wasila M. Dahdul
- Department of Biology, University of South Dakota, Vermillion, SD, United States
| | - Sami Domisch
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States
| | - Nico M. Franz
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Carolyn J. Lawrence-Dill
- Departments of Genetics, Development and Cell Biology and Agronomy, Iowa State University, Ames, IA, United States
| | | | | | - Martín J. Ramírez
- Division of Arachnology, Museo Argentino de Ciencias Naturales–CONICET, Buenos Aires, Argentina
| | - Chelsea D. Specht
- Departments of Plant and Microbial Biology & Integrative Biology, University of California, Berkeley, CA, United States
| | - Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Universität Bonn, Bonn, Germany
| | | | - Ramona L. Walls
- iPlant Collaborative, University of Arizona, Tucson, AZ, United States
| | - Jeffrey W. White
- US Arid Land Agricultural Research Center, United States Department of Agriculture—ARS, Maricopa, AZ, United States
| | - Guanyang Zhang
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
| | - Andrew R. Deans
- Department of Entomology, Pennsylvania State University, University Park, PA, United States
| | - Eva Huala
- Phoenix Bioinformatics, Redwood City, CA, United States
| | - Suzanna E. Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Paula M. Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, United States
| |
Collapse
|
82
|
The SIB Swiss Institute of Bioinformatics' resources: focus on curated databases. Nucleic Acids Res 2015; 44:D27-37. [PMID: 26615188 PMCID: PMC4702916 DOI: 10.1093/nar/gkv1310] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Accepted: 11/09/2015] [Indexed: 12/15/2022] Open
Abstract
The SIB Swiss Institute of Bioinformatics (www.isb-sib.ch) provides world-class bioinformatics databases, software tools, services and training to the international life science community in academia and industry. These solutions allow life scientists to turn the exponentially growing amount of data into knowledge. Here, we provide an overview of SIB's resources and competence areas, with a strong focus on curated databases and SIB's most popular and widely used resources. In particular, SIB's Bioinformatics resource portal ExPASy features over 150 resources, including UniProtKB/Swiss-Prot, ENZYME, PROSITE, neXtProt, STRING, UniCarbKB, SugarBindDB, SwissRegulon, EPD, arrayMap, Bgee, SWISS-MODEL Repository, OMA, OrthoDB and other databases, which are briefly described in this article.
Collapse
|
83
|
Dececchi TA, Balhoff JP, Lapp H, Mabee PM. Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies. Syst Biol 2015; 64:936-52. [PMID: 26018570 PMCID: PMC4604830 DOI: 10.1093/sysbio/syv031] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 05/20/2015] [Indexed: 02/02/2023] Open
Abstract
The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in noncomputer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a new methodology, using ontology-based reasoning systems working with the Phenoscape Knowledgebase (KB; kb.phenoscape.org), to automatically integrate large amounts of evolutionary character state descriptions into a synthetic character matrix of neomorphic (presence/absence) data. Using the KB, which includes more than 55 studies of sarcopterygian taxa, we generated a synthetic supermatrix of 639 variable characters scored for 1051 taxa, resulting in over 145,000 populated cells. Of these characters, over 76% were made variable through the addition of inferred presence/absence states derived by machine reasoning over the formal semantics of the source ontologies. Inferred data reduced the missing data in the variable character-subset from 98.5% to 78.2%. Machine reasoning also enables the isolation of conflicts in the data, that is, cells where both presence and absence are indicated; reports regarding conflicting data provenance can be generated automatically. Further, reasoning enables quantification and new visualizations of the data, here for example, allowing identification of character space that has been undersampled across the fin-to-limb transition. The approach and methods demonstrated here to compute synthetic presence/absence supermatrices are applicable to any taxonomic and phenotypic slice across the tree of life, providing the data are semantically annotated. Because such data can also be linked to model organism genetics through computational scoring of phenotypic similarity, they open a rich set of future research questions into phenotype-to-genome relationships.
Collapse
Affiliation(s)
| | - James P Balhoff
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; University of North Carolina, Chapel Hill, NC 27599, USA
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; Center for Genomics and Computational Biology, Duke University, Durham, NC 27708, USA
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD 57069, USA;
| |
Collapse
|
84
|
Edmunds RC, Su B, Balhoff JP, Eames BF, Dahdul WM, Lapp H, Lundberg JG, Vision TJ, Dunham RA, Mabee PM, Westerfield M. Phenoscape: Identifying Candidate Genes for Evolutionary Phenotypes. Mol Biol Evol 2015; 33:13-24. [PMID: 26500251 PMCID: PMC4693980 DOI: 10.1093/molbev/msv223] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Phenotypes resulting from mutations in genetic model organisms can help reveal candidate genes for evolutionarily important phenotypic changes in related taxa. Although testing candidate gene hypotheses experimentally in nonmodel organisms is typically difficult, ontology-driven information systems can help generate testable hypotheses about developmental processes in experimentally tractable organisms. Here, we tested candidate gene hypotheses suggested by expert use of the Phenoscape Knowledgebase, specifically looking for genes that are candidates responsible for evolutionarily interesting phenotypes in the ostariophysan fishes that bear resemblance to mutant phenotypes in zebrafish. For this, we searched ZFIN for genetic perturbations that result in either loss of basihyal element or loss of scales phenotypes, because these are the ancestral phenotypes observed in catfishes (Siluriformes). We tested the identified candidate genes by examining their endogenous expression patterns in the channel catfish, Ictalurus punctatus. The experimental results were consistent with the hypotheses that these features evolved through disruption in developmental pathways at, or upstream of, brpf1 and eda/edar for the ancestral losses of basihyal element and scales, respectively. These results demonstrate that ontological annotations of the phenotypic effects of genetic alterations in model organisms, when aggregated within a knowledgebase, can be used effectively to generate testable, and useful, hypotheses about evolutionary changes in morphology.
Collapse
Affiliation(s)
| | - Baofeng Su
- School of Fisheries, Aquaculture and Aquatic Sciences, Auburn University
| | | | - B Frank Eames
- Department of Anatomy and Cell Biology, University of Saskatchewan, Saskatoon, SK, Canada
| | - Wasila M Dahdul
- National Evolutionary Synthesis Center, Durham, NC Department of Biology, University of South Dakota
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, Durham, NC
| | - John G Lundberg
- Department of Ichthyology, The Academy of Natural Sciences, Philadelphia, Philadelphia, PA
| | - Todd J Vision
- National Evolutionary Synthesis Center, Durham, NC Department of Biology, University of North Carolina, Chapel Hill
| | - Rex A Dunham
- School of Fisheries, Aquaculture and Aquatic Sciences, Auburn University
| | | | | |
Collapse
|
85
|
Mungall CJ, Washington NL, Nguyen-Xuan J, Condit C, Smedley D, Köhler S, Groza T, Shefchek K, Hochheiser H, Robinson PN, Lewis SE, Haendel MA. Use of model organism and disease databases to support matchmaking for human disease gene discovery. Hum Mutat 2015; 36:979-84. [PMID: 26269093 PMCID: PMC5473253 DOI: 10.1002/humu.22857] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 07/22/2015] [Indexed: 11/10/2022]
Abstract
The Matchmaker Exchange application programming interface (API) allows searching a patient's genotypic or phenotypic profiles across clinical sites, for the purposes of cohort discovery and variant disease causal validation. This API can be used not only to search for matching patients, but also to match against public disease and model organism data. This public disease data enable matching known diseases and variant-phenotype associations using phenotype semantic similarity algorithms developed by the Monarch Initiative. The model data can provide additional evidence to aid diagnosis, suggest relevant models for disease mechanism and treatment exploration, and identify collaborators across the translational divide. The Monarch Initiative provides an implementation of this API for searching multiple integrated sources of data that contextualize the knowledge about any given patient or patient family into the greater biomedical knowledge landscape. While this corpus of data can aid diagnosis, it is also the beginning of research to improve understanding of rare human diseases.
Collapse
Affiliation(s)
| | - Nicole L. Washington
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Jeremy Nguyen-Xuan
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Christopher Condit
- San Diego Supercomputing Center, UC San Diego, La Jolla, California, USA
| | - Damian Smedley
- Wellcome Trust Sanger Institute, Mouse Informatics group, Hinxton, UK
| | - Sebastian Köhler
- Charité - Universitätsmedizin Berlin, Institute for Medical and Human Genetics, Berlin, Germany
| | - Tudor Groza
- Garvan Institute, Kinghorn Centre for Clinical Genomics, Sydney, Australia
| | - Kent Shefchek
- Department of Biomedical Informatics and Clinical Epidemiology, Oregon Health and Science University
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Peter N. Robinson
- Charité - Universitätsmedizin Berlin, Institute for Medical and Human Genetics, Berlin, Germany
| | - Suzanna E. Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Melissa A. Haendel
- Department of Biomedical Informatics and Clinical Epidemiology, Oregon Health and Science University
| |
Collapse
|
86
|
Manda P, Balhoff JP, Lapp H, Mabee P, Vision TJ. Using the phenoscape knowledgebase to relate genetic perturbations to phenotypic evolution. Genesis 2015. [PMID: 26220875 DOI: 10.1002/dvg.22878] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The abundance of phenotypic diversity among species can enrich our knowledge of development and genetics beyond the limits of variation that can be observed in model organisms. The Phenoscape Knowledgebase (KB) is designed to enable exploration and discovery of phenotypic variation among species. Because phenotypes in the KB are annotated using standard ontologies, evolutionary phenotypes can be compared with phenotypes from genetic perturbations in model organisms. To illustrate the power of this approach, we review the use of the KB to find taxa showing evolutionary variation similar to that of a query gene. Matches are made between the full set of phenotypes described for a gene and an evolutionary profile, the latter of which is defined as the set of phenotypes that are variable among the daughters of any node on the taxonomic tree. Phenoscape's semantic similarity interface allows the user to assess the statistical significance of each match and flags matches that may only result from differences in annotation coverage between genetic and evolutionary studies. Tools such as this will help meet the challenge of relating the growing volume of genetic knowledge in model organisms to the diversity of phenotypes in nature. The Phenoscape KB is available at http://kb.phenoscape.org.
Collapse
Affiliation(s)
- Prashanti Manda
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina.,US National Evolutionary Synthesis Center, Durham, North Carolina
| | - James P Balhoff
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina.,US National Evolutionary Synthesis Center, Durham, North Carolina
| | - Hilmar Lapp
- US National Evolutionary Synthesis Center, Durham, North Carolina.,Center for Genomic and Computational Biology, Duke University, Durham, North Carolina
| | - Paula Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota
| | - Todd J Vision
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina.,US National Evolutionary Synthesis Center, Durham, North Carolina
| |
Collapse
|
87
|
Groza T, Köhler S, Moldenhauer D, Vasilevsky N, Baynam G, Zemojtel T, Schriml LM, Kibbe WA, Schofield PN, Beck T, Vasant D, Brookes AJ, Zankl A, Washington NL, Mungall CJ, Lewis SE, Haendel MA, Parkinson H, Robinson PN. The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease. Am J Hum Genet 2015; 97:111-24. [PMID: 26119816 PMCID: PMC4572507 DOI: 10.1016/j.ajhg.2015.05.020] [Citation(s) in RCA: 152] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 05/22/2015] [Indexed: 12/24/2022] Open
Abstract
The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.
Collapse
Affiliation(s)
- Tudor Groza
- School of Information Technology and Electrical Engineering, University of Queensland, St. Lucia, QLD 4072, Australia; Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia
| | - Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Dawid Moldenhauer
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; University of Applied Sciences, Wiesenstrasse 14, 35390 Giessen, Germany
| | - Nicole Vasilevsky
- Library, Oregon Health & Science University, Portland, OR 97239, USA
| | - Gareth Baynam
- School of Paediatrics and Child Health, University of Western Australia, Perth, WA 6840, Australia; Institute for Immunology and Infectious Diseases, Murdoch University, Perth, WA 6150, Australia; Office of Population Health Genomics, Public Health and Clinical Services Division, Department of Health, Perth, WA 6004, Australia; Genetic Services of Western Australia, King Edward Memorial Hospital, Perth, WA 6008, Australia; Telethon Kids Institute, Perth, WA 6008, Australia
| | - Tomasz Zemojtel
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznań, Poland
| | - Lynn Marie Schriml
- Department of Epidemiology and Public Health, School of Medicine, University of Maryland, Baltimore, MD 21201, USA; Institute for Genome Sciences, School of Medicine, University of Maryland, Baltimore, MD 21201, USA
| | - Warren Alden Kibbe
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG, UK; The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Tim Beck
- Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
| | - Drashtti Vasant
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
| | - Anthony J Brookes
- Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
| | - Andreas Zankl
- Garvan Institute of Medical Research, Darlinghurst, Sydney, NSW 2010, Australia; Academic Department of Medical Genetics, The Children's Hospital at Westmead, Sydney, NSW 2145, Australia; Discipline of Genetic Medicine, Sydney Medical School, University of Sydney, Sydney, NSW 2145, Australia
| | - Nicole L Washington
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Christopher J Mungall
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Suzanna E Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Melissa A Haendel
- Library, Oregon Health & Science University, Portland, OR 97239, USA
| | - Helen Parkinson
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany; Berlin Brandenburg Center for Regenerative Therapies, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; Institute of Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Takustrasse 9, 14195 Berlin, Germany.
| |
Collapse
|
88
|
Haendel MA, Vasilevsky N, Brush M, Hochheiser HS, Jacobsen J, Oellrich A, Mungall CJ, Washington N, Köhler S, Lewis SE, Robinson PN, Smedley D. Disease insights through cross-species phenotype comparisons. Mamm Genome 2015; 26:548-55. [PMID: 26092691 PMCID: PMC4602072 DOI: 10.1007/s00335-015-9577-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 05/20/2015] [Indexed: 11/30/2022]
Abstract
New sequencing technologies have ushered in a new era for diagnosis and discovery of new causative mutations for rare diseases. However, the sheer numbers of candidate variants that require interpretation in an exome or genomic analysis are still a challenging prospect. A powerful approach is the comparison of the patient’s set of phenotypes (phenotypic profile) to known phenotypic profiles caused by mutations in orthologous genes associated with these variants. The most abundant source of relevant data for this task is available through the efforts of the Mouse Genome Informatics group and the International Mouse Phenotyping Consortium. In this review, we highlight the challenges in comparing human clinical phenotypes with mouse phenotypes and some of the solutions that have been developed by members of the Monarch Initiative. These tools allow the identification of mouse models for known disease-gene associations that may otherwise have been overlooked as well as candidate genes may be prioritized for novel associations. The culmination of these efforts is the Exomiser software package that allows clinical researchers to analyse patient exomes in the context of variant frequency and predicted pathogenicity as well the phenotypic similarity of the patient to any given candidate orthologous gene.
Collapse
Affiliation(s)
- Melissa A Haendel
- University Library and Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Nicole Vasilevsky
- University Library and Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Matthew Brush
- University Library and Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Harry S Hochheiser
- Department of Biomedical Informatics and Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15206, USA
| | - Julius Jacobsen
- Skarnes Faculty Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Anika Oellrich
- Skarnes Faculty Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Christopher J Mungall
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Nicole Washington
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Sebastian Köhler
- Computational Biology Group, Institute for Medical Genetics and Human Genetics, Universitatsklinikum Charité, Augustenburger Platz 1, 13353, Berlin, Germany
| | - Suzanna E Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Peter N Robinson
- Computational Biology Group, Institute for Medical Genetics and Human Genetics, Universitatsklinikum Charité, Augustenburger Platz 1, 13353, Berlin, Germany
| | - Damian Smedley
- Skarnes Faculty Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| |
Collapse
|
89
|
Finger JH, Smith CM, Hayamizu TF, McCright IJ, Xu J, Eppig JT, Kadin JA, Richardson JE, Ringwald M. The mouse gene expression database: New features and how to use them effectively. Genesis 2015; 53:510-22. [PMID: 26045019 DOI: 10.1002/dvg.22864] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Revised: 05/29/2015] [Accepted: 06/01/2015] [Indexed: 12/15/2022]
Abstract
The Gene Expression Database (GXD) is an extensive and freely available community resource of mouse developmental expression data. GXD curates and integrates expression data from the literature, via electronic data submissions, and by collaborations with large-scale projects. As an integral component of the Mouse Genome Informatics Resource, GXD combines expression data with genetic, functional, phenotypic, and disease-related data, and provides tools for the research community to search for and analyze expression data in this larger context. Recent enhancements include: an interactive browser to navigate the mouse developmental anatomy and find expression data for specific anatomical structures; the capability to search for expression data of genes located in specific genomic regions, supporting the identification of disease candidate genes; a summary displaying all the expression images that meet specified search criteria; interactive matrix views that provide overviews of spatio-temporal expression patterns (Tissue × Stage Matrix) and enable the comparison of expression patterns between genes (Tissue × Gene Matrix); data zoom and filter utilities to iteratively refine summary displays and data sets; and gene-based links to expression data from other model organisms, such as chicken, Xenopus, and zebrafish, fostering comparative expression analysis for species that are highly relevant for developmental research.
Collapse
Affiliation(s)
| | | | | | | | - Jingxia Xu
- The Jackson Laboratory, Bar Harbor, Maine
| | | | | | | | | |
Collapse
|
90
|
Dahdul W, Dececchi TA, Ibrahim N, Lapp H, Mabee P. Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav040. [PMID: 25972520 PMCID: PMC4429748 DOI: 10.1093/database/bav040] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 04/05/2015] [Indexed: 11/28/2022]
Abstract
The diverse phenotypes of living organisms have been described for centuries, and though they may be digitized, they are not readily available in a computable form. Using over 100 morphological studies, the Phenoscape project has demonstrated that by annotating characters with community ontology terms, links between novel species anatomy and the genes that may underlie them can be made. But given the enormity of the legacy literature, how can this largely unexploited wealth of descriptive data be rendered amenable to large-scale computation? To identify the bottlenecks, we quantified the time involved in the major aspects of phenotype curation as we annotated characters from the vertebrate phylogenetic systematics literature. This involves attaching fully computable logical expressions consisting of ontology terms to the descriptions in character-by-taxon matrices. The workflow consists of: (i) data preparation, (ii) phenotype annotation, (iii) ontology development and (iv) curation team discussions and software development feedback. Our results showed that the completion of this work required two person-years by a team of two post-docs, a lead data curator, and students. Manual data preparation required close to 13% of the effort. This part in particular could be reduced substantially with better community data practices, such as depositing fully populated matrices in public repositories. Phenotype annotation required ∼40% of the effort. We are working to make this more efficient with Natural Language Processing tools. Ontology development (40%), however, remains a highly manual task requiring domain (anatomical) expertise and use of specialized software. The large overhead required for data preparation and ontology development contributed to a low annotation rate of approximately two characters per hour, compared with 14 characters per hour when activity was restricted to character annotation. Unlocking the potential of the vast stores of morphological descriptions requires better tools for efficiently processing natural language, and better community practices towards a born-digital morphology. Database URL:http://kb.phenoscape.org
Collapse
Affiliation(s)
- Wasila Dahdul
- Department of Biology, University of South Dakota, Vermillion, SD, USA, Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL, USA and National Evolutionary Synthesis Center, Durham, NC, USA
| | - T Alexander Dececchi
- Department of Biology, University of South Dakota, Vermillion, SD, USA, Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL, USA and National Evolutionary Synthesis Center, Durham, NC, USA
| | - Nizar Ibrahim
- Department of Biology, University of South Dakota, Vermillion, SD, USA, Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL, USA and National Evolutionary Synthesis Center, Durham, NC, USA
| | - Hilmar Lapp
- Department of Biology, University of South Dakota, Vermillion, SD, USA, Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL, USA and National Evolutionary Synthesis Center, Durham, NC, USA
| | - Paula Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA, Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL, USA and National Evolutionary Synthesis Center, Durham, NC, USA
| |
Collapse
|
91
|
Papatheodorou I, Oellrich A, Smedley D. Linking gene expression to phenotypes via pathway information. J Biomed Semantics 2015; 6:17. [PMID: 25901272 PMCID: PMC4404592 DOI: 10.1186/s13326-015-0013-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2014] [Accepted: 03/19/2015] [Indexed: 11/10/2022] Open
Abstract
Establishing robust links among gene expression, pathways and phenotypes is critical for understanding diseases and developing treatments. In recent years there have been many efforts to develop the computational means to traverse from genes to gene expression, model pathways and classify phenotypes. Numerous ontologies and other controlled vocabularies have been developed, as well as computational methods to combine and mine these data sets and establish connections. Here we discuss these efforts and identify areas of future work that could lead to a better integration of genes, pathways and phenotypes to provide insights into the mechanisms under which gene mutations affect expression and pathways and how these effects are manifested onto the phenotype.
Collapse
Affiliation(s)
- Irene Papatheodorou
- Mouse Developmental Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB1 10SA, Hinxton, UK
| | - Anika Oellrich
- Mouse Developmental Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB1 10SA, Hinxton, UK
| | - Damian Smedley
- Mouse Developmental Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB1 10SA, Hinxton, UK
| |
Collapse
|
92
|
Roux J, Rosikiewicz M, Robinson-Rechavi M. What to compare and how: Comparative transcriptomics for Evo-Devo. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2015; 324:372-82. [PMID: 25864439 PMCID: PMC4949521 DOI: 10.1002/jez.b.22618] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 02/19/2015] [Indexed: 12/30/2022]
Abstract
Evolutionary developmental biology has grown historically from the capacity to relate patterns of evolution in anatomy to patterns of evolution of expression of specific genes, whether between very distantly related species, or very closely related species or populations. Scaling up such studies by taking advantage of modern transcriptomics brings promising improvements, allowing us to estimate the overall impact and molecular mechanisms of convergence, constraint or innovation in anatomy and development. But it also presents major challenges, including the computational definitions of anatomical homology and of organ function, the criteria for the comparison of developmental stages, the annotation of transcriptomics data to proper anatomical and developmental terms, and the statistical methods to compare transcriptomic data between species to highlight significant conservation or changes. In this article, we review these challenges, and the ongoing efforts to address them, which are emerging from bioinformatics work on ontologies, evolutionary statistics, and data curation, with a focus on their implementation in the context of the development of our database Bgee (http://bgee.org). J. Exp. Zool. (Mol. Dev. Evol.) 324B: 372–382, 2015. © 2015 The Authors. J. Exp. Zool. (Mol. Dev. Evol.) published by Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Julien Roux
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Department of Human Genetics, University of Chicago, Chicago, Illinois
| | - Marta Rosikiewicz
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
93
|
Deans AR, Lewis SE, Huala E, Anzaldo SS, Ashburner M, Balhoff JP, Blackburn DC, Blake JA, Burleigh JG, Chanet B, Cooper LD, Courtot M, Csösz S, Cui H, Dahdul W, Das S, Dececchi TA, Dettai A, Diogo R, Druzinsky RE, Dumontier M, Franz NM, Friedrich F, Gkoutos GV, Haendel M, Harmon LJ, Hayamizu TF, He Y, Hines HM, Ibrahim N, Jackson LM, Jaiswal P, James-Zorn C, Köhler S, Lecointre G, Lapp H, Lawrence CJ, Le Novère N, Lundberg JG, Macklin J, Mast AR, Midford PE, Mikó I, Mungall CJ, Oellrich A, Osumi-Sutherland D, Parkinson H, Ramírez MJ, Richter S, Robinson PN, Ruttenberg A, Schulz KS, Segerdell E, Seltmann KC, Sharkey MJ, Smith AD, Smith B, Specht CD, Squires RB, Thacker RW, Thessen A, Fernandez-Triana J, Vihinen M, Vize PD, Vogt L, Wall CE, Walls RL, Westerfeld M, Wharton RA, Wirkner CS, Woolley JB, Yoder MJ, Zorn AM, Mabee P. Finding our way through phenotypes. PLoS Biol 2015; 13:e1002033. [PMID: 25562316 PMCID: PMC4285398 DOI: 10.1371/journal.pbio.1002033] [Citation(s) in RCA: 124] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.
Collapse
Affiliation(s)
- Andrew R. Deans
- Department of Entomology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Suzanna E. Lewis
- Genome Division, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Eva Huala
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California, United States of America
- Phoenix Bioinformatics, Palo Alto, California, United States of America
| | - Salvatore S. Anzaldo
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Michael Ashburner
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - James P. Balhoff
- National Evolutionary Synthesis Center, Durham, North Carolina, United States of America
| | - David C. Blackburn
- Department of Vertebrate Zoology and Anthropology, California Academy of Sciences, San Francisco, California, United States of America
| | - Judith A. Blake
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - J. Gordon Burleigh
- Department of Biology, University of Florida, Gainesville, Florida, United States of America
| | - Bruno Chanet
- Muséum national d'Histoire naturelle, Département Systématique et Evolution, Paris, France
| | - Laurel D. Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, United States of America
| | - Mélanie Courtot
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Sándor Csösz
- MTA-ELTE-MTM, Ecology Research Group, Pázmány Péter sétány 1C, Budapest, Hungary
| | - Hong Cui
- School of Information Resources and Library Science, University of Arizona, Tucson, Arizona, United States of America
| | - Wasila Dahdul
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi, India
| | - T. Alexander Dececchi
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Agnes Dettai
- Muséum national d'Histoire naturelle, Département Systématique et Evolution, Paris, France
| | - Rui Diogo
- Department of Anatomy, Howard University College of Medicine, Washington D.C., United States of America
| | - Robert E. Druzinsky
- Department of Oral Biology, College of Dentistry, University of Illinois, Chicago, Illinois, United States of America
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford, California, United States of America
| | - Nico M. Franz
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Frank Friedrich
- Biocenter Grindel and Zoological Museum, Hamburg University, Hamburg, Germany
| | - George V. Gkoutos
- Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion, United Kingdom
| | - Melissa Haendel
- Department of Medical Informatics & Epidemiology, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Luke J. Harmon
- Department of Biological Sciences, University of Idaho, Moscow, Idaho, United States of America
| | - Terry F. Hayamizu
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Heather M. Hines
- Department of Entomology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Nizar Ibrahim
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois, United States of America
| | - Laura M. Jackson
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, United States of America
| | - Christina James-Zorn
- Cincinnati Children's Hospital, Division of Developmental Biology, Cincinnati, Ohio, United States of America
| | - Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Guillaume Lecointre
- Muséum national d'Histoire naturelle, Département Systématique et Evolution, Paris, France
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, Durham, North Carolina, United States of America
| | - Carolyn J. Lawrence
- Department of Genetics, Development and Cell Biology and Department of Agronomy, Iowa State University, Ames, Iowa, United States of America
| | | | - John G. Lundberg
- Department of Ichthyology, The Academy of Natural Sciences, Philadelphia, Pennsylvania, United States of America
| | - James Macklin
- Eastern Cereal and Oilseed Research Centre, Ottawa, Ontario, Canada
| | - Austin R. Mast
- Department of Biological Science, Florida State University, Tallahassee, Florida, United States of America
| | | | - István Mikó
- Department of Entomology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Christopher J. Mungall
- Genome Division, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Anika Oellrich
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - David Osumi-Sutherland
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Helen Parkinson
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Martín J. Ramírez
- Division of Arachnology, Museo Argentino de Ciencias Naturales - CONICET, Buenos Aires, Argentina
| | - Stefan Richter
- Allgemeine & Spezielle Zoologie, Institut für Biowissenschaften, Universität Rostock, Universitätsplatz 2, Rostock, Germany
| | - Peter N. Robinson
- Institut für Medizinische Genetik und Humangenetik Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Alan Ruttenberg
- School of Dental Medicine, University at Buffalo, Buffalo, New York, United States of America
| | - Katja S. Schulz
- Smithsonian Institution, National Museum of Natural History, Washington, D.C., United States of America
| | - Erik Segerdell
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Katja C. Seltmann
- Division of Invertebrate Zoology, American Museum of Natural History, New York, New York, United States of America
| | - Michael J. Sharkey
- Department of Entomology, University of Kentucky, Lexington, Kentucky, United States of America
| | - Aaron D. Smith
- Department of Biological Sciences, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, New York, United States of America
| | - Chelsea D. Specht
- Department of Plant and Microbial Biology, Integrative Biology, and the University and Jepson Herbaria, University of California, Berkeley, California, United States of America
| | - R. Burke Squires
- Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Robert W. Thacker
- Department of Biology, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Anne Thessen
- The Data Detektiv, 1412 Stearns Hill Road, Waltham, Massachusetts, United States of America
| | | | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| | - Peter D. Vize
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada
| | - Lars Vogt
- Universität Bonn, Institut für Evolutionsbiologie und Ökologie, Bonn, Germany
| | - Christine E. Wall
- Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, United States of America
| | - Ramona L. Walls
- iPlant Collaborative University of Arizona, Thomas J. Keating Bioresearch Building, Tucson, Arizona, United States of America
| | - Monte Westerfeld
- Institute of Neuroscience, University of Oregon, Eugene, Oregon, United States of America
| | - Robert A. Wharton
- Department of Entomology, Texas A & M University, College, Station, Texas, United States of America
| | - Christian S. Wirkner
- Allgemeine & Spezielle Zoologie, Institut für Biowissenschaften, Universität Rostock, Universitätsplatz 2, Rostock, Germany
| | - James B. Woolley
- Department of Entomology, Texas A & M University, College, Station, Texas, United States of America
| | - Matthew J. Yoder
- Illinois Natural History Survey, University of Illinois, Champaign, Illinois, United States of America
| | - Aaron M. Zorn
- Cincinnati Children's Hospital, Division of Developmental Biology, Cincinnati, Ohio, United States of America
| | - Paula Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| |
Collapse
|
94
|
Dahdul WM, Cui H, Mabee PM, Mungall CJ, Osumi-Sutherland D, Walls RL, Haendel MA. Nose to tail, roots to shoots: spatial descriptors for phenotypic diversity in the Biological Spatial Ontology. J Biomed Semantics 2014; 5:34. [PMID: 25140222 PMCID: PMC4137724 DOI: 10.1186/2041-1480-5-34] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 06/16/2014] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Spatial terminology is used in anatomy to indicate precise, relative positions of structures in an organism. While these terms are often standardized within specific fields of biology, they can differ dramatically across taxa. Such differences in usage can impair our ability to unambiguously refer to anatomical position when comparing anatomy or phenotypes across species. We developed the Biological Spatial Ontology (BSPO) to standardize the description of spatial and topological relationships across taxa to enable the discovery of comparable phenotypes. RESULTS BSPO currently contains 146 classes and 58 relations representing anatomical axes, gradients, regions, planes, sides, and surfaces. These concepts can be used at multiple biological scales and in a diversity of taxa, including plants, animals and fungi. The BSPO is used to provide a source of anatomical location descriptors for logically defining anatomical entity classes in anatomy ontologies. Spatial reasoning is further enhanced in anatomy ontologies by integrating spatial relations such as dorsal_to into class descriptions (e.g., 'dorsolateral placode' dorsal_to some 'epibranchial placode'). CONCLUSIONS The BSPO is currently used by projects that require standardized anatomical descriptors for phenotype annotation and ontology integration across a diversity of taxa. Anatomical location classes are also useful for describing phenotypic differences, such as morphological variation in position of structures resulting from evolution within and across species.
Collapse
Affiliation(s)
- Wasila M Dahdul
- Department of Biology, University of South Dakota, Vermillion, SD, USA
- National Evolutionary Synthesis Center, Durham, NC, USA
| | - Hong Cui
- School of Information Resource and Library Science, University of Arizona, Tucson, AZ, USA
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA
| | | | | | - Ramona L Walls
- The iPlant Collaborative, Bio5 Institute, University of Arizona, Tucson, AZ, USA
| | - Melissa A Haendel
- Library and Department of Medical Informatics & Epidemiology, Oregon Health & Science University, Portland, OR, USA
| |
Collapse
|
95
|
Van Slyke CE, Bradford YM, Westerfield M, Haendel MA. The zebrafish anatomy and stage ontologies: representing the anatomy and development of Danio rerio. J Biomed Semantics 2014; 5:12. [PMID: 24568621 PMCID: PMC3944782 DOI: 10.1186/2041-1480-5-12] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Accepted: 02/07/2014] [Indexed: 01/07/2023] Open
Abstract
Background The Zebrafish Anatomy Ontology (ZFA) is an OBO Foundry ontology that is used in conjunction with the Zebrafish Stage Ontology (ZFS) to describe the gross and cellular anatomy and development of the zebrafish, Danio rerio, from single cell zygote to adult. The zebrafish model organism database (ZFIN) uses the ZFA and ZFS to annotate phenotype and gene expression data from the primary literature and from contributed data sets. Results The ZFA models anatomy and development with a subclass hierarchy, a partonomy, and a developmental hierarchy and with relationships to the ZFS that define the stages during which each anatomical entity exists. The ZFA and ZFS are developed utilizing OBO Foundry principles to ensure orthogonality, accessibility, and interoperability. The ZFA has 2860 classes representing a diversity of anatomical structures from different anatomical systems and from different stages of development. Conclusions The ZFA describes zebrafish anatomy and development semantically for the purposes of annotating gene expression and anatomical phenotypes. The ontology and the data have been used by other resources to perform cross-species queries of gene expression and phenotype data, providing insights into genetic relationships, morphological evolution, and models of human disease.
Collapse
|