1
|
Xu F, Juty N, Goble C, Jupp S, Parkinson H, Courtot M. Features of a FAIR vocabulary. J Biomed Semantics 2023; 14:6. [PMID: 37264430 DOI: 10.1186/s13326-023-00286-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 04/27/2023] [Indexed: 06/03/2023] Open
Abstract
BACKGROUND The Findable, Accessible, Interoperable and Reusable(FAIR) Principles explicitly require the use of FAIR vocabularies, but what precisely constitutes a FAIR vocabulary remains unclear. Being able to define FAIR vocabularies, identify features of FAIR vocabularies, and provide assessment approaches against the features can guide the development of vocabularies. RESULTS We differentiate data, data resources and vocabularies used for FAIR, examine the application of the FAIR Principles to vocabularies, align their requirements with the Open Biomedical Ontologies principles, and propose FAIR Vocabulary Features. We also design assessment approaches for FAIR vocabularies by mapping the FVFs with existing FAIR assessment indicators. Finally, we demonstrate how they can be used for evaluating and improving vocabularies using exemplary biomedical vocabularies. CONCLUSIONS Our work proposes features of FAIR vocabularies and corresponding indicators for assessing the FAIR levels of different types of vocabularies, identifies use cases for vocabulary engineers, and guides the evolution of vocabularies.
Collapse
Affiliation(s)
- Fuqi Xu
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SD, UK
| | - Nick Juty
- The University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
| | - Carole Goble
- The University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
| | - Simon Jupp
- SciBite BioData Innovation Centre, Wellcome Genome Campus, Hinxton, CB10 1DR, UK
| | - Helen Parkinson
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SD, UK
| | - Mélanie Courtot
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SD, UK.
- Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, M5G 0A3, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada.
| |
Collapse
|
2
|
Matentzoglu N, Balhoff JP, Bello SM, Bizon C, Brush M, Callahan TJ, Chute CG, Duncan WD, Evelo CT, Gabriel D, Graybeal J, Gray A, Gyori BM, Haendel M, Harmse H, Harris NL, Harrow I, Hegde HB, Hoyt AL, Hoyt CT, Jiao D, Jiménez-Ruiz E, Jupp S, Kim H, Koehler S, Liener T, Long Q, Malone J, McLaughlin JA, McMurry JA, Moxon S, Munoz-Torres MC, Osumi-Sutherland D, Overton JA, Peters B, Putman T, Queralt-Rosinach N, Shefchek K, Solbrig H, Thessen A, Tudorache T, Vasilevsky N, Wagner AH, Mungall CJ. A Simple Standard for Sharing Ontological Mappings (SSSOM). Database (Oxford) 2022; 2022:6591806. [PMID: 35616100 PMCID: PMC9216545 DOI: 10.1093/database/baac035] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 03/08/2022] [Accepted: 05/11/2022] [Indexed: 02/03/2023]
Abstract
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.
Collapse
Affiliation(s)
| | - James P Balhoff
- RENCI, University of North Carolina, Chapel Hill, NC 27517, USA
| | | | - Chris Bizon
- RENCI, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Matthew Brush
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | | | | | | | - Chris T Evelo
- Maastricht University, Maastricht 6211 LK, The Netherlands
| | | | | | - Alasdair Gray
- Department of Computer Science, Heriot-Watt University, Edinburgh, Currie EH14 4AS, UK
| | | | - Melissa Haendel
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | - Henriette Harmse
- European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Nomi L Harris
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - Harshad B Hegde
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Amelia L Hoyt
- Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | | | - Dazhi Jiao
- Johns Hopkins University, Baltimore, MD 21210, USA
| | - Ernesto Jiménez-Ruiz
- City University of London, London EC1V 0HB, UK,University of Oslo, Oslo 0315, Norway
| | - Simon Jupp
- SciBite Limited, Bio Data Innovation Centre, Wellcome Genome Campus, Hinxton, Saffron Walden CB10 1DR, UK
| | | | | | | | - Qinqin Long
- Leiden University Medical Center, Leiden 2333 ZA, The Netherlands
| | - James Malone
- BenchSci, 25 York St Suite 1100, Toronto, ON M5J 2V5, Canada
| | | | - Julie A McMurry
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | - Sierra Moxon
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | | | | - Bjoern Peters
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Tim Putman
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | | | - Kent Shefchek
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | | | - Anne Thessen
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | | | - Nicole Vasilevsky
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | - Alex H Wagner
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH 43205, USA,The Ohio State University College of Medicine, Columbus, OH 43210, USA
| | | |
Collapse
|
3
|
Liyanage I, Burdett T, Droesbeke B, Erdos K, Fernandez R, Gray A, Haseeb M, Jupp S, Penim F, Pommier C, Rocca-Serra P, Courtot M, Coppens F. ELIXIR biovalidator for semantic validation of life science metadata. Bioinformatics 2022; 38:3141-3142. [PMID: 35380605 PMCID: PMC9154242 DOI: 10.1093/bioinformatics/btac195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 02/25/2022] [Accepted: 04/01/2022] [Indexed: 01/14/2023] Open
Abstract
SUMMARY To advance biomedical research, increasingly large amounts of complex data need to be discovered and integrated. This requires syntactic and semantic validation to ensure shared understanding of relevant entities. This article describes the ELIXIR biovalidator, which extends the syntactic validation of the widely used AJV library with ontology-based validation of JSON documents. AVAILABILITY AND IMPLEMENTATION Source code: https://github.com/elixir-europe/biovalidator, Release: v1.9.1, License: Apache License 2.0, Deployed at: https://www.ebi.ac.uk/biosamples/schema/validator/validate. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Isuru Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Bert Droesbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium,VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
| | - Karoly Erdos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Rolando Fernandez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Alasdair Gray
- Department of Computer Science, Heriot-Watt University, Edinburgh EH14 4AS, UK
| | - Muhammad Haseeb
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Simon Jupp
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Flavia Penim
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Cyril Pommier
- INRAE, BioinfOmics, Plant Bioinformatics Facility, Université Paris-Saclay, 78026 Versailles, France,INRAE, URGI, Université Paris-Saclay, 78026 Versailles, France
| | - Philippe Rocca-Serra
- Department of Engineering Science, University of Oxford e-Research Centre, University of Oxford, Oxford OX1 3QG, UK
| | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK,Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada,To whom correspondence should be addressed.
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium,VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
| |
Collapse
|
4
|
Kerimov N, Hayhurst JD, Peikova K, Manning JR, Walter P, Kolberg L, Samoviča M, Sakthivel MP, Kuzmin I, Trevanion SJ, Burdett T, Jupp S, Parkinson H, Papatheodorou I, Yates AD, Zerbino DR, Alasoo K. A compendium of uniformly processed human gene expression and splicing quantitative trait loci. Nat Genet 2021; 53:1290-1299. [PMID: 34493866 PMCID: PMC8423625 DOI: 10.1038/s41588-021-00924-w] [Citation(s) in RCA: 128] [Impact Index Per Article: 42.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 07/26/2021] [Indexed: 12/15/2022]
Abstract
Many gene expression quantitative trait locus (eQTL) studies have published their summary statistics, which can be used to gain insight into complex human traits by downstream analyses, such as fine mapping and co-localization. However, technical differences between these datasets are a barrier to their widespread use. Consequently, target genes for most genome-wide association study (GWAS) signals have still not been identified. In the present study, we present the eQTL Catalogue ( https://www.ebi.ac.uk/eqtl ), a resource of quality-controlled, uniformly re-computed gene expression and splicing QTLs from 21 studies. We find that, for matching cell types and tissues, the eQTL effect sizes are highly reproducible between studies. Although most QTLs were shared between most bulk tissues, we identified a greater diversity of cell-type-specific QTLs from purified cell types, a subset of which also manifested as new disease co-localizations. Our summary statistics are freely available to enable the systematic interpretation of human GWAS associations across many cell types and tissues.
Collapse
Affiliation(s)
- Nurlan Kerimov
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Open Targets, Wellcome Genome Campus, Cambridge, UK
| | - James D Hayhurst
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Kateryna Peikova
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Jonathan R Manning
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Peter Walter
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Liis Kolberg
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Marija Samoviča
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Manoj Pandian Sakthivel
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Ivan Kuzmin
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Stephen J Trevanion
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Tony Burdett
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Simon Jupp
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Helen Parkinson
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Irene Papatheodorou
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Andrew D Yates
- Open Targets, Wellcome Genome Campus, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Daniel R Zerbino
- Open Targets, Wellcome Genome Campus, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia.
- Open Targets, Wellcome Genome Campus, Cambridge, UK.
| |
Collapse
|
5
|
Lambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A, McMahon A, Abraham G, Chapman M, Parkinson H, Danesh J, MacArthur JAL, Inouye M. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat Genet 2021; 53:420-425. [PMID: 33692568 DOI: 10.1101/2020.05.20.20108217v1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Affiliation(s)
- Samuel A Lambert
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
| | - Laurent Gil
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, UK
| | - Simon Jupp
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Scott C Ritchie
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- National Institute for Health Research Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
- British Heart Foundation Cambridge Centre of Research Excellence, Department of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Yu Xu
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Annalisa Buniello
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Aoife McMahon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Gad Abraham
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Michael Chapman
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, UK
| | - Helen Parkinson
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - John Danesh
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, UK
- National Institute for Health Research Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
- British Heart Foundation Cambridge Centre of Research Excellence, Department of Clinical Medicine, University of Cambridge, Cambridge, UK
- NIHR Blood and Transplant Research Unit in Donor Health and Genomics, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | | | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.
- National Institute for Health Research Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK.
- British Heart Foundation Cambridge Centre of Research Excellence, Department of Clinical Medicine, University of Cambridge, Cambridge, UK.
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.
- The Alan Turing Institute, London, UK.
| |
Collapse
|
6
|
Lambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A, McMahon A, Abraham G, Chapman M, Parkinson H, Danesh J, MacArthur JAL, Inouye M. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat Genet 2021; 53:420-425. [PMID: 33692568 DOI: 10.1038/s41588-021-00783-5] [Citation(s) in RCA: 218] [Impact Index Per Article: 72.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Samuel A Lambert
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK.
| | - Laurent Gil
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, UK
| | - Simon Jupp
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Scott C Ritchie
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- National Institute for Health Research Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
- British Heart Foundation Cambridge Centre of Research Excellence, Department of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Yu Xu
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Annalisa Buniello
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Aoife McMahon
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Gad Abraham
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Michael Chapman
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, UK
| | - Helen Parkinson
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - John Danesh
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, UK
- National Institute for Health Research Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
- British Heart Foundation Cambridge Centre of Research Excellence, Department of Clinical Medicine, University of Cambridge, Cambridge, UK
- NIHR Blood and Transplant Research Unit in Donor Health and Genomics, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | | | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.
- National Institute for Health Research Cambridge Biomedical Research Centre at the Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK.
- British Heart Foundation Cambridge Centre of Research Excellence, Department of Clinical Medicine, University of Cambridge, Cambridge, UK.
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.
- The Alan Turing Institute, London, UK.
| |
Collapse
|
7
|
Ghoussaini M, Mountjoy E, Carmona M, Peat G, Schmidt E, Hercules A, Fumis L, Miranda A, Carvalho-Silva D, Buniello A, Burdett T, Hayhurst J, Baker J, Ferrer J, Gonzalez-Uriarte A, Jupp S, Karim M, Koscielny G, Machlitt-Northen S, Malangone C, Pendlington ZM, Roncaglia P, Suveges D, Wright D, Vrousgou O, Papa E, Parkinson H, MacArthur JAL, Todd J, Barrett JC, Schwartzentruber J, Hulcoop D, Ochoa D, McDonagh EM, Dunham I. Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic Acids Res 2021; 49:D1311-D1320. [PMID: 33045747 PMCID: PMC7778936 DOI: 10.1093/nar/gkaa840] [Citation(s) in RCA: 208] [Impact Index Per Article: 69.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/16/2020] [Accepted: 09/17/2020] [Indexed: 01/22/2023] Open
Abstract
Open Targets Genetics (https://genetics.opentargets.org) is an open-access integrative resource that aggregates human GWAS and functional genomics data including gene expression, protein abundance, chromatin interaction and conformation data from a wide range of cell types and tissues to make robust connections between GWAS-associated loci, variants and likely causal genes. This enables systematic identification and prioritisation of likely causal variants and genes across all published trait-associated loci. In this paper, we describe the public resources we aggregate, the technology and analyses we use, and the functionality that the portal offers. Open Targets Genetics can be searched by variant, gene or study/phenotype. It offers tools that enable users to prioritise causal variants and genes at disease-associated loci and access systematic cross-disease and disease-molecular trait colocalization analysis across 92 cell types and tissues including the eQTL Catalogue. Data visualizations such as Manhattan-like plots, regional plots, credible sets overlap between studies and PheWAS plots enable users to explore GWAS signals in depth. The integrated data is made available through the web portal, for bulk download and via a GraphQL API, and the software is open source. Applications of this integrated data include identification of novel targets for drug discovery and drug repurposing.
Collapse
Affiliation(s)
- Maya Ghoussaini
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Edward Mountjoy
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Miguel Carmona
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gareth Peat
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ellen M Schmidt
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Andrew Hercules
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Luca Fumis
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Alfredo Miranda
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Denise Carvalho-Silva
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Annalisa Buniello
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Tony Burdett
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - James Hayhurst
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Jarrod Baker
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Javier Ferrer
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Asier Gonzalez-Uriarte
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Simon Jupp
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Mohd Anisul Karim
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gautier Koscielny
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- GlaxoSmithKline plc, GSK Medicines Research Centre, Gunnels Wood Road, Stevenage SG1 2NY, UK
| | - Sandra Machlitt-Northen
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- GlaxoSmithKline plc, GSK Medicines Research Centre, Gunnels Wood Road, Stevenage SG1 2NY, UK
| | - Cinzia Malangone
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Zoe May Pendlington
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Paola Roncaglia
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Daniel Suveges
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Daniel Wright
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Olga Vrousgou
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Eliseo Papa
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- Systems Biology, Biogen, Cambridge, MA 02142, USA
| | - Helen Parkinson
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Jacqueline A L MacArthur
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - John A Todd
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, NIHR Oxford Biomedical Research Centre, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
| | - Jeffrey C Barrett
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Jeremy Schwartzentruber
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - David G Hulcoop
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- GlaxoSmithKline plc, GSK Medicines Research Centre, Gunnels Wood Road, Stevenage SG1 2NY, UK
| | - David Ochoa
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ellen M McDonagh
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ian Dunham
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
8
|
Shefchek KA, Harris NL, Gargano M, Matentzoglu N, Unni D, Brush M, Keith D, Conlin T, Vasilevsky N, Zhang XA, Balhoff JP, Babb L, Bello SM, Blau H, Bradford Y, Carbon S, Carmody L, Chan LE, Cipriani V, Cuzick A, Della Rocca M, Dunn N, Essaid S, Fey P, Grove C, Gourdine JP, Hamosh A, Harris M, Helbig I, Hoatlin M, Joachimiak M, Jupp S, Lett KB, Lewis SE, McNamara C, Pendlington ZM, Pilgrim C, Putman T, Ravanmehr V, Reese J, Riggs E, Robb S, Roncaglia P, Seager J, Segerdell E, Similuk M, Storm AL, Thaxon C, Thessen A, Jacobsen JOB, McMurry JA, Groza T, Köhler S, Smedley D, Robinson PN, Mungall CJ, Haendel MA, Munoz-Torres MC, Osumi-Sutherland D. The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species. Nucleic Acids Res 2020; 48:D704-D715. [PMID: 31701156 PMCID: PMC7056945 DOI: 10.1093/nar/gkz997] [Citation(s) in RCA: 119] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 10/09/2019] [Accepted: 10/14/2019] [Indexed: 12/14/2022] Open
Abstract
In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven’t been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search. We develop many widely adopted ontologies that together enable sophisticated computational analysis, mechanistic discovery and diagnostics of Mendelian diseases. Our algorithms and tools are widely used to identify animal models of human disease through phenotypic similarity, for differential diagnostics and to facilitate translational research. Launched in 2015, Monarch has grown with regards to data (new organisms, more sources, better modeling); new API and standards; ontologies (new Mondo unified disease ontology, improvements to ontologies such as HPO and uPheno); user interface (a redesigned website); and community development. Monarch data, algorithms and tools are being used and extended by resources such as GA4GH and NCATS Translator, among others, to aid mechanistic discovery and diagnostics.
Collapse
Affiliation(s)
- Kent A Shefchek
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Nomi L Harris
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Michael Gargano
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Nicolas Matentzoglu
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Deepak Unni
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Matthew Brush
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Daniel Keith
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Tom Conlin
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Nicole Vasilevsky
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | | | - James P Balhoff
- Renaissance Computing Institute at UNC, Chapel Hill, NC 27517, USA
| | - Larry Babb
- Broad Institute, Cambridge, MA 02142, USA
| | | | - Hannah Blau
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Yvonne Bradford
- Institute of Neuroscience, University of Oregon, Eugene, OR 97401, USA
| | - Seth Carbon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Leigh Carmody
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Valentina Cipriani
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | | | - Maria Della Rocca
- Office of Rare Diseases Research (ORDR), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Nathan Dunn
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Shahim Essaid
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Petra Fey
- dictyBase, Center for Genetic Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Chris Grove
- California Institute of Technology, Pasadena, CA 91125, USA
| | - Jean-Phillipe Gourdine
- Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Ada Hamosh
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD 21205, USA
| | | | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.,Department of Neuropediatrics, Christian-Albrechts-University of Kiel, 24105 Kiel, Germany.,Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Maureen Hoatlin
- Department of Biochemistry and Molecular Biology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Marcin Joachimiak
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Simon Jupp
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kenneth B Lett
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Suzanna E Lewis
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | | | - Zoë M Pendlington
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Tim Putman
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Vida Ravanmehr
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Justin Reese
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Erin Riggs
- Autism & Developmental Medicine Institute, Geisinger, Danville, PA 17837, USA
| | - Sofia Robb
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Paola Roncaglia
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Erik Segerdell
- Xenbase, Cincinnati Children's Hospital, Cincinnati, OH 45229, USA
| | - Morgan Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Andrea L Storm
- Office of Rare Diseases Research (ORDR), National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Courtney Thaxon
- University of North Carolina Medical School, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA
| | - Anne Thessen
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Julie A McMurry
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | | | - Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Damian Smedley
- William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Peter N Robinson
- The Jackson Laboratory For Genomic Medicine, Farmington, CT 06032, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94710, USA
| | - Melissa A Haendel
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA.,Oregon Clinical and Translational Research Institute, Oregon Health & Science University, Portland, OR 97239, USA
| | - Monica C Munoz-Torres
- Center for Genome Research and Biocomputing, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - David Osumi-Sutherland
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
9
|
Papatheodorou I, Moreno P, Manning J, Fuentes AMP, George N, Fexova S, Fonseca NA, Füllgrabe A, Green M, Huang N, Huerta L, Iqbal H, Jianu M, Mohammed S, Zhao L, Jarnuczak AF, Jupp S, Marioni J, Meyer K, Petryszak R, Prada Medina CA, Talavera-López C, Teichmann S, Vizcaino JA, Brazma A. Expression Atlas update: from tissues to single cells. Nucleic Acids Res 2020; 48:D77-D83. [PMID: 31665515 PMCID: PMC7145605 DOI: 10.1093/nar/gkz947] [Citation(s) in RCA: 191] [Impact Index Per Article: 47.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/07/2019] [Accepted: 10/16/2019] [Indexed: 12/16/2022] Open
Abstract
Expression Atlas is EMBL-EBI's resource for gene and protein expression. It sources and compiles data on the abundance and localisation of RNA and proteins in various biological systems and contexts and provides open access to this data for the research community. With the increased availability of single cell RNA-Seq datasets in the public archives, we have now extended Expression Atlas with a new added-value service to display gene expression in single cells. Single Cell Expression Atlas was launched in 2018 and currently includes 123 single cell RNA-Seq studies from 12 species. The website can be searched by genes within or across species to reveal experiments, tissues and cell types where this gene is expressed or under which conditions it is a marker gene. Within each study, cells can be visualized using a pre-calculated t-SNE plot and can be coloured by different features or by cell clusters based on gene expression. Within each experiment, there are links to downloadable files, such as RNA quantification matrices, clustering results, reports on protocols and associated metadata, such as assigned cell types.
Collapse
Affiliation(s)
- Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Jonathan Manning
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | | | - Nancy George
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Silvie Fexova
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Nuno A Fonseca
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Anja Füllgrabe
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Matthew Green
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Ni Huang
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK.,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Laura Huerta
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Haider Iqbal
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Monica Jianu
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Suhaib Mohammed
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Lingyun Zhao
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Andrew F Jarnuczak
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Simon Jupp
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - John Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK.,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.,Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Kerstin Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Robert Petryszak
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | | | | | - Sarah Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Juan Antonio Vizcaino
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| |
Collapse
|
10
|
Maiella S, Olry A, Hanauer M, Lanneau V, Lourghi H, Donadille B, Rodwell C, Köhler S, Seelow D, Jupp S, Parkinson H, Groza T, Brudno M, Robinson PN, Rath A. Harmonising phenomics information for a better interoperability in the rare disease field. Eur J Med Genet 2018; 61:706-714. [PMID: 29425702 DOI: 10.1016/j.ejmg.2018.01.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Revised: 11/30/2017] [Accepted: 01/27/2018] [Indexed: 01/30/2023]
Abstract
HIPBI-RD (Harmonising phenomics information for a better interoperability in the rare disease field) is a three-year project which started in 2016 funded via the E-Rare 3 ERA-NET program. This project builds on three resources largely adopted by the rare disease (RD) community: Orphanet, its ontology ORDO (the Orphanet Rare Disease Ontology), HPO (the Human Phenotype Ontology) as well as PhenoTips software for the capture and sharing of structured phenotypic data for RD patients. Our project is further supported by resources developed by the European Bioinformatics Institute and the Garvan Institute. HIPBI-RD aims to provide the community with an integrated, RD-specific bioinformatics ecosystem that will harmonise the way phenomics information is stored in databases and patient files worldwide, and thereby contribute to interoperability. This ecosystem will consist of a suite of tools and ontologies, optimized to work together, and made available through commonly used software repositories. The project workplan follows three main objectives: The HIPBI-RD ecosystem will contribute to the interpretation of variants identified through exome and full genome sequencing by harmonising the way phenotypic information is collected, thus improving diagnostics and delineation of RD. The ultimate goal of HIPBI-RD is to provide a resource that will contribute to bridging genome-scale biology and a disease-centered view on human pathobiology. Achievements in Year 1.
Collapse
Affiliation(s)
- Sylvie Maiella
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Annie Olry
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Marc Hanauer
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Valérie Lanneau
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Halima Lourghi
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Bruno Donadille
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Charlotte Rodwell
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Sebastian Köhler
- NeuroCure Cluster of Excellence, Charité Universitätsklinikum, Charitéplatz 1, 10117 Berlin, Germany
| | - Dominik Seelow
- NeuroCure Cluster of Excellence, Charité Universitätsklinikum, Charitéplatz 1, 10117 Berlin, Germany
| | - Simon Jupp
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Helen Parkinson
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Tudor Groza
- Kinghorn Centre for Clinical Genomics, Garvan Institute for Medical Research, Darlinghurst, NSW, Australia
| | - Michael Brudno
- Department of Computer Science, University of Toronto, Toronto M5S 1A1, Canada
| | - Peter N Robinson
- Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Ana Rath
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France.
| |
Collapse
|
11
|
Abstract
BACKGROUND The Experimental Factor Ontology (EFO) is an application ontology driven by experimental variables including cell lines to organize and describe the diverse experimental variables and data resided in the EMBL-EBI resources. The Cell Line Ontology (CLO) is an OBO community-based ontology that contains information of immortalized cell lines and relevant experimental components. EFO integrates and extends ontologies from the bio-ontology community to drive a number of practical applications. It is desirable that the community shares design patterns and therefore that EFO reuses the cell line representation from the Cell Line Ontology (CLO). There are, however, challenges to be addressed when developing a common ontology design pattern for representing cell lines in both EFO and CLO. RESULTS In this study, we developed a strategy to compare and map cell line terms between EFO and CLO. We examined Cellosaurus resources for EFO-CLO cross-references. Text labels of cell lines from both ontologies were verified by biological information axiomatized in each source. The study resulted in the identification 873 EFO-CLO aligned and 344 EFO unique immortalized permanent cell lines. All of these cell lines were updated to CLO and the cell line related information was merged. A design pattern that integrates EFO and CLO was also developed. CONCLUSION Our study compared, aligned, and synchronized the cell line information between CLO and EFO. The final updated CLO will be examined as the candidate ontology to import and replace eligible EFO cell line classes thereby supporting the interoperability in the bio-ontology domain. Our mapping pipeline illustrates the use of ontology in aiding biological data standardization and integration through the biological and semantics content of cell lines.
Collapse
Affiliation(s)
- Edison Ong
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Samples, Phenotypes, and Ontologies Team, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridge, UK
| | - Sirarat Sarntivijai
- Samples, Phenotypes, and Ontologies Team, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridge, UK
| | - Simon Jupp
- Samples, Phenotypes, and Ontologies Team, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridge, UK
| | - Helen Parkinson
- Samples, Phenotypes, and Ontologies Team, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Hinxton, Cambridge, UK
| | - Yongqun He
- Center of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Unit of Laboratory Animal Medicine, University of Michigan, Ann Arbor, MI USA
| |
Collapse
|
12
|
Perez‐Riverol Y, Ternent T, Koch M, Barsnes H, Vrousgou O, Jupp S, Vizcaíno JA. OLS Client and OLS Dialog: Open Source Tools to Annotate Public Omics Datasets. Proteomics 2017; 17:1700244. [PMID: 28792687 PMCID: PMC5707441 DOI: 10.1002/pmic.201700244] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Revised: 07/12/2017] [Indexed: 01/12/2023]
Abstract
The availability of user-friendly software to annotate biological datasets and experimental details is becoming essential in data management practices, both in local storage systems and in public databases. The Ontology Lookup Service (OLS, http://www.ebi.ac.uk/ols) is a popular centralized service to query, browse and navigate biomedical ontologies and controlled vocabularies. Recently, the OLS framework has been completely redeveloped (version 3.0), including enhancements in the data model, like the added support for Web Ontology Language based ontologies, among many other improvements. However, the new OLS is not backwards compatible and new software tools are needed to enable access to this widely used framework now that the previous version is no longer available. We here present the OLS Client as a free, open-source Java library to retrieve information from the new version of the OLS. It enables rapid tool creation by providing a robust, pluggable programming interface and common data model to programmatically access the OLS. The library has already been integrated and is routinely used by several bioinformatics resources and related data annotation tools. Secondly, we also introduce an updated version of the OLS Dialog (version 2.0), a Java graphical user interface that can be easily plugged into Java desktop applications to access the OLS. The software and related documentation are freely available at https://github.com/PRIDE-Utilities/ols-client and https://github.com/PRIDE-Toolsuite/ols-dialog.
Collapse
Affiliation(s)
- Yasset Perez‐Riverol
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome CampusHinxtonCambridgeUK
| | - Tobias Ternent
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome CampusHinxtonCambridgeUK
| | - Maximilian Koch
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome CampusHinxtonCambridgeUK
| | - Harald Barsnes
- Proteomics Unit, Department of BiomedicineUniversity of BergenBergenNorway
- Computational Biology Unit, Department of InformaticsUniversity of BergenBergenNorway
| | - Olga Vrousgou
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome CampusHinxtonCambridgeUK
| | - Simon Jupp
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome CampusHinxtonCambridgeUK
| | - Juan Antonio Vizcaíno
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome CampusHinxtonCambridgeUK
| |
Collapse
|
13
|
McMurry JA, Juty N, Blomberg N, Burdett T, Conlin T, Conte N, Courtot M, Deck J, Dumontier M, Fellows DK, Gonzalez-Beltran A, Gormanns P, Grethe J, Hastings J, Hériché JK, Hermjakob H, Ison JC, Jimenez RC, Jupp S, Kunze J, Laibe C, Le Novère N, Malone J, Martin MJ, McEntyre JR, Morris C, Muilu J, Müller W, Rocca-Serra P, Sansone SA, Sariyar M, Snoep JL, Soiland-Reyes S, Stanford NJ, Swainston N, Washington N, Williams AR, Wimalaratne SM, Winfree LM, Wolstencroft K, Goble C, Mungall CJ, Haendel MA, Parkinson H. Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data. PLoS Biol 2017; 15:e2001414. [PMID: 28662064 PMCID: PMC5490878 DOI: 10.1371/journal.pbio.2001414] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
Collapse
Affiliation(s)
- Julie A. McMurry
- Department of Medical Informatics and Epidemiology and OHSU Library, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Nick Juty
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Niklas Blomberg
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Tony Burdett
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Tom Conlin
- Department of Medical Informatics and Epidemiology and OHSU Library, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Nathalie Conte
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Mélanie Courtot
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - John Deck
- Berkeley Natural History Museums, University of California at Berkeley, Berkely, California, United States of America
| | - Michel Dumontier
- Institute of Data Science, Maastricht University, Maastricht, the Netherlands
| | - Donal K. Fellows
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | | | - Philipp Gormanns
- Institute of Experimental Genetics, Helmholtz Centre Munich, German Research Center for Environmental Health, Neuherberg, Germany
| | - Jeffrey Grethe
- Center for Research in Biological Systems, University of California San Diego, La Jolla, California, United States of America
| | | | | | - Henning Hermjakob
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Jon C. Ison
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Rafael C. Jimenez
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Simon Jupp
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - John Kunze
- California Digital Library, Oakland, California, United States of America
| | - Camille Laibe
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | - James Malone
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Maria Jesus Martin
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Johanna R. McEntyre
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Chris Morris
- Science and Technology Facilities Council, Daresbury Laboratory, Warrington, United Kingdom
| | - Juha Muilu
- Genomics Coordination Center, Department of Genetics, University Medical Center Groningen and Groningen Bioinformatics Center, University of Groningen, Groningen, the Netherlands
| | - Wolfgang Müller
- Scientific Databases and Visualization at Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | | | | | - Murat Sariyar
- Institute for Medical Informatics, Bern University of Applied Sciences, Engineering and Information Technology, Bern, Switzerland
| | - Jacky L. Snoep
- Manchester Institute of Biology, University of Manchester, Manchester, United Kingdom
- Department of Biochemistry, Stellenbosch University, Stellenbosch, South Africa
| | - Stian Soiland-Reyes
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | - Natalie J. Stanford
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | - Neil Swainston
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals, University of Manchester, Manchester, United Kingdom
| | - Nicole Washington
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Alan R. Williams
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | - Sarala M. Wimalaratne
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Lilly M. Winfree
- Department of Medical Informatics and Epidemiology and OHSU Library, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Katherine Wolstencroft
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, the Netherlands
| | - Carole Goble
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Melissa A. Haendel
- Department of Medical Informatics and Epidemiology and OHSU Library, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Helen Parkinson
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
14
|
Dumontier M, Gray AJG, Marshall MS, Alexiev V, Ansell P, Bader G, Baran J, Bolleman JT, Callahan A, Cruz-Toledo J, Gaudet P, Gombocz EA, Gonzalez-Beltran AN, Groth P, Haendel M, Ito M, Jupp S, Juty N, Katayama T, Kobayashi N, Krishnaswami K, Laibe C, Le Novère N, Lin S, Malone J, Miller M, Mungall CJ, Rietveld L, Wimalaratne SM, Yamaguchi A. The health care and life sciences community profile for dataset descriptions. PeerJ 2016; 4:e2331. [PMID: 27602295 PMCID: PMC4991880 DOI: 10.7717/peerj.2331] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 07/14/2016] [Indexed: 11/20/2022] Open
Abstract
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.
Collapse
Affiliation(s)
- Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of America
| | - Alasdair J G Gray
- Department of Computer Science, Heriot-Watt University, Edinburgh, United Kingdom
| | - M Scott Marshall
- Department of Radiation Oncology (MAASTRO), GROW- School for Oncology and Developmental Biology, MAASTRO Clinic, Maastricht, Netherlands
| | | | | | - Gary Bader
- The Donnelly Centre, University of Toronto, Toronto, Canada
| | - Joachim Baran
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of America
| | - Jerven T Bolleman
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Geneve, Switzerland
| | - Alison Callahan
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of America
| | | | - Pascale Gaudet
- CALIPHO group, SIB Swiss Institute of Bioinformatics, Geneve, Switzerland
| | | | | | | | - Melissa Haendel
- Department of Medical Informatics and Epidemiology, Oregon Health Sciences University, Portland, OR, United States of America
| | - Maori Ito
- Office of Medical Informatics and Epidemiology, Pharmaceuticals and Medical Devices Agency, Chiyoda-ku, Japan
| | - Simon Jupp
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
| | - Nick Juty
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
| | | | - Norio Kobayashi
- Advanced Center for Computing and Communication, RIKEN, Wako-shi, Saitama, Japan
| | | | - Camille Laibe
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
| | | | - Simon Lin
- Nationwide Children's Hospital, Columbus, OH, United States of America
| | - James Malone
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
| | - Michael Miller
- Institute for Systems Biology, Seattle, WA, United States of America
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Laurens Rietveld
- Department of Exact Sciences, VU University Amsterdam, Amsterdam, Netherlands
| | | | | |
Collapse
|
15
|
Jupp S, Malone J, Burdett T, Heriche JK, Williams E, Ellenberg J, Parkinson H, Rustici G. The cellular microscopy phenotype ontology. J Biomed Semantics 2016; 7:28. [PMID: 27195102 PMCID: PMC4870745 DOI: 10.1186/s13326-016-0074-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 05/10/2016] [Indexed: 11/17/2022] Open
Abstract
Background Phenotypic data derived from high content screening is currently annotated using free-text, thus preventing the integration of independent datasets, including those generated in different biological domains, such as cell lines, mouse and human tissues. Description We present the Cellular Microscopy Phenotype Ontology (CMPO), a species neutral ontology for describing phenotypic observations relating to the whole cell, cellular components, cellular processes and cell populations. CMPO is compatible with related ontology efforts, allowing for future cross-species integration of phenotypic data. CMPO was developed following a curator-driven approach where phenotype data were annotated by expert biologists following the Entity-Quality (EQ) pattern. These EQs were subsequently transformed into new CMPO terms following an established post composition process. Conclusion CMPO is currently being utilized to annotate phenotypes associated with high content screening datasets stored in several image repositories including the Image Data Repository (IDR), MitoSys project database and the Cellular Phenotype Database to facilitate data browsing and discoverability.
Collapse
Affiliation(s)
- Simon Jupp
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - James Malone
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - Tony Burdett
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - Jean-Karim Heriche
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Eleanor Williams
- Centre for Gene Regulation and Expression, University of Dundee, Dundee, DD1 5EH UK
| | - Jan Ellenberg
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Helen Parkinson
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| | - Gabriella Rustici
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SD UK
| |
Collapse
|
16
|
Conway M, Khojoyan A, Fana F, Scuba W, Castine M, Mowery D, Chapman W, Jupp S. Developing a web-based SKOS editor. J Biomed Semantics 2016; 7:5. [PMID: 27047653 PMCID: PMC4819276 DOI: 10.1186/s13326-015-0043-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 12/21/2015] [Indexed: 12/03/2022] Open
Abstract
Background The Simple Knowledge Organization System (SKOS) was introduced to the wider research community by a 2005 World Wide Web Consortium (W3C) working draft, and further developed and refined in a 2009 W3C recommendation. Since then, SKOS has become the de facto standard for representing and sharing thesauri, lexicons, vocabularies, taxonomies, and classification schemes. In this paper, we describe the development of a web-based, free, open-source SKOS editor built for the development, curation, and management of small to medium-sized lexicons for health-related Natural Language Processing (NLP). Results The web-based SKOS editor allows users to create, curate, version, manage, and visualise SKOS resources. We tested the system against five widely-used, publicly-available SKOS vocabularies of various sizes and found that the editor is suitable for the development and management of small to medium-size lexicons. Qualitative testing has focussed on using the editor to develop lexical resources to drive NLP applications in two domains. First, developing a lexicon to support an Electronic Health Record-based NLP system for the automatic identification of pneumonia symptoms. Second, creating a taxonomy of lexical cues associated with Diagnostic and Statistical Manual of Mental Disorders (DSM-5) diagnoses with the goal of facilitating the automatic identification of symptoms associated with depression from short, informal texts. Conclusions The SKOS editor we have developed is — to the best of our knowledge — the first free, open-source, web-based, SKOS editor capable of creating, curating, versioning, managing, and visualising SKOS lexicons. Electronic supplementary material The online version of this article (doi:10.1186/s13326-015-0043-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mike Conway
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, 84108 UT United States
| | | | - Fariba Fana
- CALIT2, University of California San Diego, 9500 Gilman Drive, La Jolla, 92093 CA United States
| | - William Scuba
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, 84108 UT United States
| | - Melissa Castine
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, 84108 UT United States
| | - Danielle Mowery
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, 84108 UT United States
| | - Wendy Chapman
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, 84108 UT United States
| | - Simon Jupp
- European Bioinformatics Institute, Hinxton, CB10 1SD Cambridgeshire United Kingdom
| |
Collapse
|
17
|
Jupp S, Burdett T, Welter D, Sarntivijai S, Parkinson H, Malone J. Webulous and the Webulous Google Add-On--a web service and application for ontology building from templates. J Biomed Semantics 2016; 7:17. [PMID: 27042287 PMCID: PMC4818523 DOI: 10.1186/s13326-016-0055-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Accepted: 03/11/2016] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Authoring bio-ontologies is a task that has traditionally been undertaken by skilled experts trained in understanding complex languages such as the Web Ontology Language (OWL), in tools designed for such experts. As requests for new terms are made, the need for expert ontologists represents a bottleneck in the development process. Furthermore, the ability to rigorously enforce ontology design patterns in large, collaboratively developed ontologies is difficult with existing ontology authoring software. DESCRIPTION We present Webulous, an application suite for supporting ontology creation by design patterns. Webulous provides infrastructure to specify templates for populating ontology design patterns that get transformed into OWL assertions in a target ontology. Webulous provides programmatic access to the template server and a client application has been developed for Google Sheets that allows templates to be loaded, populated and resubmitted to the Webulous server for processing. CONCLUSIONS The development and delivery of ontologies to the community requires software support that goes beyond the ontology editor. Building ontologies by design patterns and providing simple mechanisms for the addition of new content helps reduce the overall cost and effort required to develop an ontology. The Webulous system provides support for this process and is used as part of the development of several ontologies at the European Bioinformatics Institute.
Collapse
Affiliation(s)
- Simon Jupp
- European Bioinformatics Institute (EMBL-EBI),European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Tony Burdett
- European Bioinformatics Institute (EMBL-EBI),European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Danielle Welter
- European Bioinformatics Institute (EMBL-EBI),European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Sirarat Sarntivijai
- European Bioinformatics Institute (EMBL-EBI),European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Helen Parkinson
- European Bioinformatics Institute (EMBL-EBI),European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - James Malone
- European Bioinformatics Institute (EMBL-EBI),European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
18
|
Sarntivijai S, Vasant D, Jupp S, Saunders G, Bento AP, Gonzalez D, Betts J, Hasan S, Koscielny G, Dunham I, Parkinson H, Malone J. Linking rare and common disease: mapping clinical disease-phenotypes to ontologies in therapeutic target validation. J Biomed Semantics 2016; 7:8. [PMID: 27011785 PMCID: PMC4804633 DOI: 10.1186/s13326-016-0051-7] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2015] [Accepted: 02/02/2016] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The Centre for Therapeutic Target Validation (CTTV - https://www.targetvalidation.org/) was established to generate therapeutic target evidence from genome-scale experiments and analyses. CTTV aims to support the validity of therapeutic targets by integrating existing and newly-generated data. Data integration has been achieved in some resources by mapping metadata such as disease and phenotypes to the Experimental Factor Ontology (EFO). Additionally, the relationship between ontology descriptions of rare and common diseases and their phenotypes can offer insights into shared biological mechanisms and potential drug targets. Ontologies are not ideal for representing the sometimes associated type relationship required. This work addresses two challenges; annotation of diverse big data, and representation of complex, sometimes associated relationships between concepts. METHODS Semantic mapping uses a combination of custom scripting, our annotation tool 'Zooma', and expert curation. Disease-phenotype associations were generated using literature mining on Europe PubMed Central abstracts, which were manually verified by experts for validity. Representation of the disease-phenotype association was achieved by the Ontology of Biomedical AssociatioN (OBAN), a generic association representation model. OBAN represents associations between a subject and object i.e., disease and its associated phenotypes and the source of evidence for that association. The indirect disease-to-disease associations are exposed through shared phenotypes. This was applied to the use case of linking rare to common diseases at the CTTV. RESULTS EFO yields an average of over 80% of mapping coverage in all data sources. A 42% precision is obtained from the manual verification of the text-mined disease-phenotype associations. This results in 1452 and 2810 disease-phenotype pairs for IBD and autoimmune disease and contributes towards 11,338 rare diseases associations (merged with existing published work [Am J Hum Genet 97:111-24, 2015]). An OBAN result file is downloadable at http://sourceforge.net/p/efo/code/HEAD/tree/trunk/src/efoassociations/. Twenty common diseases are linked to 85 rare diseases by shared phenotypes. A generalizable OBAN model for association representation is presented in this study. CONCLUSIONS Here we present solutions to large-scale annotation-ontology mapping in the CTTV knowledge base, a process for disease-phenotype mining, and propose a generic association model, 'OBAN', as a means to integrate disease using shared phenotypes. AVAILABILITY EFO is released monthly and available for download at http://www.ebi.ac.uk/efo/.
Collapse
Affiliation(s)
- Sirarat Sarntivijai
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Drashtti Vasant
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Simon Jupp
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Gary Saunders
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - A Patrícia Bento
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Daniel Gonzalez
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Joanna Betts
- Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; GSK, Medicine Research Centre, Stevenage, SG1 2NY UK
| | - Samiul Hasan
- Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; GSK, Medicine Research Centre, Stevenage, SG1 2NY UK
| | - Gautier Koscielny
- Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; GSK, Medicine Research Centre, Stevenage, SG1 2NY UK
| | - Ian Dunham
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Helen Parkinson
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - James Malone
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK ; Centre for Therapeutic Target Validation, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| |
Collapse
|
19
|
Mulder N, Nembaware V, Adekile A, Anie KA, Inusa B, Brown B, Campbell A, Chinenere F, Chunda-Liyoka C, Derebail VK, Geard A, Ghedira K, Hamilton CM, Hanchard NA, Haendel M, Huggins W, Ibrahim M, Jupp S, Kamga KK, Knight-Madden J, Lopez-Sall P, Mbiyavanga M, Munube D, Nirenberg D, Nnodu O, Ofori-Acquah SF, Ohene-Frempong K, Opap KB, Panji S, Park M, Pule G, Royal C, Sangeda R, Tayo B, Treadwell M, Tshilolo L, Wonkam A. Proceedings of a Sickle Cell Disease Ontology workshop - Towards the first comprehensive ontology for Sickle Cell Disease. Appl Transl Genom 2016; 9:23-9. [PMID: 27354937 PMCID: PMC4911424 DOI: 10.1016/j.atg.2016.03.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Revised: 03/11/2016] [Accepted: 03/11/2016] [Indexed: 11/20/2022]
Abstract
Sickle cell disease (SCD) is a debilitating single gene disorder caused by a single point mutation that results in physical deformation (i.e. sickling) of erythrocytes at reduced oxygen tensions. Up to 75% of SCD in newborns world-wide occurs in sub-Saharan Africa, where neonatal and childhood mortality from sickle cell related complications is high. While SCD research across the globe is tackling the disease on multiple fronts, advances have yet to significantly impact on the health and quality of life of SCD patients, due to lack of coordination of these disparate efforts. Ensuring data across studies is directly comparable through standardization is a necessary step towards realizing this goal. Such a standardization requires the development and implementation of a disease-specific ontology for SCD that is applicable globally. Ontology development is best achieved by bringing together experts in the domain to contribute their knowledge. The SCD community and H3ABioNet members joined forces at a recent SCD Ontology workshop to develop an ontology covering aspects of SCD under the classes: phenotype, diagnostics, therapeutics, quality of life, disease modifiers and disease stage. The aim of the workshop was for participants to contribute their expertise to development of the structure and contents of the SCD ontology. Here we describe the proceedings of the Sickle Cell Disease Ontology Workshop held in Cape Town South Africa in February 2016 and its outcomes. The objective of the workshop was to bring together experts in SCD from around the world to contribute their expertise to the development of various aspects of the SCD ontology.
Collapse
Affiliation(s)
- Nicola Mulder
- H3ABioNet Consortium, Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, South Africa
| | - Victoria Nembaware
- H3ABioNet Consortium, Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, South Africa
| | - Adekunle Adekile
- Department of Pediatrics, Faculty of Medicine, Kuwait University, Kuwait City, Kuwait
| | - Kofi A Anie
- London North West Healthcare NHS Trust & Imperial College London, London, United Kingdom
| | - Baba Inusa
- Evelina Children's Hospital, Guy's and St Thomas NHS Trust, London, United Kingdom
| | - Biobele Brown
- Department of Paediatrics, College of Medicine, University of Ibadan, Ibadan, Nigeria
| | - Andrew Campbell
- Pediatric Hematology/Oncology and Center for Human Growth and Development, University of Michigan, Ann Arbor, MI, United States
| | | | - Catherine Chunda-Liyoka
- University Teaching Hospital (UTH), Lusaka, Zambia; University of Zambia (UNZA) School of medicine, Lusaka, Zambia
| | - Vimal K Derebail
- Division of Nephrology and Hypertension, Department of Medicine, UNC Kidney Center, University of North Carolina at Chapel Hill, NC, United States
| | - Amy Geard
- Division of Human Genetics, Department of Clinical Laboratory Sciences, National Health Laboratory Service and University of Cape Town, 7925, South Africa
| | - Kais Ghedira
- Université de Tunis El Manar, Institut Pasteur de Tunis, LR11IPT06 Laboratory of medical parasitology, biotechnologies and biomolecules, Group of Bioinformatics and mathematical modeling, Tunis, Tunisia
| | - Carol M Hamilton
- Research Computing Division, RTI International, Research Triangle Park, NC, United States
| | - Neil A Hanchard
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, United States
| | - Melissa Haendel
- Oregon Health and Science University, Portland, OR, United States
| | - Wayne Huggins
- Research Computing Division, RTI International, Research Triangle Park, NC, United States
| | | | - Simon Jupp
- European Bioinformatics Institute, London, United Kingdom
| | | | | | - Philomène Lopez-Sall
- Department of Pharmacy, Biochemistry Unit, , Cheikh Anta Diop University, Dakar, Senegal
| | - Mamana Mbiyavanga
- H3ABioNet Consortium, Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, South Africa
| | - Deogratias Munube
- Department of Paediatrics and Child Health, College of Health Sciences, Makerere University/Mulago Hospital, Kampala, Uganda
| | - Damian Nirenberg
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, United States
| | - Obiageli Nnodu
- Centre of Excellence for Sickle Cell Disease Research and Training, University of Abuja, Abuja, Nigeria
| | - Solomon Fiifi Ofori-Acquah
- Center for Translational and International Hematology, Vascular Medicine Institute, University of Pittsburgh, Pittsburgh, PA, United States
| | | | - Kenneth Babu Opap
- H3ABioNet Consortium, Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, South Africa
| | - Sumir Panji
- H3ABioNet Consortium, Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, South Africa
| | - Miriam Park
- Instituto da Criança, Hospital das Clínicas, São Paulo Medical School, University of São Paulo, Brazil
| | - Gift Pule
- Division of Human Genetics, Department of Clinical Laboratory Sciences, National Health Laboratory Service and University of Cape Town, 7925, South Africa
| | | | | | - Bamidele Tayo
- Loyola University Chicago, Chicago, IL, United States
| | - Marsha Treadwell
- UCSF Benioff Children's Hospital Oakland, Oakland, CA, United States
| | | | - Ambroise Wonkam
- Division of Human Genetics, Department of Clinical Laboratory Sciences, National Health Laboratory Service and University of Cape Town, 7925, South Africa
| |
Collapse
|
20
|
Petryszak R, Keays M, Tang YA, Fonseca NA, Barrera E, Burdett T, Füllgrabe A, Fuentes AMP, Jupp S, Koskinen S, Mannion O, Huerta L, Megy K, Snow C, Williams E, Barzine M, Hastings E, Weisser H, Wright J, Jaiswal P, Huber W, Choudhary J, Parkinson HE, Brazma A. Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res 2016; 44:D746-52. [PMID: 26481351 PMCID: PMC4702781 DOI: 10.1093/nar/gkv1045] [Citation(s) in RCA: 396] [Impact Index Per Article: 49.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Revised: 09/25/2015] [Accepted: 09/29/2015] [Indexed: 11/12/2022] Open
Abstract
Expression Atlas (http://www.ebi.ac.uk/gxa) provides information about gene and protein expression in animal and plant samples of different cell types, organism parts, developmental stages, diseases and other conditions. It consists of selected microarray and RNA-sequencing studies from ArrayExpress, which have been manually curated, annotated with ontology terms, checked for high quality and processed using standardised analysis methods. Since the last update, Atlas has grown seven-fold (1572 studies as of August 2015), and incorporates baseline expression profiles of tissues from Human Protein Atlas, GTEx and FANTOM5, and of cancer cell lines from ENCODE, CCLE and Genentech projects. Plant studies constitute a quarter of Atlas data. For genes of interest, the user can view baseline expression in tissues, and differential expression for biologically meaningful pairwise comparisons-estimated using consistent methodology across all of Atlas. Our first proteomics study in human tissues is now displayed alongside transcriptomics data in the same tissues. Novel analyses and visualisations include: 'enrichment' in each differential comparison of GO terms, Reactome, Plant Reactome pathways and InterPro domains; hierarchical clustering (by baseline expression) of most variable genes and experimental conditions; and, for a given gene-condition, distribution of baseline expression across biological replicates.
Collapse
Affiliation(s)
- Robert Petryszak
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Maria Keays
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Y Amy Tang
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Nuno A Fonseca
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Elisabet Barrera
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Anja Füllgrabe
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | | | - Simon Jupp
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Satu Koskinen
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Oliver Mannion
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Laura Huerta
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Karine Megy
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Catherine Snow
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Eleanor Williams
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Mitra Barzine
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Emma Hastings
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | | | | | | | - Wolfgang Huber
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | | | - Helen E Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, UK
| |
Collapse
|
21
|
Jupp S, Malone J, Bolleman J, Brandizi M, Davies M, Garcia L, Gaulton A, Gehant S, Laibe C, Redaschi N, Wimalaratne SM, Martin M, Le Novère N, Parkinson H, Birney E, Jenkinson AM. The EBI RDF platform: linked open data for the life sciences. Bioinformatics 2014; 30:1338-9. [PMID: 24413672 PMCID: PMC3998127 DOI: 10.1093/bioinformatics/btt765] [Citation(s) in RCA: 117] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Motivation: Resource description framework (RDF) is an emerging technology for describing, publishing and linking life science data. As a major provider of bioinformatics data and services, the European Bioinformatics Institute (EBI) is committed to making data readily accessible to the community in ways that meet existing demand. The EBI RDF platform has been developed to meet an increasing demand to coordinate RDF activities across the institute and provides a new entry point to querying and exploring integrated resources available at the EBI. Availability:http://www.ebi.ac.uk/rdf Contact:jupp@ebi.ac.uk
Collapse
Affiliation(s)
- Simon Jupp
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1211 Geneve, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Petryszak R, Burdett T, Fiorelli B, Fonseca NA, Gonzalez-Porta M, Hastings E, Huber W, Jupp S, Keays M, Kryvych N, McMurry J, Marioni JC, Malone J, Megy K, Rustici G, Tang AY, Taubert J, Williams E, Mannion O, Parkinson HE, Brazma A. Expression Atlas update--a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments. Nucleic Acids Res 2013; 42:D926-32. [PMID: 24304889 PMCID: PMC3964963 DOI: 10.1093/nar/gkt1270] [Citation(s) in RCA: 251] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of 'baseline' expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful 'contrasts', i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.
Collapse
Affiliation(s)
- Robert Petryszak
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Hinxton, CB10 1SD, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Stevens R, Jupp S, Klein J, Schanstra J. Using semantic web technologies to manage complexity and change in biomedical data. Annu Int Conf IEEE Eng Med Biol Soc 2012; 2011:3708-11. [PMID: 22255145 DOI: 10.1109/iembs.2011.6090629] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Data in biomedicine are characterised by their complexity, volatility and heterogeneity. It is these characteristics, rather than size of the data, that make managing these data an issue for their analysis. Any significant data analysis task requires gathering data from many places, organising the relationships between the data's entities and overcoming the issues of recognising the nature of each entity such that this organisation can take place. It is the inter-relationship of these data and the semantic confusion inherent in the data that make the data complex. On top of this we have volatility in the domain's data, knowledge and experimental techniques that make the processing of data from the domain a distinct challenge, even before those data are organised. In this article we describe these challenges with respect to a project that is using data mining techniques to analyse data from the kidney and urinary pathway (KUP) domain. We are using Semantic Web technologies to manage the complexity and change in our data and we report on our experiences in this project.
Collapse
Affiliation(s)
- Robert Stevens
- School of Computer Science, University of Manchester, Oxford Road, Manchester, United Kingdom
| | | | | | | |
Collapse
|
24
|
Abstract
MOTIVATION Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other analytical activities. The bio-ontologies community, in particular the Open Biomedical Ontologies (OBO) community, have provided many other ontologies and an increasingly large volume of annotations of gene products that can be exploited in query and analysis. As many annotations with different ontologies centre upon gene products, there is a possibility to explore gene products through multiple ontological perspectives at the same time. Questions could be asked that link a gene product's function, process, cellular location, phenotype and disease. Current tools, such as AmiGO, allow exploration of genes based on their GO annotations, but not through multiple ontological perspectives. In addition, the semantics of these ontology's representations should be able to, through automated reasoning, afford richer query opportunities of the gene product annotations than is currently possible. RESULTS To do this multi-perspective, richer querying of gene product annotations, we have created the Logical Gene Ontology, or GOAL ontology, in OWL that combines the Gene Ontology, Human Disease Ontology and the Mammalian Phenotype Ontology, together with classes that represent the annotations with these ontologies for mouse gene products. Each mouse gene product is represented as a class, with the appropriate relationships to the GO aspects, phenotype and disease with which it has been annotated. We then use defined classes to query these protein classes through automated reasoning, and to build a complex hierarchy of gene products. We have presented this through a Web interface that allows arbitrary queries to be constructed and the results displayed. CONCLUSION This standard use of OWL affords a rich interaction with Gene Ontology, Human Disease Ontology and Mammalian Phenotype Ontology annotations for the mouse, to give a fine partitioning of the gene products in the GOAL ontology. OWL in combination with automated reasoning can be effectively used to query across ontologies to ask biologically rich questions. We have demonstrated that automated reasoning can be used to deliver practical on-line querying support for the ontology annotations available for the mouse. AVAILABILITY The GOAL Web page is to be found at http://owl.cs.manchester.ac.uk/goal.
Collapse
Affiliation(s)
- Simon Jupp
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
| | - Robert Stevens
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Robert Hoehndorf
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| |
Collapse
|
25
|
Klein J, Jupp S, Moulos P, Fernandez M, Buffin‐Meyer B, Casemayou A, Chaaya R, Charonis A, Bascands J, Stevens R, Schanstra JP. The KUPKB: a novel Web application to access multiomics data on kidney disease. FASEB J 2012; 26:2145-53. [DOI: 10.1096/fj.11-194381] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Julie Klein
- Institut National de la Santé et de la Recherche Médicale, U1048Institut of Cardiovascular and Metabolic DiseaseToulouseFrance
- Université Toulouse III Paul‐SabatierToulouseFrance
| | - Simon Jupp
- School of Computer ScienceUniversity of ManchesterManchesterUK
| | - Panagiotis Moulos
- Institut National de la Santé et de la Recherche Médicale, U1048Institut of Cardiovascular and Metabolic DiseaseToulouseFrance
- Université Toulouse III Paul‐SabatierToulouseFrance
| | - Myriem Fernandez
- Institut National de la Santé et de la Recherche Médicale, U1048Institut of Cardiovascular and Metabolic DiseaseToulouseFrance
- Université Toulouse III Paul‐SabatierToulouseFrance
| | - Bénédicte Buffin‐Meyer
- Institut National de la Santé et de la Recherche Médicale, U1048Institut of Cardiovascular and Metabolic DiseaseToulouseFrance
- Université Toulouse III Paul‐SabatierToulouseFrance
| | - Audrey Casemayou
- Institut National de la Santé et de la Recherche Médicale, U1048Institut of Cardiovascular and Metabolic DiseaseToulouseFrance
- Université Toulouse III Paul‐SabatierToulouseFrance
| | - Rana Chaaya
- Institut National de la Santé et de la Recherche Médicale, U1048Institut of Cardiovascular and Metabolic DiseaseToulouseFrance
- Université Toulouse III Paul‐SabatierToulouseFrance
| | - Aristidis Charonis
- Section of HistologyCenter for Basic Research I, Biomedical Research Foundation of the Academy of AthensAthensGreece
| | - Jean‐Loup Bascands
- Institut National de la Santé et de la Recherche Médicale, U1048Institut of Cardiovascular and Metabolic DiseaseToulouseFrance
- Université Toulouse III Paul‐SabatierToulouseFrance
| | - Robert Stevens
- School of Computer ScienceUniversity of ManchesterManchesterUK
| | - Joost P. Schanstra
- Institut National de la Santé et de la Recherche Médicale, U1048Institut of Cardiovascular and Metabolic DiseaseToulouseFrance
- Université Toulouse III Paul‐SabatierToulouseFrance
| |
Collapse
|
26
|
Abstract
BACKGROUND Ontologies are being developed for the life sciences to standardise the way we describe and interpret the wealth of data currently being generated. As more ontology based applications begin to emerge, tools are required that enable domain experts to contribute their knowledge to the growing pool of ontologies. There are many barriers that prevent domain experts engaging in the ontology development process and novel tools are needed to break down these barriers to engage a wider community of scientists. RESULTS We present Populous, a tool for gathering content with which to construct an ontology. Domain experts need to add content, that is often repetitive in its form, but without having to tackle the underlying ontological representation. Populous presents users with a table based form in which columns are constrained to take values from particular ontologies. Populated tables are mapped to patterns that can then be used to automatically generate the ontology's content. These forms can be exported as spreadsheets, providing an interface that is much more familiar to many biologists. CONCLUSIONS Populous's contribution is in the knowledge gathering stage of ontology development; it separates knowledge gathering from the conceptualisation and axiomatisation, as well as separating the user from the standard ontology authoring environments. Populous is by no means a replacement for standard ontology editing tools, but instead provides a useful platform for engaging a wider community of scientists in the mass production of ontology content.
Collapse
Affiliation(s)
- Simon Jupp
- Bio-Health Informatics Group, School of Computer Science, Kilburn Building, Oxford Road, Manchester, UK, M13 9PL
| | - Matthew Horridge
- Bio-Health Informatics Group, School of Computer Science, Kilburn Building, Oxford Road, Manchester, UK, M13 9PL
| | - Luigi Iannone
- Bio-Health Informatics Group, School of Computer Science, Kilburn Building, Oxford Road, Manchester, UK, M13 9PL
| | - Julie Klein
- Inserm U1048, Institute of Metabolic and Cardiovascular Diseases - I2MC, 1 avenue Jean Poulhés, B.P. 84225, 31432 Toulouse Cedex 4, France
| | - Stuart Owen
- Bio-Health Informatics Group, School of Computer Science, Kilburn Building, Oxford Road, Manchester, UK, M13 9PL
| | - Joost Schanstra
- Inserm U1048, Institute of Metabolic and Cardiovascular Diseases - I2MC, 1 avenue Jean Poulhés, B.P. 84225, 31432 Toulouse Cedex 4, France
| | - Katy Wolstencroft
- Bio-Health Informatics Group, School of Computer Science, Kilburn Building, Oxford Road, Manchester, UK, M13 9PL
| | - Robert Stevens
- Bio-Health Informatics Group, School of Computer Science, Kilburn Building, Oxford Road, Manchester, UK, M13 9PL
| |
Collapse
|
27
|
Abstract
BACKGROUND Chronic renal disease is a global health problem. The identification of suitable biomarkers could facilitate early detection and diagnosis and allow better understanding of the underlying pathology. One of the challenges in meeting this goal is the necessary integration of experimental results from multiple biological levels for further analysis by data mining. Data integration in the life science is still a struggle, and many groups are looking to the benefits promised by the Semantic Web for data integration. RESULTS We present a Semantic Web approach to developing a knowledge base that integrates data from high-throughput experiments on kidney and urine. A specialised KUP ontology is used to tie the various layers together, whilst background knowledge from external databases is incorporated by conversion into RDF. Using SPARQL as a query mechanism, we are able to query for proteins expressed in urine and place these back into the context of genes expressed in regions of the kidney. CONCLUSIONS The KUPKB gives KUP biologists the means to ask queries across many resources in order to aggregate knowledge that is necessary for answering biological questions. The Semantic Web technologies we use, together with the background knowledge from the domain's ontologies, allows both rapid conversion and integration of this knowledge base. The KUPKB is still relatively small, but questions remain about scalability, maintenance and availability of the knowledge itself. AVAILABILITY The KUPKB may be accessed via http://www.e-lico.eu/kupkb.
Collapse
Affiliation(s)
- Simon Jupp
- School of Computer Science, University of Manchester, UK
| | - Julie Klein
- Institut National de la Santé et de la Recherche Médicale (INSERM), U858, Toulouse, France
- Université Toulouse III Paul-Sabatier, I2MR, IFR150, Toulouse, France
| | - Joost Schanstra
- Institut National de la Santé et de la Recherche Médicale (INSERM), U858, Toulouse, France
- Université Toulouse III Paul-Sabatier, I2MR, IFR150, Toulouse, France
| | - Robert Stevens
- School of Computer Science, University of Manchester, UK
| |
Collapse
|
28
|
Abstract
Ontology construction for any domain is a labour intensive and complex process. Any methodology that can reduce the cost and increase efficiency has the potential to make a major impact in the life sciences. This paper describes an experiment in ontology construction from text for the animal behaviour domain. Our objective was to see how much could be done in a simple and relatively rapid manner using a corpus of journal papers. We used a sequence of pre-existing text processing steps, and here describe the different choices made to clean the input, to derive a set of terms and to structure those terms in a number of hierarchies. We describe some of the challenges, especially that of focusing the ontology appropriately given a starting point of a heterogeneous corpus. Using mainly automated techniques, we were able to construct an 18055 term ontology-like structure with 73% recall of animal behaviour terms, but a precision of only 26%. We were able to clean unwanted terms from the nascent ontology using lexico-syntactic patterns that tested the validity of term inclusion within the ontology. We used the same technique to test for subsumption relationships between the remaining terms to add structure to the initially broad and shallow structure we generated. All outputs are available at . We present a systematic method for the initial steps of ontology or structured vocabulary construction for scientific domains that requires limited human effort and can make a contribution both to ontology learning and maintenance. The method is useful both for the exploration of a scientific domain and as a stepping stone towards formally rigourous ontologies. The filtering of recognised terms from a heterogeneous corpus to focus upon those that are the topic of the ontology is identified to be one of the main challenges for research in ontology learning.
Collapse
Affiliation(s)
- Christopher Brewster
- Aston Business School, Aston University, Aston Triangle, Birmingham, B4 7ET, UK.
| | | | | | | | | | | |
Collapse
|
29
|
Bertani G, Ljungquist E, Jagusztyn-Krynicka K, Jupp S. Defective particle assembly in wild type P2 bacteriophage and its correction by the lg mutation. J Gen Virol 1978; 38:251-61. [PMID: 627873 DOI: 10.1099/0022-1317-38-2-251] [Citation(s) in RCA: 30] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The mutation lg of phage P2 has been located on the genetic map of P2 to the right of, and closely linked to, the del2 deletion, probably within tail gene F. The lg mutation causes larger burst sizes, compared with the wild type, especially at high incubation temperatures. The frequency of defective particles is lower in preparations of P2 lg than in those of wild type P2. It seems that the mutation lg improves the efficiency of particle assembly.
Collapse
|