1
|
Gill EE, Jia B, Murall CL, Poujol R, Anwar MZ, John NS, Richardsson J, Hobb A, Olabode AS, Lepsa A, Duggan AT, Tyler AD, N'Guessan A, Kachru A, Chan B, Yoshida C, Yung CK, Bujold D, Andric D, Su E, Griffiths EJ, Domselaar GV, Jolly GW, Ward HKE, Feher H, Baker J, Simpson JT, Uddin J, Ragoussis J, Eubank J, Fritz JH, Gálvez JH, Fang K, Cullion K, Rivera L, Xiang L, Croxen MA, Shiell M, Prystajecky N, Quirion PO, Bajari R, Rich S, Mubareka S, Moreira S, Cain S, Sutcliffe SG, Kraemer SA, Joly Y, Alturmessov Y, Consortium C, Consortium C, Academic VDP, Network H, Fiume M, Snutch TP, Bell C, Lopez-Correa C, Hussin JG, Joy JB, Colijn C, Gordon PMK, Hsiao WWL, Poon AFY, Knox NC, Courtot M, Stein L, Otto SP, Bourque G, Shapiro BJ, Brinkman FSL. The Canadian VirusSeq Data Portal & Duotang: open resources for SARS-CoV-2 viral sequences and genomic epidemiology. ArXiv 2024:arXiv:2405.04734v1. [PMID: 38764594 PMCID: PMC11100916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Subscribe] [Scholar Register] [Indexed: 05/21/2024]
Abstract
The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). The Portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. Here we also highlight Duotang, a web platform that presents genomic epidemiology and modeling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the Portal (COVID-MVP, CoVizu), are all open-source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.
Collapse
|
2
|
Wittner R, Holub P, Mascia C, Frexia F, Müller H, Plass M, Allocca C, Betsou F, Burdett T, Cancio I, Chapman A, Chapman M, Courtot M, Curcin V, Eder J, Elliot M, Exter K, Goble C, Golebiewski M, Kisler B, Kremer A, Leo S, Lin‐Gibson S, Marsano A, Mattavelli M, Moore J, Nakae H, Perseil I, Salman A, Sluka J, Soiland‐Reyes S, Strambio‐De‐Castillia C, Sussman M, Swedlow JR, Zatloukal K, Geiger J. Toward a common standard for data and specimen provenance in life sciences. Learn Health Syst 2024; 8:e10365. [PMID: 38249839 PMCID: PMC10797572 DOI: 10.1002/lrh2.10365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 03/17/2023] [Accepted: 03/24/2023] [Indexed: 01/23/2024] Open
Abstract
Open and practical exchange, dissemination, and reuse of specimens and data have become a fundamental requirement for life sciences research. The quality of the data obtained and thus the findings and knowledge derived is thus significantly influenced by the quality of the samples, the experimental methods, and the data analysis. Therefore, a comprehensive and precise documentation of the pre-analytical conditions, the analytical procedures, and the data processing are essential to be able to assess the validity of the research results. With the increasing importance of the exchange, reuse, and sharing of data and samples, procedures are required that enable cross-organizational documentation, traceability, and non-repudiation. At present, this information on the provenance of samples and data is mostly either sparse, incomplete, or incoherent. Since there is no uniform framework, this information is usually only provided within the organization and not interoperably. At the same time, the collection and sharing of biological and environmental specimens increasingly require definition and documentation of benefit sharing and compliance to regulatory requirements rather than consideration of pure scientific needs. In this publication, we present an ongoing standardization effort to provide trustworthy machine-actionable documentation of the data lineage and specimens. We would like to invite experts from the biotechnology and biomedical fields to further contribute to the standard.
Collapse
Affiliation(s)
- Rudolf Wittner
- BBMRI‐ERICGrazAustria
- Institute of Computer Science & Faculty of InformaticsMasaryk UniversityBrnoCzechia
| | - Petr Holub
- BBMRI‐ERICGrazAustria
- Institute of Computer Science & Faculty of InformaticsMasaryk UniversityBrnoCzechia
| | - Cecilia Mascia
- CRS4—Center for Advanced StudiesResearch and Development in SardiniaPulaItaly
| | - Francesca Frexia
- CRS4—Center for Advanced StudiesResearch and Development in SardiniaPulaItaly
| | | | | | - Clare Allocca
- National Institute of Standards and TechnologyGaithersburgMarylandUSA
| | - Fay Betsou
- Biological Resource Center of Institut Pasteur (CRBIP)ParisFrance
| | - Tony Burdett
- EMBL's European Bioinformatics Institute (EMBL‐EBI)CambridgeUK
| | - Ibon Cancio
- Plentzia Marine Station (PiE‐UPV/EHU)University of the Basque Country, EMBRC‐SpainBilbaoSpain
| | | | | | | | | | | | - Mark Elliot
- Department of Social Statistics, School of Social SciencesUniversity of ManchesterManchesterUK
| | - Katrina Exter
- Flanders Marine Institute (VLIZ), EMBRC‐BelgiumOstendBelgium
| | - Carole Goble
- Department of Computer ScienceUniversity of ManchesterManchesterUK
| | - Martin Golebiewski
- Heidelberg Institute for Theoretical Studies (HITS gGmbH)HeidelbergGermany
| | | | | | - Simone Leo
- CRS4—Center for Advanced StudiesResearch and Development in SardiniaPulaItaly
| | | | - Anna Marsano
- Department of BiomedicineUniversity of BaselBaselSwitzerland
| | - Marco Mattavelli
- SCI‐STI‐MMÉcole Politechnique Fédérale de LausanneLausanneSwitzerland
| | - Josh Moore
- Centre for Gene Regulation and Expression and Division of Computational Biology, School of Life SciencesUniversity of DundeeDundeeUK
- German BioImaging–Gesellschaft für Mikroskopie und Bildanalyse e.V.KonstanzGermany
| | - Hiroki Nakae
- Japan bio‐Measurement and Analysis ConsortiumTokyoJapan
| | - Isabelle Perseil
- INSERM–Institut National de la Sante et de la Recherche MedicaleParisFrance
| | - Ayat Salman
- Standards Council of CanadaOttawaOntarioCanada
- Canadian Primary Care Sentinel Surveillance Network (CPCSSN) Department of Family MedicineQueen's UniversityKingstonOntarioCanada
| | - James Sluka
- Biocomplexity InstituteIndiana UniversityBloomingtonIndianaUSA
| | - Stian Soiland‐Reyes
- Department of Computer ScienceUniversity of ManchesterManchesterUK
- Informatics InstituteUniversity of AmsterdamAmsterdamThe Netherlands
| | | | - Michael Sussman
- US Department of AgricultureWashingtonDistrict of ColumbiaUSA
| | - Jason R. Swedlow
- Centre for Gene Regulation and Expression and Division of Computational Biology, School of Life SciencesUniversity of DundeeDundeeUK
| | | | - Jörg Geiger
- Interdisciplinary Bank of Biomaterials and Data Würzburg (ibdw)WürzburgGermany
| |
Collapse
|
3
|
Xu F, Juty N, Goble C, Jupp S, Parkinson H, Courtot M. Features of a FAIR vocabulary. J Biomed Semantics 2023; 14:6. [PMID: 37264430 DOI: 10.1186/s13326-023-00286-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 04/27/2023] [Indexed: 06/03/2023] Open
Abstract
BACKGROUND The Findable, Accessible, Interoperable and Reusable(FAIR) Principles explicitly require the use of FAIR vocabularies, but what precisely constitutes a FAIR vocabulary remains unclear. Being able to define FAIR vocabularies, identify features of FAIR vocabularies, and provide assessment approaches against the features can guide the development of vocabularies. RESULTS We differentiate data, data resources and vocabularies used for FAIR, examine the application of the FAIR Principles to vocabularies, align their requirements with the Open Biomedical Ontologies principles, and propose FAIR Vocabulary Features. We also design assessment approaches for FAIR vocabularies by mapping the FVFs with existing FAIR assessment indicators. Finally, we demonstrate how they can be used for evaluating and improving vocabularies using exemplary biomedical vocabularies. CONCLUSIONS Our work proposes features of FAIR vocabularies and corresponding indicators for assessing the FAIR levels of different types of vocabularies, identifies use cases for vocabulary engineers, and guides the evolution of vocabularies.
Collapse
Affiliation(s)
- Fuqi Xu
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SD, UK
| | - Nick Juty
- The University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
| | - Carole Goble
- The University of Manchester, Oxford Rd, Manchester, M13 9PL, UK
| | - Simon Jupp
- SciBite BioData Innovation Centre, Wellcome Genome Campus, Hinxton, CB10 1DR, UK
| | - Helen Parkinson
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SD, UK
| | - Mélanie Courtot
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, Hinxton, CB10 1SD, UK.
- Ontario Institute for Cancer Research, 661 University Ave Suite 510, Toronto, M5G 0A3, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada.
| |
Collapse
|
4
|
Beier S, Fiebig A, Pommier C, Liyanage I, Lange M, Kersey PJ, Weise S, Finkers R, Koylass B, Cezard T, Courtot M, Contreras-Moreira B, Naamati G, Dyer S, Scholz U. Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR. F1000Res 2022; 11. [PMID: 35811804 PMCID: PMC9218589 DOI: 10.12688/f1000research.109080.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/17/2022] [Indexed: 11/20/2022] Open
Abstract
In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. They form a basis for the proposed VCF extensions here. We have learned from the existing application of VCF that the definition of relevant metadata using controlled standards, vocabulary and the consistent use of cross-references via resolvable identifiers (machine-readable) are particularly necessary and propose their encoding. VCF is an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant data (for example, the HapMap and the gVCF formats), but none currently have the reach of VCF. For the sake of simplicity, we will only discuss VCF and our recommendations for its use, but these recommendations could also be applied to gVCF. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains.
Collapse
Affiliation(s)
- Sebastian Beier
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
- Institute of Bio- and Geosciences, Bioinformatics (IBG-4), Forschungszentrum Jülich GmbH, Jülich, 52425, Germany
| | - Anne Fiebig
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Cyril Pommier
- BioinfOmics, Plant bioinformatics facility, Université Paris-Saclay, INRAE, Versailles, France
| | - Isuru Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Matthias Lange
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | | | - Stephan Weise
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
- Gennovation B.V., Wageningen, The Netherlands
| | - Baron Koylass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Timothee Cezard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Bruno Contreras-Moreira
- Laboratorio de Biología Computacional y Estructural, Estación Experimental Aula Dei-CSIC, Zaragoza, 50059, Spain
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Uwe Scholz
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| |
Collapse
|
5
|
Liyanage I, Burdett T, Droesbeke B, Erdos K, Fernandez R, Gray A, Haseeb M, Jupp S, Penim F, Pommier C, Rocca-Serra P, Courtot M, Coppens F. ELIXIR biovalidator for semantic validation of life science metadata. Bioinformatics 2022; 38:3141-3142. [PMID: 35380605 PMCID: PMC9154242 DOI: 10.1093/bioinformatics/btac195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 02/25/2022] [Accepted: 04/01/2022] [Indexed: 01/14/2023] Open
Abstract
SUMMARY To advance biomedical research, increasingly large amounts of complex data need to be discovered and integrated. This requires syntactic and semantic validation to ensure shared understanding of relevant entities. This article describes the ELIXIR biovalidator, which extends the syntactic validation of the widely used AJV library with ontology-based validation of JSON documents. AVAILABILITY AND IMPLEMENTATION Source code: https://github.com/elixir-europe/biovalidator, Release: v1.9.1, License: Apache License 2.0, Deployed at: https://www.ebi.ac.uk/biosamples/schema/validator/validate. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Isuru Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Bert Droesbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium,VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
| | - Karoly Erdos
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Rolando Fernandez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Alasdair Gray
- Department of Computer Science, Heriot-Watt University, Edinburgh EH14 4AS, UK
| | - Muhammad Haseeb
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Simon Jupp
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Flavia Penim
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Cyril Pommier
- INRAE, BioinfOmics, Plant Bioinformatics Facility, Université Paris-Saclay, 78026 Versailles, France,INRAE, URGI, Université Paris-Saclay, 78026 Versailles, France
| | - Philippe Rocca-Serra
- Department of Engineering Science, University of Oxford e-Research Centre, University of Oxford, Oxford OX1 3QG, UK
| | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK,Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada,To whom correspondence should be addressed.
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium,VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
| |
Collapse
|
6
|
Cabili MN, Lawson J, Saltzman A, Rushton G, O’Rourke P, Wilbanks J, Rodriguez LL, Nyronen T, Courtot M, Donnelly S, Philippakis AA. Empirical validation of an automated approach to data use oversight. Cell Genom 2021; 1:100031. [PMID: 36778584 PMCID: PMC9903839 DOI: 10.1016/j.xgen.2021.100031] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 06/30/2021] [Accepted: 08/07/2021] [Indexed: 10/19/2022]
Abstract
The current paradigm for data use oversight of biomedical datasets is onerous, extending the timescale and resources needed to obtain access for secondary analyses, thus hindering scientific discovery. For a researcher to utilize a controlled-access dataset, a data access committee must review her research plans to determine whether they are consistent with the data use limitations (DULs) specified by the informed consent form. The newly created GA4GH data use ontology (DUO) holds the potential to streamline this process by making data use oversight computable. Here, we describe an open-source software platform, the Data Use Oversight System (DUOS), that connects with DUO terminology to enable automated data use oversight. We analyze dbGaP data acquired since 2006, finding an exponential increase in data access requests, which will not be sustainable with current manual oversight review. We perform an empirical evaluation of DUOS and DUO on selected datasets from the Broad Institute's data repository. We were able to structure 118/123 of the evaluated DULs (96%) and 52/52 (100%) of research proposals using DUO terminology, and we find that DUOS' automated data access adjudication in all cases agreed with the DAC manual review. This first empirical evaluation of the feasibility of automated data use oversight demonstrates comparable accuracy to human-based data access oversight in real-world data governance.
Collapse
Affiliation(s)
- Moran N. Cabili
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jonathan Lawson
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Andrea Saltzman
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Greg Rushton
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | | | | - Tommi Nyronen
- ELIXIR Finland, CSC - IT Center for Science, Espoo, Finland
| | - Mélanie Courtot
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Stacey Donnelly
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA,Corresponding author
| | - Anthony A. Philippakis
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA,Corresponding author
| |
Collapse
|
7
|
Lawson J, Cabili MN, Kerry G, Boughtwood T, Thorogood A, Alper P, Bowers SR, Boyles RR, Brookes AJ, Brush M, Burdett T, Clissold H, Donnelly S, Dyke SO, Freeberg MA, Haendel MA, Hata C, Holub P, Jeanson F, Jene A, Kawashima M, Kawashima S, Konopko M, Kyomugisha I, Li H, Linden M, Rodriguez LL, Morita M, Mulder N, Muller J, Nagaie S, Nasir J, Ogishima S, Ota Wang V, Paglione LD, Pandya RN, Parkinson H, Philippakis AA, Prasser F, Rambla J, Reinold K, Rushton GA, Saltzman A, Saunders G, Sofia HJ, Spalding JD, Swertz MA, Tulchinsky I, van Enckevort EJ, Varma S, Voisin C, Yamamoto N, Yamasaki C, Zass L, Guidry Auvil JM, Nyrönen TH, Courtot M. The Data Use Ontology to streamline responsible access to human biomedical datasets. Cell Genom 2021; 1:None. [PMID: 34820659 PMCID: PMC8591903 DOI: 10.1016/j.xgen.2021.100028] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Revised: 07/02/2021] [Accepted: 08/09/2021] [Indexed: 11/25/2022]
Abstract
Human biomedical datasets that are critical for research and clinical studies to benefit human health also often contain sensitive or potentially identifying information of individual participants. Thus, care must be taken when they are processed and made available to comply with ethical and regulatory frameworks and informed consent data conditions. To enable and streamline data access for these biomedical datasets, the Global Alliance for Genomics and Health (GA4GH) Data Use and Researcher Identities (DURI) work stream developed and approved the Data Use Ontology (DUO) standard. DUO is a hierarchical vocabulary of human and machine-readable data use terms that consistently and unambiguously represents a dataset's allowable data uses. DUO has been implemented by major international stakeholders such as the Broad and Sanger Institutes and is currently used in annotation of over 200,000 datasets worldwide. Using DUO in data management and access facilitates researchers' discovery and access of relevant datasets. DUO annotations increase the FAIRness of datasets and support data linkages using common data use profiles when integrating the data for secondary analyses. DUO is implemented in the Web Ontology Language (OWL) and, to increase community awareness and engagement, hosted in an open, centralized GitHub repository. DUO, together with the GA4GH Passport standard, offers a new, efficient, and streamlined data authorization and access framework that has enabled increased sharing of biomedical datasets worldwide.
Collapse
Affiliation(s)
- Jonathan Lawson
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Moran N. Cabili
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Giselle Kerry
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Tiffany Boughtwood
- Australian Genomics, Murdoch Children’s Research Institute, Parkville, VIC, Australia
| | - Adrian Thorogood
- Centre of Genomics and Policy, Department of Human Genetics, McGill University, Montreal, QC, Canada,ELIXIR-Luxembourg, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Pinar Alper
- ELIXIR-Luxembourg, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | | | | | | | - Matthew Brush
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Tony Burdett
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Hayley Clissold
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Stacey Donnelly
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Stephanie O.M. Dyke
- McGill Centre for Integrative Neuroscience, Montreal Neurological Institute, Department of Neurology & Neurosurgery, Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Mallory A. Freeberg
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | - Chihiro Hata
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Japan
| | - Petr Holub
- BBMRI-ERIC, AT and Masaryk University, Brno, Czech Republic
| | | | - Aina Jene
- Centre de Regulació Genòmica (CRG), Barcelona, Spain
| | - Minae Kawashima
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
| | - Shuichi Kawashima
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa, Japan
| | | | - Irene Kyomugisha
- Division of Human Genetics, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Haoyuan Li
- Canada’s Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
| | - Mikael Linden
- ELIXIR-Finland, CSC - IT Center for Science Ltd, Espoo, Finland
| | | | | | - Nicola Mulder
- Computational Biology Division, IDM, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jean Muller
- Laboratoire de Génétique Médicale, Institut de Génétique Médicale d’Alsace, INSERM U1112, Université; de Strasbourg, Strasbourg, France,Laboratoire de Diagnostic Génétique, Institut de Génétique Médicale d’Alsace, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Satoshi Nagaie
- Tohoku Medical Megabank Organization (ToMMo), Tohoku University, Sendai, Japan
| | - Jamal Nasir
- Department of Life Sciences, University of Northampton, Northampton, UK
| | - Soichi Ogishima
- Tohoku Medical Megabank Organization (ToMMo), Tohoku University, Sendai, Japan
| | - Vivian Ota Wang
- Office of Data Sharing, National Cancer Institute, NIH, Rockville, MD, USA
| | | | | | - Helen Parkinson
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Anthony A. Philippakis
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Fabian Prasser
- Berlin Institute of Health at Charité—Universitätsmedizin Berlin, Berlin, Germany
| | - Jordi Rambla
- Centre de Regulació Genòmica (CRG), Barcelona, Spain
| | - Kathy Reinold
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gregory A. Rushton
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Andrea Saltzman
- Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | - Heidi J. Sofia
- National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - John D. Spalding
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Morris A. Swertz
- Genomics Coordination Center, Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | | | - Esther J. van Enckevort
- Genomics Coordination Center, Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Susheel Varma
- Health Data Research UK, Gibbs Building, 215 Euston Road, London NW1 2BE, UK
| | | | | | | | - Lyndon Zass
- Computational Biology Division, IDM, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | | | | | - Mélanie Courtot
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL-EBI), Hinxton, UK,Corresponding author
| |
Collapse
|
8
|
Rehm HL, Page AJ, Smith L, Adams JB, Alterovitz G, Babb LJ, Barkley MP, Baudis M, Beauvais MJ, Beck T, Beckmann JS, Beltran S, Bernick D, Bernier A, Bonfield JK, Boughtwood TF, Bourque G, Bowers SR, Brookes AJ, Brudno M, Brush MH, Bujold D, Burdett T, Buske OJ, Cabili MN, Cameron DL, Carroll RJ, Casas-Silva E, Chakravarty D, Chaudhari BP, Chen SH, Cherry JM, Chung J, Cline M, Clissold HL, Cook-Deegan RM, Courtot M, Cunningham F, Cupak M, Davies RM, Denisko D, Doerr MJ, Dolman LI, Dove ES, Dursi LJ, Dyke SO, Eddy JA, Eilbeck K, Ellrott KP, Fairley S, Fakhro KA, Firth HV, Fitzsimons MS, Fiume M, Flicek P, Fore IM, Freeberg MA, Freimuth RR, Fromont LA, Fuerth J, Gaff CL, Gan W, Ghanaim EM, Glazer D, Green RC, Griffith M, Griffith OL, Grossman RL, Groza T, Guidry Auvil JM, Guigó R, Gupta D, Haendel MA, Hamosh A, Hansen DP, Hart RK, Hartley DM, Haussler D, Hendricks-Sturrup RM, Ho CW, Hobb AE, Hoffman MM, Hofmann OM, Holub P, Hsu JS, Hubaux JP, Hunt SE, Husami A, Jacobsen JO, Jamuar SS, Janes EL, Jeanson F, Jené A, Johns AL, Joly Y, Jones SJ, Kanitz A, Kato K, Keane TM, Kekesi-Lafrance K, Kelleher J, Kerry G, Khor SS, Knoppers BM, Konopko MA, Kosaki K, Kuba M, Lawson J, Leinonen R, Li S, Lin MF, Linden M, Liu X, Liyanage IU, Lopez J, Lucassen AM, Lukowski M, Mann AL, Marshall J, Mattioni M, Metke-Jimenez A, Middleton A, Milne RJ, Molnár-Gábor F, Mulder N, Munoz-Torres MC, Nag R, Nakagawa H, Nasir J, Navarro A, Nelson TH, Niewielska A, Nisselle A, Niu J, Nyrönen TH, O’Connor BD, Oesterle S, Ogishima S, Ota Wang V, Paglione LA, Palumbo E, Parkinson HE, Philippakis AA, Pizarro AD, Prlic A, Rambla J, Rendon A, Rider RA, Robinson PN, Rodarmer KW, Rodriguez LL, Rubin AF, Rueda M, Rushton GA, Ryan RS, Saunders GI, Schuilenburg H, Schwede T, Scollen S, Senf A, Sheffield NC, Skantharajah N, Smith AV, Sofia HJ, Spalding D, Spurdle AB, Stark Z, Stein LD, Suematsu M, Tan P, Tedds JA, Thomson AA, Thorogood A, Tickle TL, Tokunaga K, Törnroos J, Torrents D, Upchurch S, Valencia A, Guimera RV, Vamathevan J, Varma S, Vears DF, Viner C, Voisin C, Wagner AH, Wallace SE, Walsh BP, Williams MS, Winkler EC, Wold BJ, Wood GM, Woolley JP, Yamasaki C, Yates AD, Yung CK, Zass LJ, Zaytseva K, Zhang J, Goodhand P, North K, Birney E. GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom 2021; 1:100029. [PMID: 35072136 PMCID: PMC8774288 DOI: 10.1016/j.xgen.2021.100029] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.
Collapse
Affiliation(s)
- Heidi L. Rehm
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Massachusetts General Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Angela J.H. Page
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Global Alliance for Genomics and Health, Toronto, ON, Canada
| | - Lindsay Smith
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Jeremy B. Adams
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Gil Alterovitz
- Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | | | - Michael Baudis
- University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Michael J.S. Beauvais
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- McGill University, Montreal, QC, Canada
| | - Tim Beck
- University of Leicester, Leicester, UK
| | | | - Sergi Beltran
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Universitat de Barcelona, Barcelona, Spain
| | - David Bernick
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Tiffany F. Boughtwood
- Australian Genomics, Parkville, VIC, Australia
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
| | - Guillaume Bourque
- McGill University, Montreal, QC, Canada
- Canadian Center for Computational Genomics, Montreal, QC, Canada
| | | | | | - Michael Brudno
- Canadian Center for Computational Genomics, Montreal, QC, Canada
- University of Toronto, Toronto, ON, Canada
- University Health Network, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
- Canadian Distributed Infrastructure for Genomics (CanDIG), Toronto, ON, Canada
| | | | - David Bujold
- McGill University, Montreal, QC, Canada
- Canadian Center for Computational Genomics, Montreal, QC, Canada
- Canadian Distributed Infrastructure for Genomics (CanDIG), Toronto, ON, Canada
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | - Daniel L. Cameron
- Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
| | | | | | | | - Bimal P. Chaudhari
- Nationwide Children’s Hospital, Columbus, OH, USA
- The Ohio State University, Columbus, OH, USA
| | - Shu Hui Chen
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Justina Chung
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Melissa Cline
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | | | | | | | | | - L. Jonathan Dursi
- University Health Network, Toronto, ON, Canada
- Canadian Distributed Infrastructure for Genomics (CanDIG), Toronto, ON, Canada
| | | | | | | | | | - Susan Fairley
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Khalid A. Fakhro
- Sidra Medicine, Doha, Qatar
- Weill Cornell Medicine - Qatar, Doha, Qatar
| | - Helen V. Firth
- Wellcome Sanger Institute, Hinxton, UK
- Addenbrooke’s Hospital, Cambridge, UK
| | | | | | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Ian M. Fore
- National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mallory A. Freeberg
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | - Lauren A. Fromont
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | | | - Clara L. Gaff
- Australian Genomics, Parkville, VIC, Australia
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
| | - Weiniu Gan
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Elena M. Ghanaim
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - David Glazer
- Verily Life Sciences, South San Francisco, CA, USA
| | - Robert C. Green
- Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Malachi Griffith
- Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | - Obi L. Griffith
- Washington University School of Medicine in St. Louis, St. Louis, MO, USA
| | | | | | | | - Roderic Guigó
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Dipayan Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | - Ada Hamosh
- Johns Hopkins University, Baltimore, MD, USA
| | - David P. Hansen
- Australian Genomics, Parkville, VIC, Australia
- The Australian e-Health Research Centre, CSIRO, Herston, QLD, Australia
| | - Reece K. Hart
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Invitae, San Francisco, CA, USA
- MyOme, Inc, San Bruno, CA, USA
| | | | - David Haussler
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, USA
| | | | | | | | - Michael M. Hoffman
- University of Toronto, Toronto, ON, Canada
- University Health Network, Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Oliver M. Hofmann
- University of Toronto, Toronto, ON, Canada
- University of Melbourne, Melbourne, VIC, Australia
| | - Petr Holub
- BBMRI-ERIC, Graz, Austria
- Masaryk University, Brno, Czech Republic
| | | | | | - Sarah E. Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Ammar Husami
- Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | | | - Saumya S. Jamuar
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Republic of Singapore
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore, Republic of Singapore
| | - Elizabeth L. Janes
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- University of Waterloo, Waterloo, ON, Canada
| | | | - Aina Jené
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Amber L. Johns
- Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Yann Joly
- McGill University, Montreal, QC, Canada
| | - Steven J.M. Jones
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Alexander Kanitz
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University of Basel, Basel, Switzerland
| | | | - Thomas M. Keane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
- University of Nottingham, Nottingham, UK
| | - Kristina Kekesi-Lafrance
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- McGill University, Montreal, QC, Canada
| | | | - Giselle Kerry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Seik-Soon Khor
- National Center for Global Health and Medicine Hospital, Tokyo, Japan
- University of Tokyo, Tokyo, Japan
| | | | | | | | | | | | - Rasko Leinonen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Stephanie Li
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Global Alliance for Genomics and Health, Toronto, ON, Canada
| | | | - Mikael Linden
- CSC–IT Center for Science, Espoo, Finland
- ELIXIR Finland, Espoo, Finland
| | | | - Isuru Udara Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | | | - Alice L. Mann
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Wellcome Sanger Institute, Hinxton, UK
| | | | | | | | - Anna Middleton
- Wellcome Connecting Science, Hinxton, UK
- University of Cambridge, Cambridge, UK
| | - Richard J. Milne
- Wellcome Connecting Science, Hinxton, UK
- University of Cambridge, Cambridge, UK
| | | | - Nicola Mulder
- H3ABioNet, Computational Biology Division, IDM, Faculty of Health Sciences, Cape Town, South Africa
| | | | - Rishi Nag
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Hidewaki Nakagawa
- Japan Agency for Medical Research & Development (AMED), Tokyo, Japan
- RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | | | - Arcadi Navarro
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
| | | | - Ania Niewielska
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Amy Nisselle
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
- Human Genetics Society of Australasia Education, Ethics & Social Issues Committee, Alexandria, NSW, Australia
| | - Jeffrey Niu
- University Health Network, Toronto, ON, Canada
| | - Tommi H. Nyrönen
- CSC–IT Center for Science, Espoo, Finland
- ELIXIR Finland, Espoo, Finland
| | | | - Sabine Oesterle
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Vivian Ota Wang
- National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Emilio Palumbo
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Helen E. Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | | | | | | | - Jordi Rambla
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | | | - Renee A. Rider
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Peter N. Robinson
- The Jackson Laboratory, Farmington, CT, USA
- University of Connecticut, Farmington, CT, USA
| | - Kurt W. Rodarmer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Alan F. Rubin
- Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
| | - Manuel Rueda
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | | | | | | | - Helen Schuilenburg
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Torsten Schwede
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- University of Basel, Basel, Switzerland
| | | | | | | | - Neerjah Skantharajah
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | | | - Heidi J. Sofia
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Dylan Spalding
- CSC–IT Center for Science, Espoo, Finland
- ELIXIR Finland, Espoo, Finland
| | | | - Zornitza Stark
- Australian Genomics, Parkville, VIC, Australia
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
| | - Lincoln D. Stein
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- University of Toronto, Toronto, ON, Canada
| | | | - Patrick Tan
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Republic of Singapore
- Precision Health Research Singapore, Singapore, Republic of Singapore
- Genome Institute of Singapore, Singapore, Republic of Singapore
| | | | - Alastair A. Thomson
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Adrian Thorogood
- McGill University, Montreal, QC, Canada
- University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | | | - Katsushi Tokunaga
- University of Tokyo, Tokyo, Japan
- National Center for Global Health and Medicine, Tokyo, Japan
| | - Juha Törnroos
- CSC–IT Center for Science, Espoo, Finland
- ELIXIR Finland, Espoo, Finland
| | - David Torrents
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
- Barcelona Supercomputing Center, Barcelona, Spain
| | - Sean Upchurch
- California Institute of Technology, Pasadena, CA, USA
| | - Alfonso Valencia
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
- Barcelona Supercomputing Center, Barcelona, Spain
| | | | - Jessica Vamathevan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Susheel Varma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
- Health Data Research UK, London, UK
| | - Danya F. Vears
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- University of Melbourne, Melbourne, VIC, Australia
- Human Genetics Society of Australasia Education, Ethics & Social Issues Committee, Alexandria, NSW, Australia
- Melbourne Law School, University of Melbourne, Parkville, VIC, Australia
| | - Coby Viner
- University of Toronto, Toronto, ON, Canada
- University Health Network, Toronto, ON, Canada
| | | | - Alex H. Wagner
- Nationwide Children’s Hospital, Columbus, OH, USA
- The Ohio State University, Columbus, OH, USA
| | | | | | | | - Eva C. Winkler
- Section of Translational Medical Ethics, University Hospital Heidelberg, Heidelberg, Germany
| | | | | | | | | | - Andrew D. Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Christina K. Yung
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Indoc Research, Toronto, ON, Canada
| | - Lyndon J. Zass
- H3ABioNet, Computational Biology Division, IDM, Faculty of Health Sciences, Cape Town, South Africa
| | - Ksenia Zaytseva
- McGill University, Montreal, QC, Canada
- Canadian Centre for Computational Genomics, Montreal, QC, Canada
| | - Junjun Zhang
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Peter Goodhand
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Kathryn North
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
- University of Toronto, Toronto, ON, Canada
- University of Melbourne, Melbourne, VIC, Australia
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
- European Molecular Biology Laboratory, Heidelberg, Germany
| |
Collapse
|
9
|
Voisin C, Linden M, Dyke SO, Bowers SR, Alper P, Barkley MP, Bernick D, Chao J, Courtot M, Jeanson F, Konopko MA, Kuba M, Lawson J, Leinonen J, Li S, Ota Wang V, Philippakis AA, Reinold K, Rushton GA, Spalding JD, Törnroos J, Tulchinsky I, Guidry Auvil JM, Nyrönen TH. GA4GH Passport standard for digital identity and access permissions. Cell Genom 2021; 1:None. [PMID: 34820660 PMCID: PMC8591913 DOI: 10.1016/j.xgen.2021.100030] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 07/08/2021] [Accepted: 09/02/2021] [Indexed: 12/21/2022]
Abstract
The Global Alliance for Genomics and Health (GA4GH) supports international standards that enable a federated data sharing model for the research community while respecting data security, ethical and regulatory frameworks, and data authorization and access processes for sensitive data. The GA4GH Passport standard (Passport) defines a machine-readable digital identity that conveys roles and data access permissions (called "visas") for individual users. Visas are issued by data stewards, including data access committees (DACs) working with public databases, the entities responsible for the quality, integrity, and access arrangements for the datasets in the management of human biomedical data. Passports streamline management of data access rights across data systems by using visas that present a data user's digital identity and permissions across organizations, tools, environments, and services. We describe real-world implementations of the GA4GH Passport standard in use cases from ELIXIR Europe, National Institutes of Health, and the Autism Sharing Initiative. These implementations demonstrate that the Passport standard has provided transparent mechanisms for establishing permissions and authorizing data access across platforms.
Collapse
Affiliation(s)
- Craig Voisin
- Google LLC, Kitchener, ON N2H 5G5, Canada,Corresponding author
| | - Mikael Linden
- CSC–IT Center for Science, Espoo 02101, Finland,ELIXIR Finland, Espoo 02101, Finland
| | - Stephanie O.M. Dyke
- McGill Centre for Integrative Neuroscience, McGill University, Montreal, QC H3A 2B4, Canada
| | | | - Pinar Alper
- ELIXIR Luxembourg, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 4367 Belvaux, Luxembourg
| | | | - David Bernick
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | | | - Melissa A. Konopko
- Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK,Global Alliance for Genomics and Health, Toronto, ON M5G 0A3, Canada
| | - Martin Kuba
- Masaryk University, Brno 602 00, Czech Republic
| | - Jonathan Lawson
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Stephanie Li
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA,Global Alliance for Genomics and Health, Toronto, ON M5G 0A3, Canada
| | - Vivian Ota Wang
- National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | | - Kathy Reinold
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - J. Dylan Spalding
- CSC–IT Center for Science, Espoo 02101, Finland,ELIXIR Finland, Espoo 02101, Finland
| | - Juha Törnroos
- CSC–IT Center for Science, Espoo 02101, Finland,ELIXIR Finland, Espoo 02101, Finland
| | | | | | - Tommi H. Nyrönen
- CSC–IT Center for Science, Espoo 02101, Finland,ELIXIR Finland, Espoo 02101, Finland,Corresponding author
| |
Collapse
|
10
|
Courtot M, Gupta D, Liyanage I, Xu F, Burdett T. BioSamples database: FAIRer samples metadata to accelerate research data management. Nucleic Acids Res 2021; 50:D1500-D1507. [PMID: 34747489 PMCID: PMC8728232 DOI: 10.1093/nar/gkab1046] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/13/2021] [Accepted: 10/14/2021] [Indexed: 12/04/2022] Open
Abstract
The BioSamples database at EMBL-EBI is the central institutional repository for sample metadata storage and connection to EMBL-EBI archives and other resources. The technical improvements to our infrastructure described in our last update have enabled us to scale and accommodate an increasing number of communities, resulting in a higher number of submissions and more heterogeneous data. The BioSamples database now has a valuable set of features and processes to improve data quality in BioSamples, and in particular enriching metadata content and following FAIR principles. In this manuscript, we describe how BioSamples in 2021 handles requirements from our community of users through exemplar use cases: increased findability of samples and improved data management practices support the goals of the ReSOLUTE project, how the plant community benefits from being able to link genotypic to phenotypic information, and we highlight how cumulatively those improvements contribute to more complex multi-omics data integration supporting COVID-19 research. Finally, we present underlying technical features used as pillars throughout those use cases and how they are reused for expanded engagement with communities such as FAIRplus and the Global Alliance for Genomics and Health. Availability: The BioSamples database is freely available at http://www.ebi.ac.uk/biosamples. Content is distributed under the EMBL-EBI Terms of Use available at https://www.ebi.ac.uk/about/terms-of-use. The BioSamples code is available at https://github.com/EBIBioSamples/biosamples-v4 and distributed under the Apache 2.0 license.
Collapse
Affiliation(s)
- Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Dipayan Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Isuru Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Fuqi Xu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
11
|
Harrison PW, Ahamed A, Aslam R, Alako BTF, Burgin J, Buso N, Courtot M, Fan J, Gupta D, Haseeb M, Holt S, Ibrahim T, Ivanov E, Jayathilaka S, Balavenkataraman Kadhirvelu V, Kumar M, Lopez R, Kay S, Leinonen R, Liu X, O'Cathail C, Pakseresht A, Park Y, Pesant S, Rahman N, Rajan J, Sokolov A, Vijayaraja S, Waheed Z, Zyoud A, Burdett T, Cochrane G. The European Nucleotide Archive in 2020. Nucleic Acids Res 2021; 49:D82-D85. [PMID: 33175160 PMCID: PMC7778925 DOI: 10.1093/nar/gkaa1028] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 10/20/2020] [Indexed: 11/12/2022] Open
Abstract
The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena), provided by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), has for almost forty years continued in its mission to freely archive and present the world's public sequencing data for the benefit of the entire scientific community and for the acceleration of the global research effort. Here we highlight the major developments to ENA services and content in 2020, focussing in particular on the recently released updated ENA browser, modernisation of our release process and our data coordination collaborations with specific research communities.
Collapse
Affiliation(s)
- Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alisha Ahamed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Raheela Aslam
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Blaise T F Alako
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Josephine Burgin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicola Buso
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jun Fan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dipayan Gupta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Muhammad Haseeb
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sam Holt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Talal Ibrahim
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eugene Ivanov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Suran Jayathilaka
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Manish Kumar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rodrigo Lopez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Simon Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Rasko Leinonen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Xin Liu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Colman O'Cathail
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Amir Pakseresht
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Youngmi Park
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stephane Pesant
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nadim Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jeena Rajan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexey Sokolov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Senthilnathan Vijayaraja
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zahra Waheed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ahmad Zyoud
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Guy Cochrane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
12
|
Courtot M, Cherubin L, Faulconbridge A, Vaughan D, Green M, Richardson D, Harrison P, Whetzel PL, Parkinson H, Burdett T. BioSamples database: an updated sample metadata hub. Nucleic Acids Res 2020; 47:D1172-D1178. [PMID: 30407529 PMCID: PMC6323949 DOI: 10.1093/nar/gky1061] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 10/18/2018] [Indexed: 12/23/2022] Open
Abstract
The BioSamples database at EMBL-EBI provides a central hub for sample metadata storage and linkage to other EMBL-EBI resources. BioSamples has recently undergone major changes, both in terms of data content and supporting infrastructure. The data content has more than doubled from around 2 million samples in 2014 to just over 5 million samples in 2018. Fast, reciprocal data exchange was fully established between sister Biosample databases and other INSDC partners, enabling a worldwide common representation and centralization of sample metadata. The BioSamples platform has been upgraded to accommodate anticipated increases in the number of submissions via GA4GH driver projects such as the Human Cell Atlas and the EGA, as well as from mirroring of NCBI dbGaP data. The BioSamples database is now the authoritative repository for all INSDC sample metadata, an ELIXIR Deposition Database for Biomolecular Data and the EMBL-EBI sample metadata hub. To support faster turnaround for sample submission, and to increase scalability and resilience, we have upgraded the BioSamples database backend storage, APIs and user interface. Finally, the website has been redesigned to allow search and retrieval of records based on specific filters, such as ‘disease’ or ‘organism’. These changes are targeted at answering current use cases as well as providing functionalities for future emerging and anticipated developments. Availability: The BioSamples database is freely available at http://www.ebi.ac.uk/biosamples. Content is distributed under the EMBL-EBI Terms of Use available at https://www.ebi.ac.uk/about/terms-of-use.
Collapse
Affiliation(s)
| | - Luca Cherubin
- EMBL-EBI, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | | | | | - Matthew Green
- EMBL-EBI, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | | | | | | | | | - Tony Burdett
- EMBL-EBI, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
13
|
Jacobsen A, de Miranda Azevedo R, Juty N, Batista D, Coles S, Cornet R, Courtot M, Crosas M, Dumontier M, Evelo CT, Goble C, Guizzardi G, Hansen KK, Hasnain A, Hettne K, Heringa J, Hooft RW, Imming M, Jeffery KG, Kaliyaperumal R, Kersloot MG, Kirkpatrick CR, Kuhn T, Labastida I, Magagna B, McQuilton P, Meyers N, Montesanti A, van Reisen M, Rocca-Serra P, Pergl R, Sansone SA, da Silva Santos LOB, Schneider J, Strawn G, Thompson M, Waagmeester A, Weigel T, Wilkinson MD, Willighagen EL, Wittenburg P, Roos M, Mons B, Schultes E. FAIR Principles: Interpretations and Implementation Considerations. Data Intellegence 2020. [DOI: 10.1162/dint_r_00024] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The FAIR principles have been widely cited, endorsed and adopted by a broad range of stakeholders since their publication in 2016. By intention, the 15 FAIR guiding principles do not dictate specific technological implementations, but provide guidance for improving Findability, Accessibility, Interoperability and Reusability of digital resources. This has likely contributed to the broad adoption of the FAIR principles, because individual stakeholder communities can implement their own FAIR solutions. However, it has also resulted in inconsistent interpretations that carry the risk of leading to incompatible implementations. Thus, while the FAIR principles are formulated on a high level and may be interpreted and implemented in different ways, for true interoperability we need to support convergence in implementation choices that are widely accessible and (re)-usable. We introduce the concept of FAIR implementation considerations to assist accelerated global participation and convergence towards accessible, robust, widespread and consistent FAIR implementations. Any self-identified stakeholder community may either choose to reuse solutions from existing implementations, or when they spot a gap, accept the challenge to create the needed solution, which, ideally, can be used again by other communities in the future. Here, we provide interpretations and implementation considerations (choices and challenges) for each FAIR principle.
Collapse
Affiliation(s)
- Annika Jacobsen
- Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
| | - Ricardo de Miranda Azevedo
- Institute of Data Science, Maastricht University, Universiteitssingel 60, Maastricht 6229 ER, The Netherlands
| | - Nick Juty
- Department of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Dominique Batista
- Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, Oxford OX13PJ, UK
| | - Simon Coles
- School of Chemistry, Faculty of Engineering and Physical Sciences, University of Southampton, SO17 1BJ, UK
| | - Ronald Cornet
- Amsterdam UMC, University of Amsterdam, Amsterdam 1000 GG, The Netherlands
| | - Mélanie Courtot
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK
| | - Mercè Crosas
- Harvard University, Cambridge, Massachusetts 02138, USA
| | - Michel Dumontier
- Institute of Data Science, Maastricht University, Universiteitssingel 60, Maastricht 6229 ER, The Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics – BiGCaT, NUTRIM, Maastricht University, Maastricht 6229 ER, The Netherlands
| | - Carole Goble
- Department of Computer Science, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Giancarlo Guizzardi
- Conceptual and Cognitive Modeling Research Group (CORE), Free University of Bozen-Bolzano, Bolzano 39100, Italy
| | | | - Ali Hasnain
- Insight Centre for Data Analytics, National University of Ireland Galway, H91 TK33, Ireland
| | - Kristina Hettne
- Centre for Digital Scholarship, Leiden University Libraries, Leiden, 2333 ZA, The Netherlands
| | - Jaap Heringa
- Department of Computer Science, Vrije Universiteit Amsterdam, De Boelelaan 11051081 HV Amsterdam, The Netherlands
| | - Rob W.W. Hooft
- Department of Computer Science, Vrije Universiteit Amsterdam, De Boelelaan 11051081 HV Amsterdam, The Netherlands
- Dutch Techcentre for Life Sciences (DTL), Utrecht, The Netherlands
| | | | | | | | - Martijn G. Kersloot
- Amsterdam UMC, University of Amsterdam, Amsterdam 1000 GG, The Netherlands
- Castor EDC, Paasheuvelweg 25, Wing 5D, 1105 BP, Amsterdam, The Netherlands
| | - Christine R. Kirkpatrick
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, USA
| | - Tobias Kuhn
- Department of Computer Science, Vrije Universiteit Amsterdam, De Boelelaan 11051081 HV Amsterdam, The Netherlands
| | - Ignasi Labastida
- Learning and Research Resources Centre (CRAI), Universitat de Barcelona, 08007 Barcelona, Spain
| | | | - Peter McQuilton
- Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, Oxford OX13PJ, UK
| | | | | | - Mirjam van Reisen
- Liacs Institute of Advanced Computer Science, Leiden University, 2311 GJ Leiden, The Netherlands
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, Oxford OX13PJ, UK
| | - Robert Pergl
- Czech Technical University in Prague, Faculty of Information Technology (FIT CTU), 160 00 Prague 6, Czech Republic
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, Oxford OX13PJ, UK
| | | | - Juliane Schneider
- Harvard Catalyst
- Clinical and Translational Science Center, Boston, MA 02115, USA
| | - George Strawn
- US National Academy of Sciences, Washington DC 20418, USA
| | - Mark Thompson
- Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
| | | | - Tobias Weigel
- Deutsches Klimarechenzentrum, Bundesstrasse 45a, 20146 Hamburg, Germany
| | - Mark D. Wilkinson
- Center for Plant Biotechnology and Genomics UPM-INIA, Madrid 28040, Spain
| | - Egon L. Willighagen
- Department of Bioinformatics – BiGCaT, NUTRIM, Maastricht University, Maastricht 6229 ER, The Netherlands
| | - Peter Wittenburg
- Max Planck Computing and Data Facility, Gießenbachstraße 2, 85748 Garching, Germany
| | - Marco Roos
- Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
| | - Barend Mons
- Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
- GO FAIR International Support & Coordination Office (GFISCO), Leiden, The Netherlands
| | - Erik Schultes
- GO FAIR International Support & Coordination Office (GFISCO), Leiden, The Netherlands
- Leiden Center for Data Science, 2311 EZ Leiden, The Netherlands
| |
Collapse
|
14
|
McMurry JA, Juty N, Blomberg N, Burdett T, Conlin T, Conte N, Courtot M, Deck J, Dumontier M, Fellows DK, Gonzalez-Beltran A, Gormanns P, Grethe J, Hastings J, Hériché JK, Hermjakob H, Ison JC, Jimenez RC, Jupp S, Kunze J, Laibe C, Le Novère N, Malone J, Martin MJ, McEntyre JR, Morris C, Muilu J, Müller W, Rocca-Serra P, Sansone SA, Sariyar M, Snoep JL, Soiland-Reyes S, Stanford NJ, Swainston N, Washington N, Williams AR, Wimalaratne SM, Winfree LM, Wolstencroft K, Goble C, Mungall CJ, Haendel MA, Parkinson H. Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data. PLoS Biol 2017; 15:e2001414. [PMID: 28662064 PMCID: PMC5490878 DOI: 10.1371/journal.pbio.2001414] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
Collapse
Affiliation(s)
- Julie A. McMurry
- Department of Medical Informatics and Epidemiology and OHSU Library, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Nick Juty
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Niklas Blomberg
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Tony Burdett
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Tom Conlin
- Department of Medical Informatics and Epidemiology and OHSU Library, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Nathalie Conte
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Mélanie Courtot
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - John Deck
- Berkeley Natural History Museums, University of California at Berkeley, Berkely, California, United States of America
| | - Michel Dumontier
- Institute of Data Science, Maastricht University, Maastricht, the Netherlands
| | - Donal K. Fellows
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | | | - Philipp Gormanns
- Institute of Experimental Genetics, Helmholtz Centre Munich, German Research Center for Environmental Health, Neuherberg, Germany
| | - Jeffrey Grethe
- Center for Research in Biological Systems, University of California San Diego, La Jolla, California, United States of America
| | | | | | - Henning Hermjakob
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Jon C. Ison
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Rafael C. Jimenez
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Simon Jupp
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - John Kunze
- California Digital Library, Oakland, California, United States of America
| | - Camille Laibe
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | - James Malone
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Maria Jesus Martin
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Johanna R. McEntyre
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Chris Morris
- Science and Technology Facilities Council, Daresbury Laboratory, Warrington, United Kingdom
| | - Juha Muilu
- Genomics Coordination Center, Department of Genetics, University Medical Center Groningen and Groningen Bioinformatics Center, University of Groningen, Groningen, the Netherlands
| | - Wolfgang Müller
- Scientific Databases and Visualization at Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | | | | | - Murat Sariyar
- Institute for Medical Informatics, Bern University of Applied Sciences, Engineering and Information Technology, Bern, Switzerland
| | - Jacky L. Snoep
- Manchester Institute of Biology, University of Manchester, Manchester, United Kingdom
- Department of Biochemistry, Stellenbosch University, Stellenbosch, South Africa
| | - Stian Soiland-Reyes
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | - Natalie J. Stanford
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | - Neil Swainston
- Manchester Centre for Synthetic Biology of Fine and Speciality Chemicals, University of Manchester, Manchester, United Kingdom
| | - Nicole Washington
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Alan R. Williams
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | - Sarala M. Wimalaratne
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Lilly M. Winfree
- Department of Medical Informatics and Epidemiology and OHSU Library, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Katherine Wolstencroft
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, the Netherlands
| | - Carole Goble
- School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Melissa A. Haendel
- Department of Medical Informatics and Epidemiology and OHSU Library, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Helen Parkinson
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
15
|
Jia B, Raphenya AR, Alcock B, Waglechner N, Guo P, Tsang KK, Lago BA, Dave BM, Pereira S, Sharma AN, Doshi S, Courtot M, Lo R, Williams LE, Frye JG, Elsayegh T, Sardar D, Westman EL, Pawlowski AC, Johnson TA, Brinkman FSL, Wright GD, McArthur AG. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic Acids Res 2016; 45:D566-D573. [PMID: 27789705 PMCID: PMC5210516 DOI: 10.1093/nar/gkw1004] [Citation(s) in RCA: 1501] [Impact Index Per Article: 187.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Revised: 09/30/2016] [Accepted: 10/17/2016] [Indexed: 11/30/2022] Open
Abstract
The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins and mutations involved in AMR. CARD is ontologically structured, model centric, and spans the breadth of AMR drug classes and resistance mechanisms, including intrinsic, mutation-driven and acquired resistance. It is built upon the Antibiotic Resistance Ontology (ARO), a custom built, interconnected and hierarchical controlled vocabulary allowing advanced data sharing and organization. Its design allows the development of novel genome analysis tools, such as the Resistance Gene Identifier (RGI) for resistome prediction from raw genome sequence. Recent improvements include extensive curation of additional reference sequences and mutations, development of a unique Model Ontology and accompanying AMR detection models to power sequence analysis, new visualization tools, and expansion of the RGI for detection of emergent AMR threats. CARD curation is updated monthly based on an interplay of manual literature curation, computational text mining, and genome analysis.
Collapse
Affiliation(s)
- Baofeng Jia
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Amogelang R Raphenya
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Brian Alcock
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Nicholas Waglechner
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Peiyao Guo
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Kara K Tsang
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Briony A Lago
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Biren M Dave
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Sheldon Pereira
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Arjun N Sharma
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Sachin Doshi
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Mélanie Courtot
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Raymond Lo
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Laura E Williams
- Bacterial Epidemiology and Antimicrobial Resistance Research Unit, USDA-ARS U.S. National Poultry Research Center, U.S. Department of Agriculture, Athens, GA 30605, USA
| | - Jonathan G Frye
- Bacterial Epidemiology and Antimicrobial Resistance Research Unit, USDA-ARS U.S. National Poultry Research Center, U.S. Department of Agriculture, Athens, GA 30605, USA
| | - Tariq Elsayegh
- School of Medicine, Royal College of Surgeons in Ireland, Dublin 2, Republic of Ireland
| | - Daim Sardar
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Erin L Westman
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Andrew C Pawlowski
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Timothy A Johnson
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Fiona S L Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Gerard D Wright
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| | - Andrew G McArthur
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, Ontario L8S 4K1, Canada
| |
Collapse
|
16
|
Ong E, Xiang Z, Zhao B, Liu Y, Lin Y, Zheng J, Mungall C, Courtot M, Ruttenberg A, He Y. Ontobee: A linked ontology data server to support ontology term dereferencing, linkage, query and integration. Nucleic Acids Res 2016; 45:D347-D352. [PMID: 27733503 PMCID: PMC5210626 DOI: 10.1093/nar/gkw918] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Revised: 09/28/2016] [Accepted: 10/05/2016] [Indexed: 11/13/2022] Open
Abstract
Linked Data (LD) aims to achieve interconnected data by representing entities using Unified Resource Identifiers (URIs), and sharing information using Resource Description Frameworks (RDFs) and HTTP. Ontologies, which logically represent entities and relations in specific domains, are the basis of LD. Ontobee (http://www.ontobee.org/) is a linked ontology data server that stores ontology information using RDF triple store technology and supports query, visualization and linkage of ontology terms. Ontobee is also the default linked data server for publishing and browsing biomedical ontologies in the Open Biological Ontology (OBO) Foundry (http://obofoundry.org) library. Ontobee currently hosts more than 180 ontologies (including 131 OBO Foundry Library ontologies) with over four million terms. Ontobee provides a user-friendly web interface for querying and visualizing the details and hierarchy of a specific ontology term. Using the eXtensible Stylesheet Language Transformation (XSLT) technology, Ontobee is able to dereference a single ontology term URI, and then output RDF/eXtensible Markup Language (XML) for computer processing or display the HTML information on a web browser for human users. Statistics and detailed information are generated and displayed for each ontology listed in Ontobee. In addition, a SPARQL web interface is provided for custom advanced SPARQL queries of one or multiple ontologies.
Collapse
Affiliation(s)
- Edison Ong
- University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Zuoshuang Xiang
- University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Bin Zhao
- University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Yue Liu
- University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Yu Lin
- University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Jie Zheng
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Chris Mungall
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Mélanie Courtot
- European Molecular Biology Laboratory-European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Yongqun He
- University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
17
|
Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, Chibucos MC, Clancy K, Courtot M, Derom D, Dumontier M, Fan L, Fostel J, Fragoso G, Gibson F, Gonzalez-Beltran A, Haendel MA, He Y, Heiskanen M, Hernandez-Boussard T, Jensen M, Lin Y, Lister AL, Lord P, Malone J, Manduchi E, McGee M, Morrison N, Overton JA, Parkinson H, Peters B, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Schober D, Smith B, Soldatova LN, Stoeckert CJ, Taylor CF, Torniai C, Turner JA, Vita R, Whetzel PL, Zheng J. The Ontology for Biomedical Investigations. PLoS One 2016; 11:e0154556. [PMID: 27128319 PMCID: PMC4851331 DOI: 10.1371/journal.pone.0154556] [Citation(s) in RCA: 133] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Accepted: 04/17/2016] [Indexed: 12/18/2022] Open
Abstract
The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed in association with OBI. The current release of OBI is available at http://purl.obolibrary.org/obo/obi.owl.
Collapse
Affiliation(s)
- Anita Bandrowski
- University of California San Diego, La Jolla, California, United States of America
| | - Ryan Brinkman
- British Columbia Cancer Research Centre, Vancouver, British Columbia, Canada
| | - Mathias Brochhausen
- University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Matthew H. Brush
- Oregon Health and Science University, Portland, Oregon, United States of America
| | - Bill Bug
- Drexel University College of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Marcus C. Chibucos
- University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Kevin Clancy
- Thermo Fisher Scientific, Carlsbad, California, United States of America
| | | | - Dirk Derom
- The Vrije Universiteit Brussel, Ixelles, Brussels, Belgium
| | - Michel Dumontier
- Stanford University, Stanford, California, United States of America
| | - Liju Fan
- Ontology Workshop, LLC, Columbia, Maryland, United States of America
| | - Jennifer Fostel
- National Toxicology Program, NIEHS, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Gilberto Fragoso
- Center for Biomedical Informatics and Information Technology, National Institutes of Health, Rockville, Maryland, United States of America
| | - Frank Gibson
- Royal Society of Chemistry, Cambridge, Cambridgeshire, United Kingdom
| | | | - Melissa A. Haendel
- Oregon Health and Science University, Portland, Oregon, United States of America
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Mervi Heiskanen
- National Cancer Institute, Rockville, Maryland, United States of America
| | | | - Mark Jensen
- University at Buffalo, Buffalo, New York, United States of America
| | - Yu Lin
- University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | | | - Phillip Lord
- Newcastle University, Newcastle-upon-Tyne, Tyne and Wear, United Kingdom
| | - James Malone
- European Molecular Biology Laboratory- European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom
| | - Elisabetta Manduchi
- University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Monnie McGee
- Southern Methodist University, Dallas, Texas, United States of America
| | - Norman Morrison
- The University of Manchester, Manchester, Greater Manchester, United Kingdom
| | - James A. Overton
- La Jolla Institute for Allergy and Immunology, La Jolla, California, United States of America
| | - Helen Parkinson
- European Molecular Biology Laboratory- European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom
| | - Bjoern Peters
- La Jolla Institute for Allergy and Immunology, La Jolla, California, United States of America
| | | | - Alan Ruttenberg
- University at Buffalo, Buffalo, New York, United States of America
| | | | | | - Daniel Schober
- Leibniz Institute of Plant Biochemistry, Halle, Saxony-Anhalt, Germany
| | - Barry Smith
- University at Buffalo, Buffalo, New York, United States of America
| | | | | | - Chris F. Taylor
- European Molecular Biology Laboratory- European Bioinformatics Institute, Hinxton, Cambridgeshire, United Kingdom
| | - Carlo Torniai
- Oregon Health and Science University, Portland, Oregon, United States of America
| | - Jessica A. Turner
- Georgia State University, Atlanta, Georgia, United States of America
| | - Randi Vita
- La Jolla Institute for Allergy and Immunology, La Jolla, California, United States of America
| | - Patricia L. Whetzel
- University of California San Diego, La Jolla, California, United States of America
| | - Jie Zheng
- University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
18
|
Kılıç S, Sagitova DM, Wolfish S, Bely B, Courtot M, Ciufo S, Tatusova T, O'Donovan C, Chibucos MC, Martin MJ, Erill I. From data repositories to submission portals: rethinking the role of domain-specific databases in CollecTF. Database (Oxford) 2016; 2016:baw055. [PMID: 27114493 PMCID: PMC4843526 DOI: 10.1093/database/baw055] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 03/20/2016] [Indexed: 11/12/2022]
Abstract
Domain-specific databases are essential resources for the biomedical community, leveraging expert knowledge to curate published literature and provide access to referenced data and knowledge. The limited scope of these databases, however, poses important challenges on their infrastructure, visibility, funding and usefulness to the broader scientific community. CollecTF is a community-oriented database documenting experimentally validated transcription factor (TF)-binding sites in the Bacteria domain. In its quest to become a community resource for the annotation of transcriptional regulatory elements in bacterial genomes, CollecTF aims to move away from the conventional data-repository paradigm of domain-specific databases. Through the adoption of well-established ontologies, identifiers and collaborations, CollecTF has progressively become also a portal for the annotation and submission of information on transcriptional regulatory elements to major biological sequence resources (RefSeq, UniProtKB and the Gene Ontology Consortium). This fundamental change in database conception capitalizes on the domain-specific knowledge of contributing communities to provide high-quality annotations, while leveraging the availability of stable information hubs to promote long-term access and provide high-visibility to the data. As a submission portal, CollecTF generates TF-binding site information through direct annotation of RefSeq genome records, definition of TF-based regulatory networks in UniProtKB entries and submission of functional annotations to the Gene Ontology. As a database, CollecTF provides enhanced search and browsing, targeted data exports, binding motif analysis tools and integration with motif discovery and search platforms. This innovative approach will allow CollecTF to focus its limited resources on the generation of high-quality information and the provision of specialized access to the data.Database URL: http://www.collectf.org/.
Collapse
Affiliation(s)
- Sefa Kılıç
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), 1000 Hilltop Circle, Baltimore, MD, 21250, USA
| | - Dinara M Sagitova
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), 1000 Hilltop Circle, Baltimore, MD, 21250, USA
| | - Shoshannah Wolfish
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Benoit Bely
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Stacy Ciufo
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, Rockville Pike, Bethesda, MD, 20894, USA
| | - Tatiana Tatusova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, Rockville Pike, Bethesda, MD, 20894, USA
| | - Claire O'Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Marcus C Chibucos
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland Baltimore County (UMBC), 1000 Hilltop Circle, Baltimore, MD, 21250, USA
| |
Collapse
|
19
|
Deans AR, Lewis SE, Huala E, Anzaldo SS, Ashburner M, Balhoff JP, Blackburn DC, Blake JA, Burleigh JG, Chanet B, Cooper LD, Courtot M, Csösz S, Cui H, Dahdul W, Das S, Dececchi TA, Dettai A, Diogo R, Druzinsky RE, Dumontier M, Franz NM, Friedrich F, Gkoutos GV, Haendel M, Harmon LJ, Hayamizu TF, He Y, Hines HM, Ibrahim N, Jackson LM, Jaiswal P, James-Zorn C, Köhler S, Lecointre G, Lapp H, Lawrence CJ, Le Novère N, Lundberg JG, Macklin J, Mast AR, Midford PE, Mikó I, Mungall CJ, Oellrich A, Osumi-Sutherland D, Parkinson H, Ramírez MJ, Richter S, Robinson PN, Ruttenberg A, Schulz KS, Segerdell E, Seltmann KC, Sharkey MJ, Smith AD, Smith B, Specht CD, Squires RB, Thacker RW, Thessen A, Fernandez-Triana J, Vihinen M, Vize PD, Vogt L, Wall CE, Walls RL, Westerfeld M, Wharton RA, Wirkner CS, Woolley JB, Yoder MJ, Zorn AM, Mabee P. Finding our way through phenotypes. PLoS Biol 2015; 13:e1002033. [PMID: 25562316 PMCID: PMC4285398 DOI: 10.1371/journal.pbio.1002033] [Citation(s) in RCA: 156] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.
Collapse
Affiliation(s)
- Andrew R. Deans
- Department of Entomology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Suzanna E. Lewis
- Genome Division, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Eva Huala
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California, United States of America
- Phoenix Bioinformatics, Palo Alto, California, United States of America
| | - Salvatore S. Anzaldo
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Michael Ashburner
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - James P. Balhoff
- National Evolutionary Synthesis Center, Durham, North Carolina, United States of America
| | - David C. Blackburn
- Department of Vertebrate Zoology and Anthropology, California Academy of Sciences, San Francisco, California, United States of America
| | - Judith A. Blake
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - J. Gordon Burleigh
- Department of Biology, University of Florida, Gainesville, Florida, United States of America
| | - Bruno Chanet
- Muséum national d'Histoire naturelle, Département Systématique et Evolution, Paris, France
| | - Laurel D. Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, United States of America
| | - Mélanie Courtot
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Sándor Csösz
- MTA-ELTE-MTM, Ecology Research Group, Pázmány Péter sétány 1C, Budapest, Hungary
| | - Hong Cui
- School of Information Resources and Library Science, University of Arizona, Tucson, Arizona, United States of America
| | - Wasila Dahdul
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi, India
| | - T. Alexander Dececchi
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Agnes Dettai
- Muséum national d'Histoire naturelle, Département Systématique et Evolution, Paris, France
| | - Rui Diogo
- Department of Anatomy, Howard University College of Medicine, Washington D.C., United States of America
| | - Robert E. Druzinsky
- Department of Oral Biology, College of Dentistry, University of Illinois, Chicago, Illinois, United States of America
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford, California, United States of America
| | - Nico M. Franz
- School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| | - Frank Friedrich
- Biocenter Grindel and Zoological Museum, Hamburg University, Hamburg, Germany
| | - George V. Gkoutos
- Department of Computer Science, Aberystwyth University, Aberystwyth, Ceredigion, United Kingdom
| | - Melissa Haendel
- Department of Medical Informatics & Epidemiology, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Luke J. Harmon
- Department of Biological Sciences, University of Idaho, Moscow, Idaho, United States of America
| | - Terry F. Hayamizu
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, and Comprehensive Cancer Center, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Heather M. Hines
- Department of Entomology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Nizar Ibrahim
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois, United States of America
| | - Laura M. Jackson
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, United States of America
| | - Christina James-Zorn
- Cincinnati Children's Hospital, Division of Developmental Biology, Cincinnati, Ohio, United States of America
| | - Sebastian Köhler
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Guillaume Lecointre
- Muséum national d'Histoire naturelle, Département Systématique et Evolution, Paris, France
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, Durham, North Carolina, United States of America
| | - Carolyn J. Lawrence
- Department of Genetics, Development and Cell Biology and Department of Agronomy, Iowa State University, Ames, Iowa, United States of America
| | | | - John G. Lundberg
- Department of Ichthyology, The Academy of Natural Sciences, Philadelphia, Pennsylvania, United States of America
| | - James Macklin
- Eastern Cereal and Oilseed Research Centre, Ottawa, Ontario, Canada
| | - Austin R. Mast
- Department of Biological Science, Florida State University, Tallahassee, Florida, United States of America
| | | | - István Mikó
- Department of Entomology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Christopher J. Mungall
- Genome Division, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Anika Oellrich
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - David Osumi-Sutherland
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Helen Parkinson
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Martín J. Ramírez
- Division of Arachnology, Museo Argentino de Ciencias Naturales - CONICET, Buenos Aires, Argentina
| | - Stefan Richter
- Allgemeine & Spezielle Zoologie, Institut für Biowissenschaften, Universität Rostock, Universitätsplatz 2, Rostock, Germany
| | - Peter N. Robinson
- Institut für Medizinische Genetik und Humangenetik Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Alan Ruttenberg
- School of Dental Medicine, University at Buffalo, Buffalo, New York, United States of America
| | - Katja S. Schulz
- Smithsonian Institution, National Museum of Natural History, Washington, D.C., United States of America
| | - Erik Segerdell
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, United States of America
| | - Katja C. Seltmann
- Division of Invertebrate Zoology, American Museum of Natural History, New York, New York, United States of America
| | - Michael J. Sharkey
- Department of Entomology, University of Kentucky, Lexington, Kentucky, United States of America
| | - Aaron D. Smith
- Department of Biological Sciences, Northern Arizona University, Flagstaff, Arizona, United States of America
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, New York, United States of America
| | - Chelsea D. Specht
- Department of Plant and Microbial Biology, Integrative Biology, and the University and Jepson Herbaria, University of California, Berkeley, California, United States of America
| | - R. Burke Squires
- Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Robert W. Thacker
- Department of Biology, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Anne Thessen
- The Data Detektiv, 1412 Stearns Hill Road, Waltham, Massachusetts, United States of America
| | | | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| | - Peter D. Vize
- Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada
| | - Lars Vogt
- Universität Bonn, Institut für Evolutionsbiologie und Ökologie, Bonn, Germany
| | - Christine E. Wall
- Department of Evolutionary Anthropology, Duke University, Durham, North Carolina, United States of America
| | - Ramona L. Walls
- iPlant Collaborative University of Arizona, Thomas J. Keating Bioresearch Building, Tucson, Arizona, United States of America
| | - Monte Westerfeld
- Institute of Neuroscience, University of Oregon, Eugene, Oregon, United States of America
| | - Robert A. Wharton
- Department of Entomology, Texas A & M University, College, Station, Texas, United States of America
| | - Christian S. Wirkner
- Allgemeine & Spezielle Zoologie, Institut für Biowissenschaften, Universität Rostock, Universitätsplatz 2, Rostock, Germany
| | - James B. Woolley
- Department of Entomology, Texas A & M University, College, Station, Texas, United States of America
| | - Matthew J. Yoder
- Illinois Natural History Survey, University of Illinois, Champaign, Illinois, United States of America
| | - Aaron M. Zorn
- Cincinnati Children's Hospital, Division of Developmental Biology, Cincinnati, Ohio, United States of America
| | - Paula Mabee
- Department of Biology, University of South Dakota, Vermillion, South Dakota, United States of America
| |
Collapse
|
20
|
Courtot M, Meskas J, Diehl AD, Droumeva R, Gottardo R, Jalali A, Taghiyar MJ, Maecker HT, McCoy JP, Ruttenberg A, Scheuermann RH, Brinkman RR. flowCL: ontology-based cell population labelling in flow cytometry. ACTA ACUST UNITED AC 2014; 31:1337-9. [PMID: 25481008 DOI: 10.1093/bioinformatics/btu807] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Accepted: 12/02/2014] [Indexed: 11/13/2022]
Abstract
MOTIVATION Finding one or more cell populations of interest, such as those correlating to a specific disease, is critical when analysing flow cytometry data. However, labelling of cell populations is not well defined, making it difficult to integrate the output of algorithms to external knowledge sources. RESULTS We developed flowCL, a software package that performs semantic labelling of cell populations based on their surface markers and applied it to labelling of the Federation of Clinical Immunology Societies Human Immunology Project Consortium lyoplate populations as a use case. CONCLUSION By providing automated labelling of cell populations based on their immunophenotype, flowCL allows for unambiguous and reproducible identification of standardized cell types. AVAILABILITY AND IMPLEMENTATION Code, R script and documentation are available under the Artistic 2.0 license through Bioconductor (http://www.bioconductor.org/packages/devel/bioc/html/flowCL.html). CONTACT rbrinkman@bccrc.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mélanie Courtot
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA
| | - Justin Meskas
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA
| | - Alexander D Diehl
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA
| | - Radina Droumeva
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA
| | - Raphael Gottardo
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA
| | - Adrin Jalali
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA
| | - Mohammad Jafar Taghiyar
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA
| | - Holden T Maecker
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA
| | - J Philip McCoy
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA. Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA
| | - Alan Ruttenberg
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA
| | - Richard H Scheuermann
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA. Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA
| | - Ryan R Brinkman
- Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC V5Z 1L3, Canada, Department of Neurology, University at Buffalo School of Medicine and Biomedical Sciences, Buffalo, NY 14203, USA, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA, Center for Human Immunology, Autoimmunity and Inflammation, National Institutes of Health, Bethesda, MD 20892, USA, School of Dental Medicine, University at Buffalo, NY 14214-8006, USA, J. Craig Venter Institute, La Jolla, CA 92037, USA, Department of Pathology, University of California, San Diego, CA 92093, USA
| |
Collapse
|
21
|
Courtot M, Brinkman RR, Ruttenberg A. The logic of surveillance guidelines: an analysis of vaccine adverse event reports from an ontological perspective. PLoS One 2014; 9:e92632. [PMID: 24667848 PMCID: PMC3965435 DOI: 10.1371/journal.pone.0092632] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 02/11/2014] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND When increased rates of adverse events following immunization are detected, regulatory action can be taken by public health agencies. However to be interpreted reports of adverse events must be encoded in a consistent way. Regulatory agencies rely on guidelines to help determine the diagnosis of the adverse events. Manual application of these guidelines is expensive, time consuming, and open to logical errors. Representing these guidelines in a format amenable to automated processing can make this process more efficient. METHODS AND FINDINGS Using the Brighton anaphylaxis case definition, we show that existing clinical guidelines used as standards in pharmacovigilance can be logically encoded using a formal representation such as the Adverse Event Reporting Ontology we developed. We validated the classification of vaccine adverse event reports using the ontology against existing rule-based systems and a manually curated subset of the Vaccine Adverse Event Reporting System. However, we encountered a number of critical issues in the formulation and application of the clinical guidelines. We report these issues and the steps being taken to address them in current surveillance systems, and in the terminological standards in use. CONCLUSIONS By standardizing and improving the reporting process, we were able to automate diagnosis confirmation. By allowing medical experts to prioritize reports such a system can accelerate the identification of adverse reactions to vaccines and the response of regulatory agencies. This approach of combining ontology and semantic technologies can be used to improve other areas of vaccine adverse event reports analysis and should inform both the design of clinical guidelines and how they are used in the future. AVAILABILITY Sufficient material to reproduce our results is available, including documentation, ontology, code and datasets, at http://purl.obolibrary.org/obo/aero.
Collapse
Affiliation(s)
| | - Ryan R. Brinkman
- BC Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Alan Ruttenberg
- School of Dental Medicine, University at Buffalo, New York, United States of America
| |
Collapse
|
22
|
Courtot M, Juty N, Knüpfer C, Waltemath D, Zhukova A, Dräger A, Dumontier M, Finney A, Golebiewski M, Hastings J, Hoops S, Keating S, Kell DB, Kerrien S, Lawson J, Lister A, Lu J, Machne R, Mendes P, Pocock M, Rodriguez N, Villeger A, Wilkinson DJ, Wimalaratne S, Laibe C, Hucka M, Le Novère N. Controlled vocabularies and semantics in systems biology. Mol Syst Biol 2011; 7:543. [PMID: 22027554 PMCID: PMC3261705 DOI: 10.1038/msb.2011.77] [Citation(s) in RCA: 214] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2011] [Accepted: 09/07/2011] [Indexed: 01/09/2023] Open
Abstract
The use of computational modeling to describe and analyze biological systems is at the heart of systems biology. This Perspective discusses the development and use of ontologies that are designed to add semantic information to computational models and simulations. The use of computational modeling to describe and analyze biological systems is at the heart of systems biology. Model structures, simulation descriptions and numerical results can be encoded in structured formats, but there is an increasing need to provide an additional semantic layer. Semantic information adds meaning to components of structured descriptions to help identify and interpret them unambiguously. Ontologies are one of the tools frequently used for this purpose. We describe here three ontologies created specifically to address the needs of the systems biology community. The Systems Biology Ontology (SBO) provides semantic information about the model components. The Kinetic Simulation Algorithm Ontology (KiSAO) supplies information about existing algorithms available for the simulation of systems biology models, their characterization and interrelationships. The Terminology for the Description of Dynamics (TEDDY) categorizes dynamical features of the simulation results and general systems behavior. The provision of semantic information extends a model's longevity and facilitates its reuse. It provides useful insight into the biology of modeled processes, and may be used to make informed decisions on subsequent simulation experiments.
Collapse
|
23
|
Abstract
Background Ontology development is a rapidly growing area of research, especially in the life sciences domain. To promote collaboration and interoperability between different projects, the OBO Foundry principles require that these ontologies be open and non-redundant, avoiding duplication of terms through the re-use of existing resources. As current options to do so present various difficulties, a new approach, MIREOT, allows specifying import of single terms. Initial implementations allow for controlled import of selected annotations and certain classes of related terms. Findings OntoFox http://ontofox.hegroup.org/ is a web-based system that allows users to input terms, fetch selected properties, annotations, and certain classes of related terms from the source ontologies and save the results using the RDF/XML serialization of the Web Ontology Language (OWL). Compared to an initial implementation of MIREOT, OntoFox allows additional and more easily configurable options for selecting and rewriting annotation properties, and for inclusion of all or a computed subset of terms between low and top level terms. Additional methods for including related classes include a SPARQL-based ontology term retrieval algorithm that extracts terms related to a given set of signature terms and an option to extract the hierarchy rooted at a specified ontology term. OntoFox's output can be directly imported into a developer's ontology. OntoFox currently supports term retrieval from a selection of 15 ontologies accessible via SPARQL endpoints and allows users to extend this by specifying additional endpoints. An OntoFox application in the development of the Vaccine Ontology (VO) is demonstrated. Conclusions OntoFox provides a timely publicly available service, providing different options for users to collect terms from external ontologies, making them available for reuse by import into client OWL ontologies.
Collapse
Affiliation(s)
- Zuoshuang Xiang
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | | | | | | | | |
Collapse
|
24
|
Brinkman RR, Courtot M, Derom D, Fostel JM, He Y, Lord P, Malone J, Parkinson H, Peters B, Rocca-Serra P, Ruttenberg A, Sansone SA, Soldatova LN, Stoeckert CJ, Turner JA, Zheng J. Modeling biomedical experimental processes with OBI. J Biomed Semantics 2010; 1 Suppl 1:S7. [PMID: 20626927 PMCID: PMC2903726 DOI: 10.1186/2041-1480-1-s1-s7] [Citation(s) in RCA: 165] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Experimental descriptions are typically stored as free text without using standardized terminology, creating challenges in comparison, reproduction and analysis. These difficulties impose limitations on data exchange and information retrieval. RESULTS The Ontology for Biomedical Investigations (OBI), developed as a global, cross-community effort, provides a resource that represents biomedical investigations in an explicit and integrative framework. Here we detail three real-world applications of OBI, provide detailed modeling information and explain how to use OBI. CONCLUSION We demonstrate how OBI can be applied to different biomedical investigations to both facilitate interpretation of the experimental process and increase the computational processing and integration within the Semantic Web. The logical definitions of the entities involved allow computers to unambiguously understand and integrate different biological experimental processes and their relevant components. AVAILABILITY OBI is available at http://purl.obolibrary.org/obo/obi/2009-11-02/obi.owl.
Collapse
Affiliation(s)
| | | | - Dirk Derom
- Victoria University of Wellington, New Zealand
| | | | - Yongqun He
- University of Michigan Medical School, Ann Arbor, USA
| | - Phillip Lord
- School of Computing Science, Newcastle University, UK
| | - James Malone
- The European Bioinformatics Institute, Cambridge, UK
| | | | - Bjoern Peters
- La Jolla Institute for Allergy and Immunology, La Jolla, CA, USA
| | | | | | | | | | - Christian J Stoeckert
- Center for Bioinformatics, Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
| | - Jessica A Turner
- Department of Psychiatry and Human Behavior, University of California, Irvine, CA, USA
| | - Jie Zheng
- Center for Bioinformatics, Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
| | | |
Collapse
|
25
|
Li C, Courtot M, Le Novère N, Laibe C. BioModels.net Web Services, a free and integrated toolkit for computational modelling software. Brief Bioinform 2009; 11:270-7. [PMID: 19939940 DOI: 10.1093/bib/bbp056] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Exchanging and sharing scientific results are essential for researchers in the field of computational modelling. BioModels.net defines agreed-upon standards for model curation. A fundamental one, MIRIAM (Minimum Information Requested in the Annotation of Models), standardises the annotation and curation process of quantitative models in biology. To support this standard, MIRIAM Resources maintains a set of standard data types for annotating models, and provides services for manipulating these annotations. Furthermore, BioModels.net creates controlled vocabularies, such as SBO (Systems Biology Ontology) which strictly indexes, defines and links terms used in Systems Biology. Finally, BioModels Database provides a free, centralised, publicly accessible database for storing, searching and retrieving curated and annotated computational models. Each resource provides a web interface to submit, search, retrieve and display its data. In addition, the BioModels.net team provides a set of Web Services which allows the community to programmatically access the resources. A user is then able to perform remote queries, such as retrieving a model and resolving all its MIRIAM Annotations, as well as getting the details about the associated SBO terms. These web services use established standards. Communications rely on SOAP (Simple Object Access Protocol) messages and the available queries are described in a WSDL (Web Services Description Language) file. Several libraries are provided in order to simplify the development of client software. BioModels.net Web Services make one step further for the researchers to simulate and understand the entirety of a biological system, by allowing them to retrieve biological models in their own tool, combine queries in workflows and efficiently analyse models.
Collapse
Affiliation(s)
- Chen Li
- Computational Neurobiology Group, at the European Bioinformatics Institute, Hinxton, UK
| | | | | | | |
Collapse
|
26
|
Le Novère N, Bornstein B, Broicher A, Courtot M, Donizelli M, Dharuri H, Li L, Sauro H, Schilstra M, Shapiro B, Snoep JL, Hucka M. BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems. Nucleic Acids Res 2006; 34:D689-91. [PMID: 16381960 PMCID: PMC1347454 DOI: 10.1093/nar/gkj092] [Citation(s) in RCA: 448] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
BioModels Database (), part of the international initiative BioModels.net, provides access to published, peer-reviewed, quantitative models of biochemical and cellular systems. Each model is carefully curated to verify that it corresponds to the reference publication and gives the proper numerical results. Curators also annotate the components of the models with terms from controlled vocabularies and links to other relevant data resources. This allows the users to search accurately for the models they need. The models can currently be retrieved in the SBML format, and import/export facilities are being developed to extend the spectrum of formats supported by the resource.
Collapse
Affiliation(s)
- Nicolas Le Novère
- European Bioinformatics Institute EMBL, Wellcome-Trust Genome Campus, Hinxton, CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Auteau N, Courtot M, Gervaise S. [Nursing strategy. Thermal treatment, a global management of the atopic child]. Soins Pediatr Pueric 2001:28-30. [PMID: 11949384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
|