1
|
Bernal-Llinares M, Ferrer-Gómez J, Juty N, Goble C, Wimalaratne SM, Hermjakob H. Identifiers.org: Compact Identifier services in the cloud. Bioinformatics 2021; 37:1781-1782. [PMID: 33031499 PMCID: PMC8289372 DOI: 10.1093/bioinformatics/btaa864] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 08/26/2020] [Accepted: 09/23/2020] [Indexed: 12/04/2022] Open
Abstract
Motivation Since its launch in 2010, Identifiers.org has become an important tool for the annotation and cross-referencing of Life Science data. In 2016, we established the Compact Identifier (CID) scheme (prefix: accession) to generate globally unique identifiers for data resources using their locally assigned accession identifiers. Since then, we have developed and improved services to support the growing need to create, reference and resolve CIDs, in systems ranging from human readable text to cloud-based e-infrastructures, by providing high availability and low-latency cloud-based services, backed by a high-quality, manually curated resource. Results We describe a set of services that can be used to construct and resolve CIDs in Life Sciences and beyond. We have developed a new front end for accessing the Identifiers.org registry data and APIs to simplify integration of Identifiers.org CID services with third-party applications. We have also deployed the new Identifiers.org infrastructure in a commercial cloud environment, bringing our services closer to the data. Availabilityand implementation https://identifiers.org.
Collapse
Affiliation(s)
- Manuel Bernal-Llinares
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Javier Ferrer-Gómez
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Nick Juty
- Department of Computer Science, University of Manchester, M139PL Manchester, UK
| | - Carole Goble
- Department of Computer Science, University of Manchester, M139PL Manchester, UK
| | - Sarala M Wimalaratne
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Henning Hermjakob
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, CB10 1SD Cambridge, UK
| |
Collapse
|
2
|
Langenstein M, Hermjakob H, Llinares MB. A decoupled, modular and scriptable architecture for tools to curate data platforms. Bioinformatics 2021; 37:3693-3694. [PMID: 33830216 PMCID: PMC8545344 DOI: 10.1093/bioinformatics/btab233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 03/12/2021] [Accepted: 04/07/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Curation is essential for any data platform to maintain the quality of the data it provides. Today, more effective curation tools are often vital to keep up with the rapid growth of existing, maintenance-requiring databases and the amount of newly published information that needs to be surveyed. However, curation interfaces are often complex and challenging to be further developed. Therefore, opportunities for experimentation with curation workflows may be lost due to a lack of development resources or a reluctance to change sensitive production systems. RESULTS We propose a decoupled, modular and scriptable architecture to build new curation tools on top of existing platforms. Our architecture treats the existing platform as a black box. It therefore only relies on its public application programming interfaces (APIs) and web application instead of requiring any changes to the existing infrastructure. As a case study, we have implemented this architecture in cmd-iaso, a curation tool for the identifiers.org registry. With cmd-iaso, we also show that the proposed design's flexibility can be utilised to streamline and enhance the curator's workflow with the platform's existing web interface. AVAILABILITY The cmd-iaso curation tool is implemented in Python 3.7+ and supports Linux, macOS and Windows. Its source code and documentation are freely available from https://github.com/identifiers-org/cmd-iaso. It is also published as a Docker container at https://hub.docker.com/r/identifiersorg/cmd-iaso. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Momo Langenstein
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, UK CB10 1SD
| | - Henning Hermjakob
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, UK CB10 1SD
| | - Manuel Bernal Llinares
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Cambridge, UK CB10 1SD
| |
Collapse
|
3
|
Ison J, Ménager H, Brancotte B, Jaaniso E, Salumets A, Raček T, Lamprecht AL, Palmblad M, Kalaš M, Chmura P, Hancock JM, Schwämmle V, Ienasescu HI. Community curation of bioinformatics software and data resources. Brief Bioinform 2019; 21:1697-1705. [PMID: 31624831 PMCID: PMC7947956 DOI: 10.1093/bib/bbz075] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 05/13/2019] [Accepted: 05/30/2019] [Indexed: 11/13/2022] Open
Abstract
The corpus of bioinformatics resources is huge and expanding rapidly, presenting life scientists with a growing challenge in selecting tools that fit the desired purpose. To address this, the European Infrastructure for Biological Information is supporting a systematic approach towards a comprehensive registry of tools and databases for all domains of bioinformatics, provided under a single portal (https://bio.tools). We describe here the practical means by which scientific communities, including individual developers and projects, through major service providers and research infrastructures, can describe their own bioinformatics resources and share these via bio.tools.
Collapse
Affiliation(s)
- Jon Ison
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| | - Hervé Ménager
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Bryan Brancotte
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Erik Jaaniso
- ELIXIR-EE, Institute of Computer Science, University of Tartu. J Liivi 2, Tartu, Estonia
| | - Ahto Salumets
- ELIXIR-EE, Institute of Computer Science, University of Tartu. J Liivi 2, Tartu, Estonia
| | - Tomáš Raček
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno-Bohunice, Czech Republic.,Faculty of Informatics, Masaryk University, Botanická 68a, 602 00 Brno, Czech Republic
| | - Anna-Lena Lamprecht
- Department of Information and Computing Sciences, Utrecht University, Utrecht, Netherlands
| | - Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, Netherlands
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5020 Bergen, Norway
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen
| | - John M Hancock
- ELIXIR-Hub, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Hans-Ioan Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| |
Collapse
|
4
|
Cousijn H, Kenall A, Ganley E, Harrison M, Kernohan D, Lemberger T, Murphy F, Polischuk P, Taylor S, Martone M, Clark T. A data citation roadmap for scientific publishers. Sci Data 2018; 5:180259. [PMID: 30457573 PMCID: PMC6244190 DOI: 10.1038/sdata.2018.259] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 10/04/2018] [Indexed: 12/04/2022] Open
Abstract
This article presents a practical roadmap for scholarly publishers to implement data citation in accordance with the Joint Declaration of Data Citation Principles (JDDCP), a synopsis and harmonization of the recommendations of major science policy bodies. It was developed by the Publishers Early Adopters Expert Group as part of the Data Citation Implementation Pilot (DCIP) project, an initiative of FORCE11.org and the NIH BioCADDIE program. The structure of the roadmap presented here follows the "life of a paper" workflow and includes the categories Pre-submission, Submission, Production, and Publication. The roadmap is intended to be publisher-agnostic so that all publishers can use this as a starting point when implementing JDDCP-compliant data citation. Authors reading this roadmap will also better know what to expect from publishers and how to enable their own data citations to gain maximum impact, as well as complying with what will become increasingly common funder mandates on data transparency.
Collapse
Affiliation(s)
| | | | - Emma Ganley
- Public Library of Science, San Francisco CA 94111, USA
| | | | | | | | | | | | | | | | - Tim Clark
- University of Virginia, School of Medicine, Charlottesville VA 22908, USA
- University of Virginia, Data Science Institute, Charlottesville VA 22904, USA
| |
Collapse
|
5
|
Miller RA, Woollard P, Willighagen EL, Digles D, Kutmon M, Loizou A, Waagmeester A, Senger S, Evelo CT. Explicit interaction information from WikiPathways in RDF facilitates drug discovery in the Open PHACTS Discovery Platform. F1000Res 2018; 7:75. [PMID: 30416713 PMCID: PMC6206606 DOI: 10.12688/f1000research.13197.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/24/2018] [Indexed: 12/11/2022] Open
Abstract
Open PHACTS is a pre-competitive project to answer scientific questions developed recently by the pharmaceutical industry. Having high quality biological interaction information in the Open PHACTS Discovery Platform is needed to answer multiple pathway related questions. To address this, updated WikiPathways data has been added to the platform. This data includes information about biological interactions, such as stimulation and inhibition. The platform's Application Programming Interface (API) was extended with appropriate calls to reference these interactions. These new methods of the Open PHACTS API are available now.
Collapse
Affiliation(s)
- Ryan A Miller
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands
| | | | - Egon L Willighagen
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands
| | - Daniela Digles
- Pharmacoinformatics Research Group, Department of Pharmaceutical Chemistry, University of Vienna, Vienna, Austria
| | - Martina Kutmon
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands.,Maastricht Center for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | | | - Andra Waagmeester
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands.,Micelio, Antwerp, Belgium
| | | | - Chris T Evelo
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands.,Maastricht Center for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands.,Open PHACTS Foundation, Science Park, Cambridge, UK
| |
Collapse
|
6
|
Wimalaratne SM, Juty N, Kunze J, Janée G, McMurry JA, Beard N, Jimenez R, Grethe JS, Hermjakob H, Martone ME, Clark T. Uniform resolution of compact identifiers for biomedical data. Sci Data 2018; 5:180029. [PMID: 29737976 PMCID: PMC5944906 DOI: 10.1038/sdata.2018.29] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 01/26/2018] [Indexed: 11/09/2022] Open
Abstract
Most biomedical data repositories issue locally-unique accessions numbers, but do not provide globally unique, machine-resolvable, persistent identifiers for their datasets, as required by publishers wishing to implement data citation in accordance with widely accepted principles. Local accessions may however be prefixed with a namespace identifier, providing global uniqueness. Such "compact identifiers" have been widely used in biomedical informatics to support global resource identification with local identifier assignment. We report here on our project to provide robust support for machine-resolvable, persistent compact identifiers in biomedical data citation, by harmonizing the Identifiers.org and N2T.net (Name-To-Thing) meta-resolvers and extending their capabilities. Identifiers.org services hosted at the European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), and N2T.net services hosted at the California Digital Library (CDL), can now resolve any given identifier from over 600 source databases to its original source on the Web, using a common registry of prefix-based redirection rules. We believe these services will be of significant help to publishers and others implementing persistent, machine-resolvable citation of research data.
Collapse
Affiliation(s)
- Sarala M Wimalaratne
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Nick Juty
- University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - John Kunze
- California Digital Library, University of California, Oakland, CA 94612, USA
| | - Greg Janée
- California Digital Library, University of California, Oakland, CA 94612, USA
| | - Julie A McMurry
- Oregon Health and Science University, Portland, OR 97239, USA
| | - Niall Beard
- University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Rafael Jimenez
- ELIXIR, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | | | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | | | - Tim Clark
- Massachusetts General Hospital, Boston, MA 02114, USA.,Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
7
|
Callahan A, Abeyruwan SW, Al-Ali H, Sakurai K, Ferguson AR, Popovich PG, Shah NH, Visser U, Bixby JL, Lemmon VP. RegenBase: a knowledge base of spinal cord injury biology for translational research. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw040. [PMID: 27055827 PMCID: PMC4823819 DOI: 10.1093/database/baw040] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 03/03/2016] [Indexed: 12/20/2022]
Abstract
Spinal cord injury (SCI) research is a data-rich field that aims to identify the biological mechanisms resulting in loss of function and mobility after SCI, as well as develop therapies that promote recovery after injury. SCI experimental methods, data and domain knowledge are locked in the largely unstructured text of scientific publications, making large scale integration with existing bioinformatics resources and subsequent analysis infeasible. The lack of standard reporting for experiment variables and results also makes experiment replicability a significant challenge. To address these challenges, we have developed RegenBase, a knowledge base of SCI biology. RegenBase integrates curated literature-sourced facts and experimental details, raw assay data profiling the effect of compounds on enzyme activity and cell growth, and structured SCI domain knowledge in the form of the first ontology for SCI, using Semantic Web representation languages and frameworks. RegenBase uses consistent identifier schemes and data representations that enable automated linking among RegenBase statements and also to other biological databases and electronic resources. By querying RegenBase, we have identified novel biological hypotheses linking the effects of perturbagens to observed behavioral outcomes after SCI. RegenBase is publicly available for browsing, querying and download. Database URL:http://regenbase.org
Collapse
Affiliation(s)
- Alison Callahan
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305
| | | | - Hassan Al-Ali
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136
| | - Kunie Sakurai
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136
| | - Adam R Ferguson
- Brain and Spinal Injury Center (BASIC), Department of Neurological Surgery, University of California, San Francisco; San Francisco Veterans Affairs Medical Center, San Francisco, CA 94143
| | - Phillip G Popovich
- Center for Brain and Spinal Cord Repair and the Department of Neuroscience, The Ohio State University, Columbus, OH 43210
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305
| | - Ubbo Visser
- Department of Computer Science, University of Miami, Coral Gables, FL 33146
| | - John L Bixby
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136 Center for Computational Science, University of Miami, Coral Gables, FL 33146 Department of Cellular and Molecular Pharmacology, University of Miami School of Medicine, Miami, FL 33136, USA
| | - Vance P Lemmon
- Miami Project to Cure Paralysis, University of Miami School of Medicine, Miami, FL 33136 Center for Computational Science, University of Miami, Coral Gables, FL 33146
| |
Collapse
|
8
|
Katayama T, Wilkinson MD, Aoki-Kinoshita KF, Kawashima S, Yamamoto Y, Yamaguchi A, Okamoto S, Kawano S, Kim JD, Wang Y, Wu H, Kano Y, Ono H, Bono H, Kocbek S, Aerts J, Akune Y, Antezana E, Arakawa K, Aranda B, Baran J, Bolleman J, Bonnal RJ, Buttigieg PL, Campbell MP, Chen YA, Chiba H, Cock PJ, Cohen KB, Constantin A, Duck G, Dumontier M, Fujisawa T, Fujiwara T, Goto N, Hoehndorf R, Igarashi Y, Itaya H, Ito M, Iwasaki W, Kalaš M, Katoda T, Kim T, Kokubu A, Komiyama Y, Kotera M, Laibe C, Lapp H, Lütteke T, Marshall MS, Mori T, Mori H, Morita M, Murakami K, Nakao M, Narimatsu H, Nishide H, Nishimura Y, Nystrom-Persson J, Ogishima S, Okamura Y, Okuda S, Oshita K, Packer NH, Prins P, Ranzinger R, Rocca-Serra P, Sansone S, Sawaki H, Shin SH, Splendiani A, Strozzi F, Tadaka S, Toukach P, Uchiyama I, Umezaki M, Vos R, Whetzel PL, Yamada I, Yamasaki C, Yamashita R, York WS, Zmasek CM, Kawamoto S, Takagi T. BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains. J Biomed Semantics 2014; 5:5. [PMID: 24495517 PMCID: PMC3978116 DOI: 10.1186/2041-1480-5-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 11/26/2013] [Indexed: 01/24/2023] Open
Abstract
The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
Collapse
Affiliation(s)
- Toshiaki Katayama
- Database Center for Life Science, Research Organization of Information and Systems, 2-11-16, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|