1
|
Alper P, Dĕd V, Herzinger S, Grouès V, Peter S, Lebioda J, Ebermann L, Popleteeva M, Barry ND, Welter D, Ghosh S, Becker R, Schneider R, Gu W, Trefois C, Satagopam V. DS-PACK: Tool assembly for the end-to-end support of controlled access human data sharing. Sci Data 2024; 11:501. [PMID: 38750048 PMCID: PMC11096168 DOI: 10.1038/s41597-024-03326-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 04/29/2024] [Indexed: 05/18/2024] Open
Abstract
The EU General Data Protection Regulation (GDPR) requirements have prompted a shift from centralised controlled access genome-phenome archives to federated models for sharing sensitive human data. In a data-sharing federation, a central node facilitates data discovery; meanwhile, distributed nodes are responsible for handling data access requests, concluding agreements with data users and providing secure access to the data. Research institutions that want to become part of such federations often lack the resources to set up the required controlled access processes. The DS-PACK tool assembly is a reusable, open-source middleware solution that semi-automates controlled access processes end-to-end, from data submission to access. Data protection principles are engraved into all components of the DS-PACK assembly. DS-PACK centralises access control management and distributes access control enforcement with support for data access via cloud-based applications. DS-PACK is in production use at the ELIXIR Luxembourg data hosting platform, combined with an operational model including legal facilitation and data stewardship.
Collapse
Affiliation(s)
- Pinar Alper
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg.
- ELIXIR Luxembourg, Belvaux, Luxembourg.
| | - Vilém Dĕd
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Sascha Herzinger
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Valentin Grouès
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Sarah Peter
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Jacek Lebioda
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Linda Ebermann
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Marina Popleteeva
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Nene Djenaba Barry
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Danielle Welter
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Soumyabrata Ghosh
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Regina Becker
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Reinhard Schneider
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Wei Gu
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Christophe Trefois
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Venkata Satagopam
- ELIXIR Luxembourg, Belvaux, Luxembourg.
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg.
| |
Collapse
|
2
|
Brady A, Charbonneau A, Grossman RL, Creasy HH, Renner R, Pihl T, Otridge J, Kim E, Barnholtz-Sloan JS, Kerlavage AR. NCI Cancer Research Data Commons: Core Standards and Services. Cancer Res 2024; 84:1384-1387. [PMID: 38488505 PMCID: PMC11067691 DOI: 10.1158/0008-5472.can-23-2655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 01/05/2024] [Accepted: 02/28/2024] [Indexed: 05/03/2024]
Abstract
The NCI Cancer Research Data Commons (CRDC) is a collection of data commons, analysis platforms, and tools that make existing cancer data more findable and accessible by the cancer research community. In practice, the two biggest hurdles to finding and using data for discovery are the wide variety of models and ontologies used to describe data, and the dispersed storage of that data. Here, we outline core CRDC services to aggregate descriptive information from multiple studies for findability via a single interface and to provide a single access method that spans multiple data commons. See related articles by Wang et al., p. 1388, Pot et al., p. 1396, and Kim et al., p. 1404.
Collapse
Affiliation(s)
- Arthur Brady
- General Dynamics Information Technology, Falls Church, Virginia
| | | | - Robert L. Grossman
- University of Chicago, Center for Translational Data Science, Chicago, Illinois
| | - Heather H. Creasy
- Center for Biomedical Informatics and Information Technology, NCI, Rockville, Maryland
| | - Robinette Renner
- Center for Biomedical Informatics and Information Technology, NCI, Rockville, Maryland
| | - Todd Pihl
- Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - John Otridge
- Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - Erika Kim
- Center for Biomedical Informatics and Information Technology, NCI, Rockville, Maryland
| | | | - Jill S. Barnholtz-Sloan
- Center for Biomedical Informatics and Information Technology, NCI, Rockville, Maryland
- Trans Divisional Research Program, Division of Cancer Epidemiology and Genetics, NCI, Rockville, Maryland
| | - Anthony R. Kerlavage
- Center for Biomedical Informatics and Information Technology, NCI, Rockville, Maryland
| |
Collapse
|
3
|
Jentsch M, Schneider-Lunitz V, Taron U, Braun M, Ishaque N, Wagener H, Conrad C, Twardziok S. Creating cloud platforms for supporting FAIR data management in biomedical research projects. F1000Res 2024; 13:8. [PMID: 38779317 PMCID: PMC11109697 DOI: 10.12688/f1000research.140624.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/25/2024] [Indexed: 05/25/2024] Open
Abstract
Biomedical research projects are becoming increasingly complex and require technological solutions that support all phases of the data lifecycle and application of the FAIR principles. At the Berlin Institute of Health (BIH), we have developed and established a flexible and cost-effective approach to building customized cloud platforms for supporting research projects. The approach is based on a microservice architecture and on the management of a portfolio of supported services. On this basis, we created and maintained cloud platforms for several international research projects. In this article, we present our approach and argue that building customized cloud platforms can offer multiple advantages over using multi-project platforms. Our approach is transferable to other research environments and can be easily adapted by other projects and other service providers.
Collapse
Affiliation(s)
- Marcel Jentsch
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Valentin Schneider-Lunitz
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Ulrike Taron
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Martin Braun
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Naveed Ishaque
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Harald Wagener
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Christian Conrad
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Sven Twardziok
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| |
Collapse
|
4
|
Almeida JR, Zúquete A, Pazos A, Oliveira JL. A federated authentication schema among multiple identity providers. Heliyon 2024; 10:e28560. [PMID: 38590890 PMCID: PMC10999912 DOI: 10.1016/j.heliyon.2024.e28560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 03/08/2024] [Accepted: 03/20/2024] [Indexed: 04/10/2024] Open
Abstract
Single Sign-On (SSO) methods are the primary solution to authenticate users across multiple web systems. These mechanisms streamline the authentication procedure by avoiding duplicate developments of authentication modules for each application. Besides, these mechanisms also provide convenience to the end-user by keeping the user authenticated when switching between different contexts. To ensure this cross-application authentication, SSO relies on an Identity Provider (IdP), which is commonly set up and managed by each institution that needs to enforce SSO internally. However, the solution is not so straightforward when several institutions need to cooperate in a unique ecosystem. This could be tackled by centralizing the authentication mechanisms in one of the involved entities, a solution raising responsibilities that may be difficult for peers to accept. Moreover, this solution is not appropriate for dynamic groups, where peers may join or leave frequently. In this paper, we propose an architecture that uses a trusted third-party service to authenticate multiple entities, ensuring the isolation of the user's attributes between this service and the institutional SSO systems. This architecture was validated in the EHDEN Portal, which includes web tools and services of this European health project, to establish a Federated Authentication schema.
Collapse
Affiliation(s)
- João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal
- Department of Computation, University of A Coruña, A Coruña, Spain
| | - André Zúquete
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal
| | - Alejandro Pazos
- Department of Computation, University of A Coruña, A Coruña, Spain
| | | |
Collapse
|
5
|
Insana G, Ignatchenko A, Martin M, Bateman A. MBDBMetrics: an online metrics tool to measure the impact of biological data resources. BIOINFORMATICS ADVANCES 2023; 3:vbad180. [PMID: 38130879 PMCID: PMC10733715 DOI: 10.1093/bioadv/vbad180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 11/13/2023] [Indexed: 12/23/2023]
Abstract
Motivation There now exist thousands of molecular biology databases covering every aspect of biological data. This database infrastructure takes significant effort and funding to develop and maintain. The creators of these databases need to make strong justifications to funders to prove their impact or importance. There are many publication metrics and tools available such as Google Scholar to measure citation impact or AltMetrics covering multiple measures including social media coverage. Results In this article, we describe a series of novel impact metrics that have been applied initially to the UniProt database, and now made available via a Google Colab to enable any molecular biology resource to gain several additional metrics. These metrics, powered by freely available APIs from Europe PubMedCentral and SureCHEMBL cover mentions of the resource in full text articles, including which section of the paper the mention occurs in, grant acknowledgements and mentions in patent applications. This tool, that we call MBDBMetrics, is a useful adjunct to existing tools. Availability and implementation The MBDBMetrics tool is available at the following locations: https://colab.research.google.com/drive/1aEmSQR9DGQIZmHAIuQV9mLv7Mw9Ppkin and https://github.com/g-insana/MBDBMetrics.
Collapse
Affiliation(s)
- Giuseppe Insana
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Alex Ignatchenko
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| |
Collapse
|
6
|
de Visser C, Johansson LF, Kulkarni P, Mei H, Neerincx P, Joeri van der Velde K, Horvatovich P, van Gool AJ, Swertz MA, Hoen PAC‘, Niehues A. Ten quick tips for building FAIR workflows. PLoS Comput Biol 2023; 19:e1011369. [PMID: 37768885 PMCID: PMC10538699 DOI: 10.1371/journal.pcbi.1011369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/30/2023] Open
Abstract
Research data is accumulating rapidly and with it the challenge of fully reproducible science. As a consequence, implementation of high-quality management of scientific data has become a global priority. The FAIR (Findable, Accesible, Interoperable and Reusable) principles provide practical guidelines for maximizing the value of research data; however, processing data using workflows-systematic executions of a series of computational tools-is equally important for good data management. The FAIR principles have recently been adapted to Research Software (FAIR4RS Principles) to promote the reproducibility and reusability of any type of research software. Here, we propose a set of 10 quick tips, drafted by experienced workflow developers that will help researchers to apply FAIR4RS principles to workflows. The tips have been arranged according to the FAIR acronym, clarifying the purpose of each tip with respect to the FAIR4RS principles. Altogether, these tips can be seen as practical guidelines for workflow developers who aim to contribute to more reproducible and sustainable computational science, aiming to positively impact the open science and FAIR community.
Collapse
Affiliation(s)
- Casper de Visser
- Medical BioSciences Department, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Lennart F. Johansson
- Genomics Coordination Center and Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Purva Kulkarni
- Medical BioSciences Department, Radboud University Medical Center, Nijmegen, the Netherlands
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, Nijmegen, the Netherlands
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Hailiang Mei
- Sequencing Analysis Support Core, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| | - Pieter Neerincx
- Genomics Coordination Center and Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - K. Joeri van der Velde
- Genomics Coordination Center and Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Péter Horvatovich
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, Groningen, the Netherlands
| | - Alain J. van Gool
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, Nijmegen, the Netherlands
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Morris A. Swertz
- Genomics Coordination Center and Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Peter A. C. ‘t Hoen
- Medical BioSciences Department, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Anna Niehues
- Medical BioSciences Department, Radboud University Medical Center, Nijmegen, the Netherlands
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, Nijmegen, the Netherlands
| |
Collapse
|
7
|
Justesen TF, Gögenur I, Tarpgaard LS, Pfeiffer P, Qvortrup C. Evaluating the efficacy and safety of neoadjuvant pembrolizumab in patients with stage I-III MMR-deficient colon cancer: a national, multicentre, prospective, single-arm, phase II study protocol. BMJ Open 2023; 13:e073372. [PMID: 37349100 PMCID: PMC10314641 DOI: 10.1136/bmjopen-2023-073372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 06/08/2023] [Indexed: 06/24/2023] Open
Abstract
INTRODUCTION Within the last two decades, major advances have been made in the surgical approach for patients with colorectal cancer. However, to this day we face considerable challenges in reducing surgery-related complications and improving long-term oncological outcomes. Unprecedented response rates have been achieved in studies investigating immunotherapy in patients with mismatch repair deficient (dMMR) colorectal cancer. This has raised the question of whether neoadjuvant immunotherapy may change the standard of care for localised dMMR colon cancer and pave the way for organ-sparing treatment. METHODS AND ANALYSIS This is an investigator-initiated, multicentre, prospective, single-arm, phase II study in patients with stage I-III dMMR colon cancer scheduled for intended curative surgery. Eighty-five patients will be treated with one dose of pembrolizumab (4 mg/kg) and within 5 weeks will undergo a re-evaluation with an endoscopy and a CT scan-to assess tumour response-before standard resection of the tumour. The primary endpoint is the number of patients with pathological complete response, and secondary endpoints include safety (number and severity of adverse events) and postoperative surgical complications. In addition, we aspire to identify predictive biomarkers that can point out patients that achieve pathological complete response. ETHICS AND DISSEMINATION The Regional Committee for Health Research and Ethics and the Danish Medicines Agency have approved this study. The study will be performed according to the Helsinki II declaration. Written informed consent will be obtained from all participants. The results of the study will be submitted to peer-reviewed journals for publication and presented at international congresses. TRIAL REGISTRATION NUMBER NCT05662527.
Collapse
Affiliation(s)
| | - Ismail Gögenur
- Center for Surgical Science, Zealand University Hospital, Koge, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Line Schmidt Tarpgaard
- Department of Oncology, Odense University Hospital, Odense, Denmark
- Department of Clinical Research, University of Southern Denmark, Odense, Denmark
| | - Per Pfeiffer
- Department of Oncology, Odense University Hospital, Odense, Denmark
- Department of Clinical Research, University of Southern Denmark, Odense, Denmark
| | | |
Collapse
|
8
|
Devignes MD, Smaïl-Tabbone M, Dhondge H, Dolcemascolo R, Gavaldá-García J, Higuera-Rodriguez RA, Kravchenko A, Roca Martínez J, Messini N, Pérez-Ràfols A, Pérez Ropero G, Sperotto L, Chauvot de Beauchêne I, Vranken W. Experiences with a training DSW knowledge model for early-stage researchers. OPEN RESEARCH EUROPE 2023; 3:97. [PMID: 37645489 PMCID: PMC10445825 DOI: 10.12688/openreseurope.15609.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 05/30/2023] [Indexed: 08/31/2023]
Abstract
Background: Data management is fast becoming an essential part of scientific practice, driven by open science and FAIR (findable, accessible, interoperable, and reusable) data sharing requirements. Whilst data management plans (DMPs) are clear to data management experts and data stewards, understandings of their purpose and creation are often obscure to the producers of the data, which in academic environments are often PhD students. Methods: Within the RNAct EU Horizon 2020 ITN project, we engaged the 10 RNAct early-stage researchers (ESRs) in a training project aimed at formulating a DMP. To do so, we used the Data Stewardship Wizard (DSW) framework and modified the existing Life Sciences Knowledge Model into a simplified version aimed at training young scientists, with computational or experimental backgrounds, in core data management principles. We collected feedback from the ESRs during this exercise. Results: Here, we introduce our new life-sciences training DMP template for young scientists. We report and discuss our experiences as principal investigators (PIs) and ESRs during this project and address the typical difficulties that are encountered in developing and understanding a DMP. Conclusions: We found that the DS-wizard can also be an appropriate tool for DMP training, to get terminology and concepts across to researchers. A full training in addition requires an upstream step to present basic DMP concepts and a downstream step to publish a dataset in a (public) repository. Overall, the DS-Wizard tool was essential for our DMP training and we hope our efforts can be used in other projects.
Collapse
Affiliation(s)
| | | | | | - Roswitha Dolcemascolo
- Institute for Integrative Systems Biology (I2SysBio), CSIC - University of Valencia, Paterna, 46980, Spain
- Department of Biotechnology, Polytechnic University of Valencia, Valencia, 46022, Spain
| | - Jose Gavaldá-García
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium
| | - R. Anahí Higuera-Rodriguez
- Dynamic Biosensors GmbH, Munich, 81379, Germany
- Department of Physics, School of Natural Sciences, Technical University of Munich, Garching, 85748, Germany
| | - Anna Kravchenko
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, F-5400, France
| | - Joel Roca Martínez
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium
| | - Niki Messini
- Department of Bioscience, School of Natural Sciences, Technical University of Munich, Garching, 85748, Germany
| | - Anna Pérez-Ràfols
- Giotto Biotech s.r.l,, Florence, 50019, Italy
- Magnetic Resonance Center (CERM), Department of Chemistry “Ugo Schiff”, University of Florence, Florence, 50019, Italy
| | - Guillermo Pérez Ropero
- Department of Chemistry-BMC, Uppsala University, Uppsala, 75123, Sweden
- Ridgeview Instruments AB, Uppsala, 75237, Sweden
| | - Luca Sperotto
- Department of Bioscience, School of Natural Sciences, Technical University of Munich, Garching, 85748, Germany
| | | | - Wim Vranken
- Interuniversity Institute of Bioinformatics in Brussels, VUB/ULB, Brussels, 1050, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, 1050, Belgium
| |
Collapse
|
9
|
Grossman RL. Ten lessons for data sharing with a data commons. Sci Data 2023; 10:120. [PMID: 36878917 PMCID: PMC9988927 DOI: 10.1038/s41597-023-02029-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 02/17/2023] [Indexed: 03/08/2023] Open
Affiliation(s)
- Robert L Grossman
- University of Chicago, Center for Translational Data Science, Chicago, IL, 60615, USA.
| |
Collapse
|
10
|
Hooft RW, Harrison E, Martin CS. The road to success: drawing parallels between 'road' and 'research data' infrastructures to foster understanding between service providers, funders and policymakers. F1000Res 2023; 12:ELIXIR-88. [PMID: 37065508 PMCID: PMC10102711 DOI: 10.12688/f1000research.128167.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/13/2022] [Indexed: 01/24/2023] Open
Abstract
Background: The work of data research infrastructure operators is poorly understood, yet the services they provide are used by millions of scientists across the planet. Policy and implications: As the data services and the underlying infrastructure are typically funded through the public purse, it is essential that policymakers, research funders, experts reviewing funding proposals, and possibly even end-users are equipped with a good understanding of the daily tasks of service providers. Recommendations: We suggest drawing parallels between research data infrastructure and road infrastructure. To trigger the imagination and foster understanding, this policy brief contains a table of corresponding aspects of the two classes of infrastructure. Conclusions: Just as economists and specialist evaluators are typically brought in to inform policies and funding decisions for road infrastructure, we encourage this to also be done for research infrastructures.
Collapse
Affiliation(s)
- Rob W.W. Hooft
- Dutch Techcentre for Life Sciences, Utrecht, 3521 AL, The Netherlands
| | | | | |
Collapse
|
11
|
Hooft RW, Harrison E, Martin CS. The road to success: drawing parallels between 'road' and 'research data' infrastructures to foster understanding between service providers, funders and policymakers. F1000Res 2023; 12:ELIXIR-88. [PMID: 37065508 PMCID: PMC10102711 DOI: 10.12688/f1000research.128167.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/18/2023] [Indexed: 08/25/2023] Open
Abstract
Background: The work of data research infrastructure operators is poorly understood, yet the services they provide are used by millions of scientists across the planet. Policy and implications: As the data services and the underlying infrastructure are typically funded through the public purse, it is essential that policymakers, research funders, experts reviewing funding proposals, and possibly even end-users are equipped with a good understanding of the daily tasks of service providers. Recommendations: We suggest drawing parallels between research data infrastructure and road infrastructure. To trigger the imagination and foster understanding, this policy brief contains a table of corresponding aspects of the two classes of infrastructure, and a table of policy implications. Conclusions: Just as economists and specialist evaluators are typically brought in to inform policies and funding decisions for road infrastructure, we encourage this to also be done for research infrastructures.
Collapse
Affiliation(s)
- Rob W.W. Hooft
- Dutch Techcentre for Life Sciences, Utrecht, 3521 AL, The Netherlands
| | | | | |
Collapse
|
12
|
Arend D, Scholz U, Lange M. The Plant Phenomics and Genomics Research Data Repository: An On-Premise Approach for FAIR-Compliant Data Acquisition. Methods Mol Biol 2023; 2703:3-22. [PMID: 37646933 DOI: 10.1007/978-1-0716-3389-2_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
The FAIR data principle as a commitment to support long-term research data management is widely accepted in the scientific community. However, although many established infrastructures provide comprehensive and long-term stable services and platforms, a large quantity of research data is still hidden. Currently, high-throughput plant genomics and phenomics technologies are producing research data in abundance, the storage of which is not covered by established core databases. This concerns the data volume, for example, time series of images or high-resolution hyperspectral data; the quality of data formatting and annotation, e.g., with regard to structure and annotation specifications of core databases; uncovered data domains; or organizational constraints prohibiting primary data storage outside institutional boundaries. To share these potentially dark data in a FAIR way and master these challenges the ELIXIR Germany/de.NBI service Plant Genomic and Phenomics Research Data Repository (PGP) implements an on-premise approach, which allows research data to be kept in place and wrapped in FAIR-aware software infrastructure. In this chapter, the e!DAL infrastructure software and the PGP repository are presented as best practice on how to easily setup FAIR-compliant and intuitive research data services.
Collapse
Affiliation(s)
- Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, OT Gatersleben, Germany.
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, OT Gatersleben, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, OT Gatersleben, Germany
| |
Collapse
|
13
|
Ross-Hellauer T, Klebel T, Bannach-Brown A, Horbach SP, Jabeen H, Manola N, Metodiev T, Papageorgiou H, Reczko M, Sansone SA, Schneider J, Tijdink J, Vergoulis T. TIER2: enhancing Trust, Integrity and Efficiency in Research through next-level Reproducibility. RESEARCH IDEAS AND OUTCOMES 2022. [DOI: 10.3897/rio.8.e98457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Lack of reproducibility of research results has become a major theme in recent years. As we emerge from the COVID-19 pandemic, economic pressures and exposed consequences of lack of societal trust in science make addressing reproducibility of urgent importance. TIER2 is a new international project funded by the European Commission under their Horizon Europe programme. Covering three broad research areas (social, life and computer sciences) and two cross-disciplinary stakeholder groups (research publishers and funders) to systematically investigate reproducibility across contexts, TIER2 will significantly boost knowledge on reproducibility, create tools, engage communities, implement interventions and policy across different contexts to increase re-use and overall quality of research results in the European Research Area and global R&I, and consequently increase trust, integrity and efficiency in research.
Collapse
|
14
|
Data platforms for open life sciences-A systematic analysis of management instruments. PLoS One 2022; 17:e0276204. [PMID: 36282849 PMCID: PMC9595524 DOI: 10.1371/journal.pone.0276204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 10/02/2022] [Indexed: 11/05/2022] Open
Abstract
Open data platforms are interfaces between data demand of and supply from their users. Yet, data platform providers frequently struggle to aggregate data to suit their users’ needs and to establish a high intensity of data exchange in a collaborative environment. Here, using open life science data platforms as an example for a diverse data structure, we systematically categorize these platforms based on their technology intermediation and the range of domains they cover to derive general and specific success factors for their management instruments. Our qualitative content analysis is based on 39 in-depth interviews with experts employed by data platforms and external stakeholders. We thus complement peer initiatives which focus solely on data quality, by additionally highlighting the data platforms’ role to enable data utilization for innovative output. Based on our analysis, we propose a clearly structured and detailed guideline for seven management instruments. This guideline helps to establish and operationalize data platforms and to best exploit the data provided. Our findings support further exploitation of the open innovation potential in the life sciences and beyond.
Collapse
|
15
|
Vescovi R, Chard R, Saint ND, Blaiszik B, Pruyne J, Bicer T, Lavens A, Liu Z, Papka ME, Narayanan S, Schwarz N, Chard K, Foster IT. Linking scientific instruments and computation: Patterns, technologies, and experiences. PATTERNS (NEW YORK, N.Y.) 2022; 3:100606. [PMID: 36277824 PMCID: PMC9583115 DOI: 10.1016/j.patter.2022.100606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 08/07/2022] [Accepted: 09/14/2022] [Indexed: 11/07/2022]
Abstract
Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Thus, methods are required for configuring and running distributed computing pipelines—what we call flows—that link instruments, computers (e.g., for analysis, simulation, artificial intelligence [AI] model training), edge computing (e.g., for analysis), data stores, metadata catalogs, and high-speed networks. We review common patterns associated with such flows and describe methods for instantiating these patterns. We present experiences with the application of these methods to the processing of data from five different scientific instruments, each of which engages powerful computers for data inversion,model training, or other purposes. We also discuss implications of such methods for operators and users of scientific facilities. Patterns for linking instruments and computers for online analysis are reviewed Methods are presented for capturing such “flows” in reusable forms The use of Globus automation services to run flows is described Implications of these methods for scientists and facilities are discussed
The industrial revolution transformed society via large-scale automation of manufacturing. Today, AI- and robotics-driven automation of scientific research seems set to usher in a new era of accelerated discovery. But just as the industrial revolution depended on new replicable and scalable manufacturing processes and methods for delivering the copious mechanical power required by those processes, so the automated discovery revolution demands new methods for implementing research automation processes and for connecting those processes to computing and data power. We present here new methods that address these essential needs by allowing scientists to capture common automation patterns in reusable flows and to embed such flows in a global trust, data, and computing fabric that enables instant access to powerful AI, simulation, and other computational capabilities. We use examples from synchrotron light sources to show how these methods can be realized in software and applied at scale.
Collapse
Affiliation(s)
- Rafael Vescovi
- Data Science and Learning Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA
| | - Ryan Chard
- Data Science and Learning Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA
| | - Nickolaus D Saint
- Globus, University of Chicago, 5730 S. Ellis Ave., Chicago, IL 60615, USA
| | - Ben Blaiszik
- Data Science and Learning Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA.,Globus, University of Chicago, 5730 S. Ellis Ave., Chicago, IL 60615, USA
| | - Jim Pruyne
- Data Science and Learning Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA.,Globus, University of Chicago, 5730 S. Ellis Ave., Chicago, IL 60615, USA
| | - Tekin Bicer
- Data Science and Learning Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA.,X-ray Science Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA
| | - Alex Lavens
- Structural Biology Center, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA
| | - Zhengchun Liu
- Data Science and Learning Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA
| | - Michael E Papka
- Argonne Leadership Computing Facility, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA.,Department of Computer Science, University of Illinois Chicago, 1200 W. Harrison St., Chicago, IL 60607, USA
| | - Suresh Narayanan
- X-ray Science Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA
| | - Nicholas Schwarz
- X-ray Science Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA
| | - Kyle Chard
- Data Science and Learning Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA.,Department of Computer Science, University of Chicago, 5730 S. Ellis Ave., Chicago, IL 60615, USA
| | - Ian T Foster
- Data Science and Learning Division, Argonne National Laboratory, 9700 S. Cass Ave., Lemont, IL 60439, USA.,Department of Computer Science, University of Chicago, 5730 S. Ellis Ave., Chicago, IL 60615, USA
| |
Collapse
|
16
|
De Geest P, Coppens F, Soiland-Reyes S, Eguinoa I, Leo S. Enhancing RDM in Galaxy by integrating RO-Crate. RESEARCH IDEAS AND OUTCOMES 2022. [DOI: 10.3897/rio.8.e95164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We introduce how the Galaxy research environment (Jalili et al. 2020) integrates with RO-Crate as an implementation of Findable Accessible Interoperable Reproducible Digital Objects (FAIR Digital Objects / FDO) (Wilkinson et al. 2016, Schultes and Wittenburg 2018) and how using RO-Crate as an exchange mechanism of workflows and their execution history helps integrate Galaxy with the wider ecosystem of ELIXIR (Harrow et al. 2021) and the European Open Science Cloud (EOSC-Life) to enable FAIR and reproducible data analysis.
RO-Crate (Soiland-Reyes et al. 2022) is a generic packaging format containing datasets and their description using standards for FAIR Linked Data. The format is based on schema.org (Guha et al. 2016) annotations in JSON-LD, which allows for rich metadata representation. The RO-Crate effort aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments.
The RO-Crate community brings together practitioners from very different backgrounds, and with different motivations and use cases. Among the core target users are:
researchers engaged with computation and data-intensive, workflow-driven analysis;
digital repository managers and infrastructure providers;
individual researchers looking for a straightforward tool or how-to guide to “FAIRify” their data;
data stewards supporting research projects in creating and curating datasets.
researchers engaged with computation and data-intensive, workflow-driven analysis;
digital repository managers and infrastructure providers;
individual researchers looking for a straightforward tool or how-to guide to “FAIRify” their data;
data stewards supporting research projects in creating and curating datasets.
Given the wide applicability of RO-Crate and the lack of practical implementations of FDOs, ELIXIR (Harrow et al. 2021) co-opted this initiative as the project to define a common format for research data exchange and repository entries. Thus, during the last year it’s been implemented in a wide range of services, such as: WorkflowHub (Goble et al. 2021) (a registry for describing, sharing and publishing scientific computational workflows) uses RO-Crates as an exchange format to improve reproducibility of computational workflows that follow the Workflow RO-Crate profile (Bacall et al. 2022); LifeMonitor (Leo et al. 2022) (a service to support the sustainability of computational workflows being developed as part of the EOSC-Life project) uses RO-Crate as an exchange format for describing test suites associated with workflows.
Tools have been developed towards aiding the previously mentioned use cases and increasing the general usability of RO-Crates by providing a user-friendly (programmatic) interface for consumption and production of RO-Crates through programmatic libraries for consuming/producing RO-Crates (ro-crate-py De Geest et al. 2022, ro-crate-ruby Bacall and Whitwell 2022, ro-crate-js Lynch et al. 2021).
The Galaxy project provides a research environment with data analysis and data management functionalities as a multi user platform, aiming to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. As such, it stores not just analysis related data but also the complete analytical workflow, including its metadata. The internal data model involves the history entity, including all steps performed in a specific analysis, and the workflow entity, defining the structure of an analytical pipeline. From the start, Galaxy aims to enable reproducible analyses by providing capabilities to export (and import) all the analysis history details and workflow data and metadata in a FAIR way. As such it helps its users with the daily research data management. The Galaxy community is continuously improving and adding features, the integration of the FAIR Digital Object principles is a natural next step in this.
To be able to support these FDOs, Galaxy leverages the RO-Crate Python client library (De Geest et al. 2022) and provides multiple entry points to import and export different research data objects representing its internal entities and associated metadata. These objects include:
a workflow definition, which is used to share/publish the details of an analysis pipeline, including the graph of tools that need to be executed, and metadata about the data types required
individual data files or a collection of datasets related to an analysis history
a compressed archive of the entire analysis history including the metadata associated with it such as the tools used, their versions, the parameters chosen, workflow invocation related metadata, inputs, outputs, license, author, CWLProv description (Khan et al. 2019) of the workflow, contextual references in the form of Digital Object Identifiers (DOIs), ‘EMBRACE Data And Methods’ ontology (EDAM) terms (Ison et al. 2013), etc.
a workflow definition, which is used to share/publish the details of an analysis pipeline, including the graph of tools that need to be executed, and metadata about the data types required
individual data files or a collection of datasets related to an analysis history
a compressed archive of the entire analysis history including the metadata associated with it such as the tools used, their versions, the parameters chosen, workflow invocation related metadata, inputs, outputs, license, author, CWLProv description (Khan et al. 2019) of the workflow, contextual references in the form of Digital Object Identifiers (DOIs), ‘EMBRACE Data And Methods’ ontology (EDAM) terms (Ison et al. 2013), etc.
The adoption of RO-crate by Galaxy allows a standardised exchange of FDOs with other platforms in the ELIXIR Tools ecosystem, such as WorkflowHub and LifeMonitor. Integrating RO-Crate deeply into Galaxy and offering import and export options of various Galaxy objects such as Research Objects allows for increased standardisation, improved Research Data Management (RDM) functionalities, smoother user experience (UX) as well as improved interoperability with other systems. The integration in a platform used by biologists to do data intensive analysis, facilitates the publication of workflows and workflow invocations for all skill levels and democratises the ability to perform Open Science.
Collapse
|
17
|
Arend D, Psaroudakis D, Memon JA, Rey-Mazón E, Schüler D, Szymanski JJ, Scholz U, Junker A, Lange M. From data to knowledge - big data needs stewardship, a plant phenomics perspective. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:335-347. [PMID: 35535481 DOI: 10.1111/tpj.15804] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 05/02/2022] [Accepted: 05/06/2022] [Indexed: 06/14/2023]
Abstract
The research data life cycle from project planning to data publishing is an integral part of current research. Until the last decade, researchers were responsible for all associated phases in addition to the actual research and were assisted only at certain points by IT or bioinformaticians. Starting with advances in sequencing, the automation of analytical methods in all life science fields, including in plant phenotyping, has led to ever-increasing amounts of ever more complex data. The tasks associated with these challenges now often exceed the expertise of and infrastructure available to scientists, leading to an increased risk of data loss over time. The IPK Gatersleben has one of the world's largest germplasm collections and two decades of experience in crop plant research data management. In this article we show how challenges in modern, data-driven research can be addressed by data stewards. Based on concrete use cases, data management processes and best practices from plant phenotyping, we describe which expertise and skills are required and how data stewards as an integral actor can enhance the quality of a necessary digital transformation in progressive research.
Collapse
Affiliation(s)
- Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Dennis Psaroudakis
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Junaid Altaf Memon
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Elena Rey-Mazón
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Jedrzej Jakub Szymanski
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstraße 3, D-06466 Seeland, OT Gatersleben, Germany
| |
Collapse
|
18
|
Beier S, Fiebig A, Pommier C, Liyanage I, Lange M, Kersey PJ, Weise S, Finkers R, Koylass B, Cezard T, Courtot M, Contreras-Moreira B, Naamati G, Dyer S, Scholz U. Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR. F1000Res 2022; 11. [PMID: 35811804 PMCID: PMC9218589 DOI: 10.12688/f1000research.109080.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/17/2022] [Indexed: 11/20/2022] Open
Abstract
In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. They form a basis for the proposed VCF extensions here. We have learned from the existing application of VCF that the definition of relevant metadata using controlled standards, vocabulary and the consistent use of cross-references via resolvable identifiers (machine-readable) are particularly necessary and propose their encoding. VCF is an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant data (for example, the HapMap and the gVCF formats), but none currently have the reach of VCF. For the sake of simplicity, we will only discuss VCF and our recommendations for its use, but these recommendations could also be applied to gVCF. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains.
Collapse
Affiliation(s)
- Sebastian Beier
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
- Institute of Bio- and Geosciences, Bioinformatics (IBG-4), Forschungszentrum Jülich GmbH, Jülich, 52425, Germany
| | - Anne Fiebig
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Cyril Pommier
- BioinfOmics, Plant bioinformatics facility, Université Paris-Saclay, INRAE, Versailles, France
| | - Isuru Liyanage
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Matthias Lange
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | | | - Stephan Weise
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
- Gennovation B.V., Wageningen, The Netherlands
| | - Baron Koylass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Timothee Cezard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mélanie Courtot
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Ontario Institute for Cancer Research, Toronto, Canada
| | - Bruno Contreras-Moreira
- Laboratorio de Biología Computacional y Estructural, Estación Experimental Aula Dei-CSIC, Zaragoza, 50059, Spain
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Uwe Scholz
- Breeding Research, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| |
Collapse
|
19
|
Melo AM, Oliveira S, Oliveira JS, Martin CS, Leite RB. Making European performance and impact assessment frameworks for research infrastructures glocal. F1000Res 2022; 11:ELIXIR-278. [PMID: 36016992 PMCID: PMC9372636 DOI: 10.12688/f1000research.108804.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/29/2022] [Indexed: 11/23/2022] Open
Abstract
Sustainability of research infrastructures (RIs) is a big challenge for funders, stakeholders and operators, and the development and adoption of adequate management tools is a major concern, namely tools for monitoring and evaluating their performance and impact. BioData.pt is the Portuguese Infrastructure of Biological data and the Portuguese node of the European Strategy Forum on Research Infrastructures "Landmark" ELIXIR. The foundations of this national research infrastructure were laid under the "Building BioData.pt" project, for four years. During this period, performance and impact indicators were collected and analysed under the light of international guidelines for assessing the performance and impact of European research infrastructures produced by the European Strategy Forum on Research Infrastructures, the Organisation for Economic Co-operation and Development and the EU-funded RI-PATHS project. The exercise shared herein showed that these frameworks can be adopted by national RIs, with the necessary adaptations, namely to reflect the national landscape and specificity of activities, and can be powerful tools in supporting the management of RIs. "Not everything that counts can be counted, and not everything that can be counted, counts". (Attributed to William Bruce Cameron).
Collapse
Affiliation(s)
- Ana M.P. Melo
- BioData.pt - Portuguese Infrastructure of Biological Data, Oeiras, Portugal
- INESC ID - Instituto Nacional de Engenharias de Sistemas e Computadores - Investigação e Desenvolvimento, Lisboa, Portugal
| | - Sofia Oliveira
- BioData.pt - Portuguese Infrastructure of Biological Data, Oeiras, Portugal
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | - Jorge S. Oliveira
- BioData.pt - Portuguese Infrastructure of Biological Data, Oeiras, Portugal
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | - Corinne S. Martin
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ricardo B. Leite
- BioData.pt - Portuguese Infrastructure of Biological Data, Oeiras, Portugal
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
| |
Collapse
|
20
|
Melo AM, Oliveira S, Oliveira JS, Martin CS, Leite RB. Making European performance and impact assessment frameworks glocal. F1000Res 2022; 11:ELIXIR-278. [PMID: 36016992 PMCID: PMC9372636 DOI: 10.12688/f1000research.108804.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/16/2022] [Indexed: 01/30/2024] Open
Abstract
Sustainability of research infrastructures (RIs) is a big challenge for funders, stakeholders and operators, and the development and adoption of adequate management tools is a major concern, namely tools for monitoring and evaluating their performance and impact. BioData.pt is the Portuguese Infrastructure of Biological and Portuguese node of the European Strategy Forum on Research Infrastructures "Landmark" ELIXIR. The foundations of this national research infrastructure were laid under the "Building BioData.pt" project, for four years. During this period, performance and impact indicators were collected and analysed under the light of international guidelines for assessing the performance and impact of European research infrastructures produced by the European Strategy Forum on Research Infrastructures, the Organisation for Economic Co-operation and Development and the EU-funded RI-PATHS project. The exercise shared herein showed that these frameworks can be adopted by national RIs, with the necessary adaptations, namely to reflect the national landscape and specificity of activities, and can be powerful tools in supporting the management of RIs. "Not everything that counts can be counted, and not everything that can be counted, counts". Albert Einstein, Theoretical physicist and Nobel Prize winner.
Collapse
Affiliation(s)
- Ana M.P. Melo
- BioData.pt - Portuguese Infrastructure of Biological Data, Oeiras, Portugal
- INESC ID - Instituto Nacional de Engenharias de Sistemas e Computadores - Investigação e Desenvolvimento, Lisboa, Portugal
| | - Sofia Oliveira
- BioData.pt - Portuguese Infrastructure of Biological Data, Oeiras, Portugal
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | - Jorge S. Oliveira
- BioData.pt - Portuguese Infrastructure of Biological Data, Oeiras, Portugal
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | - Corinne S. Martin
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Ricardo B. Leite
- BioData.pt - Portuguese Infrastructure of Biological Data, Oeiras, Portugal
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
| |
Collapse
|
21
|
Towards efficient use of data, models and tools in food microbiology. Curr Opin Food Sci 2022. [DOI: 10.1016/j.cofs.2022.100834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
22
|
Abstract
Research and development are facilitated by sharing knowledge bases, and the innovation process benefits from collaborative efforts that involve the collective utilization of data. Until now, most companies and organizations have produced and collected various types of data, and stored them in data silos that still have to be integrated with one another in order to enable knowledge creation. For this to happen, both public and private actors must adopt a flexible approach to achieve the necessary transition to break data silos and create collaborative data sharing between data producers and users. In this paper, we investigate several factors influencing cooperative data usage and explore the challenges posed by the participation in cross-organizational data ecosystems by performing an interview study among stakeholders from private and public organizations in the context of the project IDE@S, which aims at fostering the cooperation in data science in the Austrian federal state of Styria. We highlight technological and organizational requirements of data infrastructure, expertise, and practises towards collaborative data usage.
Collapse
|
23
|
Freeberg MA, Fromont LA, D’Altri T, Romero AF, Ciges J, Jene A, Kerry G, Moldes M, Ariosa R, Bahena S, Barrowdale D, Barbero M, Fernandez-Orth D, Garcia-Linares C, Garcia-Rios E, Haziza F, Juhasz B, Llobet O, Milla G, Mohan A, Rueda M, Sankar A, Shaju D, Shimpi A, Singh B, Thomas C, de la Torre S, Uyan U, Vasallo C, Flicek P, Guigo R, Navarro A, Parkinson H, Keane T, Rambla J. The European Genome-phenome Archive in 2021. Nucleic Acids Res 2022; 50:D980-D987. [PMID: 34791407 PMCID: PMC8728218 DOI: 10.1093/nar/gkab1059] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 10/08/2021] [Accepted: 10/22/2021] [Indexed: 12/27/2022] Open
Abstract
The European Genome-phenome Archive (EGA - https://ega-archive.org/) is a resource for long term secure archiving of all types of potentially identifiable genetic, phenotypic, and clinical data resulting from biomedical research projects. Its mission is to foster hosted data reuse, enable reproducibility, and accelerate biomedical and translational research in line with the FAIR principles. Launched in 2008, the EGA has grown quickly, currently archiving over 4,500 studies from nearly one thousand institutions. The EGA operates a distributed data access model in which requests are made to the data controller, not to the EGA, therefore, the submitter keeps control on who has access to the data and under which conditions. Given the size and value of data hosted, the EGA is constantly improving its value chain, that is, how the EGA can contribute to enhancing the value of human health data by facilitating its submission, discovery, access, and distribution, as well as leading the design and implementation of standards and methods necessary to deliver the value chain. The EGA has become a key GA4GH Driver Project, leading multiple development efforts and implementing new standards and tools, and has been appointed as an ELIXIR Core Data Resource.
Collapse
Affiliation(s)
- Mallory Ann Freeberg
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Lauren A Fromont
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Teresa D’Altri
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Anna Foix Romero
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Jorge Izquierdo Ciges
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Aina Jene
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Giselle Kerry
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Mauricio Moldes
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Roberto Ariosa
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Silvia Bahena
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Daniel Barrowdale
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Marcos Casado Barbero
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Dietmar Fernandez-Orth
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Carles Garcia-Linares
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Emilio Garcia-Rios
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Frédéric Haziza
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Bela Juhasz
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Oscar Martinez Llobet
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Gemma Milla
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Anand Mohan
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Manuel Rueda
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Aravind Sankar
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Dona Shaju
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Ashutosh Shimpi
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Babita Singh
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Coline Thomas
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Sabela de la Torre
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Umuthan Uyan
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Claudia Vasallo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Arcadi Navarro
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Helen Parkinson
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Thomas Keane
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Jordi Rambla
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| |
Collapse
|
24
|
Bansal P, Morgat A, Axelsen KB, Muthukrishnan V, Coudert E, Aimo L, Hyka-Nouspikel N, Gasteiger E, Kerhornou A, Neto TB, Pozzato M, Blatter MC, Ignatchenko A, Redaschi N, Bridge A. Rhea, the reaction knowledgebase in 2022. Nucleic Acids Res 2022; 50:D693-D700. [PMID: 34755880 PMCID: PMC8728268 DOI: 10.1093/nar/gkab1016] [Citation(s) in RCA: 56] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/08/2021] [Accepted: 11/09/2021] [Indexed: 12/15/2022] Open
Abstract
Rhea (https://www.rhea-db.org) is an expert-curated knowledgebase of biochemical reactions based on the chemical ontology ChEBI (Chemical Entities of Biological Interest) (https://www.ebi.ac.uk/chebi). In this paper, we describe a number of key developments in Rhea since our last report in the database issue of Nucleic Acids Research in 2019. These include improved reaction coverage in Rhea, the adoption of Rhea as the reference vocabulary for enzyme annotation in the UniProt knowledgebase UniProtKB (https://www.uniprot.org), the development of a new Rhea website, and the designation of Rhea as an ELIXIR Core Data Resource. We hope that these and other developments will enhance the utility of Rhea as a reference resource to study and engineer enzymes and the metabolic systems in which they function.
Collapse
Affiliation(s)
- Parit Bansal
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Anne Morgat
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Kristian B Axelsen
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Venkatesh Muthukrishnan
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Elisabeth Coudert
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Lucila Aimo
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Nevila Hyka-Nouspikel
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Elisabeth Gasteiger
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Arnaud Kerhornou
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Teresa Batista Neto
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Monica Pozzato
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Marie-Claude Blatter
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Alex Ignatchenko
- EMBL-EBI European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicole Redaschi
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| | - Alan Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, CH-1211 Geneva 4, Switzerland
| |
Collapse
|
25
|
Grapevine and Wine Metabolomics-Based Guidelines for FAIR Data and Metadata Management. Metabolites 2021; 11:metabo11110757. [PMID: 34822415 PMCID: PMC8618349 DOI: 10.3390/metabo11110757] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 10/29/2021] [Accepted: 10/30/2021] [Indexed: 01/12/2023] Open
Abstract
In the era of big and omics data, good organization, management, and description of experimental data are crucial for achieving high-quality datasets. This, in turn, is essential for the export of robust results, to publish reliable papers, make data more easily available, and unlock the huge potential of data reuse. Lately, more and more journals now require authors to share data and metadata according to the FAIR (Findable, Accessible, Interoperable, Reusable) principles. This work aims to provide a step-by-step guideline for the FAIR data and metadata management specific to grapevine and wine science. In detail, the guidelines include recommendations for the organization of data and metadata regarding (i) meaningful information on experimental design and phenotyping, (ii) sample collection, (iii) sample preparation, (iv) chemotype analysis, (v) data analysis (vi) metabolite annotation, and (vii) basic ontologies. We hope that these guidelines will be helpful for the grapevine and wine metabolomics community and that it will benefit from the true potential of data usage in creating new knowledge being revealed.
Collapse
|