1
|
Seep L, Grein S, Splichalova I, Ran D, Mikhael M, Hildebrand S, Lauterbach M, Hiller K, Ribeiro DJS, Sieckmann K, Kardinal R, Huang H, Yu J, Kallabis S, Behrens J, Till A, Peeva V, Strohmeyer A, Bruder J, Blum T, Soriano-Arroquia A, Tischer D, Kuellmer K, Li Y, Beyer M, Gellner AK, Fromme T, Wackerhage H, Klingenspor M, Fenske WK, Scheja L, Meissner F, Schlitzer A, Mass E, Wachten D, Latz E, Pfeifer A, Hasenauer J. From Planning Stage Towards FAIR Data: A Practical Metadatasheet For Biomedical Scientists. Sci Data 2024; 11:524. [PMID: 38778016 PMCID: PMC11111677 DOI: 10.1038/s41597-024-03349-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 05/08/2024] [Indexed: 05/25/2024] Open
Abstract
Datasets consist of measurement data and metadata. Metadata provides context, essential for understanding and (re-)using data. Various metadata standards exist for different methods, systems and contexts. However, relevant information resides at differing stages across the data-lifecycle. Often, this information is defined and standardized only at publication stage, which can lead to data loss and workload increase. In this study, we developed Metadatasheet, a metadata standard based on interviews with members of two biomedical consortia and systematic screening of data repositories. It aligns with the data-lifecycle allowing synchronous metadata recording within Microsoft Excel, a widespread data recording software. Additionally, we provide an implementation, the Metadata Workbook, that offers user-friendly features like automation, dynamic adaption, metadata integrity checks, and export options for various metadata standards. By design and due to its extensive documentation, the proposed metadata standard simplifies recording and structuring of metadata for biomedical scientists, promoting practicality and convenience in data management. This framework can accelerate scientific progress by enhancing collaboration and knowledge transfer throughout the intermediate steps of data creation.
Collapse
Affiliation(s)
- Lea Seep
- Computational Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Stephan Grein
- Computational Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Iva Splichalova
- Developmental Biology of the Immune System, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Danli Ran
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Mickel Mikhael
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Staffan Hildebrand
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Mario Lauterbach
- Department of Bioinformatics and Biochemistry, Technical University Braunschweig, Braunschweig, Germany
| | - Karsten Hiller
- Department of Bioinformatics and Biochemistry, Technical University Braunschweig, Braunschweig, Germany
| | | | - Katharina Sieckmann
- Institute of Innate Immunity, University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Ronja Kardinal
- Institute of Innate Immunity, University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Hao Huang
- Developmental Biology of the Immune System, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Jiangyan Yu
- Computational Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
- Quantitative Systems Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Sebastian Kallabis
- Systems Immunology and Proteomics, Institute of Innate Immunity, Medical Faculty, University of Bonn, Bonn, Germany
| | - Janina Behrens
- Department of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Andreas Till
- Department of Internal Medicine I, Division of Endocrinology, Diabetes and Metabolism, University Medical Center Bonn, Bonn, Germany
| | - Viktoriya Peeva
- Department of Internal Medicine I, Division of Endocrinology, Diabetes and Metabolism, University Medical Center Bonn, Bonn, Germany
| | - Akim Strohmeyer
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Johanna Bruder
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Tobias Blum
- Immunology and Environment, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Ana Soriano-Arroquia
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Dominik Tischer
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
| | - Katharina Kuellmer
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Yuanfang Li
- Immunogenomics & Neurodegeneration, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
| | - Marc Beyer
- Immunogenomics & Neurodegeneration, German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
- PRECISE, Platform for Single Cell Genomics and Epigenomics at the German Center for Neurodegenerative Diseases and the University of Bonn, Bonn, Germany
| | - Anne-Kathrin Gellner
- Department of Psychiatry and Psychotherapy, University Hospital Bonn, Bonn, Germany
- Institute of Physiology II, Medical Faculty, University of Bonn, Bonn, Germany
| | - Tobias Fromme
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Henning Wackerhage
- School for Medicine and Health, Faculty of Sport and Health Sciences, Technical University of Munich, Munich, Germany
| | - Martin Klingenspor
- Chair of Molecular Nutritional Medicine, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- EKFZ-Else Kröner-Fresenius Center for Nutritional Medicine, Technical University of Munich, Freising, Germany
- ZIEL Institute for Food & Health, Technical University of Munich, Freising, Germany
| | - Wiebke K Fenske
- Department of Internal Medicine I, Division of Endocrinology, Diabetes and Metabolism, University Medical Center Bonn, Bonn, Germany
- Department of Internal Medicine I - Endocrinology, Diabetology and Metabolism, Gastroenterology and Hepatology, University Hospital Bergmannsheil, Bochum, Germany
| | - Ludger Scheja
- Department of Biochemistry and Molecular Cell Biology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Felix Meissner
- Systems Immunology and Proteomics, Institute of Innate Immunity, Medical Faculty, University of Bonn, Bonn, Germany
- Experimental Systems Immunology, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Andreas Schlitzer
- Quantitative Systems Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Elvira Mass
- Developmental Biology of the Immune System, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | - Dagmar Wachten
- Institute of Innate Immunity, University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Eicke Latz
- Institute of Innate Immunity, University Hospital Bonn, University of Bonn, Bonn, Germany
| | - Alexander Pfeifer
- Institute of Pharmacology and Toxicology, University Hospital, University of Bonn, Bonn, Germany
- PharmaCenter Bonn, University of Bonn, Bonn, Germany
| | - Jan Hasenauer
- Computational Biology, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany.
- Helmholtz Center Munich, German Research Center for Environmental Health, Computational Health Center, Munich, Germany.
| |
Collapse
|
2
|
LeRoy NJ, Khoroshevskyi O, O’Brien A, Stepień R, Arslan A, Sheffield NC. PEPhub: a database, web interface, and API for editing, sharing, and validating biological sample metadata. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.15.551388. [PMID: 37645717 PMCID: PMC10462087 DOI: 10.1101/2023.08.15.551388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Background As biological data increases, we need additional infrastructure to share it and promote interoperability. While major effort has been put into sharing data, relatively less emphasis is placed on sharing metadata. Yet, sharing metadata is also important, and in some ways has a wider scope than sharing data itself. Results Here, we present PEPhub, an approach to improve sharing and interoperability of biological metadata. PEPhub provides an API, natural language search, and user-friendly web-based sharing and editing of sample metadata tables. We used PEPhub to process more than 100,000 published biological research projects and index them with fast semantic natural language search. PEPhub thus provides a fast and user-friendly way to finding existing biological research data, or to share new data. Availability https://pephub.databio.org.
Collapse
Affiliation(s)
- Nathan J. LeRoy
- Center for Public Health Genomics, School of Medicine, University of Virginia, 22908, Charlottesville VA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, 22904, Charlottesville VA
| | - Oleksandr Khoroshevskyi
- Center for Public Health Genomics, School of Medicine, University of Virginia, 22908, Charlottesville VA
| | - Aaron O’Brien
- Center for Public Health Genomics, School of Medicine, University of Virginia, 22908, Charlottesville VA
| | - Rafał Stepień
- Center for Public Health Genomics, School of Medicine, University of Virginia, 22908, Charlottesville VA
| | - Alip Arslan
- Department of Computer Science, School of Engineering, University of Virginia, 22908, Charlottesville VA
| | - Nathan C. Sheffield
- Center for Public Health Genomics, School of Medicine, University of Virginia, 22908, Charlottesville VA
- School of Data Science, University of Virginia, Charlottesville VA 22904, Charlottesville VA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, 22904, Charlottesville VA
- Department of Public Health Sciences, School of Medicine, University of Virginia, 22908, Charlottesville VA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, 22908, Charlottesville VA
- Child Health Research Center, School of Medicine, University of Virginia, 22908, Charlottesville VA
| |
Collapse
|
3
|
Vahdati S, Khosravi B, Mahmoudi E, Zhang K, Rouzrokh P, Faghani S, Moassefi M, Tahmasebi A, Andriole KP, Chang P, Farahani K, Flores MG, Folio L, Houshmand S, Giger ML, Gichoya JW, Erickson BJ. A Guideline for Open-Source Tools to Make Medical Imaging Data Ready for Artificial Intelligence Applications: A Society of Imaging Informatics in Medicine (SIIM) Survey. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01083-0. [PMID: 38558368 DOI: 10.1007/s10278-024-01083-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/29/2024] [Accepted: 03/08/2024] [Indexed: 04/04/2024]
Abstract
In recent years, the role of Artificial Intelligence (AI) in medical imaging has become increasingly prominent, with the majority of AI applications approved by the FDA being in imaging and radiology in 2023. The surge in AI model development to tackle clinical challenges underscores the necessity for preparing high-quality medical imaging data. Proper data preparation is crucial as it fosters the creation of standardized and reproducible AI models while minimizing biases. Data curation transforms raw data into a valuable, organized, and dependable resource and is a fundamental process to the success of machine learning and analytical projects. Considering the plethora of available tools for data curation in different stages, it is crucial to stay informed about the most relevant tools within specific research areas. In the current work, we propose a descriptive outline for different steps of data curation while we furnish compilations of tools collected from a survey applied among members of the Society of Imaging Informatics (SIIM) for each of these stages. This collection has the potential to enhance the decision-making process for researchers as they select the most appropriate tool for their specific tasks.
Collapse
Affiliation(s)
- Sanaz Vahdati
- Artificial Intelligence Laboratory, Department of Radiology, Mayo Clinic, 200 1st Street, SW, Rochester, MN, 55905, USA
| | - Bardia Khosravi
- Artificial Intelligence Laboratory, Department of Radiology, Mayo Clinic, 200 1st Street, SW, Rochester, MN, 55905, USA
| | - Elham Mahmoudi
- Artificial Intelligence Laboratory, Department of Radiology, Mayo Clinic, 200 1st Street, SW, Rochester, MN, 55905, USA
| | - Kuan Zhang
- Artificial Intelligence Laboratory, Department of Radiology, Mayo Clinic, 200 1st Street, SW, Rochester, MN, 55905, USA
| | - Pouria Rouzrokh
- Artificial Intelligence Laboratory, Department of Radiology, Mayo Clinic, 200 1st Street, SW, Rochester, MN, 55905, USA
| | - Shahriar Faghani
- Artificial Intelligence Laboratory, Department of Radiology, Mayo Clinic, 200 1st Street, SW, Rochester, MN, 55905, USA
| | - Mana Moassefi
- Artificial Intelligence Laboratory, Department of Radiology, Mayo Clinic, 200 1st Street, SW, Rochester, MN, 55905, USA
| | - Aylin Tahmasebi
- Department of Radiology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Katherine P Andriole
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Peter Chang
- Department of Radiological Sciences, Irvine Medical Center, University of California, Orange, CA, USA
| | - Keyvan Farahani
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, Bethesda, MD, USA
| | | | - Les Folio
- Diagnostic Imaging & Interventional Radiology Moffitt Cancer Center, Tampa, FL, USA
| | - Sina Houshmand
- Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA, USA
| | - Maryellen L Giger
- Department of Radiology, The University of Chicago, Chicago, IL, USA
| | - Judy W Gichoya
- Department of Radiology, Emory University School of Medicine, Atlanta, GA, USA
| | - Bradley J Erickson
- Artificial Intelligence Laboratory, Department of Radiology, Mayo Clinic, 200 1st Street, SW, Rochester, MN, 55905, USA.
| |
Collapse
|
4
|
Kumar B, Lorusso E, Fosso B, Pesole G. A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions. Front Microbiol 2024; 15:1343572. [PMID: 38419630 PMCID: PMC10900530 DOI: 10.3389/fmicb.2024.1343572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 01/29/2024] [Indexed: 03/02/2024] Open
Abstract
Metagenomics, Metabolomics, and Metaproteomics have significantly advanced our knowledge of microbial communities by providing culture-independent insights into their composition and functional potential. However, a critical challenge in this field is the lack of standard and comprehensive metadata associated with raw data, hindering the ability to perform robust data stratifications and consider confounding factors. In this comprehensive review, we categorize publicly available microbiome data into five types: shotgun sequencing, amplicon sequencing, metatranscriptomic, metabolomic, and metaproteomic data. We explore the importance of metadata for data reuse and address the challenges in collecting standardized metadata. We also, assess the limitations in metadata collection of existing public repositories collecting metagenomic data. This review emphasizes the vital role of metadata in interpreting and comparing datasets and highlights the need for standardized metadata protocols to fully leverage metagenomic data's potential. Furthermore, we explore future directions of implementation of Machine Learning (ML) in metadata retrieval, offering promising avenues for a deeper understanding of microbial communities and their ecological roles. Leveraging these tools will enhance our insights into microbial functional capabilities and ecological dynamics in diverse ecosystems. Finally, we emphasize the crucial metadata role in ML models development.
Collapse
Affiliation(s)
- Bablu Kumar
- Università degli Studi di Milano, Milan, Italy
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
| | - Erika Lorusso
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
- National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy
| | - Bruno Fosso
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
| | - Graziano Pesole
- Department of Biosciences, Biotechnology and Environment, University of Bari A. Moro, Bari, Italy
- National Research Council, Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Bari, Italy
| |
Collapse
|
5
|
Samuel S, Mietchen D. Computational reproducibility of Jupyter notebooks from biomedical publications. Gigascience 2024; 13:giad113. [PMID: 38206590 PMCID: PMC10783158 DOI: 10.1093/gigascience/giad113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 08/09/2023] [Accepted: 12/08/2023] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows, including for research publications. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications. APPROACH We address computational reproducibility at 2 levels: (i) using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks associated with publications indexed in the biomedical literature repository PubMed Central. We identified such notebooks by mining the article's full text, trying to locate them on GitHub, and attempting to rerun them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. (ii) This study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over the course of 2 years, during which the corpus of Jupyter notebooks from articles indexed in PubMed Central has grown in a highly dynamic fashion. RESULTS Out of 27,271 Jupyter notebooks from 2,660 GitHub repositories associated with 3,467 publications, 22,578 notebooks were written in Python, including 15,817 that had their dependencies declared in standard requirement files and that we attempted to rerun automatically. For 10,388 of these, all declared dependencies could be installed successfully, and we reran them to assess reproducibility. Of these, 1,203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions. CONCLUSIONS We zoom in on common problems and practices, highlight trends, and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.
Collapse
Affiliation(s)
- Sheeba Samuel
- Heinz-Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Jena 07743, Germany
- Michael Stifel Center Jena, Jena 07743, Germany
| | - Daniel Mietchen
- Ronin Institute, Montclair 07043-2314, NJ, United States
- Institute for Globally Distributed Open Research and Education (IGDORE)
- FIZ Karlsruhe—Leibniz Institute for Information Infrastructure, Berlin 76344, Germany
| |
Collapse
|
6
|
Cerk K, Ugalde‐Salas P, Nedjad CG, Lecomte M, Muller C, Sherman DJ, Hildebrand F, Labarthe S, Frioux C. Community-scale models of microbiomes: Articulating metabolic modelling and metagenome sequencing. Microb Biotechnol 2024; 17:e14396. [PMID: 38243750 PMCID: PMC10832553 DOI: 10.1111/1751-7915.14396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 11/27/2023] [Accepted: 12/20/2023] [Indexed: 01/21/2024] Open
Abstract
Building models is essential for understanding the functions and dynamics of microbial communities. Metabolic models built on genome-scale metabolic network reconstructions (GENREs) are especially relevant as a means to decipher the complex interactions occurring among species. Model reconstruction increasingly relies on metagenomics, which permits direct characterisation of naturally occurring communities that may contain organisms that cannot be isolated or cultured. In this review, we provide an overview of the field of metabolic modelling and its increasing reliance on and synergy with metagenomics and bioinformatics. We survey the means of assigning functions and reconstructing metabolic networks from (meta-)genomes, and present the variety and mathematical fundamentals of metabolic models that foster the understanding of microbial dynamics. We emphasise the characterisation of interactions and the scaling of model construction to large communities, two important bottlenecks in the applicability of these models. We give an overview of the current state of the art in metagenome sequencing and bioinformatics analysis, focusing on the reconstruction of genomes in microbial communities. Metagenomics benefits tremendously from third-generation sequencing, and we discuss the opportunities of long-read sequencing, strain-level characterisation and eukaryotic metagenomics. We aim at providing algorithmic and mathematical support, together with tool and application resources, that permit bridging the gap between metagenomics and metabolic modelling.
Collapse
Affiliation(s)
- Klara Cerk
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | | | - Chabname Ghassemi Nedjad
- Inria, University of Bordeaux, INRAETalenceFrance
- University of Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800TalenceFrance
| | - Maxime Lecomte
- Inria, University of Bordeaux, INRAETalenceFrance
- INRAE STLO¸University of RennesRennesFrance
| | | | | | - Falk Hildebrand
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | - Simon Labarthe
- Inria, University of Bordeaux, INRAETalenceFrance
- INRAE, University of Bordeaux, BIOGECO, UMR 1202CestasFrance
| | | |
Collapse
|
7
|
Mackenzie A, Lewis E, Loveland J. Successes and challenges in extracting information from DICOM image databases for audit and research. Br J Radiol 2023; 96:20230104. [PMID: 37698251 PMCID: PMC10607388 DOI: 10.1259/bjr.20230104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 05/05/2023] [Accepted: 05/11/2023] [Indexed: 09/13/2023] Open
Abstract
In radiography, much valuable associated data (metadata) is generated during image acquisition. The current setup of picture archive and communication systems (PACS) can make extraction of this metadata difficult, especially as it is typically stored with the image. The aim of this work is to examine the current challenges in extracting image metadata and to discuss the potential benefits of using this rich information. This work focuses on breast screening, though the conclusions are applicable to other modalities.The data stored in PACS contain information, currently underutilised, and is of great benefit for auditing and improving imaging and radiographic practice. From the literature, we present examples of the potential clinical benefit such as audits of dose, and radiographic practice, as well as more advanced research highlighting the effects of radiographic practice, e.g. cancer detection rates affected by imaging technology.This review considers the challenges in extracting data, namely,• The search tools for data on most PACS are inadequate being both time-consuming and limited in elements that can be searched.• Security and information governance considerations• Anonymisation of data if required• Data curationThe review describes some solutions that have been successfully implemented.• Retrospective extraction: direct query on PACS• Extracting data prospectively• Use of structured reports• Use of trusted research environmentsUltimately, the data access process will be made easier by inclusion during PACS procurement. Auditing data from PACS can be used to improve quality of imaging and workflow, all of which will be a clinical benefit to patients.
Collapse
Affiliation(s)
| | | | - John Loveland
- NCCPM, Royal Surrey NHS Foundation Trust, Guildford, United Kingdom
| |
Collapse
|
8
|
Yang J, Liu Y, Shang J, Chen Q, Chen Q, Ren L, Zhang N, Yu Y, Li Z, Song Y, Yang S, Scherer A, Tong W, Hong H, Xiao W, Shi L, Zheng Y. The Quartet Data Portal: integration of community-wide resources for multiomics quality control. Genome Biol 2023; 24:245. [PMID: 37884999 PMCID: PMC10601216 DOI: 10.1186/s13059-023-03091-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 10/17/2023] [Indexed: 10/28/2023] Open
Abstract
The Quartet Data Portal facilitates community access to well-characterized reference materials, reference datasets, and related resources established based on a family of four individuals with identical twins from the Quartet Project. Users can request DNA, RNA, protein, and metabolite reference materials, as well as datasets generated across omics, platforms, labs, protocols, and batches. Reproducible analysis tools allow for objective performance assessment of user-submitted data, while interactive visualization tools support rapid exploration of reference datasets. A closed-loop "distribution-collection-evaluation-integration" workflow enables updates and integration of community-contributed multiomics data. Ultimately, this portal helps promote the advancement of reference datasets and multiomics quality control.
Collapse
Affiliation(s)
- Jingcheng Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
- Greater Bay Area Institute of Precision Medicine, Guangzhou, Guangdong, China
| | - Yaqing Liu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Jun Shang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qiaochu Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Qingwang Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Naixin Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Zhihui Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Yueqiang Song
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Shengpeng Yang
- Intelligent Storage, Alibaba Cloud, Alibaba Group, Hangzhou, Zhejiang, China
| | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- EATRIS ERIC-European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, USA
| | - Wenming Xiao
- Office of Oncological Diseases, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
- International Human Phenome Institutes (Shanghai), Shanghai, China.
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Human Phenome Institute and Shanghai Cancer Center, Fudan University, Shanghai, China.
| |
Collapse
|
9
|
Xue B, Khoroshevskyi O, Gomez RA, Sheffield NC. Opportunities and challenges in sharing and reusing genomic interval data. Front Genet 2023; 14:1155809. [PMID: 37020996 PMCID: PMC10067617 DOI: 10.3389/fgene.2023.1155809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 03/07/2023] [Indexed: 03/22/2023] Open
Affiliation(s)
- Bingjie Xue
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, United States
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA, United States
| | - Oleksandr Khoroshevskyi
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, United States
| | - R. Ariel Gomez
- Child Health Research Center, School of Medicine, University of Virginia, Charlottesville, VA, United States
| | - Nathan C. Sheffield
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, United States
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA, United States
- Child Health Research Center, School of Medicine, University of Virginia, Charlottesville, VA, United States
- School of Data Science, University of Virginia, Charlottesville, VA, United States
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA, United States
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA, United States
- *Correspondence: Nathan C. Sheffield,
| |
Collapse
|
10
|
Pokutnaya D, Childers B, Arcury-Quandt AE, Hochheiser H, Van Panhuis WG. An implementation framework to improve the transparency and reproducibility of computational models of infectious diseases. PLoS Comput Biol 2023; 19:e1010856. [PMID: 36928042 PMCID: PMC10019712 DOI: 10.1371/journal.pcbi.1010856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023] Open
Abstract
Computational models of infectious diseases have become valuable tools for research and the public health response against epidemic threats. The reproducibility of computational models has been limited, undermining the scientific process and possibly trust in modeling results and related response strategies, such as vaccination. We translated published reproducibility guidelines from a wide range of scientific disciplines into an implementation framework for improving reproducibility of infectious disease computational models. The framework comprises 22 elements that should be described, grouped into 6 categories: computational environment, analytical software, model description, model implementation, data, and experimental protocol. The framework can be used by scientific communities to develop actionable tools for sharing computational models in a reproducible way.
Collapse
Affiliation(s)
- Darya Pokutnaya
- University of Pittsburgh, Department of Epidemiology, Pittsburgh, Pennsylvania, United States of America
| | - Bruce Childers
- University of Pittsburgh, Department of Computer Science, Pittsburgh, Pennsylvania, United States of America
| | - Alice E. Arcury-Quandt
- University of Pittsburgh, Department of Epidemiology, Pittsburgh, Pennsylvania, United States of America
| | - Harry Hochheiser
- University of Pittsburgh, Department of Biomedical Informatics and Intelligent Systems Program, Pittsburgh, Pennsylvania, United States of America
| | - Willem G. Van Panhuis
- University of Pittsburgh, Department of Epidemiology, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
11
|
Tsueng G, Cano MAA, Bento J, Czech C, Kang M, Pache L, Rasmussen LV, Savidge TC, Starren J, Wu Q, Xin J, Yeaman MR, Zhou X, Su AI, Wu C, Brown L, Shabman RS, Hughes LD. Developing a standardized but extendable framework to increase the findability of infectious disease datasets. Sci Data 2023; 10:99. [PMID: 36823157 PMCID: PMC9950378 DOI: 10.1038/s41597-023-01968-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 01/13/2023] [Indexed: 02/25/2023] Open
Abstract
Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability.
Collapse
Affiliation(s)
- Ginger Tsueng
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
| | - Marco A Alvarado Cano
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - José Bento
- Department of Computer Science, Boston College, 245 Beacon St, Chestnut Hill, MA, 02467, USA
| | - Candice Czech
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Mengjia Kang
- Division of Pulmonary and Critical Care, Feinberg School of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Lars Pache
- Infectious and Inflammatory Disease Center, Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Tor C Savidge
- Texas Children's Microbiome Center & Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Justin Starren
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Qinglong Wu
- Texas Children's Microbiome Center & Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Jiwen Xin
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Michael R Yeaman
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Divisions of Molecular Medicine and Infectious Diseases, Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
- Lundquist Institute for Infection & Immunity at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
| | - Xinghua Zhou
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Andrew I Su
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Scripps Research Translational Institute, La Jolla, CA, 92037, USA
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Chunlei Wu
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Scripps Research Translational Institute, La Jolla, CA, 92037, USA
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Liliana Brown
- Office of Genomics and Advanced Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD, 20852, USA
| | - Reed S Shabman
- Office of Genomics and Advanced Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD, 20852, USA
| | - Laura D Hughes
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
| |
Collapse
|
12
|
Duong-Trung N, Born S, Kim JW, Schermeyer MT, Paulick K, Borisyak M, Cruz-Bournazou MN, Werner T, Scholz R, Schmidt-Thieme L, Neubauer P, Martinez E. When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development. Biochem Eng J 2022. [DOI: 10.1016/j.bej.2022.108764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
13
|
Lubbock ALR, Lopez CF. Microbench: automated metadata management for systems biology benchmarking and reproducibility in Python. Bioinformatics 2022; 38:4823-4825. [PMID: 36000837 PMCID: PMC9563693 DOI: 10.1093/bioinformatics/btac580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 02/21/2022] [Accepted: 08/23/2022] [Indexed: 11/21/2022] Open
Abstract
MOTIVATION Computational systems biology analyses typically make use of multiple software and their dependencies, which are often run across heterogeneous compute environments. This can introduce differences in performance and reproducibility. Capturing metadata (e.g. package versions, GPU model) currently requires repetitious code and is difficult to store centrally for analysis. Even where virtual environments and containers are used, updates over time mean that versioning metadata should still be captured within analysis pipelines to guarantee reproducibility. RESULTS Microbench is a simple and extensible Python package to automate metadata capture to a file or Redis database. Captured metadata can include execution time, software package versions, environment variables, hardware information, Python version and more, with plugins. We present three case studies demonstrating Microbench usage to benchmark code execution and examine environment metadata for reproducibility purposes. AVAILABILITY AND IMPLEMENTATION Install from the Python Package Index using pip install microbench. Source code is available from https://github.com/alubbock/microbench. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexander L R Lubbock
- Department of Biochemistry, Vanderbilt University, Nashville, TN 37232, USA
- Vanderbilt-Ingram Cancer Center, Vanderbilt University, Nashville, TN 37232, USA
| | - Carlos F Lopez
- Department of Biochemistry, Vanderbilt University, Nashville, TN 37232, USA
- Vanderbilt-Ingram Cancer Center, Vanderbilt University, Nashville, TN 37232, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| |
Collapse
|
14
|
Lortie CJ, Vargas Poulsen C, Brun J, Kui L. Tabular strategies for metadata in ecology, evolution, and the environmental sciences. Ecol Evol 2022; 12:e9245. [PMID: 36035265 PMCID: PMC9405493 DOI: 10.1002/ece3.9245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 08/04/2022] [Indexed: 11/07/2022] Open
Abstract
Data support knowledge development and theory advances in ecology and evolution. We are increasingly reusing data within our teams and projects and through the global, openly archived datasets of others. Metadata can be challenging to write and interpret, but it is always crucial for reuse. The value metadata cannot be overstated-even as a relatively independent research object because it describes the work that has been done in a structured format. We advance a new perspective and classify methods for metadata curation and development with tables. Tables with templates can be effectively used to capture all components of an experiment or project in a single, easy-to-read file familiar to most scientists. If coupled with the R programming language, metadata from tables can then be rapidly and reproducibly converted to publication formats including extensible markup language files suitable for data repositories. Tables can also be used to summarize existing metadata and store metadata across many datasets. A case study is provided and the added benefits of tables for metadata, a priori, are developed to ensure a more streamlined publishing process for many data repositories used in ecology, evolution, and the environmental sciences. In ecology and evolution, researchers are often highly tabular thinkers from experimental data collection in the lab and/or field, and representations of metadata as a table will provide novel research and reuse insights.
Collapse
Affiliation(s)
- C. J. Lortie
- National Center for Ecological Analysis and Synthesis, UCSBSanta BarbaraCaliforniaUSA
- Department of BiologyYork UniversityTorontoOntarioCanada
| | | | - Julien Brun
- National Center for Ecological Analysis and Synthesis, UCSBSanta BarbaraCaliforniaUSA
| | - Li Kui
- Marine Science Institute, UCSBSanta BarbaraCaliforniaUSA
| |
Collapse
|
15
|
Cui X, Lu J, Han Y. A Novel Unified Data Modeling Method for Equipment Lifecycle Integrated Logistics Support. SENSORS 2022; 22:s22114265. [PMID: 35684887 PMCID: PMC9185433 DOI: 10.3390/s22114265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/31/2022] [Accepted: 05/31/2022] [Indexed: 11/24/2022]
Abstract
Integrated logistics support (ILS) is of great significance for maintaining equipment operational capability in the whole lifecycle. Numerous segments and complex product objects exist in the process of equipment ILS, which gives ILS data multi-source, heterogeneous, and multidimensional characteristics. The present ILS data cannot satisfy the demand for efficient utilization. Therefore, the unified modeling of ILS data is extremely urgent and significant. In this paper, a unified data modeling method is proposed to solve the consistent and comprehensive expression problem of ILS data. Firstly, a four-tier unified data modeling framework is constructed based on the analysis of ILS data characteristics. Secondly, the Core unified data model, Domain unified data model, and Instantiated unified data model are built successively. Then, the expressions of ILS data in the three dimensions of time, product, and activity are analyzed. Thirdly, the Lifecycle ILS unified data model is constructed, and the multidimensional information retrieval methods are discussed. Based on these, different systems in the equipment ILS process can share a set of data models and provide ILS designers with relevant data through different views. Finally, the practical ILS data models are constructed based on the developed unified data modeling software prototype, which verifies the feasibility of the proposed method.
Collapse
|
16
|
Alharbi E, Gadiya Y, Henderson D, Zaliani A, Delfin-Rossaro A, Cambon-Thomsen A, Kohler M, Witt G, Welter D, Juty N, Jay C, Engkvist O, Goble C, Reilly DS, Satagopam V, Ioannidis V, Gu W, Gribbon P. Selection of data sets for FAIRification in drug discovery and development: Which, why, and how? Drug Discov Today 2022; 27:2080-2085. [PMID: 35595012 PMCID: PMC9236643 DOI: 10.1016/j.drudis.2022.05.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 04/28/2022] [Accepted: 05/10/2022] [Indexed: 11/30/2022]
Abstract
Research organisations are focussed on quantifying the costs and benefits of implementing FAIR. Criteria used for the selection of data for FAIRification can be opaque and inconsistent. FAIRification effort depends on individual skills, competencies, resources, and time available. FAIRification should satisfy reuse scenarios, and lead to scientific and economic impacts. Organisational challenges include providing training to individuals and developing a FAIR organisation culture.
Despite the intuitive value of adopting the Findable, Accessible, Interoperable, and Reusable (FAIR) principles in both academic and industrial sectors, challenges exist in resourcing, balancing long- versus short-term priorities, and achieving technical implementation. This situation is exacerbated by the unclear mechanisms by which costs and benefits can be assessed when decisions on FAIR are made. Scientific and research and development (R&D) leadership need reliable evidence of the potential benefits and information on effective implementation mechanisms and remediating strategies. In this article, we describe procedures for cost–benefit evaluation, and identify best-practice approaches to support the decision-making process involved in FAIR implementation.
Collapse
Affiliation(s)
- Ebtisam Alharbi
- Department of Computer Science, The University of Manchester, Oxford Road, Manchester, UK
| | - Yojana Gadiya
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525 Hamburg, and Theodor Stern Kai 7, 60590 Frankfurt, Germany; Fraunhofer Cluster of Excellence for Immune Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590 Frankfurt, Germany
| | - David Henderson
- Bayer AG, Research & Development, Pharmaceuticals, Müllerstrasse 178, 13353 Berlin, Germany
| | - Andrea Zaliani
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525 Hamburg, and Theodor Stern Kai 7, 60590 Frankfurt, Germany; Fraunhofer Cluster of Excellence for Immune Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590 Frankfurt, Germany
| | | | | | - Manfred Kohler
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525 Hamburg, and Theodor Stern Kai 7, 60590 Frankfurt, Germany; Fraunhofer Cluster of Excellence for Immune Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590 Frankfurt, Germany
| | - Gesa Witt
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525 Hamburg, and Theodor Stern Kai 7, 60590 Frankfurt, Germany; Fraunhofer Cluster of Excellence for Immune Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590 Frankfurt, Germany
| | - Danielle Welter
- Luxembourg Centre for Systems Biomedicine, ELIXIR Luxembourg, University of Luxembourg, L-4367 Belval, Luxembourg
| | - Nick Juty
- Department of Computer Science, The University of Manchester, Oxford Road, Manchester, UK
| | - Caroline Jay
- Department of Computer Science, The University of Manchester, Oxford Road, Manchester, UK
| | - Ola Engkvist
- Discovery Sciences, R&D, AstraZeneca, SE-43183 Mölndal, Sweden
| | - Carole Goble
- Department of Computer Science, The University of Manchester, Oxford Road, Manchester, UK
| | - Dorothy S Reilly
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland
| | - Venkata Satagopam
- Luxembourg Centre for Systems Biomedicine, ELIXIR Luxembourg, University of Luxembourg, L-4367 Belval, Luxembourg
| | - Vassilios Ioannidis
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Amphipole, 1015 Lausanne, Switzerland.
| | - Wei Gu
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland.
| | - Philip Gribbon
- Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Schnackenburgallee 114, 22525 Hamburg, and Theodor Stern Kai 7, 60590 Frankfurt, Germany; Fraunhofer Cluster of Excellence for Immune Mediated Diseases (CIMD), Theodor Stern Kai 7, 60590 Frankfurt, Germany.
| |
Collapse
|
17
|
Soiland-Reyes S, Bayarri G, Andrio P, Long R, Lowe D, Niewielska A, Hospital A, Groth P. Making Canonical Workflow Building Blocks Interoperable across Workflow Languages. DATA INTELLIGENCE 2022. [DOI: 10.1162/dint_a_00135] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
We introduce the concept of Canonical Workflow Building Blocks (CWBB), a methodology of describing and wrapping computational tools, in order for them to be utilised in a reproducible manner from multiple workflow languages and execution platforms. The concept is implemented and demonstrated with the BioExcel Building Blocks library (BioBB), a collection of tool wrappers in the field of computational biomolecular simulation. Interoperability across different workflow languages is showcased through a protein Molecular Dynamics setup transversal workflow, built using this library and run with 5 different Workflow Manager Systems (WfMS). We argue such practice is a necessary requirement for FAIR Computational Workflows and an element of Canonical Workflow Frameworks for Research (CWFR) in order to improve widespread adoption and reuse of computational methods across workflow language barriers.
Collapse
Affiliation(s)
- Stian Soiland-Reyes
- Department of Computer Science, The University of Manchester, Manchester, Manchester M13 9PL, UK
- Informatics Institute, University of Amsterdam, Amsterdam 1000 GG, The Nehterlands
| | - Genís Bayarri
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona 08028, Spain
| | - Pau Andrio
- The Spanish National Bioinformatics Institute (INB), Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Robin Long
- Data Science Institute, Lancaster University, Lancaster, Lancashire LA1 4YW, UK
- Research IT, IT Services, The University of Manchester, Manchester, Manchester M13 9PL, UK
| | - Douglas Lowe
- Research IT, IT Services, The University of Manchester, Manchester, Manchester M13 9PL, UK
| | - Ania Niewielska
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Adam Hospital
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona 08028, Spain
| | - Paul Groth
- Informatics Institute, University of Amsterdam, Amsterdam 1000 GG, The Nehterlands
| |
Collapse
|
18
|
The ATCC Genome Portal: Microbial Genome Reference Standards with Data Provenance. Microbiol Resour Announc 2021; 10:e0081821. [PMID: 34817215 PMCID: PMC8612085 DOI: 10.1128/mra.00818-21] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Lack of data provenance negatively impacts scientific reproducibility and the reliability of genomic data. The ATCC Genome Portal (https://genomes.atcc.org) addresses this by providing data provenance information for microbial whole-genome assemblies originating from authenticated biological materials. To date, we have sequenced 1,579 complete genomes, including 466 type strains and 1,156 novel genomes.
Collapse
|