1
|
Duo H, Li Y, Lan Y, Tao J, Yang Q, Xiao Y, Sun J, Li L, Nie X, Zhang X, Liang G, Liu M, Hao Y, Li B. Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios. Genome Biol 2024; 25:145. [PMID: 38831386 PMCID: PMC11149245 DOI: 10.1186/s13059-024-03290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. RESULTS We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. CONCLUSIONS No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.
Collapse
Affiliation(s)
- Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Yinghong Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, People's Republic of China
| | - Yang Lan
- Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Army Medical University, Chongqing, 400038, People's Republic of China
| | - Jingxin Tao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, People's Republic of China
| | - Yingxue Xiao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Jing Sun
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Lei Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Xiner Nie
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Xiaoxi Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Mingwei Liu
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, People's Republic of China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| |
Collapse
|
2
|
Waterhouse RM, Adam-Blondon AF, Balech B, Barta E, Ying Shi Chua P, Di Cola V, Heil KF, Hughes GM, Jermiin LS, Kalaš M, Lanfear J, Pafilis E, Palagi PM, Papageorgiou AC, Paupério J, Psomopoulos F, Raes N, Burgin J, Gabaldón T. The ELIXIR Biodiversity Community: Understanding short- and long-term changes in biodiversity. F1000Res 2024; 12:ELIXIR-499. [PMID: 38882711 PMCID: PMC11179050 DOI: 10.12688/f1000research.133724.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/09/2024] [Indexed: 06/18/2024] Open
Abstract
Biodiversity loss is now recognised as one of the major challenges for humankind to address over the next few decades. Unless major actions are taken, the sixth mass extinction will lead to catastrophic effects on the Earth's biosphere and human health and well-being. ELIXIR can help address the technical challenges of biodiversity science, through leveraging its suite of services and expertise to enable data management and analysis activities that enhance our understanding of life on Earth and facilitate biodiversity preservation and restoration. This white paper, prepared by the ELIXIR Biodiversity Community, summarises the current status and responses, and presents a set of plans, both technical and community-oriented, that should both enhance how ELIXIR Services are applied in the biodiversity field and how ELIXIR builds connections across the many other infrastructures active in this area. We discuss the areas of highest priority, how they can be implemented in cooperation with the ELIXIR Platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for a Biodiversity Community in ELIXIR and is an appeal to identify and involve new stakeholders.
Collapse
Affiliation(s)
- Robert M Waterhouse
- Department of Ecology and Evolution, SIB Swiss Institute of Bioinformatics, Universite de Lausanne, Lausanne, Vaud, 1015, Switzerland
| | - Anne-Françoise Adam-Blondon
- INRAE, BioinfOmics, Plant Bioinformatics Facility, Universite Paris-Saclay, Gif-sur-Yvette, Île-de-France, 78026, France
| | - Bachir Balech
- Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Bari, 70126, Italy
| | - Endre Barta
- Institute of Genetics and Biotechnology, Magyar Agrar- es Elettudomanyi Egyetem, Gödöllő, Pest County, Hungary
| | | | - Valeria Di Cola
- SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, 1015, Switzerland
| | | | - Graham M Hughes
- School of Biology and Environmental Science, University College Dublin, Dublin, Leinster, Ireland
| | - Lars S Jermiin
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin, Leinster, Ireland
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| | - Matúš Kalaš
- Department of Informatics, Universitetet i Bergen, Bergen, Hordaland, Norway
| | - Jerry Lanfear
- ELIXIR, Wellcome Genome Campus, Hinxton, England, CB10 1SD, UK
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, 71003, Greece
| | - Patricia M Palagi
- SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, 1015, Switzerland
| | | | - Joana Paupério
- EMBL-EBI, Wellcome Genome Campus, Hinxton, England, CB10 1SD, UK
| | - Fotis Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Niels Raes
- Naturalis Biodiversity Center, Leiden, South Holland, The Netherlands
| | - Josephine Burgin
- EMBL-EBI, Wellcome Genome Campus, Hinxton, England, CB10 1SD, UK
| | - Toni Gabaldón
- Institut de Recerca Biomedica, Barcelona, Catalonia, Spain
- Centro Nacional de Supercomputacion, Barcelona, Catalonia, Spain
| |
Collapse
|
3
|
Niehues A, de Visser C, Hagenbeek FA, Kulkarni P, Pool R, Karu N, Kindt ASD, Singh G, Vermeiren RRJM, Boomsma DI, van Dongen J, 't Hoen PAC, van Gool AJ. A multi-omics data analysis workflow packaged as a FAIR Digital Object. Gigascience 2024; 13:giad115. [PMID: 38217405 PMCID: PMC10787363 DOI: 10.1093/gigascience/giad115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 11/14/2023] [Accepted: 12/10/2023] [Indexed: 01/15/2024] Open
Abstract
BACKGROUND Applying good data management and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles in research projects can help disentangle knowledge discovery, study result reproducibility, and data reuse in future studies. Based on the concepts of the original FAIR principles for research data, FAIR principles for research software were recently proposed. FAIR Digital Objects enable discovery and reuse of Research Objects, including computational workflows for both humans and machines. Practical examples can help promote the adoption of FAIR practices for computational workflows in the research community. We developed a multi-omics data analysis workflow implementing FAIR practices to share it as a FAIR Digital Object. FINDINGS We conducted a case study investigating shared patterns between multi-omics data and childhood externalizing behavior. The analysis workflow was implemented as a modular pipeline in the workflow manager Nextflow, including containers with software dependencies. We adhered to software development practices like version control, documentation, and licensing. Finally, the workflow was described with rich semantic metadata, packaged as a Research Object Crate, and shared via WorkflowHub. CONCLUSIONS Along with the packaged multi-omics data analysis workflow, we share our experiences adopting various FAIR practices and creating a FAIR Digital Object. We hope our experiences can help other researchers who develop omics data analysis workflows to turn FAIR principles into practice.
Collapse
Affiliation(s)
- Anna Niehues
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
| | - Casper de Visser
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Fiona A Hagenbeek
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Purva Kulkarni
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
- Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - René Pool
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Naama Karu
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, 2333 AL Leiden, The Netherlands
| | - Alida S D Kindt
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, 2333 AL Leiden, The Netherlands
| | - Gurnoor Singh
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Robert R J M Vermeiren
- Department of Child and Adolescent Psychiatry, LUMC-Curium, Leiden University Medical Center, 2342 AK Oegstgeest, The Netherlands
| | - Dorret I Boomsma
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
- Amsterdam Reproduction & Development (AR&D) Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Jenny van Dongen
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
- Amsterdam Reproduction & Development (AR&D) Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Peter A C 't Hoen
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Alain J van Gool
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
| |
Collapse
|
4
|
Downing T, Angelopoulos N. A primer on correlation-based dimension reduction methods for multi-omics analysis. J R Soc Interface 2023; 20:20230344. [PMID: 37817584 PMCID: PMC10565429 DOI: 10.1098/rsif.2023.0344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 09/19/2023] [Indexed: 10/12/2023] Open
Abstract
The continuing advances of omic technologies mean that it is now more tangible to measure the numerous features collectively reflecting the molecular properties of a sample. When multiple omic methods are used, statistical and computational approaches can exploit these large, connected profiles. Multi-omics is the integration of different omic data sources from the same biological sample. In this review, we focus on correlation-based dimension reduction approaches for single omic datasets, followed by methods for pairs of omics datasets, before detailing further techniques for three or more omic datasets. We also briefly detail network methods when three or more omic datasets are available and which complement correlation-oriented tools. To aid readers new to this area, these are all linked to relevant R packages that can implement these procedures. Finally, we discuss scenarios of experimental design and present road maps that simplify the selection of appropriate analysis methods. This review will help researchers navigate emerging methods for multi-omics and integrating diverse omic datasets appropriately. This raises the opportunity of implementing population multi-omics with large sample sizes as omics technologies and our understanding improve.
Collapse
Affiliation(s)
- Tim Downing
- Pirbright Institute, Pirbright, Surrey, UK
- Department of Biotechnology, Dublin City University, Dublin, Ireland
| | | |
Collapse
|
5
|
Patel B, Soundarajan S, Ménager H, Hu Z. Making Biomedical Research Software FAIR: Actionable Step-by-step Guidelines with a User-support Tool. Sci Data 2023; 10:557. [PMID: 37612312 PMCID: PMC10447492 DOI: 10.1038/s41597-023-02463-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/10/2023] [Indexed: 08/25/2023] Open
Abstract
Findable, Accessible, Interoperable, and Reusable (FAIR) guiding principles tailored for research software have been proposed by the FAIR for Research Software (FAIR4RS) Working Group. They provide a foundation for optimizing the reuse of research software. The FAIR4RS principles are, however, aspirational and do not provide practical instructions to the researchers. To fill this gap, we propose in this work the first actionable step-by-step guidelines for biomedical researchers to make their research software compliant with the FAIR4RS principles. We designate them as the FAIR Biomedical Research Software (FAIR-BioRS) guidelines. Our process for developing these guidelines, presented here, is based on an in-depth study of the FAIR4RS principles and a thorough review of current practices in the field. To support researchers, we have also developed a workflow that streamlines the process of implementing these guidelines. This workflow is incorporated in FAIRshare, a free and open-source software application aimed at simplifying the curation and sharing of FAIR biomedical data and software through user-friendly interfaces and automation. Details about this tool are also presented.
Collapse
Affiliation(s)
- Bhavesh Patel
- FAIR Data Innovations Hub, California Medical Innovations Institute, San Diego, CA, 92121, USA.
| | - Sanjay Soundarajan
- FAIR Data Innovations Hub, California Medical Innovations Institute, San Diego, CA, 92121, USA
| | - Hervé Ménager
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, 75015, Paris, France
| | - Zicheng Hu
- Computational Health Science, University of California San Francisco, San Francisco, CA, 94158, USA
| |
Collapse
|
6
|
Licata L, Via A, Turina P, Babbi G, Benevenuta S, Carta C, Casadio R, Cicconardi A, Facchiano A, Fariselli P, Giordano D, Isidori F, Marabotti A, Martelli PL, Pascarella S, Pinelli M, Pippucci T, Russo R, Savojardo C, Scafuri B, Valeriani L, Capriotti E. Resources and tools for rare disease variant interpretation. Front Mol Biosci 2023; 10:1169109. [PMID: 37234922 PMCID: PMC10206239 DOI: 10.3389/fmolb.2023.1169109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023] Open
Abstract
Collectively, rare genetic disorders affect a substantial portion of the world's population. In most cases, those affected face difficulties in receiving a clinical diagnosis and genetic characterization. The understanding of the molecular mechanisms of these diseases and the development of therapeutic treatments for patients are also challenging. However, the application of recent advancements in genome sequencing/analysis technologies and computer-aided tools for predicting phenotype-genotype associations can bring significant benefits to this field. In this review, we highlight the most relevant online resources and computational tools for genome interpretation that can enhance the diagnosis, clinical management, and development of treatments for rare disorders. Our focus is on resources for interpreting single nucleotide variants. Additionally, we present use cases for interpreting genetic variants in clinical settings and review the limitations of these results and prediction tools. Finally, we have compiled a curated set of core resources and tools for analyzing rare disease genomes. Such resources and tools can be utilized to develop standardized protocols that will enhance the accuracy and effectiveness of rare disease diagnosis.
Collapse
Affiliation(s)
- Luana Licata
- Department of Biology, University of Rome Tor Vergata, Roma, Italy
| | - Allegra Via
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Claudio Carta
- National Centre for Rare Diseases, Istituto Superiore di Sanità, Roma, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Andrea Cicconardi
- Department of Physics, University of Genova, Genova, Italy
- Italiano di Tecnologia—IIT, Genova, Italy
| | - Angelo Facchiano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Deborah Giordano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Federica Isidori
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Anna Marabotti
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Stefano Pascarella
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Michele Pinelli
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Roberta Russo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
- CEINGE Biotecnologie Avanzate Franco Salvatore, Napoli, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | | | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
7
|
Du X, Dastmalchi F, Ye H, Garrett TJ, Diller MA, Liu M, Hogan WR, Brochhausen M, Lemas DJ. Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software. Metabolomics 2023; 19:11. [PMID: 36745241 DOI: 10.1007/s11306-023-01974-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 01/20/2023] [Indexed: 02/07/2023]
Abstract
BACKGROUND Liquid chromatography-high resolution mass spectrometry (LC-HRMS) is a popular approach for metabolomics data acquisition and requires many data processing software tools. The FAIR Principles - Findability, Accessibility, Interoperability, and Reusability - were proposed to promote open science and reusable data management, and to maximize the benefit obtained from contemporary and formal scholarly digital publishing. More recently, the FAIR principles were extended to include Research Software (FAIR4RS). AIM OF REVIEW This study facilitates open science in metabolomics by providing an implementation solution for adopting FAIR4RS in the LC-HRMS metabolomics data processing software. We believe our evaluation guidelines and results can help improve the FAIRness of research software. KEY SCIENTIFIC CONCEPTS OF REVIEW We evaluated 124 LC-HRMS metabolomics data processing software obtained from a systematic review and selected 61 software for detailed evaluation using FAIR4RS-related criteria, which were extracted from the literature along with internal discussions. We assigned each criterion one or more FAIR4RS categories through discussion. The minimum, median, and maximum percentages of criteria fulfillment of software were 21.6%, 47.7%, and 71.8%. Statistical analysis revealed no significant improvement in FAIRness over time. We identified four criteria covering multiple FAIR4RS categories but had a low %fulfillment: (1) No software had semantic annotation of key information; (2) only 6.3% of evaluated software were registered to Zenodo and received DOIs; (3) only 14.5% of selected software had official software containerization or virtual machine; (4) only 16.7% of evaluated software had a fully documented functions in code. According to the results, we discussed improvement strategies and future directions.
Collapse
Affiliation(s)
- Xinsong Du
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Farhad Dastmalchi
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Hao Ye
- Health Science Center Libraries, University of Florida, Florida, USA
| | - Timothy J Garrett
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Florida, USA
| | - Matthew A Diller
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mei Liu
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mathias Brochhausen
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, USA
| | - Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA.
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Florida, Gainesville, United States.
- Center for Perinatal Outcomes Research, University of Florida College of Medicine, Gainesville, United States.
| |
Collapse
|
8
|
Roach MJ, Pierce-Ward NT, Suchecki R, Mallawaarachchi V, Papudeshi B, Handley SA, Brown CT, Watson-Haigh NS, Edwards RA. Ten simple rules and a template for creating workflows-as-applications. PLoS Comput Biol 2022; 18:e1010705. [PMID: 36520686 PMCID: PMC9754251 DOI: 10.1371/journal.pcbi.1010705] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
- Michael J. Roach
- Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, South Australia, Australia
- * E-mail:
| | - N. Tessa Pierce-Ward
- Department of Population Health and Reproduction, University of California, Davis, California, United States of America
| | | | - Vijini Mallawaarachchi
- Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, South Australia, Australia
| | - Bhavya Papudeshi
- Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, South Australia, Australia
| | - Scott A. Handley
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - C. Titus Brown
- Department of Population Health and Reproduction, University of California, Davis, California, United States of America
| | | | - Robert A. Edwards
- Flinders Accelerator for Microbiome Exploration, Flinders University, Adelaide, South Australia, Australia
| |
Collapse
|
9
|
Barker M, Chue Hong NP, Katz DS, Lamprecht AL, Martinez-Ortiz C, Psomopoulos F, Harrow J, Castro LJ, Gruenpeter M, Martinez PA, Honeyman T. Introducing the FAIR Principles for research software. Sci Data 2022; 9:622. [PMID: 36241754 PMCID: PMC9562067 DOI: 10.1038/s41597-022-01710-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 09/21/2022] [Indexed: 11/09/2022] Open
Abstract
Research software is a fundamental and vital part of research, yet significant challenges to discoverability, productivity, quality, reproducibility, and sustainability exist. Improving the practice of scholarship is a common goal of the open science, open source, and FAIR (Findable, Accessible, Interoperable and Reusable) communities and research software is now being understood as a type of digital object to which FAIR should be applied. This emergence reflects a maturation of the research community to better understand the crucial role of FAIR research software in maximising research value. The FAIR for Research Software (FAIR4RS) Working Group has adapted the FAIR Guiding Principles to create the FAIR Principles for Research Software (FAIR4RS Principles). The contents and context of the FAIR4RS Principles are summarised here to provide the basis for discussion of their adoption. Examples of implementation by organisations are provided to share information on how to maximise the value of research outputs, and to encourage others to amplify the importance and impact of this work.
Collapse
Affiliation(s)
| | - Neil P Chue Hong
- Software Sustainability Institute & EPCC, University of Edinburgh, 47 Potterrow, Edinburgh, EH8 9BT, UK
| | - Daniel S Katz
- NCSA & CS & ECE & iSchool, University of Illinois at Urbana-Champaign, 1205 W Clark St., Urbana, IL, 61801, USA
| | - Anna-Lena Lamprecht
- Institute of Computer Science, University of Potsdam, An der Bahn 2, 14476, Potsdam, Germany
| | | | - Fotis Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, 57001, Greece
| | - Jennifer Harrow
- ELIXIR Hub, South Building, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Leyla Jael Castro
- Semantic Technologies team, ZB MED Information Centre for Life Sciences, Gleueler Strasse 60, 50931, Cologne, Germany
| | | | - Paula Andrea Martinez
- Research Software Alliance/Australian Research Data Commons, Level 6, Duhig Tower, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Tom Honeyman
- Australian Research Data Commons, University of Technology Sydney Library, Ultimo, NSW, 2007, Australia
| |
Collapse
|
10
|
MacLeod BP, Parlane FGL, Brown AK, Hein JE, Berlinguette CP. Flexible automation accelerates materials discovery. NATURE MATERIALS 2022; 21:722-726. [PMID: 34907322 DOI: 10.1038/s41563-021-01156-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Affiliation(s)
- Benjamin P MacLeod
- Department of Chemistry, The University of British Columbia, Vancouver, British Columbia, Canada
- Stewart Blusson Quantum Matter Institute, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Fraser G L Parlane
- Department of Chemistry, The University of British Columbia, Vancouver, British Columbia, Canada
- Stewart Blusson Quantum Matter Institute, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Amanda K Brown
- Department of Chemistry, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Jason E Hein
- Department of Chemistry, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Curtis P Berlinguette
- Department of Chemistry, The University of British Columbia, Vancouver, British Columbia, Canada.
- Stewart Blusson Quantum Matter Institute, The University of British Columbia, Vancouver, British Columbia, Canada.
- Department of Chemical & Biological Engineering, The University of British Columbia, Vancouver, British Columbia, Canada.
- Canadian Institute for Advanced Research (CIFAR), MaRS Innovation Centre, Toronto, Ontario, Canada.
| |
Collapse
|
11
|
Kazlovich K, Mishra SR, Behdinan K, Gladman A, May J, Mashari A. Open ventilator evaluation framework: A synthesized database of regulatory requirements and technical standards for emergency use ventilators from Australia, Canada, UK, and US. HARDWAREX 2022; 11:e00260. [PMID: 35036663 PMCID: PMC8752315 DOI: 10.1016/j.ohx.2022.e00260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 11/30/2021] [Accepted: 01/03/2022] [Indexed: 06/14/2023]
Abstract
Development of emergency use ventilators has attracted significant attention and resources during the COVID-19 pandemic. To facilitate mass collaboration and accelerate progress, many groups have adopted open-source development models, inspired by the long history of open-source development in software. According to the Open-source Hardware Association (OSHWA), Open-source Hardware (OSH) is a term for tangible artifacts - machines, devices, or other physical things - whose design has been released to the public in such a way that anyone can make, modify, and use them. One major obstacle to translating the growing body of work on open-source ventilators into clinical practice is compliance with regulations and conformance with mandated technical standards for effective performance and device safety. This is exacerbated by the inherent complexity of the regulatory process, which is tailored to traditional centralized development models, as well as the rapid changes and alternative pathways that have emerged during the pandemic. As a step in addressing this challenge, this paper provides developers, evaluators, and potential users of emergency ventilators with the first iteration of a pragmatic, open-source assessment framework that incorporates existing regulatory guidelines from Australia, Canada, UK and USA. We also provide an example evaluation for one open-source emergency ventilator design. The evaluation process has been divided into three levels: 1. Adequacy of open-source project documentation; 2. Clinical performance requirements, and 3. Conformance with technical standards.
Collapse
Affiliation(s)
- Kate Kazlovich
- Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, ON, Canada
- Department of Anesthesiology and Pain Management, Toronto General Hospital, University Health Network, Toronto, ON, Canada
| | - Soumya Ranjan Mishra
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada
| | - Kamran Behdinan
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada
| | - Aviv Gladman
- Mackenzie Richmond Hill Hospital, Toronto, ON, Canada
| | - Jesse May
- Department of Anesthesiology and Pain Medicine, University of Toronto, ON, Canada
| | - Azad Mashari
- Department of Anesthesiology and Pain Medicine, University of Toronto, ON, Canada
- Department of Anesthesiology and Pain Management, Toronto General Hospital, University Health Network, Toronto, ON, Canada
| |
Collapse
|
12
|
Wuttke J, Cottrell S, Gonzalez MA, Kaestner A, Markvardsen A, Rod TH, Rozyczko P, Vardanyan G. Guidelines for collaborative development of sustainable data treatment software. JOURNAL OF NEUTRON RESEARCH 2022. [DOI: 10.3233/jnr-220002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Software development for data reduction and analysis at large research facilities is increasingly professionalized, and internationally coordinated. To foster software quality and sustainability, and to facilitate collaboration, representatives from software groups of European neutron and muon facilities have agreed on a set of guidelines for development practices, infrastructure, and functional and non-functional product properties. These guidelines have been derived from actual practices in software projects from the EU funded consortium ‘Science and Innovation with Neutrons in Europe in 2020’ (SINE2020), and have been enriched through extensive literature review. Besides guiding the work of the professional software engineers in our computing groups, we hope to influence scientists who are willing to contribute their own data treatment software to our community. Moreover, this work may also provide inspiration to scientific software development beyond the neutron and muon field.
Collapse
Affiliation(s)
- Joachim Wuttke
- Forschungszentrum Jülich GmbH, Jülich Centre for Neutron Science at Heinz Maier Leibnitz-Zentrum, Lichtenbergstraße 1, 85748 Garching, Germany
| | - Stephen Cottrell
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Didcot OX11 0QX, United Kingdom
| | - Miguel A. Gonzalez
- Institut Laue-Langevin, 71 avenue des Martyrs, CS 20156, 38042 Grenoble Cedex 9, France
| | - Anders Kaestner
- Paul Scherrer Institute, Forschungsstrasse 111, CH-5232 Villigen PSI, Switzerland
| | - Anders Markvardsen
- ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Didcot OX11 0QX, United Kingdom
| | - Thomas H. Rod
- European Spallation Source ERIC, PO BOX 176, SE-221 00 Lund, Sweden
| | - Piotr Rozyczko
- European Spallation Source ERIC, PO BOX 176, SE-221 00 Lund, Sweden
| | - Gagik Vardanyan
- Institut Laue-Langevin, 71 avenue des Martyrs, CS 20156, 38042 Grenoble Cedex 9, France
| |
Collapse
|
13
|
Petrillo M, Fabbri M, Kagkli DM, Querci M, Van den Eede G, Alm E, Aytan-Aktug D, Capella-Gutierrez S, Carrillo C, Cestaro A, Chan KG, Coque T, Endrullat C, Gut I, Hammer P, Kay GL, Madec JY, Mather AE, McHardy AC, Naas T, Paracchini V, Peter S, Pightling A, Raffael B, Rossen J, Ruppé E, Schlaberg R, Vanneste K, Weber LM, Westh H, Angers-Loustau A. A roadmap for the generation of benchmarking resources for antimicrobial resistance detection using next generation sequencing. F1000Res 2022; 10:80. [PMID: 35847383 PMCID: PMC9243550 DOI: 10.12688/f1000research.39214.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/10/2022] [Indexed: 11/20/2022] Open
Abstract
Next Generation Sequencing technologies significantly impact the field of Antimicrobial Resistance (AMR) detection and monitoring, with immediate uses in diagnosis and risk assessment. For this application and in general, considerable challenges remain in demonstrating sufficient trust to act upon the meaningful information produced from raw data, partly because of the reliance on bioinformatics pipelines, which can produce different results and therefore lead to different interpretations. With the constant evolution of the field, it is difficult to identify, harmonise and recommend specific methods for large-scale implementations over time. In this article, we propose to address this challenge through establishing a transparent, performance-based, evaluation approach to provide flexibility in the bioinformatics tools of choice, while demonstrating proficiency in meeting common performance standards. The approach is two-fold: first, a community-driven effort to establish and maintain “live” (dynamic) benchmarking platforms to provide relevant performance metrics, based on different use-cases, that would evolve together with the AMR field; second, agreed and defined datasets to allow the pipelines’ implementation, validation, and quality-control over time. Following previous discussions on the main challenges linked to this approach, we provide concrete recommendations and future steps, related to different aspects of the design of benchmarks, such as the selection and the characteristics of the datasets (quality, choice of pathogens and resistances, etc.), the evaluation criteria of the pipelines, and the way these resources should be deployed in the community.
Collapse
Affiliation(s)
| | - Marco Fabbri
- European Commission Joint Research Centre, Ispra, Italy
| | | | | | - Guy Van den Eede
- European Commission Joint Research Centre, Ispra, Italy
- European Commission Joint Research Centre, Geel, Belgium
| | - Erik Alm
- The European Centre for Disease Prevention and Control, Stockholm, Sweden
| | - Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| | | | - Catherine Carrillo
- Ottawa Laboratory – Carling, Canadian Food Inspection Agency, Ottawa, Ontario, Canada
| | | | - Kok-Gan Chan
- International Genome Centre, Jiangsu University, Zhenjiang, China
- Division of Genetics and Molecular Biology, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Teresa Coque
- Servicio de Microbiología, Hospital Universitario Ramón y Cajal, Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain
- Spanish Consortium for Research on Epidemiology and Public Health (CIBERESP), Carlos III Health Institute, Madrid, Spain
| | | | - Ivo Gut
- Centro Nacional de Análisis Genómico, Centre for Genomic Regulation (CNAG-CRG), Barcelona Institute of Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Paul Hammer
- BIOMES. NGS GmbH c/o Technische Hochschule Wildau, Wildau, Germany
| | - Gemma L. Kay
- Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
| | - Jean-Yves Madec
- Unité Antibiorésistance et Virulence Bactériennes, ANSES Site de Lyon, Lyon, France
| | - Alison E. Mather
- Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
- University of East Anglia, Norwich, UK
| | | | - Thierry Naas
- French-NRC for CPEs, Service de Bactériologie-Hygiène, Hôpital de Bicêtre, Le Kremlin-Bicêtre, France
| | | | - Silke Peter
- Institute of Medical Microbiology and Hygiene, University of Tübingen, Tübingen, Germany
| | - Arthur Pightling
- Center for Food Safety and Applied Nutrition, US Food and Drug Administration, College Park, MD, USA
| | | | - John Rossen
- Department of Medical Microbiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | | | - Robert Schlaberg
- Department of Pathology, University of Utah, Salt Lake City, UT, USA
| | - Kevin Vanneste
- Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium
| | - Lukas M. Weber
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
- Present address: Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | |
Collapse
|
14
|
Abstract
This article presents a study on the quality and execution of research code from publicly-available replication datasets at the Harvard Dataverse repository. Research code is typically created by a group of scientists and published together with academic papers to facilitate research transparency and reproducibility. For this study, we define ten questions to address aspects impacting research reproducibility and reuse. First, we retrieve and analyze more than 2000 replication datasets with over 9000 unique R files published from 2010 to 2020. Second, we execute the code in a clean runtime environment to assess its ease of reuse. Common coding errors were identified, and some of them were solved with automatic code cleaning to aid code execution. We find that 74% of R files failed to complete without error in the initial execution, while 56% failed when code cleaning was applied, showing that many errors can be prevented with good coding practices. We also analyze the replication datasets from journals’ collections and discuss the impact of the journal policy strictness on the code re-execution rate. Finally, based on our results, we propose a set of recommendations for code dissemination aimed at researchers, journals, and repositories.
Collapse
|
15
|
Ribeiro CVR, Oliveira LP, Batista R, De Sousa M. UCEasy: A software package for automating and simplifying the analysis of ultraconserved elements (UCEs). Biodivers Data J 2021; 9:e78132. [PMID: 34934383 PMCID: PMC8683391 DOI: 10.3897/bdj.9.e78132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 12/09/2021] [Indexed: 11/25/2022] Open
Abstract
Background The use of Ultraconserved Elements (UCEs) as genetic markers in phylogenomics has become popular and has provided promising results. Although UCE data can be easily obtained from targeted enriched sequencing, the protocol for in silico analysis of UCEs consist of the execution of heterogeneous and complex tools, a challenge for scientists without training in bioinformatics. Developing tools with the adoption of best practices in research software can lessen this problem by improving the execution of computational experiments, thus promoting better reproducibility. New information We present UCEasy, an easy-to-install and easy-to-use software package with a simple command line interface that facilitates the computational analysis of UCEs from sequencing samples, following the best practices of research software. UCEasy is a wrapper that standardises, automates and simplifies the quality control of raw reads, assembly and extraction and alignment of UCEs, generating at the end a data matrix with different levels of completeness that can be used to infer phylogenetic trees. We demonstrate the functionalities of UCEasy by reproducing the published results of phylogenomic studies of the bird genus Turdus (Aves) and of Adephaga families (Coleoptera) containing genomic datasets to efficiently extract UCEs.
Collapse
Affiliation(s)
- Caio V R Ribeiro
- Coordenação de Ciência da Computação, Centro Universitário do Estado do Pará (CESUPA), Belém, Brazil Coordenação de Ciência da Computação, Centro Universitário do Estado do Pará (CESUPA) Belém Brazil
| | - Lucas P Oliveira
- Instituto de Computação, Universidade Estadual de Campinas (UNICAMP), Campinas, Brazil Instituto de Computação, Universidade Estadual de Campinas (UNICAMP) Campinas Brazil
| | - Romina Batista
- Instituto Nacional de Pesquisas da Amazônia (INPA), Manaus, Brazil Instituto Nacional de Pesquisas da Amazônia (INPA) Manaus Brazil.,Gothenburg Global Biodiversity Centre, Gothenburg, Sweden Gothenburg Global Biodiversity Centre Gothenburg Sweden
| | - Marcos De Sousa
- Museu Paraense Emílio Goeldi (MPEG), Belém, Brazil Museu Paraense Emílio Goeldi (MPEG) Belém Brazil.,Coordenação de Ciência da Computação, Centro Universitário do Estado do Pará (CESUPA), Belém, Brazil Coordenação de Ciência da Computação, Centro Universitário do Estado do Pará (CESUPA) Belém Brazil
| |
Collapse
|
16
|
Ye Y, Barapatre S, Davis MK, Elliston KO, Davatzikos C, Fedorov A, Fillion-Robin JC, Foster I, Gilbertson JR, Lasso A, Miller JV, Morgan M, Pieper S, Raumann BE, Sarachan BD, Savova G, Silverstein JC, Taylor DP, Zelnis JB, Zhang GQ, Cuticchia J, Becich MJ. Open-source Software Sustainability Models: Initial White Paper From the Informatics Technology for Cancer Research Sustainability and Industry Partnership Working Group. J Med Internet Res 2021; 23:e20028. [PMID: 34860667 PMCID: PMC8686402 DOI: 10.2196/20028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 12/14/2020] [Accepted: 09/23/2021] [Indexed: 11/13/2022] Open
Abstract
Background The National Cancer Institute Informatics Technology for Cancer Research (ITCR) program provides a series of funding mechanisms to create an ecosystem of open-source software (OSS) that serves the needs of cancer research. As the ITCR ecosystem substantially grows, it faces the challenge of the long-term sustainability of the software being developed by ITCR grantees. To address this challenge, the ITCR sustainability and industry partnership working group (SIP-WG) was convened in 2019. Objective The charter of the SIP-WG is to investigate options to enhance the long-term sustainability of the OSS being developed by ITCR, in part by developing a collection of business model archetypes that can serve as sustainability plans for ITCR OSS development initiatives. The working group assembled models from the ITCR program, from other studies, and from the engagement of its extensive network of relationships with other organizations (eg, Chan Zuckerberg Initiative, Open Source Initiative, and Software Sustainability Institute) in support of this objective. Methods This paper reviews the existing sustainability models and describes 10 OSS use cases disseminated by the SIP-WG and others, including 3D Slicer, Bioconductor, Cytoscape, Globus, i2b2 (Informatics for Integrating Biology and the Bedside) and tranSMART, Insight Toolkit, Linux, Observational Health Data Sciences and Informatics tools, R, and REDCap (Research Electronic Data Capture), in 10 sustainability aspects: governance, documentation, code quality, support, ecosystem collaboration, security, legal, finance, marketing, and dependency hygiene. Results Information available to the public reveals that all 10 OSS have effective governance, comprehensive documentation, high code quality, reliable dependency hygiene, strong user and developer support, and active marketing. These OSS include a variety of licensing models (eg, general public license version 2, general public license version 3, Berkeley Software Distribution, and Apache 3) and financial models (eg, federal research funding, industry and membership support, and commercial support). However, detailed information on ecosystem collaboration and security is not publicly provided by most OSS. Conclusions We recommend 6 essential attributes for research software: alignment with unmet scientific needs, a dedicated development team, a vibrant user community, a feasible licensing model, a sustainable financial model, and effective product management. We also stress important actions to be considered in future ITCR activities that involve the discussion of the sustainability and licensing models for ITCR OSS, the establishment of a central library, the allocation of consulting resources to code quality control, ecosystem collaboration, security, and dependency hygiene.
Collapse
Affiliation(s)
- Ye Ye
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Seemran Barapatre
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Michael K Davis
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Keith O Elliston
- Axiomedix, Inc., Bedford, MA, United States.,PHEMI Systems Corp., Vancouver, BC, Canada.,tranSMART foundation, Wakefield, MA, United States
| | - Christos Davatzikos
- Department of Radiology, School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Andrey Fedorov
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | | | - Ian Foster
- Department of Computer Science, University of Chicago, Chicago, IL, United States
| | - John R Gilbertson
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Andras Lasso
- The Perk Lab for Percutaneous Surgery, School of Computing, Queen's University, Kingston, ON, Canada
| | | | - Martin Morgan
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | | | | | | | - Guergana Savova
- Boston Children's Hospital, Harvard Medical School, Boston, MA, United States
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Donald P Taylor
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Joyce B Zelnis
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Guo-Qiang Zhang
- The University of Texas Health Science Center at Houston, Houston, TX, United States
| | | | - Michael J Becich
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
17
|
Lamprecht AL, Palmblad M, Ison J, Schwämmle V, Al Manir MS, Altintas I, Baker CJO, Ben Hadj Amor A, Capella-Gutierrez S, Charonyktakis P, Crusoe MR, Gil Y, Goble C, Griffin TJ, Groth P, Ienasescu H, Jagtap P, Kalaš M, Kasalica V, Khanteymoori A, Kuhn T, Mei H, Ménager H, Möller S, Richardson RA, Robert V, Soiland-Reyes S, Stevens R, Szaniszlo S, Verberne S, Verhoeven A, Wolstencroft K. Perspectives on automated composition of workflows in the life sciences. F1000Res 2021; 10:897. [PMID: 34804501 PMCID: PMC8573700 DOI: 10.12688/f1000research.54159.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/27/2021] [Indexed: 12/29/2022] Open
Abstract
Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.
Collapse
Affiliation(s)
| | - Magnus Palmblad
- Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands
| | - Jon Ison
- French Institute of Bioinformatics, 91057 Évry, France
| | | | | | - Ilkay Altintas
- University of California San Diego, La Jolla, CA, 92093, USA
| | - Christopher J. O. Baker
- University of New Brunswick, Saint John, E2L 4L5, Canada
- IPSNP Computing Inc., Saint John, E2L 4S6, Canada
| | | | | | | | | | - Yolanda Gil
- University of Southern California, Marina Del Rey, CA, 90292, USA
| | - Carole Goble
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Paul Groth
- University of Amsterdam, 1090 GH Amsterdam, The Netherlands
| | - Hans Ienasescu
- Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | - Pratik Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA
| | | | | | | | - Tobias Kuhn
- VU Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - Hailiang Mei
- Sequencing Analysis Support Core, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
| | | | - Steffen Möller
- IBIMA, Rostock University Medical Center, 18057 Rostock, Germany
| | | | | | - Stian Soiland-Reyes
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
- Informatics Institute, University of Amsterdam, 1090 GH Amsterdam, The Netherlands
| | - Robert Stevens
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | | | - Suzan Verberne
- Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
| | - Aswin Verhoeven
- Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands
| | - Katherine Wolstencroft
- Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
| |
Collapse
|
18
|
Austin CC, Bernier A, Bezuidenhout L, Bicarregui J, Biro T, Cambon-Thomsen A, Carroll SR, Cournia Z, Dabrowski PW, Diallo G, Duflot T, Garcia L, Gesing S, Gonzalez-Beltran A, Gururaj A, Harrower N, Lin D, Medeiros C, Méndez E, Meyers N, Mietchen D, Nagrani R, Nilsonne G, Parker S, Pickering B, Pienta A, Polydoratou P, Psomopoulos F, Rennes S, Rowe R, Sansone SA, Shanahan H, Sitz L, Stocks J, Tovani-Palone MR, Uhlmansiek M. Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group. Wellcome Open Res 2021; 5:267. [PMID: 33501381 PMCID: PMC7808050 DOI: 10.12688/wellcomeopenres.16378.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/08/2021] [Indexed: 11/20/2022] Open
Abstract
The systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. The Research Data Alliance COVID-19 Working Group recently published a set of recommendations and guidelines on data sharing and related best practices for COVID-19 research. These guidelines include recommendations for clinicians, researchers, policy- and decision-makers, funders, publishers, public health experts, disaster preparedness and response experts, infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations), and other potential users. These guidelines include recommendations for researchers, policymakers, funders, publishers and infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations). Several overarching themes have emerged from this document such as the need to balance the creation of data adherent to FAIR principles (findable, accessible, interoperable and reusable), with the need for quick data release; the use of trustworthy research data repositories; the use of well-annotated data with meaningful metadata; and practices of documenting methods and software. The resulting document marks an unprecedented cross-disciplinary, cross-sectoral, and cross-jurisdictional effort authored by over 160 experts from around the globe. This letter summarises key points of the Recommendations and Guidelines, highlights the relevant findings, shines a spotlight on the process, and suggests how these developments can be leveraged by the wider scientific community.
Collapse
Affiliation(s)
- Claire C. Austin
- Environment and Climate Change Canada, 351 boul. St-Joseph, Gatineau, Quebec, K1A 0H3, Canada
| | - Alexander Bernier
- Centre of Genomics and Policy, McGill University, 740, avenue Dr. Penfield, suite 5200, Montreal, Quebec, Canada
| | - Louise Bezuidenhout
- Institute for Science, Innovation and Society, University of Oxford, 64 Banbury Road, Oxford, OX2 6PN, UK
| | - Juan Bicarregui
- UKRI-STFC Rutherford Appleton Laboratory, Harwell Campus, Didcot, OX11 0QX, UK
| | - Timea Biro
- Digital Repository of Ireland, Royal Irish Academy, 19 Dawson St, Dublin 2, D02 HH58, Ireland
| | | | - Stephanie Russo Carroll
- Native Nations Institute at the Udall Center for Studies in Public Policy and the College of Public Health, University of Arizona, 803 E First ST, Tucson, AZ, 85719, USA
| | - Zoe Cournia
- Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou, Athens, 11527, Greece
| | | | - Gayo Diallo
- BPH INSERM1219 & LaBRI, Univ. Bordeaux, 146 rue Léo Saignat, F-33000, Bordeaux, France
| | - Thomas Duflot
- Normandie Univ, UNIROUEN, CHU Rouen, Department of Clinical Research, Rouen University Hospital, 1 Rue de Germont, Rouen Cedex, 76031, France
| | - Leyla Garcia
- ZB MED Information Centre for Life Sciences, Gleueler Str 60, Cologne, 50931, Germany
| | - Sandra Gesing
- University of Notre Dame Center for Research Computing, 814 Flanner Hall, Notre Dame, IN, 46556, USA
| | | | - Anupama Gururaj
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, 5601 Fishers Lane, Rockville, MD, 20852, USA
| | - Natalie Harrower
- Digital Repository of Ireland, Royal Irish Academy, 19 Dawson St, Dublin 2, D02 HH58, Ireland
| | - Dawei Lin
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, 5601 Fishers Lane, Rockville, MD, 20852, USA
| | - Claudia Medeiros
- Institute of Computing, University of Campinas, Av Albert Einstein 1251, Campinas, São Paulo, 13082-853, Brazil
| | - Eva Méndez
- Universidad Carlos III de Madrid, C/ Madrid, 128, Getafe (Madrid), 28903, Spain
| | - Natalie Meyers
- 250D Navari Center for Digital Scholarship, Hesburgh Library, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Daniel Mietchen
- School of Data Science, University of Virginia, P.O. Box 400249, Charlottesville, VA, 22904, USA
| | - Rajini Nagrani
- Leibniz Institute for Prevention Research and Epidemiology, Achterstrasse 30, Bremen, 28359, Germany
| | - Gustav Nilsonne
- Karolinska Institutet & Swedish National Data Service, Nobels väg 9, Stockholm, 17177, Sweden
| | - Simon Parker
- Cancer Research UK, 2 Redman Place, London, E20 1JQ, UK
| | - Brian Pickering
- University of Southampton, University Road, Southampton, SO17 1BJ, UK
| | - Amy Pienta
- ICPSR, University of Michigan, P.O. Box 1248, Ann Arbor, MI, 48106-1248, USA
| | - Panayiota Polydoratou
- OpenEdition/Department of Library Science, Archives and Information Systems, International Hellenic University, P.O. Box 141, Thessaloniki, 57400, Greece
| | - Fotis Psomopoulos
- Institute of Applied Biosciences (INAB), Centre for Research and Technology Hellas (CERTH), Thessaloniki, 57001, Greece
| | - Stephanie Rennes
- INRAE National Research Institute for Agriculture, Food and Environment, 147 Rue de l'Université, Paris, 75007, France
| | - Robyn Rowe
- Laurentian University, Ontario, P3E 2C6, Canada
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Hugh Shanahan
- Department of Computer Science, Royal Holloway, University of London, Bedford Building, Egham, TW20 0EX, UK
| | - Lina Sitz
- Indepedent Researcher, Strada Costiera, Trieste, 34151, Italy
| | - Joanne Stocks
- Division of Rheumatology, Orthopedics and Dermatology, School of Medicine, University of Nottingham, Queens Medical Centre, Nottingham, NG7 2UH, UK
| | | | - Mary Uhlmansiek
- Research Data Alliance - US Region (RDA-US), c/o Ronin Institute, 127 Haddon Place, Montclair, NJ, 07043, USA
| | | |
Collapse
|
19
|
Serrano-Solano B, Föll MC, Gallardo-Alba C, Erxleben A, Rasche H, Hiltemann S, Fahrner M, Dunning MJ, Schulz MH, Scholtz B, Clements D, Nekrutenko A, Batut B, Grüning BA. Fostering accessible online education using Galaxy as an e-learning platform. PLoS Comput Biol 2021; 17:e1008923. [PMID: 33983944 PMCID: PMC8118283 DOI: 10.1371/journal.pcbi.1008923] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The COVID-19 pandemic is shifting teaching to an online setting all over the world. The Galaxy framework facilitates the online learning process and makes it accessible by providing a library of high-quality community-curated training materials, enabling easy access to data and tools, and facilitates sharing achievements and progress between students and instructors. By combining Galaxy with robust communication channels, effective instruction can be designed inclusively, regardless of the students' environments.
Collapse
Affiliation(s)
- Beatriz Serrano-Solano
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Melanie C. Föll
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, United States of America
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Cristóbal Gallardo-Alba
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Anika Erxleben
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Helena Rasche
- Avans Hogeschool, Breda, the Netherlands
- Erasmus Medical Center, Clinical Bioinformatics Group, Department of Pathology, Rotterdam, the Netherlands
| | - Saskia Hiltemann
- Erasmus Medical Center, Clinical Bioinformatics Group, Department of Pathology, Rotterdam, the Netherlands
| | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Faculty of Biology, University of Freiburg, Freiburg, Germany
- Spemann Graduate School of Biology and Medicine, University of Freiburg, Freiburg, Germany
| | - Mark J. Dunning
- Faculty of Medicine, Dentistry and Health, University of Sheffield, Sheffield, United Kingdom
| | - Marcel H. Schulz
- Institute for Cardiovascular Regeneration, Goethe University, Frankfurt am Main, Germany
| | - Beáta Scholtz
- University of Debrecen, Faculty of Medicine, Dept. of Biochemistry and Molecular Biology, Debrecen, Hungary
| | - Dave Clements
- Johns Hopkins University, Baltimore Maryland, United States of America
| | - Anton Nekrutenko
- Center for Comparative Genomics and Bioinformatics, Penn State University, State College, Pennsylvania, United States of America
| | - Bérénice Batut
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Björn A. Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| |
Collapse
|
20
|
Bai J, Bandla C, Guo J, Alvarez RV, Bai M, Vizcaíno JA, Moreno P, Grüning B, Sallou O, Perez-Riverol Y. BioContainers Registry: Searching Bioinformatics and Proteomics Tools, Packages, and Containers. J Proteome Res 2021; 20:2056-2061. [PMID: 33625229 PMCID: PMC7611561 DOI: 10.1021/acs.jproteome.0c00904] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
BioContainers is an open-source project that aims to create, store, and distribute bioinformatics software containers and packages. The BioContainers community has developed a set of guidelines to standardize software containers including the metadata, versions, licenses, and software dependencies. BioContainers supports multiple packaging and container technologies such as Conda, Docker, and Singularity. The BioContainers provide over 9000 bioinformatics tools, including more than 200 proteomics and mass spectrometry tools. Here we introduce the BioContainers Registry and Restful API to make containerized bioinformatics tools more findable, accessible, interoperable, and reusable (FAIR). The BioContainers Registry provides a fast and convenient way to find and retrieve bioinformatics tool packages and containers. By doing so, it will increase the use of bioinformatics packages and containers while promoting replicability and reproducibility in research.
Collapse
Affiliation(s)
- Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jiaxin Guo
- College of Bioinformation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Roberto Vera Alvarez
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing, 400065, China
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg,79110, Germany
| | - Olivier Sallou
- Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA/INRIA) -GenOuest Platform, Université de Rennes, Rennes, France
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
21
|
Olivella R, Chiva C, Serret M, Mancera D, Cozzuto L, Hermoso A, Borràs E, Espadas G, Morales J, Pastor O, Solé A, Ponomarenko J, Sabidó E. QCloud2: An Improved Cloud-based Quality-Control System for Mass-Spectrometry-based Proteomics Laboratories. J Proteome Res 2021; 20:2010-2013. [PMID: 33724836 DOI: 10.1021/acs.jproteome.0c00853] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
QCloud is a cloud-based system to support proteomics laboratories in daily quality assessment using a user-friendly interface, easy setup, and automated data processing. Since its release, QCloud has facilitated automated quality control for proteomics experiments in many laboratories. QCloud provides a quick and effortless evaluation of instrument performance that helps to overcome many analytical challenges derived from clinical and translational research. Here we present an improved version of the system, QCloud2. This new version includes enhancements in the scalability and reproducibility of the quality-control pipelines, and it features an improved front end for data visualization, user management, and chart annotation. The QCloud2 system also includes programmatic access and a standalone local version.
Collapse
Affiliation(s)
- Roger Olivella
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Cristina Chiva
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Marc Serret
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Daniel Mancera
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Luca Cozzuto
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Antoni Hermoso
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Eva Borràs
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Guadalupe Espadas
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Julia Morales
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Olga Pastor
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Amanda Solé
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Julia Ponomarenko
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| | - Eduard Sabidó
- Centre de Regulació Genòmica (CRG), Barcelona Institute of Science and Technology (BIST), Dr. Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, Barcelona 08003, Spain
| |
Collapse
|
22
|
Petrillo M, Fabbri M, Kagkli DM, Querci M, Van den Eede G, Alm E, Aytan-Aktug D, Capella-Gutierrez S, Carrillo C, Cestaro A, Chan KG, Coque T, Endrullat C, Gut I, Hammer P, Kay GL, Madec JY, Mather AE, McHardy AC, Naas T, Paracchini V, Peter S, Pightling A, Raffael B, Rossen J, Ruppé E, Schlaberg R, Vanneste K, Weber LM, Westh H, Angers-Loustau A. A roadmap for the generation of benchmarking resources for antimicrobial resistance detection using next generation sequencing. F1000Res 2021; 10:80. [DOI: 10.12688/f1000research.39214.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/02/2021] [Indexed: 01/12/2023] Open
Abstract
Next Generation Sequencing technologies significantly impact the field of Antimicrobial Resistance (AMR) detection and monitoring, with immediate uses in diagnosis and risk assessment. For this application and in general, considerable challenges remain in demonstrating sufficient trust to act upon the meaningful information produced from raw data, partly because of the reliance on bioinformatics pipelines, which can produce different results and therefore lead to different interpretations. With the constant evolution of the field, it is difficult to identify, harmonise and recommend specific methods for large-scale implementations over time. In this article, we propose to address this challenge through establishing a transparent, performance-based, evaluation approach to provide flexibility in the bioinformatics tools of choice, while demonstrating proficiency in meeting common performance standards. The approach is two-fold: first, a community-driven effort to establish and maintain “live” (dynamic) benchmarking platforms to provide relevant performance metrics, based on different use-cases, that would evolve together with the AMR field; second, agreed and defined datasets to allow the pipelines’ implementation, validation, and quality-control over time. Following previous discussions on the main challenges linked to this approach, we provide concrete recommendations and future steps, related to different aspects of the design of benchmarks, such as the selection and the characteristics of the datasets (quality, choice of pathogens and resistances, etc.), the evaluation criteria of the pipelines, and the way these resources should be deployed in the community.
Collapse
|
23
|
Ison J, Ienasescu H, Rydza E, Chmura P, Rapacki K, Gaignard A, Schwämmle V, van Helden J, Kalaš M, Ménager H. biotoolsSchema: a formalized schema for bioinformatics software description. Gigascience 2021; 10:giaa157. [PMID: 33506265 PMCID: PMC7842104 DOI: 10.1093/gigascience/giaa157] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 11/10/2020] [Accepted: 12/07/2020] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description-and cataloguing-of bioinformatics resources. FINDINGS Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability. CONCLUSIONS biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.
Collapse
Affiliation(s)
- Jon Ison
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
| | - Hans Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| | - Emil Rydza
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200 København, Denmark
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200 København, Denmark
| | - Kristoffer Rapacki
- Department of Health Technology, Ørsteds Plads, Building 345C, DK-2800 Kongens, Lyngby, Denmark
| | - Alban Gaignard
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- L'institut du Thorax, INSERM, CNRS, University of Nantes, 44007 Nantes, France
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Jacques van Helden
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- Département de Biologie, Aix-Marseille Université (AMU), 3 place Victor Hugo, 13003 Marseille, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5008 Bergen, Norway
| | - Hervé Ménager
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- Hub de Bioinformatique et Biostatistique–Département Biologie Computationnelle, Institut Pasteur, USR 3756, CNRS, Paris 75015, France
| |
Collapse
|
24
|
Suhr M, Lehmann C, Bauer CR, Bender T, Knopp C, Freckmann L, Öst Hansen B, Henke C, Aschenbrandt G, Kühlborn LK, Rheinländer S, Weber L, Marzec B, Hellkamp M, Wieder P, Sax U, Kusch H, Nussbeck SY. Menoci: lightweight extensible web portal enhancing data management for biomedical research projects. BMC Bioinformatics 2020; 21:582. [PMID: 33334310 PMCID: PMC7745495 DOI: 10.1186/s12859-020-03928-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 12/09/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biomedical research projects deal with data management requirements from multiple sources like funding agencies' guidelines, publisher policies, discipline best practices, and their own users' needs. We describe functional and quality requirements based on many years of experience implementing data management for the CRC 1002 and CRC 1190. A fully equipped data management software should improve documentation of experiments and materials, enable data storage and sharing according to the FAIR Guiding Principles while maximizing usability, information security, as well as software sustainability and reusability. RESULTS We introduce the modular web portal software menoci for data collection, experiment documentation, data publication, sharing, and preservation in biomedical research projects. Menoci modules are based on the Drupal content management system which enables lightweight deployment and setup, and creates the possibility to combine research data management with a customisable project home page or collaboration platform. CONCLUSIONS Management of research data and digital research artefacts is transforming from individual researcher or groups best practices towards project- or organisation-wide service infrastructures. To enable and support this structural transformation process, a vital ecosystem of open source software tools is needed. Menoci is a contribution to this ecosystem of research data management tools that is specifically designed to support biomedical research projects.
Collapse
Affiliation(s)
- M Suhr
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany.
| | - C Lehmann
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - C R Bauer
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - T Bender
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - C Knopp
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - L Freckmann
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - B Öst Hansen
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - C Henke
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - G Aschenbrandt
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - L K Kühlborn
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - S Rheinländer
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - L Weber
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - B Marzec
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - M Hellkamp
- GWDG, Gesellschaft für Wissenschaftliche Datenverarbeitung mbH Göttingen, Am Faßberg 11, 37077, Göttingen, Germany
| | - P Wieder
- GWDG, Gesellschaft für Wissenschaftliche Datenverarbeitung mbH Göttingen, Am Faßberg 11, 37077, Göttingen, Germany
| | - U Sax
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
| | - H Kusch
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
- Department of Molecular Biology, University Medical Center Göttingen, Humboldtallee 23, 37075, Göttingen, Germany
| | - S Y Nussbeck
- Department of Medical Informatics, University Medical Center Göttingen, von-Siebold-Str. 3, 37075, Göttingen, Germany
- University Medical Center Göttingen, UMG Biobank, Robert-Koch-Str. 40, 37075, Göttingen, Germany
| |
Collapse
|
25
|
Austin CC, Bernier A, Bezuidenhout L, Bicarregui J, Biro T, Cambon-Thomsen A, Carroll SR, Cournia Z, Dabrowski PW, Diallo G, Duflot T, Garcia L, Gesing S, Gonzalez-Beltran A, Gururaj A, Harrower N, Lin D, Medeiros C, Méndez E, Meyers N, Mietchen D, Nagrani R, Nilsonne G, Parker S, Pickering B, Pienta A, Polydoratou P, Psomopoulos F, Rennes S, Rowe R, Sansone SA, Shanahan H, Sitz L, Stocks J, Tovani-Palone MR, Uhlmansiek M. Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group. Wellcome Open Res 2020; 5:267. [PMID: 33501381 PMCID: PMC7808050 DOI: 10.12688/wellcomeopenres.16378.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2020] [Indexed: 08/31/2023] Open
Abstract
The systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. The Research Data Alliance COVID-19 Working Group recently published a set of recommendations and guidelines on data sharing and related best practices for COVID-19 research. These guidelines include recommendations for researchers, policymakers, funders, publishers and infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations). Several overarching themes have emerged from this document such as the need to balance the creation of data adherent to FAIR principles (findable, accessible, interoperable and reusable), with the need for quick data release; the use of trustworthy research data repositories; the use of well-annotated data with meaningful metadata; and practices of documenting methods and software. The resulting document marks an unprecedented cross-disciplinary, cross-sectoral, and cross-jurisdictional effort authored by over 160 experts from around the globe. This letter summarises key points of the Recommendations and Guidelines, highlights the relevant findings, shines a spotlight on the process, and suggests how these developments can be leveraged by the wider scientific community.
Collapse
Affiliation(s)
- Claire C. Austin
- Environment and Climate Change Canada, 351 boul. St-Joseph, Gatineau, Quebec, K1A 0H3, Canada
| | - Alexander Bernier
- Centre of Genomics and Policy, McGill University, 740, avenue Dr. Penfield, suite 5200, Montreal, Quebec, Canada
| | - Louise Bezuidenhout
- Institute for Science, Innovation and Society, University of Oxford, 64 Banbury Road, Oxford, OX2 6PN, UK
| | - Juan Bicarregui
- UKRI-STFC Rutherford Appleton Laboratory, Harwell Campus, Didcot, OX11 0QX, UK
| | - Timea Biro
- Digital Repository of Ireland, Royal Irish Academy, 19 Dawson St, Dublin 2, D02 HH58, Ireland
| | | | - Stephanie Russo Carroll
- Native Nations Institute at the Udall Center for Studies in Public Policy and the College of Public Health, University of Arizona, 803 E First ST, Tucson, AZ, 85719, USA
| | - Zoe Cournia
- Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou, Athens, 11527, Greece
| | | | - Gayo Diallo
- BPH INSERM1219 & LaBRI, Univ. Bordeaux, 146 rue Léo Saignat, F-33000, Bordeaux, France
| | - Thomas Duflot
- Normandie Univ, UNIROUEN, CHU Rouen, Department of Clinical Research, Rouen University Hospital, 1 Rue de Germont, Rouen Cedex, 76031, France
| | - Leyla Garcia
- ZB MED Information Centre for Life Sciences, Gleueler Str 60, Cologne, 50931, Germany
| | - Sandra Gesing
- University of Notre Dame Center for Research Computing, 814 Flanner Hall, Notre Dame, IN, 46556, USA
| | | | - Anupama Gururaj
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, 5601 Fishers Lane, Rockville, MD, 20852, USA
| | - Natalie Harrower
- Digital Repository of Ireland, Royal Irish Academy, 19 Dawson St, Dublin 2, D02 HH58, Ireland
| | - Dawei Lin
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, 5601 Fishers Lane, Rockville, MD, 20852, USA
| | - Claudia Medeiros
- Institute of Computing, University of Campinas, Av Albert Einstein 1251, Campinas, São Paulo, 13082-853, Brazil
| | - Eva Méndez
- Universidad Carlos III de Madrid, C/ Madrid, 128, Getafe (Madrid), 28903, Spain
| | - Natalie Meyers
- 250D Navari Center for Digital Scholarship, Hesburgh Library, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Daniel Mietchen
- School of Data Science, University of Virginia, P.O. Box 400249, Charlottesville, VA, 22904, USA
| | - Rajini Nagrani
- Leibniz Institute for Prevention Research and Epidemiology, Achterstrasse 30, Bremen, 28359, Germany
| | - Gustav Nilsonne
- Karolinska Institutet & Swedish National Data Service, Nobels väg 9, Stockholm, 17177, Sweden
| | - Simon Parker
- Cancer Research UK, 2 Redman Place, London, E20 1JQ, UK
| | - Brian Pickering
- University of Southampton, University Road, Southampton, SO17 1BJ, UK
| | - Amy Pienta
- ICPSR, University of Michigan, P.O. Box 1248, Ann Arbor, MI, 48106-1248, USA
| | - Panayiota Polydoratou
- OpenEdition/Department of Library Science, Archives and Information Systems, International Hellenic University, P.O. Box 141, Thessaloniki, 57400, Greece
| | - Fotis Psomopoulos
- Institute of Applied Biosciences (INAB), Centre for Research and Technology Hellas (CERTH), Thessaloniki, 57001, Greece
| | - Stephanie Rennes
- INRAE National Research Institute for Agriculture, Food and Environment, 147 Rue de l'Université, Paris, 75007, France
| | - Robyn Rowe
- Laurentian University, Ontario, P3E 2C6, Canada
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Hugh Shanahan
- Department of Computer Science, Royal Holloway, University of London, Bedford Building, Egham, TW20 0EX, UK
| | - Lina Sitz
- Indepedent Researcher, Strada Costiera, Trieste, 34151, Italy
| | - Joanne Stocks
- Division of Rheumatology, Orthopedics and Dermatology, School of Medicine, University of Nottingham, Queens Medical Centre, Nottingham, NG7 2UH, UK
| | | | - Mary Uhlmansiek
- Research Data Alliance - US Region (RDA-US), c/o Ronin Institute, 127 Haddon Place, Montclair, NJ, 07043, USA
| | | |
Collapse
|
26
|
Bossu C, Heck T. Special Issue: Engaging with Open Science in Learning and Teaching. EDUCATION FOR INFORMATION 2020. [DOI: 10.3233/efi-200386] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Carina Bossu
- Institute of Educational Technology, The Open University, Milton Keynes, UK
| | - Tamara Heck
- Information Center for Education, DIPF, Leibniz Institute for Research and Information in Education, Frankfurt a. M., Germany
| |
Collapse
|
27
|
Lamprecht AL, Garcia L, Kuzak M, Martinez C, Arcila R, Martin Del Pico E, Dominguez Del Angel V, van de Sandt S, Ison J, Martinez PA, McQuilton P, Valencia A, Harrow J, Psomopoulos F, Gelpi JL, Chue Hong N, Goble C, Capella-Gutierrez S. Towards FAIR principles for research software. ACTA ACUST UNITED AC 2020. [DOI: 10.3233/ds-190026] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
| | - Leyla Garcia
- ZBMED Information Centre for Life Sciences, Germany. E-mail:
| | - Mateusz Kuzak
- Netherlands eScience Center, The Netherlands
- Dutch Techcentre for Life Sciences, The Netherlands. E-mail:
| | | | | | | | | | | | - Jon Ison
- National Life Science Supercomputing Center, Technical University of Denmark, Denmark. E-mail:
| | | | | | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Spain. E-mail:
| | | | | | - Josep Ll. Gelpi
- Barcelona Supercomputing Center (BSC), Spain
- University of Barcelona, Spain. E-mail:
| | - Neil Chue Hong
- Software Sustainability Institute, UK
- EPCC, University of Edinburgh, UK. E-mail:
| | | | | |
Collapse
|
28
|
Chen T, Tyagi S. Integrative computational epigenomics to build data-driven gene regulation hypotheses. Gigascience 2020; 9:giaa064. [PMID: 32543653 PMCID: PMC7297091 DOI: 10.1093/gigascience/giaa064] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 05/25/2020] [Accepted: 05/26/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Diseases are complex phenotypes often arising as an emergent property of a non-linear network of genetic and epigenetic interactions. To translate this resulting state into a causal relationship with a subset of regulatory features, many experiments deploy an array of laboratory assays from multiple modalities. Often, each of these resulting datasets is large, heterogeneous, and noisy. Thus, it is non-trivial to unify these complex datasets into an interpretable phenotype. Although recent methods address this problem with varying degrees of success, they are constrained by their scopes or limitations. Therefore, an important gap in the field is the lack of a universal data harmonizer with the capability to arbitrarily integrate multi-modal datasets. RESULTS In this review, we perform a critical analysis of methods with the explicit aim of harmonizing data, as opposed to case-specific integration. This revealed that matrix factorization, latent variable analysis, and deep learning are potent strategies. Finally, we describe the properties of an ideal universal data harmonization framework. CONCLUSIONS A sufficiently advanced universal harmonizer has major medical implications, such as (i) identifying dysregulated biological pathways responsible for a disease is a powerful diagnostic tool; (2) investigating these pathways further allows the biological community to better understand a disease's mechanisms; and (3) precision medicine also benefits from developments in this area, particularly in the context of the growing field of selective epigenome editing, which can suppress or induce a desired phenotype.
Collapse
Affiliation(s)
- Tyrone Chen
- 25 Rainforest Walk, School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia
| | - Sonika Tyagi
- 25 Rainforest Walk, School of Biological Sciences, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
29
|
Eicher T, Kinnebrew G, Patt A, Spencer K, Ying K, Ma Q, Machiraju R, Mathé EA. Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources. Metabolites 2020; 10:E202. [PMID: 32429287 PMCID: PMC7281435 DOI: 10.3390/metabo10050202] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 05/07/2020] [Accepted: 05/13/2020] [Indexed: 02/06/2023] Open
Abstract
As researchers are increasingly able to collect data on a large scale from multiple clinical and omics modalities, multi-omics integration is becoming a critical component of metabolomics research. This introduces a need for increased understanding by the metabolomics researcher of computational and statistical analysis methods relevant to multi-omics studies. In this review, we discuss common types of analyses performed in multi-omics studies and the computational and statistical methods that can be used for each type of analysis. We pinpoint the caveats and considerations for analysis methods, including required parameters, sample size and data distribution requirements, sources of a priori knowledge, and techniques for the evaluation of model accuracy. Finally, for the types of analyses discussed, we provide examples of the applications of corresponding methods to clinical and basic research. We intend that our review may be used as a guide for metabolomics researchers to choose effective techniques for multi-omics analyses relevant to their field of study.
Collapse
Affiliation(s)
- Tara Eicher
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Computer Science and Engineering Department, The Ohio State University College of Engineering, Columbus, OH 43210, USA
| | - Garrett Kinnebrew
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Comprehensive Cancer Center, The Ohio State University and James Cancer Hospital, Columbus, OH 43210, USA;
- Bioinformatics Shared Resource Group, The Ohio State University, Columbus, OH 43210, USA
| | - Andrew Patt
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, 9800 Medical Center Dr., Rockville, MD, 20892, USA;
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA
| | - Kyle Spencer
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA
- Nationwide Children’s Research Hospital, Columbus, OH 43210, USA
| | - Kevin Ying
- Comprehensive Cancer Center, The Ohio State University and James Cancer Hospital, Columbus, OH 43210, USA;
- Molecular, Cellular and Developmental Biology Program, The Ohio State University, Columbus, OH 43210, USA
| | - Qin Ma
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
| | - Raghu Machiraju
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Computer Science and Engineering Department, The Ohio State University College of Engineering, Columbus, OH 43210, USA
- Department of Pathology, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH 43210, USA
| | - Ewy A. Mathé
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, 9800 Medical Center Dr., Rockville, MD, 20892, USA;
| |
Collapse
|
30
|
Garcia L, Antezana E, Garcia A, Bolton E, Jimenez R, Prins P, Banda JM, Katayama T. Ten simple rules to run a successful BioHackathon. PLoS Comput Biol 2020; 16:e1007808. [PMID: 32379758 PMCID: PMC7205200 DOI: 10.1371/journal.pcbi.1007808] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Affiliation(s)
- Leyla Garcia
- ZB MED Information Centre for Life Sciences, Cologne, Germany
| | - Erick Antezana
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
- Bayer CropScience SA-NV, Diegem, Belgium
| | | | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | | | - Pjotr Prins
- University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Juan M. Banda
- Georgia State University, Atlanta, Georgia, United States of America
| | | |
Collapse
|
31
|
Mamo N, Martin GM, Desira M, Ellul B, Ebejer JP. Dwarna: a blockchain solution for dynamic consent in biobanking. Eur J Hum Genet 2020; 28:609-626. [PMID: 31844175 PMCID: PMC7170942 DOI: 10.1038/s41431-019-0560-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 11/13/2019] [Accepted: 11/26/2019] [Indexed: 11/08/2022] Open
Abstract
Dynamic consent aims to empower research partners and facilitate active participation in the research process. Used within the context of biobanking, it gives individuals access to information and control to determine how and where their biospecimens and data should be used. We present Dwarna-a web portal for 'dynamic consent' that acts as a hub connecting the different stakeholders of the Malta Biobank: biobank managers, researchers, research partners, and the general public. The portal stores research partners' consent in a blockchain to create an immutable audit trail of research partners' consent changes. Dwarna's structure also presents a solution to the European Union's General Data Protection Regulation's right to erasure-a right that is seemingly incompatible with the blockchain model. Dwarna's transparent structure increases trustworthiness in the biobanking process by giving research partners more control over which research studies they participate in, by facilitating the withdrawal of consent and by making it possible to request that the biospecimen and associated data are destroyed.
Collapse
Affiliation(s)
- Nicholas Mamo
- Centre for Molecular Medicine and Biobanking, Biomedical Sciences Building, University of Malta, Msida, MSD 2080, Malta
| | - Gillian M Martin
- Centre for Molecular Medicine and Biobanking, Biomedical Sciences Building, University of Malta, Msida, MSD 2080, Malta
- Department of Sociology, Faculty of Arts, University of Malta, Msida, MSD 2080, Malta
- BBMRI-ERIC, Neue Stiftingtalstraße 2/B/6, 8010, Graz, Austria
| | - Maria Desira
- Centre for Molecular Medicine and Biobanking, Biomedical Sciences Building, University of Malta, Msida, MSD 2080, Malta
| | - Bridget Ellul
- Department of Pathology, Faculty of Medicine and Surgery, University of Malta, Msida, MSD 2080, Malta
| | - Jean-Paul Ebejer
- Centre for Molecular Medicine and Biobanking, Biomedical Sciences Building, University of Malta, Msida, MSD 2080, Malta.
| |
Collapse
|
32
|
Anzt H, Bach F, Druskat S, Löffler F, Loewe A, Renard BY, Seemann G, Struck A, Achhammer E, Aggarwal P, Appel F, Bader M, Brusch L, Busse C, Chourdakis G, Dabrowski PW, Ebert P, Flemisch B, Friedl S, Fritzsch B, Funk MD, Gast V, Goth F, Grad JN, Hegewald J, Hermann S, Hohmann F, Janosch S, Kutra D, Linxweiler J, Muth T, Peters-Kottig W, Rack F, Raters FH, Rave S, Reina G, Reißig M, Ropinski T, Schaarschmidt J, Seibold H, Thiele JP, Uekermann B, Unger S, Weeber R. An environment for sustainable research software in Germany and beyond: current state, open challenges, and call for action. F1000Res 2020; 9:295. [PMID: 33552475 PMCID: PMC7845155 DOI: 10.12688/f1000research.23224.1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/09/2020] [Indexed: 08/22/2023] Open
Abstract
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software must be available, discoverable, usable, and adaptable to new needs, both now and in the future. Research software therefore requires an environment that supports sustainability. Hence, a change is needed in the way research software development and maintenance are currently motivated, incentivized, funded, structurally and infrastructurally supported, and legally treated. Failing to do so will threaten the quality and validity of research. In this paper, we identify challenges for research software sustainability in Germany and beyond, in terms of motivation, selection, research software engineering personnel, funding, infrastructure, and legal aspects. Besides researchers, we specifically address political and academic decision-makers to increase awareness of the importance and needs of sustainable research software practices. In particular, we recommend strategies and measures to create an environment for sustainable research software, with the ultimate goal to ensure that software-driven research is valid, reproducible and sustainable, and that software is recognized as a first class citizen in research. This paper is the outcome of two workshops run in Germany in 2019, at deRSE19 - the first International Conference of Research Software Engineers in Germany - and a dedicated DFG-supported follow-up workshop in Berlin.
Collapse
Affiliation(s)
- Hartwig Anzt
- Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
- University of Tennessee, Knoxville, TN, USA
| | - Felix Bach
- Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| | - Stephan Druskat
- Friedrich Schiller University, Jena, Germany
- German Aerospace Center (DLR), Berlin, Germany
- Humboldt-Universität zu Berlin, Berlin, Germany
| | - Frank Löffler
- Friedrich Schiller University, Jena, Germany
- Louisiana State University, Baton Rouge, LA, USA
| | - Axel Loewe
- Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| | - Bernhard Y. Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Gunnar Seemann
- University Heart Centre Freiburg Bad Krozingen, Freiburg, Germany
| | | | | | | | - Franziska Appel
- Leibniz Institute of Agricultural Development in Transition Economies (IAMO), Halle (Saale), Germany
| | | | - Lutz Brusch
- Technische Universität Dresden, Dresden, Germany
| | | | | | | | - Peter Ebert
- Saarland Informatics Campus, Saarbrücken, Germany
| | | | | | | | | | - Volker Gast
- Friedrich Schiller University, Jena, Germany
| | | | | | | | | | | | - Stephan Janosch
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Dominik Kutra
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jan Linxweiler
- Technische Universität Braunschweig, Braunschweig, Germany
| | - Thilo Muth
- Federal Institute for Materials Research and Testing, Berlin, Germany
| | | | - Fabian Rack
- FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Karlsruhe, Germany
| | | | | | | | - Malte Reißig
- Institute for Advanced Sustainability Studies, Potsdam, Germany
| | - Timo Ropinski
- Ulm University, Ulm, Germany
- Linköping University, Linköping, Sweden
| | | | - Heidi Seibold
- Ludwig Maximilian University of Munich, München, Germany
| | | | | | - Stefan Unger
- Julius Kühn-Institut (JKI), Quedlinburg, Germany
| | | |
Collapse
|
33
|
Anzt H, Bach F, Druskat S, Löffler F, Loewe A, Renard BY, Seemann G, Struck A, Achhammer E, Aggarwal P, Appel F, Bader M, Brusch L, Busse C, Chourdakis G, Dabrowski PW, Ebert P, Flemisch B, Friedl S, Fritzsch B, Funk MD, Gast V, Goth F, Grad JN, Hegewald J, Hermann S, Hohmann F, Janosch S, Kutra D, Linxweiler J, Muth T, Peters-Kottig W, Rack F, Raters FH, Rave S, Reina G, Reißig M, Ropinski T, Schaarschmidt J, Seibold H, Thiele JP, Uekermann B, Unger S, Weeber R. An environment for sustainable research software in Germany and beyond: current state, open challenges, and call for action. F1000Res 2020; 9:295. [PMID: 33552475 PMCID: PMC7845155 DOI: 10.12688/f1000research.23224.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/11/2021] [Indexed: 11/20/2022] Open
Abstract
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software must be available, discoverable, usable, and adaptable to new needs, both now and in the future. Research software therefore requires an environment that supports sustainability. Hence, a change is needed in the way research software development and maintenance are currently motivated, incentivized, funded, structurally and infrastructurally supported, and legally treated. Failing to do so will threaten the quality and validity of research. In this paper, we identify challenges for research software sustainability in Germany and beyond, in terms of motivation, selection, research software engineering personnel, funding, infrastructure, and legal aspects. Besides researchers, we specifically address political and academic decision-makers to increase awareness of the importance and needs of sustainable research software practices. In particular, we recommend strategies and measures to create an environment for sustainable research software, with the ultimate goal to ensure that software-driven research is valid, reproducible and sustainable, and that software is recognized as a first class citizen in research. This paper is the outcome of two workshops run in Germany in 2019, at deRSE19 - the first International Conference of Research Software Engineers in Germany - and a dedicated DFG-supported follow-up workshop in Berlin.
Collapse
Affiliation(s)
- Hartwig Anzt
- Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
- University of Tennessee, Knoxville, TN, USA
| | - Felix Bach
- Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| | - Stephan Druskat
- Friedrich Schiller University, Jena, Germany
- German Aerospace Center (DLR), Berlin, Germany
- Humboldt-Universität zu Berlin, Berlin, Germany
| | - Frank Löffler
- Friedrich Schiller University, Jena, Germany
- Louisiana State University, Baton Rouge, LA, USA
| | - Axel Loewe
- Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| | - Bernhard Y. Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Gunnar Seemann
- University Heart Centre Freiburg Bad Krozingen, Freiburg, Germany
| | | | | | | | - Franziska Appel
- Leibniz Institute of Agricultural Development in Transition Economies (IAMO), Halle (Saale), Germany
| | | | - Lutz Brusch
- Technische Universität Dresden, Dresden, Germany
| | | | | | | | - Peter Ebert
- Saarland Informatics Campus, Saarbrücken, Germany
| | | | | | | | | | - Volker Gast
- Friedrich Schiller University, Jena, Germany
| | | | | | | | | | | | - Stephan Janosch
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Dominik Kutra
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jan Linxweiler
- Technische Universität Braunschweig, Braunschweig, Germany
| | - Thilo Muth
- Federal Institute for Materials Research and Testing, Berlin, Germany
| | | | - Fabian Rack
- FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Karlsruhe, Germany
| | | | | | | | - Malte Reißig
- Institute for Advanced Sustainability Studies, Potsdam, Germany
| | - Timo Ropinski
- Ulm University, Ulm, Germany
- Linköping University, Linköping, Sweden
| | | | - Heidi Seibold
- Ludwig Maximilian University of Munich, München, Germany
| | | | | | - Stefan Unger
- Julius Kühn-Institut (JKI), Quedlinburg, Germany
| | | |
Collapse
|
34
|
Pospelov G, Van Herck W, Burle J, Carmona Loaiza JM, Durniak C, Fisher JM, Ganeva M, Yurov D, Wuttke J. BornAgain: software for simulating and fitting grazing-incidence small-angle scattering. J Appl Crystallogr 2020; 53:262-276. [PMID: 32047414 PMCID: PMC6998781 DOI: 10.1107/s1600576719016789] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 12/15/2019] [Indexed: 01/24/2023] Open
Abstract
BornAgain is a free and open-source multi-platform software framework for simulating and fitting X-ray and neutron reflectometry, off-specular scattering, and grazing-incidence small-angle scattering (GISAS). This paper concentrates on GISAS. Support for reflectometry and off-specular scattering has been added more recently, is still under intense development and will be described in a later publication. BornAgain supports neutron polarization and magnetic scattering. Users can define sample and instrument models through Python scripting. A large subset of the functionality is also available through a graphical user interface. This paper describes the software in terms of the realized non-functional and functional requirements. The web site https://www.bornagainproject.org/ provides further documentation.
Collapse
Affiliation(s)
- Gennady Pospelov
- Jülich Centre for Neutron Science (JCNS) at Heinz Maier-Leibnitz Zentrum (MLZ), Forschungszentrum Jülich GmbH, Lichtenbergstrasse 1, Garching, 85748, Germany
| | - Walter Van Herck
- Jülich Centre for Neutron Science (JCNS) at Heinz Maier-Leibnitz Zentrum (MLZ), Forschungszentrum Jülich GmbH, Lichtenbergstrasse 1, Garching, 85748, Germany
| | - Jan Burle
- Jülich Centre for Neutron Science (JCNS) at Heinz Maier-Leibnitz Zentrum (MLZ), Forschungszentrum Jülich GmbH, Lichtenbergstrasse 1, Garching, 85748, Germany
| | - Juan M. Carmona Loaiza
- Jülich Centre for Neutron Science (JCNS) at Heinz Maier-Leibnitz Zentrum (MLZ), Forschungszentrum Jülich GmbH, Lichtenbergstrasse 1, Garching, 85748, Germany
| | - Céline Durniak
- Jülich Centre for Neutron Science (JCNS) at Heinz Maier-Leibnitz Zentrum (MLZ), Forschungszentrum Jülich GmbH, Lichtenbergstrasse 1, Garching, 85748, Germany
| | - Jonathan M. Fisher
- Jülich Centre for Neutron Science (JCNS) at Heinz Maier-Leibnitz Zentrum (MLZ), Forschungszentrum Jülich GmbH, Lichtenbergstrasse 1, Garching, 85748, Germany
| | - Marina Ganeva
- Jülich Centre for Neutron Science (JCNS) at Heinz Maier-Leibnitz Zentrum (MLZ), Forschungszentrum Jülich GmbH, Lichtenbergstrasse 1, Garching, 85748, Germany
| | - Dmitry Yurov
- Jülich Centre for Neutron Science (JCNS) at Heinz Maier-Leibnitz Zentrum (MLZ), Forschungszentrum Jülich GmbH, Lichtenbergstrasse 1, Garching, 85748, Germany
| | - Joachim Wuttke
- Jülich Centre for Neutron Science (JCNS) at Heinz Maier-Leibnitz Zentrum (MLZ), Forschungszentrum Jülich GmbH, Lichtenbergstrasse 1, Garching, 85748, Germany
| |
Collapse
|
35
|
Bonaretti S, Gold GE, Beaupre GS. pyKNEEr: An image analysis workflow for open and reproducible research on femoral knee cartilage. PLoS One 2020; 15:e0226501. [PMID: 31978052 PMCID: PMC6980400 DOI: 10.1371/journal.pone.0226501] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Accepted: 11/27/2019] [Indexed: 02/04/2023] Open
Abstract
Transparent research in musculoskeletal imaging is fundamental to reliably investigate diseases such as knee osteoarthritis (OA), a chronic disease impairing femoral knee cartilage. To study cartilage degeneration, researchers have developed algorithms to segment femoral knee cartilage from magnetic resonance (MR) images and to measure cartilage morphology and relaxometry. The majority of these algorithms are not publicly available or require advanced programming skills to be compiled and run. However, to accelerate discoveries and findings, it is crucial to have open and reproducible workflows. We present pyKNEEr, a framework for open and reproducible research on femoral knee cartilage from MR images. pyKNEEr is written in python, uses Jupyter notebook as a user interface, and is available on GitHub with a GNU GPLv3 license. It is composed of three modules: 1) image preprocessing to standardize spatial and intensity characteristics; 2) femoral knee cartilage segmentation for intersubject, multimodal, and longitudinal acquisitions; and 3) analysis of cartilage morphology and relaxometry. Each module contains one or more Jupyter notebooks with narrative, code, visualizations, and dependencies to reproduce computational environments. pyKNEEr facilitates transparent image-based research of femoral knee cartilage because of its ease of installation and use, and its versatility for publication and sharing among researchers. Finally, due to its modular structure, pyKNEEr favors code extension and algorithm comparison. We tested our reproducible workflows with experiments that also constitute an example of transparent research with pyKNEEr, and we compared pyKNEEr performances to existing algorithms in literature review visualizations. We provide links to executed notebooks and executable environments for immediate reproducibility of our findings.
Collapse
Affiliation(s)
- Serena Bonaretti
- Department of Radiology, Stanford University, Stanford, CA, United States of America
- Musculoskeletal Research Laboratory, VA Palo Alto Health Care System, Palo Alto, CA, United States of America
| | - Garry E. Gold
- Department of Radiology, Stanford University, Stanford, CA, United States of America
| | - Gary S. Beaupre
- Musculoskeletal Research Laboratory, VA Palo Alto Health Care System, Palo Alto, CA, United States of America
- Department of Bioengineering, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
36
|
Kuzniar A, Maassen J, Verhoeven S, Santuari L, Shneider C, Kloosterman WP, de Ridder J. sv-callers: a highly portable parallel workflow for structural variant detection in whole-genome sequence data. PeerJ 2020; 8:e8214. [PMID: 31934500 PMCID: PMC6951283 DOI: 10.7717/peerj.8214] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 11/14/2019] [Indexed: 12/19/2022] Open
Abstract
Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases including cancer. Despite the advances in whole genome sequencing, comprehensive and accurate detection of SVs in short-read data still poses some practical and computational challenges. We present sv-callers, a highly portable workflow that enables parallel execution of multiple SV detection tools, as well as provide users with example analyses of detected SV callsets in a Jupyter Notebook. This workflow supports easy deployment of software dependencies, configuration and addition of new analysis tools. Moreover, porting it to different computing systems requires minimal effort. Finally, we demonstrate the utility of the workflow by performing both somatic and germline SV analyses on different high-performance computing systems.
Collapse
Affiliation(s)
| | | | | | - Luca Santuari
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Carl Shneider
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Wigard P Kloosterman
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, Netherlands
| |
Collapse
|
37
|
MacGowan SA, Madeira F, Britto‐Borges T, Warowny M, Drozdetskiy A, Procter JB, Barton GJ. The Dundee Resource for Sequence Analysis and Structure Prediction. Protein Sci 2020; 29:277-297. [PMID: 31710725 PMCID: PMC6933851 DOI: 10.1002/pro.3783] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Revised: 11/07/2019] [Accepted: 11/07/2019] [Indexed: 11/06/2022]
Abstract
The Dundee Resource for Sequence Analysis and Structure Prediction (DRSASP; http://www.compbio.dundee.ac.uk/drsasp.html) is a collection of web services provided by the Barton Group at the University of Dundee. DRSASP's flagship services are the JPred4 webserver for secondary structure and solvent accessibility prediction and the JABAWS 2.2 webserver for multiple sequence alignment, disorder prediction, amino acid conservation calculations, and specificity-determining site prediction. DRSASP resources are available through conventional web interfaces and APIs but are also integrated into the Jalview sequence analysis workbench, which enables the composition of multitool interactive workflows. Other existing Barton Group tools are being brought under the banner of DRSASP, including NoD (Nucleolar localization sequence detector) and 14-3-3-Pred. New resources are being developed that enable the analysis of population genetic data in evolutionary and 3D structural contexts. Existing resources are actively developed to exploit new technologies and maintain parity with evolving web standards. DRSASP provides substantial computational resources for public use, and since 2016 DRSASP services have completed over 1.5 million jobs.
Collapse
Affiliation(s)
- Stuart A. MacGowan
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| | - Fábio Madeira
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| | - Thiago Britto‐Borges
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| | - Mateusz Warowny
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| | - Alexey Drozdetskiy
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| | - James B. Procter
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| | - Geoffrey J. Barton
- Division of Computational BiologyCollege of Life Sciences, University of DundeeUK
| |
Collapse
|
38
|
Cereceda O, Quinn DE. A graduate student perspective on overcoming barriers to interacting with open-source software. Facets (Ott) 2020. [DOI: 10.1139/facets-2019-0020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Computational methods, coding, and software are important tools for conducting research. In both academic and industry data analytics, open-source software (OSS) has gained massive popularity. Collaborative source code allows students to interact with researchers, code developers, and users from a variety of disciplines. Based on the authors’ experiences as graduate students and coding instructors, this paper provides a unique overview of the obstacles that graduate students face in obtaining the knowledge and skills required to complete their research and in transitioning from an OSS user to a contributor: psychological, practical, and cultural barriers and challenges specific to graduate students including cognitive load in graduate school, the importance of a knowledgeable mentor, seeking help from both the online and local communities, and the ongoing campaign to recognize software as research output in career and degree progression. Specific and practical steps are recommended to provide a foundation for graduate students, supervisors, administrators, and members of the OSS community to help overcome these obstacles. In conclusion, the objective of these recommendations is to describe a possible framework that individuals from across the scientific community can adapt to their needs and facilitate a sustainable feedback loop between graduate students and OSS.
Collapse
Affiliation(s)
- Oihane Cereceda
- Faculty of Engineering and Applied Science, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| | - Danielle E.A. Quinn
- Faculty of Science, Memorial University of Newfoundland, St. John’s, NL A1C 5S7, Canada
| |
Collapse
|
39
|
Wilde H, Knight V, Gillard J. Evolutionary dataset optimisation: learning algorithm quality through evolution. APPL INTELL 2019. [DOI: 10.1007/s10489-019-01592-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractIn this paper we propose a novel method for learning how algorithms perform. Classically, algorithms are compared on a finite number of existing (or newly simulated) benchmark datasets based on some fixed metrics. The algorithm(s) with the smallest value of this metric are chosen to be the ‘best performing’. We offer a new approach to flip this paradigm. We instead aim to gain a richer picture of the performance of an algorithm by generating artificial data through genetic evolution, the purpose of which is to create populations of datasets for which a particular algorithm performs well on a given metric. These datasets can be studied so as to learn what attributes lead to a particular progression of a given algorithm. Following a detailed description of the algorithm as well as a brief description of an open source implementation, a case study in clustering is presented. This case study demonstrates the performance and nuances of the method which we call Evolutionary Dataset Optimisation. In this study, a number of known properties about preferable datasets for the clustering algorithms known as k-means and DBSCAN are realised in the generated datasets.
Collapse
|
40
|
Gilbert J, Pearcy N, Norman R, Millat T, Winzer K, King J, Hodgman C, Minton N, Twycross J. Gsmodutils: a python based framework for test-driven genome scale metabolic model development. Bioinformatics 2019; 35:3397-3403. [PMID: 30759197 PMCID: PMC6748746 DOI: 10.1093/bioinformatics/btz088] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 01/29/2019] [Accepted: 02/12/2019] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Genome scale metabolic models (GSMMs) are increasingly important for systems biology and metabolic engineering research as they are capable of simulating complex steady-state behaviour. Constraints based models of this form can include thousands of reactions and metabolites, with many crucial pathways that only become activated in specific simulation settings. However, despite their widespread use, power and the availability of tools to aid with the construction and analysis of large scale models, little methodology is suggested for their continued management. For example, when genome annotations are updated or new understanding regarding behaviour is discovered, models often need to be altered to reflect this. This is quickly becoming an issue for industrial systems and synthetic biotechnology applications, which require good quality reusable models integral to the design, build, test and learn cycle. RESULTS As part of an ongoing effort to improve genome scale metabolic analysis, we have developed a test-driven development methodology for the continuous integration of validation data from different sources. Contributing to the open source technology based around COBRApy, we have developed the gsmodutils modelling framework placing an emphasis on test-driven design of models through defined test cases. Crucially, different conditions are configurable allowing users to examine how different designs or curation impact a wide range of system behaviours, minimizing error between model versions. AVAILABILITY AND IMPLEMENTATION The software framework described within this paper is open source and freely available from http://github.com/SBRCNottingham/gsmodutils. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- James Gilbert
- Synthetic Biology Research Centre, University of Nottingham, Nottingham, UK
| | - Nicole Pearcy
- Synthetic Biology Research Centre, University of Nottingham, Nottingham, UK
| | - Rupert Norman
- Synthetic Biology Research Centre, University of Nottingham, Nottingham, UK
- School of Biosciences, University of Nottingham, Sutton Bonington, Loughborough, UK
| | - Thomas Millat
- Synthetic Biology Research Centre, University of Nottingham, Nottingham, UK
| | - Klaus Winzer
- Synthetic Biology Research Centre, University of Nottingham, Nottingham, UK
| | - John King
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
| | - Charlie Hodgman
- Synthetic Biology Research Centre, University of Nottingham, Nottingham, UK
- School of Biosciences, University of Nottingham, Sutton Bonington, Loughborough, UK
| | - Nigel Minton
- Synthetic Biology Research Centre, University of Nottingham, Nottingham, UK
| | - Jamie Twycross
- School of Computer Science, University of Nottingham, Nottingham, UK
| |
Collapse
|
41
|
Georgeson P, Syme A, Sloggett C, Chung J, Dashnow H, Milton M, Lonsdale A, Powell D, Seemann T, Pope B. Bionitio: demonstrating and facilitating best practices for bioinformatics command-line software. Gigascience 2019; 8:giz109. [PMID: 31544213 PMCID: PMC6755254 DOI: 10.1093/gigascience/giz109] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 07/16/2019] [Accepted: 08/13/2019] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Bioinformatics software tools are often created ad hoc, frequently by people without extensive training in software development. In particular, for beginners, the barrier to entry in bioinformatics software development is high, especially if they want to adopt good programming practices. Even experienced developers do not always follow best practices. This results in the proliferation of poorer-quality bioinformatics software, leading to limited scalability and inefficient use of resources; lack of reproducibility, usability, adaptability, and interoperability; and erroneous or inaccurate results. FINDINGS We have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in 1 of 12 programming languages. The resulting software is functional, carrying out a prototypical bioinformatics task, and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, progress logging, defined exit status values, a test suite, a version number, standardized building and packaging, user documentation, code documentation, a standard open source software license, software revision control, and containerization. CONCLUSIONS Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. This also benefits users because tools are more easily installed and consistent in their usage. Bionitio is released as open source software under the MIT License and is available at https://github.com/bionitio-team/bionitio.
Collapse
Affiliation(s)
- Peter Georgeson
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Victorian Comprehensive Cancer Centre, 305 Grattan Street, Melbourne, Victoria, Australia 3000
| | - Anna Syme
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
- Royal Botanic Gardens Victoria, Birdwood Avenue, Melbourne, Victoria, Australia 3004
| | - Clare Sloggett
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
| | - Jessica Chung
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
| | - Harriet Dashnow
- Bioinformatics, Murdoch Children's Research Institute, Royal Children's Hospital, Flemington Road, Parkville, Victoria, Australia 3052
- School of BioSciences, The University of Melbourne, Royal Parade, Parkville, Victoria, Australia 3052
| | - Michael Milton
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
- Melbourne Genomics Health Alliance, Walter and Eliza Hall Institute, 1G Royal Parade, Parkville, Victoria, Australia 3052
| | - Andrew Lonsdale
- Bioinformatics, Murdoch Children's Research Institute, Royal Children's Hospital, Flemington Road, Parkville, Victoria, Australia 3052
- ARC Centre of Excellence in Plant Cell Walls, School of BioSciences, The University of Melbourne, Royal Parade, Parkville, Victoria, Australia 3052
| | - David Powell
- Monash Bioinformatics Platform, Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences, 15 Innovation Walk, Monash University, Clayton, Victoria, Australia 3800
| | - Torsten Seemann
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
- Department of Microbiology and Immunology, Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street Melbourne, Victoria, Australia 3000
| | - Bernard Pope
- Melbourne Bioinformatics, The University of Melbourne, 187 Grattan Street, Carlton, Victoria, Australia 3053
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Victorian Comprehensive Cancer Centre, 305 Grattan Street, Melbourne, Victoria, Australia 3000
- Department of Medicine, Central Clinical School, Monash University, Clayton, Victoria, Australia 3800
| |
Collapse
|
42
|
Wolff J, Bhardwaj V, Nothjunge S, Richard G, Renschler G, Gilsbach R, Manke T, Backofen R, Ramírez F, Grüning BA. Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization. Nucleic Acids Res 2019; 46:W11-W16. [PMID: 29901812 PMCID: PMC6031062 DOI: 10.1093/nar/gky504] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Accepted: 05/22/2018] [Indexed: 11/13/2022] Open
Abstract
Galaxy HiCExplorer is a web server that facilitates the study of the 3D conformation of chromatin by allowing Hi-C data processing, analysis and visualization. With the Galaxy HiCExplorer web server, users with little bioinformatic background can perform every step of the analysis in one workflow: mapping of the raw sequence data, creation of Hi-C contact matrices, quality assessment, correction of contact matrices and identification of topological associated domains (TADs) and A/B compartments. Users can create publication ready plots of the contact matrix, A/B compartments, and TADs on a selected genomic locus, along with additional information like gene tracks or ChIP-seq signals. Galaxy HiCExplorer is freely usable at: https://hicexplorer.usegalaxy.eu and is available as a Docker container: https://github.com/deeptools/docker-galaxy-hicexplorer.
Collapse
Affiliation(s)
- Joachim Wolff
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Vivek Bhardwaj
- Max Planck Institute of Immunobiology and Epigenetics, Stübeweg 51, 79108 Freiburg im Breisgau.,Faculty of Biology, University of Freiburg, Schänzlestr. 1, 79104 Freiburg, Germany
| | - Stephan Nothjunge
- Institute of Experimental and Clinical Pharmacology and Toxicology, Faculty of Medicine, University of Freiburg, Albertstr. 25, 79104 Freiburg, Germany.,Hermann Staudinger Graduate School, University of Freiburg, Hebelstrasse 27, 79104 Freiburg, Germany
| | - Gautier Richard
- Max Planck Institute of Immunobiology and Epigenetics, Stübeweg 51, 79108 Freiburg im Breisgau.,IGEPP, INRA, Agrocampus Ouest, Univ Rennes, 35600 Le Rheu, France
| | - Gina Renschler
- Max Planck Institute of Immunobiology and Epigenetics, Stübeweg 51, 79108 Freiburg im Breisgau.,Faculty of Biology, University of Freiburg, Schänzlestr. 1, 79104 Freiburg, Germany
| | - Ralf Gilsbach
- Institute of Experimental and Clinical Pharmacology and Toxicology, Faculty of Medicine, University of Freiburg, Albertstr. 25, 79104 Freiburg, Germany
| | - Thomas Manke
- Max Planck Institute of Immunobiology and Epigenetics, Stübeweg 51, 79108 Freiburg im Breisgau
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany.,Center for Biological Systems Analysis (ZBSA), University of Freiburg, Habsburgerstr. 49, 79104 Freiburg, Germany.,BIOSS Centre for Biological Signaling Studies, University of Freiburg, Schänzlestr. 18, 79104 Freiburg, Germany
| | - Fidel Ramírez
- Max Planck Institute of Immunobiology and Epigenetics, Stübeweg 51, 79108 Freiburg im Breisgau
| | - Björn A Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany.,Center for Biological Systems Analysis (ZBSA), University of Freiburg, Habsburgerstr. 49, 79104 Freiburg, Germany
| |
Collapse
|
43
|
Abstract
Background: Evaluation of the quality of research software is a challenging and relevant issue, still not sufficiently addressed by the scientific community. Methods: Our contribution begins by defining, precisely but widely enough, the notions of research software and of its authors followed by a study of the evaluation issues, as the basis for the proposition of a sound assessment protocol: the CDUR procedure. Results: CDUR comprises four steps introduced as follows: Citation, to deal with correct RS identification, Dissemination, to measure good dissemination practices, Use, devoted to the evaluation of usability aspects, and Research, to assess the impact of the scientific work. Conclusions: Some conclusions and recommendations are finally included. The evaluation of research is the keystone to boost the evolution of the Open Science policies and practices. It is as well our belief that research software evaluation is a fundamental step to induce better research software practices and, thus, a step towards more efficient science.
Collapse
Affiliation(s)
- Teresa Gomez-Diaz
- Laboratoire d'Informatique Gaspard-Monge, Centre National de la Recherche Scientifique, University of Paris-Est Marne-la-Vallée, Marne-la-Vallée, France
| | | |
Collapse
|
44
|
Abstract
Background: Evaluation of the quality of research software is a challenging and relevant issue, still not sufficiently addressed by the scientific community. Methods: Our contribution begins by defining, precisely but widely enough, the notions of research software and of its authors followed by a study of the evaluation issues, as the basis for the proposition of a sound assessment protocol: the CDUR procedure. Results: CDUR comprises four steps introduced as follows: Citation, to deal with correct RS identification, Dissemination, to measure good dissemination practices, Use, devoted to the evaluation of usability aspects, and Research, to assess the impact of the scientific work. Conclusions: Some conclusions and recommendations are finally included. The evaluation of research is the keystone to boost the evolution of the Open Science policies and practices. It is as well our belief that research software evaluation is a fundamental step to induce better research software practices and, thus, a step towards more efficient science.
Collapse
Affiliation(s)
- Teresa Gomez-Diaz
- Laboratoire d'Informatique Gaspard-Monge, Centre National de la Recherche Scientifique, University of Paris-Est Marne-la-Vallée, Marne-la-Vallée, France
| | | |
Collapse
|
45
|
Mangul S, Mosqueiro T, Abdill RJ, Duong D, Mitchell K, Sarwal V, Hill B, Brito J, Littman RJ, Statz B, Lam AKM, Dayama G, Grieneisen L, Martin LS, Flint J, Eskin E, Blekhman R. Challenges and recommendations to improve the installability and archival stability of omics computational tools. PLoS Biol 2019; 17:e3000333. [PMID: 31220077 PMCID: PMC6605654 DOI: 10.1371/journal.pbio.3000333] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 07/02/2019] [Indexed: 01/07/2023] Open
Abstract
Developing new software tools for analysis of large-scale biological data is a key component of advancing modern biomedical research. Scientific reproduction of published findings requires running computational tools on data generated by such studies, yet little attention is presently allocated to the installability and archival stability of computational software tools. Scientific journals require data and code sharing, but none currently require authors to guarantee the continuing functionality of newly published tools. We have estimated the archival stability of computational biology software tools by performing an empirical analysis of the internet presence for 36,702 omics software resources published from 2005 to 2017. We found that almost 28% of all resources are currently not accessible through uniform resource locators (URLs) published in the paper they first appeared in. Among the 98 software tools selected for our installability test, 51% were deemed "easy to install," and 28% of the tools failed to be installed at all because of problems in the implementation. Moreover, for papers introducing new software, we found that the number of citations significantly increased when authors provided an easy installation process. We propose for incorporation into journal policy several practical solutions for increasing the widespread installability and archival stability of published bioinformatics software.
Collapse
Affiliation(s)
- Serghei Mangul
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, California, United States of America
| | - Thiago Mosqueiro
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, California, United States of America
| | - Richard J. Abdill
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Dat Duong
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Keith Mitchell
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Varuni Sarwal
- Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Brian Hill
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Jaqueline Brito
- Institute of Mathematics and Computer Science, University of São Paulo, São Paulo, Brazil
| | - Russell Jared Littman
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Benjamin Statz
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Angela Ka-Mei Lam
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
| | - Gargi Dayama
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Laura Grieneisen
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Lana S. Martin
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, California, United States of America
| | - Jonathan Flint
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, United States of America
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Human Genetics, University of California Los Angeles, Los Angeles, California, United States of America
| | - Ran Blekhman
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, Minnesota, United States of America
- Department of Ecology, Evolution, and Behavior, University of Minnesota, Minnesota, United States of America
| |
Collapse
|
46
|
Katz DS, McInnes LC, Bernholdt DE, Mayes AC, Hong NPC, Duckles J, Gesing S, Heroux MA, Hettrick S, Jimenez RC, Pierce M, Weaver B, Wilkins-Diehr N. Community Organizations: Changing the Culture in Which Research Software Is Developed and Sustained. Comput Sci Eng 2019. [DOI: 10.1109/mcse.2018.2883051] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
47
|
AlNoamany Y, Borghi JA. Towards computational reproducibility: researcher perspectives on the use and sharing of software. PeerJ Comput Sci 2018; 4:e163. [PMID: 33816816 PMCID: PMC7924683 DOI: 10.7717/peerj-cs.163] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 08/24/2018] [Indexed: 05/05/2023]
Abstract
Research software, which includes both source code and executables used as part of the research process, presents a significant challenge for efforts aimed at ensuring reproducibility. In order to inform such efforts, we conducted a survey to better understand the characteristics of research software as well as how it is created, used, and shared by researchers. Based on the responses of 215 participants, representing a range of research disciplines, we found that researchers create, use, and share software in a wide variety of forms for a wide variety of purposes, including data collection, data analysis, data visualization, data cleaning and organization, and automation. More participants indicated that they use open source software than commercial software. While a relatively small number of programming languages (e.g., Python, R, JavaScript, C++, MATLAB) are used by a large number, there is a long tail of languages used by relatively few. Between-group comparisons revealed that significantly more participants from computer science write source code and create executables than participants from other disciplines. Differences between researchers from computer science and other disciplines related to the knowledge of best practices of software creation and sharing were not statistically significant. While many participants indicated that they draw a distinction between the sharing and preservation of software, related practices and perceptions were often not aligned with those of the broader scholarly communications community.
Collapse
Affiliation(s)
- Yasmin AlNoamany
- University of California, Berkeley, CA, United States of America
| | - John A. Borghi
- California Digital Library, Oakland, CA, United States of America
| |
Collapse
|
48
|
Griffin PC, Khadake J, LeMay KS, Lewis SE, Orchard S, Pask A, Pope B, Roessner U, Russell K, Seemann T, Treloar A, Tyagi S, Christiansen JH, Dayalan S, Gladman S, Hangartner SB, Hayden HL, Ho WWH, Keeble-Gagnère G, Korhonen PK, Neish P, Prestes PR, Richardson MF, Watson-Haigh NS, Wyres KL, Young ND, Schneider MV. Best practice data life cycle approaches for the life sciences. F1000Res 2018; 6:1618. [PMID: 30109017 DOI: 10.12688/f1000research.12344.1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/17/2017] [Indexed: 11/20/2022] Open
Abstract
Throughout history, the life sciences have been revolutionised by technological advances; in our era this is manifested by advances in instrumentation for data generation, and consequently researchers now routinely handle large amounts of heterogeneous data in digital formats. The simultaneous transitions towards biology as a data science and towards a 'life cycle' view of research data pose new challenges. Researchers face a bewildering landscape of data management requirements, recommendations and regulations, without necessarily being able to access data management training or possessing a clear understanding of practical approaches that can assist in data management in their particular research domain. Here we provide an overview of best practice data life cycle approaches for researchers in the life sciences/bioinformatics space with a particular focus on 'omics' datasets and computer-based data processing and analysis. We discuss the different stages of the data life cycle and provide practical suggestions for useful tools and resources to improve data management practices.
Collapse
Affiliation(s)
- Philippa C Griffin
- EMBL Australia Bioinformatics Resource, The University of Melbourne, Parkville, VIC, 3010, Australia.,Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Jyoti Khadake
- NIHR BioResource, University of Cambridge and Cambridge University Hospitals NHS Foundation Trust Hills Road, Cambridge , CB2 0QQ, UK
| | - Kate S LeMay
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Suzanna E Lewis
- Lawrence Berkeley National Laboratory, Environmental Genomics and Systems Biology Division, Berkeley, CA, 94720, USA
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Cambridge, CB10 1SD, UK
| | - Andrew Pask
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Bernard Pope
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Ute Roessner
- Metabolomics Australia, School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Keith Russell
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Torsten Seemann
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Andrew Treloar
- Australian National Data Service, Monash University, Malvern East , VIC, 3145, Australia
| | - Sonika Tyagi
- Australian Genome Research Facility Ltd, Parkville, VIC, 3052, Australia.,Monash Bioinformatics Platform, Monash University, Clayton, VIC, 3800, Australia
| | - Jeffrey H Christiansen
- Queensland Cyber Infrastructure Foundation and the University of Queensland Research Computing Centre, St Lucia, QLD, 4072, Australia
| | - Saravanan Dayalan
- Metabolomics Australia, School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Simon Gladman
- EMBL Australia Bioinformatics Resource, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Sandra B Hangartner
- School of Biological Sciences, Monash University, Clayton, VIC, 3800, Australia
| | - Helen L Hayden
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources (DEDJTR), Bundoora, VIC, 3083, Australia
| | - William W H Ho
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Gabriel Keeble-Gagnère
- School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia.,Agriculture Victoria, AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources (DEDJTR), Bundoora, VIC, 3083, Australia
| | - Pasi K Korhonen
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Peter Neish
- The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Priscilla R Prestes
- Faculty of Science and Engineering, Federation University Australia, Mt Helen , VIC, 3350, Australia
| | - Mark F Richardson
- Bioinformatics Core Research Group & Centre for Integrative Ecology, Deakin University, Geelong, VIC, 3220, Australia
| | - Nathan S Watson-Haigh
- School of Agriculture, Food and Wine, University of Adelaide, Glen Osmond, SA, 5064, Australia
| | - Kelly L Wyres
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Neil D Young
- Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Maria Victoria Schneider
- Melbourne Bioinformatics, The University of Melbourne, Parkville, VIC, 3010, Australia.,The University of Melbourne, Parkville, VIC, 3010, Australia
| |
Collapse
|
49
|
Gruening B, Sallou O, Moreno P, da Veiga Leprevost F, Ménager H, Søndergaard D, Röst H, Sachsenberg T, O'Connor B, Madeira F, Dominguez Del Angel V, Crusoe MR, Varma S, Blankenberg D, Jimenez RC, Perez-Riverol Y. Recommendations for the packaging and containerizing of bioinformatics software. F1000Res 2018; 7. [PMID: 31543945 DOI: 10.12688/f1000research.15140.1] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/01/2018] [Indexed: 11/20/2022] Open
Abstract
Software Containers are changing the way scientists and researchers develop, deploy and exchange scientific software. They allow labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. However, containers and software packages should be produced under certain rules and standards in order to be reusable, compatible and easy to integrate into pipelines and analysis workflows. Here, we presented a set of recommendations developed by the BioContainers Community to produce standardized bioinformatics packages and containers. These recommendations provide practical guidelines to make bioinformatics software more discoverable, reusable and transparent. They are aimed to guide developers, organisations, journals and funders to increase the quality and sustainability of research software.
Collapse
Affiliation(s)
- Bjorn Gruening
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, 79110, Germany
| | - Olivier Sallou
- Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA/INRIA) - GenOuest Platform, Université de Rennes, Rennes, France
| | - Pablo Moreno
- EMBL European Bioinformatics Institute, Cambridge, UK
| | | | - Hervé Ménager
- Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France
| | - Dan Søndergaard
- Bioinformatics Research Centre, Aarhus University, Aarhus, DK-8000, Denmark
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, Toronto, Ontario, M5S 3E1, Canada
| | - Timo Sachsenberg
- Applied Bioinformatics Group, Wilhelm Schickard Institut für Informatik, Universität Tübingen, Tübingen, D-72076, Germany
| | - Brian O'Connor
- Computational Genomics Lab, UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, USA
| | - Fábio Madeira
- EMBL European Bioinformatics Institute, Cambridge, UK
| | | | - Michael R Crusoe
- Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA
| | - Susheel Varma
- EMBL European Bioinformatics Institute, Cambridge, UK
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | | | | | | |
Collapse
|
50
|
Gruening B, Sallou O, Moreno P, da Veiga Leprevost F, Ménager H, Søndergaard D, Röst H, Sachsenberg T, O'Connor B, Madeira F, Dominguez Del Angel V, Crusoe MR, Varma S, Blankenberg D, Jimenez RC, Perez-Riverol Y. Recommendations for the packaging and containerizing of bioinformatics software. F1000Res 2018; 7. [PMID: 31543945 PMCID: PMC6738188 DOI: 10.12688/f1000research.15140.2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/18/2019] [Indexed: 11/22/2022] Open
Abstract
Software Containers are changing the way scientists and researchers develop, deploy and exchange scientific software. They allow labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. However, containers and software packages should be produced under certain rules and standards in order to be reusable, compatible and easy to integrate into pipelines and analysis workflows. Here, we presented a set of recommendations developed by the BioContainers Community to produce standardized bioinformatics packages and containers. These recommendations provide practical guidelines to make bioinformatics software more discoverable, reusable and transparent. They are aimed to guide developers, organisations, journals and funders to increase the quality and sustainability of research software.
Collapse
Affiliation(s)
- Bjorn Gruening
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, 79110, Germany
| | - Olivier Sallou
- Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA/INRIA) - GenOuest Platform, Université de Rennes, Rennes, France
| | - Pablo Moreno
- EMBL European Bioinformatics Institute, Cambridge, UK
| | | | - Hervé Ménager
- Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France
| | - Dan Søndergaard
- Bioinformatics Research Centre, Aarhus University, Aarhus, DK-8000, Denmark
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, Toronto, Ontario, M5S 3E1, Canada
| | - Timo Sachsenberg
- Applied Bioinformatics Group, Wilhelm Schickard Institut für Informatik, Universität Tübingen, Tübingen, D-72076, Germany
| | - Brian O'Connor
- Computational Genomics Lab, UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, USA
| | - Fábio Madeira
- EMBL European Bioinformatics Institute, Cambridge, UK
| | | | - Michael R Crusoe
- Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA
| | - Susheel Varma
- EMBL European Bioinformatics Institute, Cambridge, UK
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | | | | | | |
Collapse
|