1
|
Williams JJ, Tractenberg RE, Batut B, Becker EA, Brown AM, Burke ML, Busby B, Cooch NK, Dillman AA, Donovan SS, Doyle MA, van Gelder CWG, Hall CR, Hertweck KL, Jordan KL, Jungck JR, Latour AR, Lindvall JM, Lloret-Llinares M, McDowell GS, Morris R, Mourad T, Nisselle A, Ordóñez P, Paladin L, Palagi PM, Sukhai MA, Teal TK, Woodley L. An international consensus on effective, inclusive, and career-spanning short-format training in the life sciences and beyond. PLoS One 2023; 18:e0293879. [PMID: 37943810 PMCID: PMC10635508 DOI: 10.1371/journal.pone.0293879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/23/2023] [Indexed: 11/12/2023] Open
Abstract
Science, technology, engineering, mathematics, and medicine (STEMM) fields change rapidly and are increasingly interdisciplinary. Commonly, STEMM practitioners use short-format training (SFT) such as workshops and short courses for upskilling and reskilling, but unaddressed challenges limit SFT's effectiveness and inclusiveness. Education researchers, students in SFT courses, and organizations have called for research and strategies that can strengthen SFT in terms of effectiveness, inclusiveness, and accessibility across multiple dimensions. This paper describes the project that resulted in a consensus set of 14 actionable recommendations to systematically strengthen SFT. A diverse international group of 30 experts in education, accessibility, and life sciences came together from 10 countries to develop recommendations that can help strengthen SFT globally. Participants, including representation from some of the largest life science training programs globally, assembled findings in the educational sciences and encompassed the experiences of several of the largest life science SFT programs. The 14 recommendations were derived through a Delphi method, where consensus was achieved in real time as the group completed a series of meetings and tasks designed to elicit specific recommendations. Recommendations cover the breadth of SFT contexts and stakeholder groups and include actions for instructors (e.g., make equity and inclusion an ethical obligation), programs (e.g., centralize infrastructure for assessment and evaluation), as well as organizations and funders (e.g., professionalize training SFT instructors; deploy SFT to counter inequity). Recommendations are aligned with a purpose-built framework-"The Bicycle Principles"-that prioritizes evidenced-based teaching, inclusiveness, and equity, as well as the ability to scale, share, and sustain SFT. We also describe how the Bicycle Principles and recommendations are consistent with educational change theories and can overcome systemic barriers to delivering consistently effective, inclusive, and career-spanning SFT.
Collapse
Affiliation(s)
- Jason J. Williams
- DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Rochelle E. Tractenberg
- Collaborative for Research on Outcomes and Metrics, Georgetown University, Washington, DC, United States of America
| | - Bérénice Batut
- Albert-Ludwigs-University Freiburg, Freiburg, Germany
- Open Life Science, Freiburg, Germany
| | | | - Anne M. Brown
- Virginia Tech, Blacksburg, Virginia, United States of America
| | - Melissa L. Burke
- Australian BioCommons, North Melbourne, Australia
- Queensland Cyber Infrastructure Foundation, Research Computing Centre
- The University of Queensland
| | - Ben Busby
- DNAnexus, Mountain View, California, United States of America
| | | | | | | | | | | | - Christina R. Hall
- Australian BioCommons, North Melbourne, Australia
- University of Melbourne, Melbourne, Australia
| | - Kate L. Hertweck
- Chan Zuckerberg Initiative, Redwood City, California, United States of America
| | | | - John R. Jungck
- University of Delaware, Newark, DE, United States of America
| | | | | | - Marta Lloret-Llinares
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, United Kingdom
| | - Gary S. McDowell
- Lightoller LLC
- The Ronin Institute, Montclair, NJ, United States of America
- Institute for Globally Distributed Open Research and Education
| | - Rana Morris
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
| | - Teresa Mourad
- Ecological Society of America, Washington, DC, United States of America
| | - Amy Nisselle
- Murdoch Children’s Research Institute, Melbourne, Australia
- Melbourne Genomics, The University of Melbourne, Melbourne, Australia
| | - Patricia Ordóñez
- University of Maryland Baltimore County, Catonsville, Maryland, United States of America
| | - Lisanna Paladin
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
| | | | - Mahadeo A. Sukhai
- Canadian National Institute for the Blind, Toronto, Canada
- Queen’s University School of Medicine, Kingston, Canada
| | - Tracy K. Teal
- Posit, PBC, Boston, Massachusetts, United States of America
| | - Louise Woodley
- Center for Scientific Collaboration and Community Engagement, Oakland, California, United States of America
| |
Collapse
|
2
|
Schiml VC, Delogu F, Kumar P, Kunath B, Batut B, Mehta S, Johnson JE, Grüning B, Pope PB, Jagtap PD, Griffin TJ, Arntzen MØ. Integrative meta-omics in Galaxy and beyond. Environ Microbiome 2023; 18:56. [PMID: 37420292 DOI: 10.1186/s40793-023-00514-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 07/05/2023] [Indexed: 07/09/2023]
Abstract
BACKGROUND 'Omics methods have empowered scientists to tackle the complexity of microbial communities on a scale not attainable before. Individually, omics analyses can provide great insight; while combined as "meta-omics", they enhance the understanding of which organisms occupy specific metabolic niches, how they interact, and how they utilize environmental nutrients. Here we present three integrative meta-omics workflows, developed in Galaxy, for enhanced analysis and integration of metagenomics, metatranscriptomics, and metaproteomics, combined with our newly developed web-application, ViMO (Visualizer for Meta-Omics) to analyse metabolisms in complex microbial communities. RESULTS In this study, we applied the workflows on a highly efficient cellulose-degrading minimal consortium enriched from a biogas reactor to analyse the key roles of uncultured microorganisms in complex biomass degradation processes. Metagenomic analysis recovered metagenome-assembled genomes (MAGs) for several constituent populations including Hungateiclostridium thermocellum, Thermoclostridium stercorarium and multiple heterogenic strains affiliated to Coprothermobacter proteolyticus. The metagenomics workflow was developed as two modules, one standard, and one optimized for improving the MAG quality in complex samples by implementing a combination of single- and co-assembly, and dereplication after binning. The exploration of the active pathways within the recovered MAGs can be visualized in ViMO, which also provides an overview of the MAG taxonomy and quality (contamination and completeness), and information about carbohydrate-active enzymes (CAZymes), as well as KEGG annotations and pathways, with counts and abundances at both mRNA and protein level. To achieve this, the metatranscriptomic reads and metaproteomic mass-spectrometry spectra are mapped onto predicted genes from the metagenome to analyse the functional potential of MAGs, as well as the actual expressed proteins and functions of the microbiome, all visualized in ViMO. CONCLUSION Our three workflows for integrative meta-omics in combination with ViMO presents a progression in the analysis of 'omics data, particularly within Galaxy, but also beyond. The optimized metagenomics workflow allows for detailed reconstruction of microbial community consisting of MAGs with high quality, and thus improves analyses of the metabolism of the microbiome, using the metatranscriptomics and metaproteomics workflows.
Collapse
Affiliation(s)
- Valerie C Schiml
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), P.O. Box 5003, 1432, Ås, Norway
| | - Francesco Delogu
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), P.O. Box 5003, 1432, Ås, Norway
| | - Praveen Kumar
- Department of Biochemistry, Biophysics and Molecular Biology, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Benoit Kunath
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), P.O. Box 5003, 1432, Ås, Norway
| | - Bérénice Batut
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Subina Mehta
- Department of Biochemistry, Biophysics and Molecular Biology, University of Minnesota, Minneapolis, MN, 55455, USA
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Phillip B Pope
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), P.O. Box 5003, 1432, Ås, Norway
- Faculty of Biosciences, Norwegian University of Life Sciences (NMBU), P.O. Box 5003, 1432, Ås, Norway
| | - Pratik D Jagtap
- Department of Biochemistry, Biophysics and Molecular Biology, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Timothy J Griffin
- Department of Biochemistry, Biophysics and Molecular Biology, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Magnus Ø Arntzen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), P.O. Box 5003, 1432, Ås, Norway.
| |
Collapse
|
3
|
Bray S, Chilton J, Bernt M, Soranzo N, van den Beek M, Batut B, Rasche H, Čech M, Cock PJA, Grüning B, Nekrutenko A. The Planemo toolkit for developing, deploying, and executing scientific data analyses in Galaxy and beyond. Genome Res 2023; 33:261-268. [PMID: 36828587 PMCID: PMC10069471 DOI: 10.1101/gr.276963.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 01/11/2023] [Indexed: 02/26/2023]
Abstract
There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For more than a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. To streamline the process of integrating tools and constructing workflows as much as possible, we have developed Planemo, a software development kit for tool and workflow developers and Galaxy power users. Here we outline Planemo's implementation and describe its broad range of functionality for designing, testing, and executing Galaxy tools, workflows, and training material. In addition, we discuss the philosophy underlying Galaxy tool and workflow development, and how Planemo encourages the use of development best practices, such as test-driven development, by its users, including those who are not professional software developers.
Collapse
Affiliation(s)
- Simon Bray
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, 79110 Freiburg, Germany
| | - John Chilton
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Matthias Bernt
- Department of Computational Biology, Helmholtz Centre for Environmental Research GmbH-UFZ, 04318 Leipzig, Germany
| | - Nicola Soranzo
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, United Kingdom
| | - Marius van den Beek
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Bérénice Batut
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, 79110 Freiburg, Germany
| | - Helena Rasche
- Clinical Bioinformatics Group, Department of Pathology, Erasmus Medical Center, 3015 CN, Rotterdam, The Netherlands; Academie voor de Technologie van Gezondheid en Milieu, Avans Hogeschool, 4818 AJ Breda, The Netherlands
| | - Martin Čech
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Peter J A Cock
- James Hutton Institute, Invergowrie, Dundee DD2 5DA, United Kingdom
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, 79110 Freiburg, Germany
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA;
| |
Collapse
|
4
|
Afgan E, Nekrutenko A, Grüning BA, Blankenberg D, Goecks J, Schatz MC, Ostrovsky AE, Mahmoud A, Lonie AJ, Syme A, Fouilloux A, Bretaudeau A, Nekrutenko A, Kumar A, Eschenlauer AC, DeSanto AD, Guerler A, Serrano-Solano B, Batut B, Grüning BA, Langhorst BW, Carr B, Raubenolt BA, Hyde CJ, Bromhead CJ, Barnett CB, Royaux C, Gallardo C, Blankenberg D, Fornika DJ, Baker D, Bouvier D, Clements D, de Lima Morais DA, Tabernero DL, Lariviere D, Nasr E, Afgan E, Zambelli F, Heyl F, Psomopoulos F, Coppens F, Price GR, Cuccuru G, Corguillé GL, Von Kuster G, Akbulut GG, Rasche H, Hotz HR, Eguinoa I, Makunin I, Ranawaka IJ, Taylor JP, Joshi J, Hillman-Jackson J, Goecks J, Chilton JM, Kamali K, Suderman K, Poterlowicz K, Yvan LB, Lopez-Delisle L, Sargent L, Bassetti ME, Tangaro MA, van den Beek M, Čech M, Bernt M, Fahrner M, Tekman M, Föll MC, Schatz MC, Crusoe MR, Roncoroni M, Kucher N, Coraor N, Stoler N, Rhodes N, Soranzo N, Pinter N, Goonasekera NA, Moreno PA, Videm P, Melanie P, Mandreoli P, Jagtap PD, Gu Q, Weber RJM, Lazarus R, Vorderman RHP, Hiltemann S, Golitsynskiy S, Garg S, Bray SA, Gladman SL, Leo S, Mehta SP, Griffin TJ, Jalili V, Yves V, Wen V, Nagampalli VK, Bacon WA, de Koning W, Maier W, Briggs PJ. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 2022; 50:W345-W351. [PMID: 35446428 PMCID: PMC9252830 DOI: 10.1093/nar/gkac247] [Citation(s) in RCA: 223] [Impact Index Per Article: 111.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/17/2022] [Accepted: 03/30/2022] [Indexed: 01/19/2023] Open
Abstract
Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.
Collapse
|
5
|
Abstract
Despite intense research on genome architecture since the 2000s, genome-size evolution in prokaryotes has remained puzzling. Using a phylogenetic approach, a new study found that increased mutation rate is associated with gene loss and reduced genome size in prokaryotes.
Collapse
Affiliation(s)
- Gabriel A B Marais
- Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France.
| | - Bérénice Batut
- Albert-Ludwigs-University Freiburg, Department of Computer Science, 79110 Freiburg, Germany
| | - Vincent Daubin
- Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France
| |
Collapse
|
6
|
Serrano-Solano B, Föll MC, Gallardo-Alba C, Erxleben A, Rasche H, Hiltemann S, Fahrner M, Dunning MJ, Schulz MH, Scholtz B, Clements D, Nekrutenko A, Batut B, Grüning BA. Fostering accessible online education using Galaxy as an e-learning platform. PLoS Comput Biol 2021; 17:e1008923. [PMID: 33983944 PMCID: PMC8118283 DOI: 10.1371/journal.pcbi.1008923] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The COVID-19 pandemic is shifting teaching to an online setting all over the world. The Galaxy framework facilitates the online learning process and makes it accessible by providing a library of high-quality community-curated training materials, enabling easy access to data and tools, and facilitates sharing achievements and progress between students and instructors. By combining Galaxy with robust communication channels, effective instruction can be designed inclusively, regardless of the students' environments.
Collapse
Affiliation(s)
- Beatriz Serrano-Solano
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Melanie C. Föll
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, United States of America
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Cristóbal Gallardo-Alba
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Anika Erxleben
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Helena Rasche
- Avans Hogeschool, Breda, the Netherlands
- Erasmus Medical Center, Clinical Bioinformatics Group, Department of Pathology, Rotterdam, the Netherlands
| | - Saskia Hiltemann
- Erasmus Medical Center, Clinical Bioinformatics Group, Department of Pathology, Rotterdam, the Netherlands
| | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Faculty of Biology, University of Freiburg, Freiburg, Germany
- Spemann Graduate School of Biology and Medicine, University of Freiburg, Freiburg, Germany
| | - Mark J. Dunning
- Faculty of Medicine, Dentistry and Health, University of Sheffield, Sheffield, United Kingdom
| | - Marcel H. Schulz
- Institute for Cardiovascular Regeneration, Goethe University, Frankfurt am Main, Germany
| | - Beáta Scholtz
- University of Debrecen, Faculty of Medicine, Dept. of Biochemistry and Molecular Biology, Debrecen, Hungary
| | - Dave Clements
- Johns Hopkins University, Baltimore Maryland, United States of America
| | - Anton Nekrutenko
- Center for Comparative Genomics and Bioinformatics, Penn State University, State College, Pennsylvania, United States of America
| | - Bérénice Batut
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Björn A. Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| |
Collapse
|
7
|
Mehta S, Crane M, Leith E, Batut B, Hiltemann S, Arntzen MØ, Kunath BJ, Pope PB, Delogu F, Sajulga R, Kumar P, Johnson JE, Griffin TJ, Jagtap PD. ASaiM-MT: a validated and optimized ASaiM workflow for metatranscriptomics analysis within Galaxy framework. F1000Res 2021; 10:103. [PMID: 34484688 PMCID: PMC8383124 DOI: 10.12688/f1000research.28608.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/12/2021] [Indexed: 12/13/2022] Open
Abstract
The Earth Microbiome Project (EMP) aided in understanding the role of microbial communities and the influence of collective genetic material (the 'microbiome') and microbial diversity patterns across the habitats of our planet. With the evolution of new sequencing technologies, researchers can now investigate the microbiome and map its influence on the environment and human health. Advances in bioinformatics methods for next-generation sequencing (NGS) data analysis have helped researchers to gain an in-depth knowledge about the taxonomic and genetic composition of microbial communities. Metagenomic-based methods have been the most commonly used approaches for microbiome analysis; however, it primarily extracts information about taxonomic composition and genetic potential of the microbiome under study, lacking quantification of the gene products (RNA and proteins). On the other hand, metatranscriptomics, the study of a microbial community's RNA expression, can reveal the dynamic gene expression of individual microbial populations and the community as a whole, ultimately providing information about the active pathways in the microbiome. In order to address the analysis of NGS data, the ASaiM analysis framework was previously developed and made available via the Galaxy platform. Although developed for both metagenomics and metatranscriptomics, the original publication demonstrated the use of ASaiM only for metagenomics, while thorough testing for metatranscriptomics data was lacking. In the current study, we have focused on validating and optimizing the tools within ASaiM for metatranscriptomics data. As a result, we deliver a robust workflow that will enable researchers to understand dynamic functional response of the microbiome in a wide variety of metatranscriptomics studies. This improved and optimized ASaiM-metatranscriptomics (ASaiM-MT) workflow is publicly available via the ASaiM framework, documented and supported with training material so that users can interrogate and characterize metatranscriptomic data, as part of larger meta-omic studies of microbiomes.
Collapse
Affiliation(s)
- Subina Mehta
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Marie Crane
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Emma Leith
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Bérénice Batut
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, Germany
| | - Saskia Hiltemann
- Department of Pathology, Erasmus Medical Center, Rotterdam, The Netherlands
| | | | | | | | | | - Ray Sajulga
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Praveen Kumar
- University of Minnesota, Twin Cities, MN, 55455, USA
| | | | | | | |
Collapse
|
8
|
Mehta S, Crane M, Leith E, Batut B, Hiltemann S, Arntzen MØ, Kunath BJ, Pope PB, Delogu F, Sajulga R, Kumar P, Johnson JE, Griffin TJ, Jagtap PD. ASaiM-MT: a validated and optimized ASaiM workflow for metatranscriptomics analysis within Galaxy framework. F1000Res 2021; 10:103. [PMID: 34484688 PMCID: PMC8383124 DOI: 10.12688/f1000research.28608.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/03/2021] [Indexed: 12/13/2022] Open
Abstract
The Human Microbiome Project (HMP) aided in understanding the role of microbial communities and the influence of collective genetic material (the 'microbiome') in human health and disease. With the evolution of new sequencing technologies, researchers can now investigate the microbiome and map its influence on human health. Advances in bioinformatics methods for next-generation sequencing (NGS) data analysis have helped researchers to gain an in-depth knowledge about the taxonomic and genetic composition of microbial communities. Metagenomic-based methods have been the most commonly used approaches for microbiome analysis; however, it primarily extracts information about taxonomic composition and genetic potential of the microbiome under study, lacking quantification of the gene products (RNA and proteins). Conversely, metatranscriptomics, the study of a microbial community's RNA expression, can reveal the dynamic gene expression of individual microbial populations and the community as a whole, ultimately providing information about the active pathways in the microbiome. In order to address the analysis of NGS data, the ASaiM analysis framework was previously developed and made available via the Galaxy platform. Although developed for both metagenomics and metatranscriptomics, the original publication demonstrated the use of ASaiM only for metagenomics, while thorough testing for metatranscriptomics data was lacking. In the current study, we have focused on validating and optimizing the tools within ASaiM for metatranscriptomics data. As a result, we deliver a robust workflow that will enable researchers to understand dynamic functional response of the microbiome in a wide variety of metatranscriptomics studies. This improved and optimized ASaiM-metatranscriptomics (ASaiM-MT) workflow is publicly available via the ASaiM framework, documented and supported with training material so that users can interrogate and characterize metatranscriptomic data, as part of larger meta-omic studies of microbiomes.
Collapse
Affiliation(s)
- Subina Mehta
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Marie Crane
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Emma Leith
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Bérénice Batut
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, Germany
| | - Saskia Hiltemann
- Department of Pathology, Erasmus Medical Center, Rotterdam, The Netherlands
| | | | | | | | | | - Ray Sajulga
- University of Minnesota, Twin Cities, MN, 55455, USA
| | - Praveen Kumar
- University of Minnesota, Twin Cities, MN, 55455, USA
| | | | | | | |
Collapse
|
9
|
Abstract
A complete RNA-Seq analysis involves the use of several different tools, with substantial software and computational requirements. The Galaxy platform simplifies the execution of such bioinformatics analyses by embedding the needed tools in its web interface, while also providing reproducibility. Here, we describe how to perform a reference-based RNA-Seq analysis using Galaxy, from data upload to visualization and functional enrichment analysis of differentially expressed genes.
Collapse
Affiliation(s)
- Bérénice Batut
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Marius van den Beek
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Maria A Doyle
- Research Computing Facility, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, Australia
| | | |
Collapse
|
10
|
Tekman M, Batut B, Ostrovsky A, Antoniewski C, Clements D, Ramirez F, Etherington GJ, Hotz HR, Scholtalbers J, Manning JR, Bellenger L, Doyle MA, Heydarian M, Huang N, Soranzo N, Moreno P, Mautner S, Papatheodorou I, Nekrutenko A, Taylor J, Blankenberg D, Backofen R, Grüning B. A single-cell RNA-sequencing training and analysis suite using the Galaxy framework. Gigascience 2020; 9:5931798. [PMID: 33079170 PMCID: PMC7574357 DOI: 10.1093/gigascience/giaa102] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/30/2020] [Indexed: 11/25/2022] Open
Abstract
Background The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more towards the large computing requirements and the statistically driven methods needed to process and understand these ever-growing datasets. Results Here we outline several Galaxy workflows and learning resources for single-cell RNA-sequencing, with the aim of providing a comprehensive analysis environment paired with a thorough user learning experience that bridges the knowledge gap between the computational methods and the underlying cell biology. The Galaxy reproducible bioinformatics framework provides tools, workflows, and trainings that not only enable users to perform 1-click 10x preprocessing but also empower them to demultiplex raw sequencing from custom tagged and full-length sequencing protocols. The downstream analysis supports a range of high-quality interoperable suites separated into common stages of analysis: inspection, filtering, normalization, confounder removal, and clustering. The teaching resources cover concepts from computer science to cell biology. Access to all resources is provided at the singlecell.usegalaxy.eu portal. Conclusions The reproducible and training-oriented Galaxy framework provides a sustainable high-performance computing environment for users to run flexible analyses on both 10x and alternative platforms. The tutorials from the Galaxy Training Network along with the frequent training workshops hosted by the Galaxy community provide a means for users to learn, publish, and teach single-cell RNA-sequencing analysis.
Collapse
Affiliation(s)
- Mehmet Tekman
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Bérénice Batut
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Alexander Ostrovsky
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Christophe Antoniewski
- ARTbio, Sorbonne Université, CNRS FR 3631, Inserm US 037, Paris, France.,Institut de Biologie Paris Seine, 9 Quai Saint-Bernard Université Pierre et Marie Curie, Campus Jussieu, Bâtiments A-B-C, 75005 Paris, France
| | - Dave Clements
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Fidel Ramirez
- Boehringer Ingelheim International GmbH, Binger Strasse 173, 55216 Ingelheim am Rhein, Biberach, Germany
| | | | - Hans-Rudolf Hotz
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Maulbeerstrasse 66, 4058 Basel, Switzerland
| | - Jelle Scholtalbers
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Jonathan R Manning
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lea Bellenger
- ARTbio, Sorbonne Université, CNRS FR 3631, Inserm US 037, Paris, France
| | - Maria A Doyle
- Research Computing Facility, Peter MacCallum Cancer Centre, Melbourne, 305 Grattan Street, Victoria 3000, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Victoria 3010, Australia
| | - Mohammad Heydarian
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Ni Huang
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Nicola Soranzo
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Stefan Mautner
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - James Taylor
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Avenue, NB21 Cleveland, OH 44195, USA
| | - Rolf Backofen
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Björn Grüning
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| |
Collapse
|
11
|
Garcia L, Batut B, Burke ML, Kuzak M, Psomopoulos F, Arcila R, Attwood TK, Beard N, Carvalho-Silva D, Dimopoulos AC, del Angel VD, Dumontier M, Gurwitz KT, Krause R, McQuilton P, Le Pera L, Morgan SL, Rauste P, Via A, Kahlem P, Rustici G, van Gelder CWG, Palagi PM. Ten simple rules for making training materials FAIR. PLoS Comput Biol 2020; 16:e1007854. [PMID: 32437350 PMCID: PMC7241697 DOI: 10.1371/journal.pcbi.1007854] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Everything we do today is becoming more and more reliant on the use of computers. The field of biology is no exception; but most biologists receive little or no formal preparation for the increasingly computational aspects of their discipline. In consequence, informal training courses are often needed to plug the gaps; and the demand for such training is growing worldwide. To meet this demand, some training programs are being expanded, and new ones are being developed. Key to both scenarios is the creation of new course materials. Rather than starting from scratch, however, it's sometimes possible to repurpose materials that already exist. Yet finding suitable materials online can be difficult: They're often widely scattered across the internet or hidden in their home institutions, with no systematic way to find them. This is a common problem for all digital objects. The scientific community has attempted to address this issue by developing a set of rules (which have been called the Findable, Accessible, Interoperable and Reusable [FAIR] principles) to make such objects more findable and reusable. Here, we show how to apply these rules to help make training materials easier to find, (re)use, and adapt, for the benefit of all.
Collapse
Affiliation(s)
- Leyla Garcia
- ZB MED Information Centre for Life Sciences, Cologne, Germany
| | - Bérénice Batut
- Bioinformatics group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Melissa L. Burke
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Mateusz Kuzak
- Netherlands eScience Center, Amsterdam, the Netherlands
- Dutch Techcentre for Life Sciences, Utrecht, the Netherlands
| | - Fotis Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Ricardo Arcila
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Teresa K. Attwood
- Department of Computer Science, The University of Manchester, Manchester, United Kingdom
| | - Niall Beard
- Department of Computer Science, The University of Manchester, Manchester, United Kingdom
| | - Denise Carvalho-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- Open Targets, Wellcome Genome Campus, Hinxton, United Kingdom
| | | | | | - Michel Dumontier
- Institute of Data Science, Maastricht University, Maastricht, the Netherlands
| | | | - Roland Krause
- University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Peter McQuilton
- Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, United Kingdom
| | - Loredana Le Pera
- IBIOM-CNR, Bari, Italy
- IBPM-CNR, Sapienza Università di Roma, Roma, Italy
| | - Sarah L. Morgan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Päivi Rauste
- CSC—IT Center for Science, Keilaranta, Espoo, Finland
| | - Allegra Via
- IBPM-CNR, Sapienza Università di Roma, Roma, Italy
| | - Pascal Kahlem
- Scientific Network Management S.L., Barcelona, Spain
| | | | | | - Patricia M. Palagi
- SIB Training group, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
12
|
Wibberg D, Batut B, Belmann P, Blom J, Glöckner FO, Grüning B, Hoffmann N, Kleinbölting N, Rahn R, Rey M, Scholz U, Sharan M, Tauch A, Trojahn U, Usadel B, Kohlbacher O. The de.NBI / ELIXIR-DE training platform - Bioinformatics training in Germany and across Europe within ELIXIR. F1000Res 2019; 8. [PMID: 33163154 PMCID: PMC7607484 DOI: 10.12688/f1000research.20244.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/04/2020] [Indexed: 12/25/2022] Open
Abstract
The German Network for Bioinformatics Infrastructure (de.NBI) is a national and academic infrastructure funded by the German Federal Ministry of Education and Research (BMBF). The de.NBI provides (i) service, (ii) training, and (iii) cloud computing to users in life sciences research and biomedicine in Germany and Europe and (iv) fosters the cooperation of the German bioinformatics community with international network structures. The de.NBI members also run the German node (ELIXIR-DE) within the European ELIXIR infrastructure. The de.NBI / ELIXIR-DE training platform, also known as special interest group 3 (SIG 3) ‘Training & Education’, coordinates the bioinformatics training of de.NBI and the German ELIXIR node. The network provides a high-quality, coherent, timely, and impactful training program across its eight service centers. Life scientists learn how to handle and analyze biological big data more effectively by applying tools, standards and compute services provided by de.NBI. Since 2015, more than 300 training courses were carried out with about 6,000 participants and these courses received recommendation rates of almost 90% (status as of July 2020). In addition to face-to-face training courses, online training was introduced on the de.NBI website in 2016 and guidelines for the preparation of e-learning material were established in 2018. In 2016, ELIXIR-DE joined the ELIXIR training platform. Here, the de.NBI / ELIXIR-DE training platform collaborates with ELIXIR in training activities, advertising training courses via TeSS and discussions on the exchange of data for training events essential for quality assessment on both the technical and administrative levels. The de.NBI training program trained thousands of scientists from Germany and beyond in many different areas of bioinformatics.
Collapse
Affiliation(s)
- Daniel Wibberg
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, 33501, Germany
| | - Bérénice Batut
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, 79110, Germany
| | - Peter Belmann
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, 33501, Germany
| | - Jochen Blom
- Bioinformatics and Systems Biology, Justus-Liebig-University Giessen, Giessen, 35392, Germany
| | - Frank Oliver Glöckner
- Alfred-Wegener-Institut - Helmholtz Zentrum für Polar- und Meeresforschung and Jacobs University Bremen, Campus Ring 1, Bremen, 28759, Germany
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, 79110, Germany
| | - Nils Hoffmann
- Leibniz-Institut für Analytische Wissenschaften - ISAS - e.V., Dortmund, 44227, Germany
| | - Nils Kleinbölting
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, 33501, Germany
| | - René Rahn
- Algorithmic Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Takustraße 9, Berlin, 14195, Germany
| | - Maja Rey
- Scientific Databases and Visualization Group, Heidelberg Institute for Theoretical Studies (HITS) gGmbH, Schloss-Wolfsbrunnenweg 35, Heidelberg, 69118, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
| | - Malvika Sharan
- The Heidelberg Center for Human Bioinformatics (HD-HuB), European Molecular Biology Laboratory, Meyerhofstrasse 1, Heidelberg, 69117, Germany
| | - Andreas Tauch
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, 33501, Germany
| | - Ulrike Trojahn
- The Heidelberg Center for Human Bioinformatics (HD-HuB), European Molecular Biology Laboratory, Meyerhofstrasse 1, Heidelberg, 69117, Germany
| | - Björn Usadel
- IBG-2 Plant Sciences, Forschungszentrum Jülich, Jülich, 52428, Germany
| | - Oliver Kohlbacher
- Applied Bioinformatics, Department of Computer Science, University of Tübingen, Tübingen, 72076, Germany.,Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, 72076, Germany.,Translational Bioinformatics, University Hospital Tubingen, Tübingen, 72076, Germany.,Biomolecular Interactions, Max Planck Institute for Development Biology, Tübingen, 72076, Germany
| |
Collapse
|
13
|
Batut B, Hiltemann S, Bagnacani A, Baker D, Bhardwaj V, Blank C, Bretaudeau A, Brillet-Guéguen L, Čech M, Chilton J, Clements D, Doppelt-Azeroual O, Erxleben A, Freeberg MA, Gladman S, Hoogstrate Y, Hotz HR, Houwaart T, Jagtap P, Larivière D, Le Corguillé G, Manke T, Mareuil F, Ramírez F, Ryan D, Sigloch FC, Soranzo N, Wolff J, Videm P, Wolfien M, Wubuli A, Yusuf D, Taylor J, Backofen R, Nekrutenko A, Grüning B. Community-Driven Data Analysis Training for Biology. Cell Syst 2019; 6:752-758.e1. [PMID: 29953864 DOI: 10.1016/j.cels.2018.05.012] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 03/10/2018] [Accepted: 05/18/2018] [Indexed: 01/12/2023]
Abstract
The primary problem with the explosion of biomedical datasets is not the data, not computational resources, and not the required storage space, but the general lack of trained and skilled researchers to manipulate and analyze these data. Eliminating this problem requires development of comprehensive educational resources. Here we present a community-driven framework that enables modern, interactive teaching of data analytics in life sciences and facilitates the development of training materials. The key feature of our system is that it is not a static but a continuously improved collection of tutorials. By coupling tutorials with a web-based analysis framework, biomedical researchers can learn by performing computation themselves through a web browser without the need to install software or search for example datasets. Our ultimate goal is to expand the breadth of training materials to include fundamental statistical and data science topics and to precipitate a complete re-engineering of undergraduate and graduate curricula in life sciences. This project is accessible at https://training.galaxyproject.org.
Collapse
Affiliation(s)
- Bérénice Batut
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany
| | - Saskia Hiltemann
- Erasmus Medical Centre, Wytemaweg 80, Rotterdam 3015 CN, the Netherlands
| | - Andrea Bagnacani
- Department of Systems Biology and Bioinformatics, University of Rostock, Ulmenstraße 69, Rostock 18051, Germany
| | - Dannon Baker
- Johns Hopkins University, 3400 N Charles Street, Mudd Hall 144, Baltimore 21218, MD, USA
| | - Vivek Bhardwaj
- Department of Biology, Albert-Ludwigs-University, Schänzlestraße 1, Freiburg 79104, Germany
| | - Clemens Blank
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany
| | - Anthony Bretaudeau
- INRA, UMR IGEPP, BIPAA/GenOuest, INRIA/Irisa - Campus de Beaulieu, 35042 RENNES Cedex, France
| | | | - Martin Čech
- The Pennsylvania State University, 505 Wartik Lab, University Park, PA 16802, USA
| | - John Chilton
- The Pennsylvania State University, 505 Wartik Lab, University Park, PA 16802, USA
| | - Dave Clements
- Johns Hopkins University, 3400 N Charles Street, Mudd Hall 144, Baltimore 21218, MD, USA
| | - Olivia Doppelt-Azeroual
- Bioinformatics and Biostatistics HUB, Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), Institut Pasteur, 25-28 Rue du Docteur Roux, 75015 Paris, France
| | - Anika Erxleben
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany
| | | | - Simon Gladman
- Melbourne Bioinformatics, The University of Melbourne, Melbourne, VIC 3010, Australia
| | - Youri Hoogstrate
- Erasmus Medical Centre, Wytemaweg 80, Rotterdam 3015 CN, the Netherlands
| | - Hans-Rudolf Hotz
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, Basel 4058, Switzerland
| | - Torsten Houwaart
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany
| | - Pratik Jagtap
- Biochemistry, Molecular Biology and Biophysics, University of Minnesota Medical School, 420 Delaware Street SE, Minneapolis, MN 55455, USA
| | - Delphine Larivière
- The Pennsylvania State University, 505 Wartik Lab, University Park, PA 16802, USA
| | - Gildas Le Corguillé
- PMC, CNRS, FR2424, ABiMS, Station Biologique, Place Georges Teissier, Roscoff 29680, France
| | - Thomas Manke
- Max Planck Institute of Immunobiology and Epigenetics, Stübeweg 51, Freiburg 79108, Germany
| | - Fabien Mareuil
- Bioinformatics and Biostatistics HUB, Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), Institut Pasteur, 25-28 Rue du Docteur Roux, 75015 Paris, France
| | - Fidel Ramírez
- Max Planck Institute of Immunobiology and Epigenetics, Stübeweg 51, Freiburg 79108, Germany
| | - Devon Ryan
- Max Planck Institute of Immunobiology and Epigenetics, Stübeweg 51, Freiburg 79108, Germany
| | - Florian Christoph Sigloch
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany
| | - Nicola Soranzo
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Joachim Wolff
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany
| | - Pavankumar Videm
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany
| | - Markus Wolfien
- Department of Systems Biology and Bioinformatics, University of Rostock, Ulmenstraße 69, Rostock 18051, Germany
| | - Aisanjiang Wubuli
- Leibniz Institute for Farm Animal Biology (FBN), Wilhelm-Stahl-Allee 2, Dummerstorf 18196, Germany
| | - Dilmurat Yusuf
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany
| | | | - James Taylor
- Johns Hopkins University, 3400 N Charles Street, Mudd Hall 144, Baltimore 21218, MD, USA
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany.
| | - Anton Nekrutenko
- The Pennsylvania State University, 505 Wartik Lab, University Park, PA 16802, USA.
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany.
| |
Collapse
|
14
|
Fallmann J, Videm P, Bagnacani A, Batut B, Doyle MA, Klingstrom T, Eggenhofer F, Stadler PF, Backofen R, Grüning B. The RNA workbench 2.0: next generation RNA data analysis. Nucleic Acids Res 2019; 47:W511-W515. [PMID: 31073612 PMCID: PMC6602469 DOI: 10.1093/nar/gkz353] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 04/11/2019] [Accepted: 04/29/2019] [Indexed: 12/30/2022] Open
Abstract
RNA has become one of the major research topics in molecular biology. As a central player in key processes regulating gene expression, RNA is in the focus of many efforts to decipher the pathways that govern the transition of genetic information to a fully functional cell. As more and more researchers join this endeavour, there is a rapidly growing demand for comprehensive collections of tools that cover the diverse layers of RNA-related research. However, increasing amounts of data, from diverse types of experiments, addressing different aspects of biological questions need to be consolidated and integrated into a single framework. Only then is it possible to connect findings from e.g. RNA-Seq experiments and methods for e.g. target predictions. To address these needs, we present the RNA Workbench 2.0 , an updated online resource for RNA related analysis. With the RNA Workbench we created a comprehensive set of analysis tools and workflows that enables researchers to analyze their data without the need for sophisticated command-line skills. This update takes the established framework to the next level, providing not only a containerized infrastructure for analysis, but also a ready-to-use platform for hands-on training, analysis, data exploration, and visualization. The new framework is available at https://rna.usegalaxy.eu , and login is free and open to all users. The containerized version can be found at https://github.com/bgruening/galaxy-rna-workbench.
Collapse
Affiliation(s)
- Jörg Fallmann
- Bioinformatics Group, Department of Computer Science; Leipzig University, Härtelstraße 16-18, D-04107 Leipzig
| | - Pavankumar Videm
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany
| | - Andrea Bagnacani
- Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Ulmenstr. 69, 18057 Rostock, Germany
| | - Bérénice Batut
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany
| | - Maria A Doyle
- Research Computing Facility, Peter MacCallum Cancer Centre, Melbourne, Victoria 3000, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Victoria 3010, Australia
| | - Tomas Klingstrom
- SLU-Global Bioinformatics Centre, Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science; Leipzig University, Härtelstraße 16-18, D-04107 Leipzig.,Interdisciplinary Center of Bioinformatics; German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig; Competence Center for Scalable Data Services and Solutions; and Leipzig Research Center for Civilization Diseases, Leipzig University, Härtelstraße 16-18, D-04107 Leipzig.,Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig Inst. f. Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria; Facultad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Colombia Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany.,Signalling Research Centres BIOSS and CIBSS, Albert-Ludwigs-University Freiburg, Schänzlestr. 18, Freiburg 79104, Germany
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, Freiburg 79110, Germany.,Center for Biological Systems Analysis (ZBSA), University of Freiburg, Habsburgerstr. 49, 79104 Freiburg, Germany
| |
Collapse
|
15
|
Grüning BA, Fallmann J, Yusuf D, Will S, Erxleben A, Eggenhofer F, Houwaart T, Batut B, Videm P, Bagnacani A, Wolfien M, Lott SC, Hoogstrate Y, Hess WR, Wolkenhauer O, Hoffmann S, Akalin A, Ohler U, Stadler PF, Backofen R. The RNA workbench: best practices for RNA and high-throughput sequencing bioinformatics in Galaxy. Nucleic Acids Res 2019; 45:W560-W566. [PMID: 28582575 PMCID: PMC5570170 DOI: 10.1093/nar/gkx409] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Accepted: 05/31/2017] [Indexed: 01/23/2023] Open
Abstract
RNA-based regulation has become a major research topic in molecular biology. The analysis of epigenetic and expression data is therefore incomplete if RNA-based regulation is not taken into account. Thus, it is increasingly important but not yet standard to combine RNA-centric data and analysis tools with other types of experimental data such as RNA-seq or ChIP-seq. Here, we present the RNA workbench, a comprehensive set of analysis tools and consolidated workflows that enable the researcher to combine these two worlds. Based on the Galaxy framework the workbench guarantees simple access, easy extension, flexible adaption to personal and security needs, and sophisticated analyses that are independent of command-line knowledge. Currently, it includes more than 50 bioinformatics tools that are dedicated to different research areas of RNA biology including RNA structure analysis, RNA alignment, RNA annotation, RNA-protein interaction, ribosome profiling, RNA-seq analysis and RNA target prediction. The workbench is developed and maintained by experts in RNA bioinformatics and the Galaxy framework. Together with the growing community evolving around this workbench, we are committed to keep the workbench up-to-date for future standards and needs, providing researchers with a reliable and robust framework for RNA data analysis. AVAILABILITY The RNA workbench is available at https://github.com/bgruening/galaxy-rna-workbench.
Collapse
Affiliation(s)
- Björn A Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany.,Center for Biological Systems Analysis (ZBSA), University of Freiburg, Habsburgerstr. 49, D-79104 Freiburg, Germany
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107 Leipzig, Germany
| | - Dilmurat Yusuf
- Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Robert-Rössle-Str. 10, D-13125, Berlin, Germany
| | - Sebastian Will
- Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Vienna, Austria
| | - Anika Erxleben
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
| | - Torsten Houwaart
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
| | - Bérénice Batut
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
| | - Pavankumar Videm
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
| | - Andrea Bagnacani
- Department of Systems Biology and Bioinformatics, University of Rostock, Ulmenstr. 69, D-18051 Rostock, Germany
| | - Markus Wolfien
- Department of Systems Biology and Bioinformatics, University of Rostock, Ulmenstr. 69, D-18051 Rostock, Germany
| | - Steffen C Lott
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Schänzlestr. 1, D-79104 Freiburg, Germany
| | - Youri Hoogstrate
- Department of Urology, Erasmus University Medical Center, Wytemaweg 80, 3015 CN Rotterdam, Netherlands
| | - Wolfgang R Hess
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Schänzlestr. 1, D-79104 Freiburg, Germany
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, University of Rostock, Ulmenstr. 69, D-18051 Rostock, Germany
| | - Steve Hoffmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107 Leipzig, Germany
| | - Altuna Akalin
- Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Robert-Rössle-Str. 10, D-13125, Berlin, Germany
| | - Uwe Ohler
- Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Robert-Rössle-Str. 10, D-13125, Berlin, Germany.,Departments of Biology and Computer Science, Humboldt University, Unter den Linden 6, D-10099 Berlin
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, D-04107 Leipzig, Germany.,Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Vienna, Austria.,Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, D-04103 Leipzig, Germany.,Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany.,Center for Biological Systems Analysis (ZBSA), University of Freiburg, Habsburgerstr. 49, D-79104 Freiburg, Germany.,BIOSS Centre for Biological Signaling Studies, University of Freiburg, Schänzlestr. 18, D-79104 Freiburg, Germany
| |
Collapse
|
16
|
Afgan E, Baker D, Batut B, van den Beek M, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Grüning BA, Guerler A, Hillman-Jackson J, Hiltemann S, Jalili V, Rasche H, Soranzo N, Goecks J, Taylor J, Nekrutenko A, Blankenberg D. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 2018; 46:W537-W544. [PMID: 29790989 PMCID: PMC6030816 DOI: 10.1093/nar/gky379] [Citation(s) in RCA: 2148] [Impact Index Per Article: 358.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 04/25/2018] [Accepted: 05/02/2018] [Indexed: 02/06/2023] Open
Abstract
Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.
Collapse
Affiliation(s)
- Enis Afgan
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Dannon Baker
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Bérénice Batut
- Department of Computer Science, Albert-Ludwigs-University, Freiburg, Freiburg, Germany
| | | | - Dave Bouvier
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Martin Čech
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - John Chilton
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Dave Clements
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Nate Coraor
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Björn A Grüning
- Department of Computer Science, Albert-Ludwigs-University, Freiburg, Freiburg, Germany
- Center for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany
| | - Aysam Guerler
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer Hillman-Jackson
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Saskia Hiltemann
- Department of Pathology, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Vahid Jalili
- Department of Biomedical Engineering, Oregon Health and Science University, OR, USA
| | - Helena Rasche
- Department of Computer Science, Albert-Ludwigs-University, Freiburg, Freiburg, Germany
| | | | - Jeremy Goecks
- Department of Biomedical Engineering, Oregon Health and Science University, OR, USA
| | - James Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| |
Collapse
|
17
|
Batut B, Gravouil K, Defois C, Hiltemann S, Brugère JF, Peyretaillade E, Peyret P. ASaiM: a Galaxy-based framework to analyze microbiota data. Gigascience 2018; 7:5001424. [PMID: 29790941 PMCID: PMC6007547 DOI: 10.1093/gigascience/giy057] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Accepted: 05/10/2018] [Indexed: 12/24/2022] Open
Abstract
Background New generations of sequencing platforms coupled to numerous bioinformatics tools have led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. Findings We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore, and visualize microbiota information from raw metataxonomic, metagenomic, or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io). Conclusions Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation, and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible, and shareable.
Collapse
Affiliation(s)
- Bérénice Batut
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany
| | - Kévin Gravouil
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Université Clermont Auvergne, INRA, MEDIS, 63000 Clermont-Ferrand, France
- Université Clermont Auvergne, CNRS, LMGE, 63000 Clermont-Ferrand, France
- Université Clermont Auvergne, CNRS, LIMOS, 63000 Clermont-Ferrand, France
| | - Clémence Defois
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Université Clermont Auvergne, INRA, MEDIS, 63000 Clermont-Ferrand, France
| | - Saskia Hiltemann
- Department of Bioinformatics, Erasmus University Medical Center, Rotterdam, 3015 CE, Netherlands
| | - Jean-François Brugère
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
| | - Eric Peyretaillade
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Université Clermont Auvergne, CNRS, LMGE, 63000 Clermont-Ferrand, France
| | - Pierre Peyret
- Université Clermont Auvergne, EA 4678 CIDAM, 63000 Clermont-Ferrand, France (previous address)
- Université Clermont Auvergne, INRA, MEDIS, 63000 Clermont-Ferrand, France
| |
Collapse
|
18
|
Defois C, Ratel J, Denis S, Batut B, Beugnot R, Peyretaillade E, Engel E, Peyret P. Environmental Pollutant Benzo[ a]Pyrene Impacts the Volatile Metabolome and Transcriptome of the Human Gut Microbiota. Front Microbiol 2017; 8:1562. [PMID: 28861070 PMCID: PMC5559432 DOI: 10.3389/fmicb.2017.01562] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 08/02/2017] [Indexed: 01/23/2023] Open
Abstract
Benzo[a]pyrene (B[a]P) is a ubiquitous, persistent, and carcinogenic pollutant that belongs to the large family of polycyclic aromatic hydrocarbons. Population exposure primarily occurs via contaminated food products, which introduces the pollutant to the digestive tract. Although the metabolism of B[a]P by host cells is well known, its impacts on the human gut microbiota, which plays a key role in health and disease, remain unexplored. We performed an in vitro assay using 16S barcoding, metatranscriptomics and volatile metabolomics to study the impact of B[a]P on two distinct human fecal microbiota. B[a]P exposure did not induce a significant change in the microbial structure; however, it altered the microbial volatolome in a dose-dependent manner. The transcript levels related to several metabolic pathways, such as vitamin and cofactor metabolism, cell wall compound metabolism, DNA repair and replication systems, and aromatic compound metabolism, were upregulated, whereas the transcript levels related to the glycolysis-gluconeogenesis pathway and bacterial chemotaxis toward simple carbohydrates were downregulated. These primary findings show that food pollutants, such as B[a]P, alter human gut microbiota activity. The observed shift in the volatolome demonstrates that B[a]P induces a specific deviation in the microbial metabolism.
Collapse
Affiliation(s)
- Clémence Defois
- MEDIS, Institut National de la Recherche Agronomique, Université Clermont AuvergneClermont-Ferrand, France
| | - Jérémy Ratel
- UR370 QuaPA, MASS Team, Institut National de la Recherche AgronomiqueSaint-Genes-Champanelle, France
| | - Sylvain Denis
- MEDIS, Institut National de la Recherche Agronomique, Université Clermont AuvergneClermont-Ferrand, France
| | - Bérénice Batut
- MEDIS, Institut National de la Recherche Agronomique, Université Clermont AuvergneClermont-Ferrand, France
| | - Réjane Beugnot
- MEDIS, Institut National de la Recherche Agronomique, Université Clermont AuvergneClermont-Ferrand, France
| | - Eric Peyretaillade
- MEDIS, Institut National de la Recherche Agronomique, Université Clermont AuvergneClermont-Ferrand, France
| | - Erwan Engel
- UR370 QuaPA, MASS Team, Institut National de la Recherche AgronomiqueSaint-Genes-Champanelle, France
| | - Pierre Peyret
- MEDIS, Institut National de la Recherche Agronomique, Université Clermont AuvergneClermont-Ferrand, France
| |
Collapse
|
19
|
Jiménez RC, Kuzak M, Alhamdoosh M, Barker M, Batut B, Borg M, Capella-Gutierrez S, Chue Hong N, Cook M, Corpas M, Flannery M, Garcia L, Gelpí JL, Gladman S, Goble C, González Ferreiro M, Gonzalez-Beltran A, Griffin PC, Grüning B, Hagberg J, Holub P, Hooft R, Ison J, Katz DS, Leskošek B, López Gómez F, Oliveira LJ, Mellor D, Mosbergen R, Mulder N, Perez-Riverol Y, Pergl R, Pichler H, Pope B, Sanz F, Schneider MV, Stodden V, Suchecki R, Svobodová Vařeková R, Talvik HA, Todorov I, Treloar A, Tyagi S, van Gompel M, Vaughan D, Via A, Wang X, Watson-Haigh NS, Crouch S. Four simple recommendations to encourage best practices in research software. F1000Res 2017; 6. [PMID: 28751965 PMCID: PMC5490478 DOI: 10.12688/f1000research.11407.1] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/03/2017] [Indexed: 12/24/2022] Open
Abstract
Scientific research relies on computer software, yet software is not always developed following practices that ensure its quality and sustainability. This manuscript does not aim to propose new software development best practices, but rather to provide simple recommendations that encourage the adoption of existing best practices. Software development best practices promote better quality software, and better quality software improves the reproducibility and reusability of research. These recommendations are designed around Open Source values, and provide practical suggestions that contribute to making research software and its source code more discoverable, reusable and transparent. This manuscript is aimed at developers, but also at organisations, projects, journals and funders that can increase the quality and sustainability of research software by encouraging the adoption of these recommendations.
Collapse
Affiliation(s)
| | - Mateusz Kuzak
- Netherlands eScience Center, Science Park 140, Amsterdam, 1098 XG, Netherlands
| | - Monther Alhamdoosh
- CSL Limited, Bio21 Institute, 30 Flemington Road, Parkville, Victoria, 3010, Australia
| | - Michelle Barker
- National eResearch Collaboration Tools and Resources, Victoria, 3010, Australia
| | - Bérénice Batut
- ELIXIR-DE and de.NBI, Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Mikael Borg
- ELIXIR-SE, National Bioinformatics Infrastructure Sweden (NBIS), Scilifelab, Department of Biochemistry and Biophysics (DBB), Stockholm University, Stockholm, Sweden
| | - Salvador Capella-Gutierrez
- ELIXIR-ES, Spanish National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Calle de Melchor Fernández Almagro 3, Madrid, 28029, Spain
| | - Neil Chue Hong
- Software Sustainability Institute, JCMB, University of Edinburgh, Edinburgh, EH9 3FD, UK
| | - Martin Cook
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Manuel Corpas
- Repositive Ltd, Future Business Centre, Cambridge, UK
| | - Madison Flannery
- EMBL Australia Bioinformatics Resource, Lab-14, The University of Melbourne, 700 Swanston St, Parkville, Victoria, 3053, Australia
| | - Leyla Garcia
- EMBL-EBI, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Josep Ll Gelpí
- Barcelona Supercomputing Center, Barcelona, 08034, Spain.,Department of Biochemistry and Molecular Biomedicine, Universitat de Barcelona, Barcelona, 08028, Spain
| | - Simon Gladman
- EMBL Australia Bioinformatics Resource, Lab-14, The University of Melbourne, 700 Swanston St, Parkville, Victoria, 3053, Australia
| | - Carole Goble
- ELIXIR-UK, Software Sustainability Institute, School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | | | | | - Philippa C Griffin
- EMBL Australia Bioinformatics Resource, Lab-14, The University of Melbourne, 700 Swanston St, Parkville, Victoria, 3053, Australia
| | - Björn Grüning
- ELIXIR-DE and de.NBI, Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
| | - Jonas Hagberg
- ELIXIR-SE, National Bioinformatics Infrastructure Sweden (NBIS), Scilifelab, Department of Biochemistry and Biophysics (DBB), Stockholm University, Stockholm, Sweden
| | - Petr Holub
- BBMRI-ERIC, Neue Stiftingtalstraße 2/B/6, Graz, 8010, Austria
| | - Rob Hooft
- Dutch TechCenter for Life Sciences and ELIXIR-NL, Utrecht, Netherlands
| | - Jon Ison
- ELIXIR-DK, Technical University of Denmark, Denmark, Denmark
| | - Daniel S Katz
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, USA.,School of Information Sciences, University of Illinois Urbana Champaign, Urbana, IL, USA.,Department of Electrical and Computer Engineering, University of Illinois Urbana Champaign, Urbana, IL, USA.,Department of Computer Science, University of Illinois Urbana Champaign, Urbana, IL, USA
| | - Brane Leskošek
- ELIXIR-SI, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | | | | | - David Mellor
- Center for Open Science, Charlottesville, VA, USA
| | | | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute for Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| | | | - Robert Pergl
- ELIXIR-CZ, Faculty of Information Technology, Czech Technical University in Prague, Prague, Czech Republic
| | - Horst Pichler
- BBMRI.at, Alpen-Adria-University Klagenfurt, Klagenfurt, Austria
| | - Bernard Pope
- EMBL Australia Bioinformatics Resource, Lab-14, The University of Melbourne, 700 Swanston St, Parkville, Victoria, 3053, Australia
| | - Ferran Sanz
- GRIB, Institut Hospital del Mar d'Investigacions Mèdiques (IMIM), Universitat Pompeu Fabra, Barcelona, Spain
| | - Maria V Schneider
- EMBL Australia Bioinformatics Resource, Lab-14, The University of Melbourne, 700 Swanston St, Parkville, Victoria, 3053, Australia
| | - Victoria Stodden
- School of Information Sciences, University of Illinois Urbana Champaign, Urbana, IL, USA
| | - Radosław Suchecki
- School of Agriculture, Food & Wine, University of Adelaide, Adelaide, Australia
| | - Radka Svobodová Vařeková
- Central European Institute of Technology (CEITEC), Brno, Czech Republic.,National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic
| | - Harry-Anton Talvik
- ELIXIR-EE, Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Ilian Todorov
- Science & Technologies Facilities Council, Swindon, UK
| | | | - Sonika Tyagi
- EMBL Australia Bioinformatics Resource, Lab-14, The University of Melbourne, 700 Swanston St, Parkville, Victoria, 3053, Australia.,Australian Genome Research Facility Ltd., Melbourne, Australia
| | - Maarten van Gompel
- Centre for Language and Speech Technology, Radboud University Nijmegen, Nijmegen, Netherlands
| | | | - Allegra Via
- IBPM-CNR, Department of Biochemical Sciences , Sapienza University of Rome, Rome, Italy
| | - Xiaochuan Wang
- Faculty of Information Technology, Monash University, Victoria, Australia
| | | | - Steve Crouch
- Software Sustainability Institute, Web and Internet Science, University of Southampton, Southampton, UK
| |
Collapse
|
20
|
Abstract
Comparative genomics has revealed that some species have exceptional genomes, compared to their closest relatives. For instance, some species have undergone a strong reduction of their genome with a drastic reduction of their genic repertoire. Deciphering the causes of these atypical trajectories can be very difficult because of the many phenomena that are intertwined during their evolution (e.g. changes of population size, environment structure and dynamics, selection strength, mutation rates...). Here we propose a methodology based on synthetic experiments to test the individual effect of these phenomena on a population of simulated organisms. We developed an evolutionary model--aevol--in which evolutionary conditions can be changed one at a time to test their effects on genome size and organization (e.g. coding ratio). To illustrate the proposed approach, we used aevol to test the effects of a strong reduction in the selection strength on a population of (simulated) bacteria. Our results show that this reduction of selection strength leads to a genome reduction of ~35% with a slight loss of coding sequences (~15% of the genes are lost--mainly those for which the contribution to fitness is the lowest). More surprisingly, under a low selection strength, genomes undergo a strong reduction of the noncoding compartment (~55% of the noncoding sequences being lost). These results are consistent with what is observed in reduced Prochlorococcus strains (marine cyanobacteria) when compared to close relatives.
Collapse
|