1
|
Hanssen F, Garcia MU, Folkersen L, Pedersen A, Lescai F, Jodoin S, Miller E, Seybold M, Wacker O, Smith N, Gabernet G, Nahnsen S. Scalable and efficient DNA sequencing analysis on different compute infrastructures aiding variant discovery. NAR Genom Bioinform 2024; 6:lqae031. [PMID: 38666213 PMCID: PMC11044436 DOI: 10.1093/nargab/lqae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 03/23/2024] [Indexed: 04/28/2024] Open
Abstract
DNA variation analysis has become indispensable in many aspects of modern biomedicine, most prominently in the comparison of normal and tumor samples. Thousands of samples are collected in local sequencing efforts and public databases requiring highly scalable, portable, and automated workflows for streamlined processing. Here, we present nf-core/sarek 3, a well-established, comprehensive variant calling and annotation pipeline for germline and somatic samples. It is suitable for any genome with a known reference. We present a full rewrite of the original pipeline showing a significant reduction of storage requirements by using the CRAM format and runtime by increasing intra-sample parallelization. Both are leading to a 70% cost reduction in commercial clouds enabling users to do large-scale and cross-platform data analysis while keeping costs and CO2 emissions low. The code is available at https://nf-co.re/sarek.
Collapse
Affiliation(s)
- Friederike Hanssen
- Quantitative Biology Center, Eberhard-Karls University of Tübingen, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- Department of Computer Science, Eberhard-Karls University of Tübingen, 72076 Baden-Württemberg, Germany
- M3 Research Center, University Hospital, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- Cluster of Excellence iFIT (EXC 2180) ‘Image-Guided and Functionally Instructed Tumor Therapies’, Eberhard-Karls University of Tübingen, Tübingen 72076, Baden-Württemberg, Germany
| | - Maxime U Garcia
- Seqera Labs, Carrer de Marià Aguilò, 28, Barcelona 08005, Spain
- Barntumörbanken, Department of Oncology-Pathology, Karolinska Institutet, BioClinicum, Visionsgatan 4, Solna 17164, Sweden
- National Genomics Infrastructure, SciLifeLab, SciLifeLab, Tomtebodavägen 23, Solna 17165, Sweden
| | | | | | - Francesco Lescai
- Department of Biology and Biotechnology ”L. Spallanzani”, University of Pavia, via Ferrata, 9, Pavia, 27100 PV, Italy
| | - Susanne Jodoin
- Quantitative Biology Center, Eberhard-Karls University of Tübingen, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- M3 Research Center, University Hospital, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
| | - Edmund Miller
- Department of Biological Sciences and Center for Systems Biology, University of Texas at Dallas, 800 W Campbell Rd, Richardson, TX 75080, USA
| | - Matthias Seybold
- Quantitative Biology Center, Eberhard-Karls University of Tübingen, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
| | - Oskar Wacker
- Quantitative Biology Center, Eberhard-Karls University of Tübingen, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- M3 Research Center, University Hospital, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
| | - Nicholas Smith
- Department of Informatics, Technical University of Munich, Boltzmannstr. 3, Garching, 85748 Bavaria, Germany
| | - Gisela Gabernet
- Quantitative Biology Center, Eberhard-Karls University of Tübingen, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- Department of Pathology, Yale School of Medicine, 300 George, New Haven, CT 06510, USA
| | - Sven Nahnsen
- Quantitative Biology Center, Eberhard-Karls University of Tübingen, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- Department of Computer Science, Eberhard-Karls University of Tübingen, 72076 Baden-Württemberg, Germany
- M3 Research Center, University Hospital, Otfried-Müller Str. 37, Tübingen 72076, Baden-Württemberg, Germany
- Cluster of Excellence iFIT (EXC 2180) ‘Image-Guided and Functionally Instructed Tumor Therapies’, Eberhard-Karls University of Tübingen, Tübingen 72076, Baden-Württemberg, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard-Karls University of Tübingen, Tübingen 72076, Baden-Württemberg, Germany
| |
Collapse
|
2
|
Waterhouse RM, Adam-Blondon AF, Balech B, Barta E, Ying Shi Chua P, Di Cola V, Heil KF, Hughes GM, Jermiin LS, Kalaš M, Lanfear J, Pafilis E, Palagi PM, Papageorgiou AC, Paupério J, Psomopoulos F, Raes N, Burgin J, Gabaldón T. The ELIXIR Biodiversity Community: Understanding short- and long-term changes in biodiversity. F1000Res 2024; 12:ELIXIR-499. [PMID: 38882711 PMCID: PMC11179050 DOI: 10.12688/f1000research.133724.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/09/2024] [Indexed: 06/18/2024] Open
Abstract
Biodiversity loss is now recognised as one of the major challenges for humankind to address over the next few decades. Unless major actions are taken, the sixth mass extinction will lead to catastrophic effects on the Earth's biosphere and human health and well-being. ELIXIR can help address the technical challenges of biodiversity science, through leveraging its suite of services and expertise to enable data management and analysis activities that enhance our understanding of life on Earth and facilitate biodiversity preservation and restoration. This white paper, prepared by the ELIXIR Biodiversity Community, summarises the current status and responses, and presents a set of plans, both technical and community-oriented, that should both enhance how ELIXIR Services are applied in the biodiversity field and how ELIXIR builds connections across the many other infrastructures active in this area. We discuss the areas of highest priority, how they can be implemented in cooperation with the ELIXIR Platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for a Biodiversity Community in ELIXIR and is an appeal to identify and involve new stakeholders.
Collapse
Affiliation(s)
- Robert M. Waterhouse
- Department of Ecology and Evolution, SIB Swiss Institute of Bioinformatics, Universite de Lausanne, Lausanne, Vaud, 1015, Switzerland
| | - Anne-Françoise Adam-Blondon
- INRAE, BioinfOmics, Plant Bioinformatics Facility, Universite Paris-Saclay, Gif-sur-Yvette, Île-de-France, 78026, France
| | - Bachir Balech
- Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Bari, 70126, Italy
| | - Endre Barta
- Institute of Genetics and Biotechnology, Magyar Agrar- es Elettudomanyi Egyetem, Gödöllő, Pest County, Hungary
| | | | - Valeria Di Cola
- SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, 1015, Switzerland
| | | | - Graham M. Hughes
- School of Biology and Environmental Science, University College Dublin, Dublin, Leinster, Ireland
| | - Lars S. Jermiin
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin, Leinster, Ireland
- School of Mathematical and Statistical Sciences, University of Galway, Galway, Ireland
| | - Matúš Kalaš
- Department of Informatics, Universitetet i Bergen, Bergen, Hordaland, Norway
| | - Jerry Lanfear
- ELIXIR, Wellcome Genome Campus, Hinxton, England, CB10 1SD, UK
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Heraklion, 71003, Greece
| | - Patricia M. Palagi
- SIB Swiss Institute of Bioinformatics, Lausanne, Vaud, 1015, Switzerland
| | | | - Joana Paupério
- EMBL-EBI, Wellcome Genome Campus, Hinxton, England, CB10 1SD, UK
| | - Fotis Psomopoulos
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Niels Raes
- Naturalis Biodiversity Center, Leiden, South Holland, The Netherlands
| | - Josephine Burgin
- EMBL-EBI, Wellcome Genome Campus, Hinxton, England, CB10 1SD, UK
| | - Toni Gabaldón
- Institut de Recerca Biomedica, Barcelona, Catalonia, Spain
- Centro Nacional de Supercomputacion, Barcelona, Catalonia, Spain
| |
Collapse
|
3
|
Salignon J, Millan-Ariño L, Garcia MU, Riedel CG. Cactus: A user-friendly and reproducible ATAC-Seq and mRNA-Seq analysis pipeline for data preprocessing, differential analysis, and enrichment analysis. Genomics 2024; 116:110858. [PMID: 38735595 DOI: 10.1016/j.ygeno.2024.110858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 05/06/2024] [Accepted: 05/09/2024] [Indexed: 05/14/2024]
Abstract
The ever decreasing cost of Next-Generation Sequencing coupled with the emergence of efficient and reproducible analysis pipelines has rendered genomic methods more accessible. However, downstream analyses are basic or missing in most workflows, creating a significant barrier for non-bioinformaticians. To help close this gap, we developed Cactus, an end-to-end pipeline for analyzing ATAC-Seq and mRNA-Seq data, either separately or jointly. Its Nextflow-, container-, and virtual environment-based architecture ensures efficient and reproducible analyses. Cactus preprocesses raw reads, conducts differential analyses between conditions, and performs enrichment analyses in various databases, including DNA-binding motifs, ChIP-Seq binding sites, chromatin states, and ontologies. We demonstrate the utility of Cactus in a multi-modal and multi-species case study as well as by showcasing its unique capabilities as compared to other ATAC-Seq pipelines. In conclusion, Cactus can assist researchers in gaining comprehensive insights from chromatin accessibility and gene expression data in a quick, user-friendly, and reproducible manner.
Collapse
Affiliation(s)
- Jérôme Salignon
- Department of Bioscience and Nutrition, Karolinska Institute, Blickagången 16, Huddinge SE-141 83, Sweden.
| | - Lluís Millan-Ariño
- Department of Bioscience and Nutrition, Karolinska Institute, Blickagången 16, Huddinge SE-141 83, Sweden
| | - Maxime U Garcia
- National Genomics Infrastructure, Science for Life Laboratory, Tomtebodavägen 23A, Solna SE-171 65, Sweden; Department of Oncology-Pathology, Karolinska Institute, Visionsgatan 4, Solna SE-171 64, Sweden
| | - Christian G Riedel
- Department of Bioscience and Nutrition, Karolinska Institute, Blickagången 16, Huddinge SE-141 83, Sweden.
| |
Collapse
|
4
|
Renton AI, Dao TT, Johnstone T, Civier O, Sullivan RP, White DJ, Lyons P, Slade BM, Abbott DF, Amos TJ, Bollmann S, Botting A, Campbell MEJ, Chang J, Close TG, Dörig M, Eckstein K, Egan GF, Evas S, Flandin G, Garner KG, Garrido MI, Ghosh SS, Grignard M, Halchenko YO, Hannan AJ, Heinsfeld AS, Huber L, Hughes ME, Kaczmarzyk JR, Kasper L, Kuhlmann L, Lou K, Mantilla-Ramos YJ, Mattingley JB, Meier ML, Morris J, Narayanan A, Pestilli F, Puce A, Ribeiro FL, Rogasch NC, Rorden C, Schira MM, Shaw TB, Sowman PF, Spitz G, Stewart AW, Ye X, Zhu JD, Narayanan A, Bollmann S. Neurodesk: an accessible, flexible and portable data analysis environment for reproducible neuroimaging. Nat Methods 2024; 21:804-808. [PMID: 38191935 PMCID: PMC11180540 DOI: 10.1038/s41592-023-02145-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 11/27/2023] [Indexed: 01/10/2024]
Abstract
Neuroimaging research requires purpose-built analysis software, which is challenging to install and may produce different results across computing environments. The community-oriented, open-source Neurodesk platform ( https://www.neurodesk.org/ ) harnesses a comprehensive and growing suite of neuroimaging software containers. Neurodesk includes a browser-accessible virtual desktop, command-line interface and computational notebook compatibility, allowing for accessible, flexible, portable and fully reproducible neuroimaging analysis on personal workstations, high-performance computers and the cloud.
Collapse
Affiliation(s)
- Angela I Renton
- The University of Queensland, Queensland Brain Institute, St Lucia, Brisbane, Queensland, Australia.
- The University of Queensland, School of Electrical Engineering and Computer Science, St Lucia, Brisbane, Queensland, Australia.
| | - Thuy T Dao
- The University of Queensland, School of Electrical Engineering and Computer Science, St Lucia, Brisbane, Queensland, Australia
| | - Tom Johnstone
- Centre for Mental Health & Brain Sciences, Swinburne University of Technology, Hawthorn, Victoria, Australia
| | - Oren Civier
- Centre for Mental Health & Brain Sciences, Swinburne University of Technology, Hawthorn, Victoria, Australia
| | - Ryan P Sullivan
- The University of Sydney, School of Biomedical Engineering, Sydney, New South Wales, Australia
| | - David J White
- Centre for Mental Health & Brain Sciences, Swinburne University of Technology, Hawthorn, Victoria, Australia
| | - Paris Lyons
- Centre for Mental Health & Brain Sciences, Swinburne University of Technology, Hawthorn, Victoria, Australia
| | - Benjamin M Slade
- Centre for Mental Health & Brain Sciences, Swinburne University of Technology, Hawthorn, Victoria, Australia
| | - David F Abbott
- The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Melbourne, Victoria, Australia
| | - Toluwani J Amos
- School of Life Science and Technology, University of Electronic Science and Technology, Chengdu, China
| | - Saskia Bollmann
- The University of Queensland, School of Electrical Engineering and Computer Science, St Lucia, Brisbane, Queensland, Australia
| | - Andy Botting
- Australian Research Data Commons (ARDC), Sydney, New South Wales, Australia
| | - Megan E J Campbell
- School of Psychological Sciences, University of Newcastle, Newcastle, New South Wales, Australia
- Hunter Medical Research Institute Imaging Centre, Newcastle, New South Wales, Australia
| | - Jeryn Chang
- The University of Queensland, School of Biomedical Sciences, St Lucia, Brisbane, Queensland, Australia
| | - Thomas G Close
- The University of Sydney, School of Biomedical Engineering, Sydney, New South Wales, Australia
| | - Monika Dörig
- Integrative Spinal Research, Department of Chiropractic Medicine, Balgrist University Hospital, University of Zurich, Zurich, Switzerland
| | - Korbinian Eckstein
- The University of Queensland, School of Electrical Engineering and Computer Science, St Lucia, Brisbane, Queensland, Australia
| | - Gary F Egan
- The Turner Institute for Brain and Mental Health, School of Psychological Sciences, Monash University, Melbourne, Victoria, Australia
- Monash Biomedical Imaging, Monash University, Melbourne, Victoria, Australia
| | - Stefanie Evas
- School of Psychology, University of Adelaide, Adelaide, South Australia, Australia
- Human Health, Health & Biosecurity, CSIRO, Adelaide, South Australia, Australia
| | - Guillaume Flandin
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
| | - Kelly G Garner
- School of Psychology, University of New South Wales, Sydney, New South Wales, Australia
- The University of Queensland, School of Psychology, St Lucia, Brisbane, Queensland, Australia
| | - Marta I Garrido
- Melbourne School of Psychological Sciences, he University of Melbourne, Melbourne, Victoria, Australia
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne, Melbourne, Victoria, Australia
| | - Satrajit S Ghosh
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Otolaryngology - Head and Neck Surgery, Harvard Medical School, Boston, MA, USA
| | - Martin Grignard
- GIGA CRC In-Vivo Imaging, University of Liège, Liège, Belgium
| | - Yaroslav O Halchenko
- Center for Open Neuroscience, Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Anthony J Hannan
- The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Melbourne, Victoria, Australia
| | - Anibal S Heinsfeld
- Department of Psychology, Center for Perceptual Systems, Institute for Neuroscience, Center For Learning and Memory, The University of Texas at Austin, Austin, TX, USA
| | - Laurentius Huber
- National Institute of Mental Health (NIMH), National Institutes Health, Bethesda, MD, USA
| | - Matthew E Hughes
- Centre for Mental Health & Brain Sciences, Swinburne University of Technology, Hawthorn, Victoria, Australia
| | - Jakub R Kaczmarzyk
- Department of Biomedical Informatics, Stony Brook University, New York, NY, USA
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, New York, NY, USA
| | - Lars Kasper
- BRAIN-TO Lab, Krembil Brain Institute, University Health Network, Toronto, Ontario, Canada
- Translational Neuromodeling Unit, Institute for Biomedical Engineering, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Levin Kuhlmann
- Department of Data Science and AI, Faculty of Information Technology, Monash University, Melbourne, Victoria, Australia
| | - Kexin Lou
- The University of Queensland, School of Electrical Engineering and Computer Science, St Lucia, Brisbane, Queensland, Australia
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Yorguin-Jose Mantilla-Ramos
- Grupo Neuropsicología y Conducta (GRUNECO), Facultad de Medicina, Universidad de Antioquia, Medellín, Colombia
| | - Jason B Mattingley
- The University of Queensland, Queensland Brain Institute, St Lucia, Brisbane, Queensland, Australia
- The University of Queensland, School of Psychology, St Lucia, Brisbane, Queensland, Australia
| | - Michael L Meier
- Integrative Spinal Research, Department of Chiropractic Medicine, Balgrist University Hospital, University of Zurich, Zurich, Switzerland
| | - Jo Morris
- Australian Research Data Commons (ARDC), Sydney, New South Wales, Australia
| | - Akshaiy Narayanan
- School of Computer Science, The University of Auckland, Auckland, New Zealand
| | - Franco Pestilli
- Department of Psychology, Center for Perceptual Systems, Institute for Neuroscience, Center For Learning and Memory, The University of Texas at Austin, Austin, TX, USA
| | - Aina Puce
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, USA
| | - Fernanda L Ribeiro
- The University of Queensland, School of Electrical Engineering and Computer Science, St Lucia, Brisbane, Queensland, Australia
| | - Nigel C Rogasch
- The Turner Institute for Brain and Mental Health, School of Psychological Sciences, Monash University, Melbourne, Victoria, Australia
- Discipline of Psychiatry, Adelaide Medical School, University of Adelaide, Adelaide, South Australia, Australia
- Hopwood Centre for Neurobiology, Lifelong Health Theme, South Australian Health and Medical Research Institute (SAHMRI), Adelaide, South Australia, Australia
| | - Chris Rorden
- McCausland Center for Brain Imaging, Department of Psychology, University of South Carolina, Columbia, SC, USA
| | - Mark M Schira
- School of Psychology, University of Wollongong, Wollongong, New South Wales, Australia
| | - Thomas B Shaw
- The University of Queensland, School of Electrical Engineering and Computer Science, St Lucia, Brisbane, Queensland, Australia
- The University of Queensland, Centre for Advanced Imaging, St Lucia, Brisbane, Queensland, Australia
- Department of Neurology, Royal Brisbane and Women's Hospital, Brisbane, Queensland, Australia
| | - Paul F Sowman
- Macquarie University, School of Psychological Sciences, Sydney, New South Wales, Australia
| | - Gershon Spitz
- Department of Neuroscience, Central Clinical School, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, Victoria, Australia
- Monash-Epworth Rehabilitation Research Centre, Turner Institute for Brain and Mental Health, School of Psychological Sciences, Monash University, Melbourne, Victoria, Australia
| | - Ashley W Stewart
- The University of Queensland, School of Electrical Engineering and Computer Science, St Lucia, Brisbane, Queensland, Australia
- ARC Training Centre for Innovation in Biomedical Imaging Technology, The University of Queensland, Brisbane, Queensland, Australia
| | - Xincheng Ye
- The University of Queensland, School of Electrical Engineering and Computer Science, St Lucia, Brisbane, Queensland, Australia
| | - Judy D Zhu
- Macquarie University, School of Psychological Sciences, Sydney, New South Wales, Australia
| | - Aswin Narayanan
- The University of Queensland, Centre for Advanced Imaging, St Lucia, Brisbane, Queensland, Australia
| | - Steffen Bollmann
- The University of Queensland, School of Electrical Engineering and Computer Science, St Lucia, Brisbane, Queensland, Australia.
- The University of Queensland, Centre for Advanced Imaging, St Lucia, Brisbane, Queensland, Australia.
- ARC Training Centre for Innovation in Biomedical Imaging Technology, The University of Queensland, Brisbane, Queensland, Australia.
- Queensland Digital Health Centre, The University of Queensland, Brisbane, Queensland, Australia.
| |
Collapse
|
5
|
Alser M, Lawlor B, Abdill RJ, Waymost S, Ayyala R, Rajkumar N, LaPierre N, Brito J, Ribeiro-Dos-Santos AM, Almadhoun N, Sarwal V, Firtina C, Osinski T, Eskin E, Hu Q, Strong D, Kim BDBD, Abedalthagafi MS, Mutlu O, Mangul S. Packaging and containerization of computational methods. Nat Protoc 2024:10.1038/s41596-024-00986-0. [PMID: 38565959 DOI: 10.1038/s41596-024-00986-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 02/12/2024] [Indexed: 04/04/2024]
Abstract
Methods for analyzing the full complement of a biomolecule type, e.g., proteomics or metabolomics, generate large amounts of complex data. The software tools used to analyze omics data have reshaped the landscape of modern biology and become an essential component of biomedical research. These tools are themselves quite complex and often require the installation of other supporting software, libraries and/or databases. A researcher may also be using multiple different tools that require different versions of the same supporting materials. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging and containerization are different approaches to satisfy this need by delivering omics tools already wrapped in additional software that makes the tools easier to install and use. In this systematic review, we describe and compare the features of prominent packaging and containerization platforms. We outline the challenges, advantages and limitations of each approach and some of the most widely used platforms from the perspectives of users, software developers and system administrators. We also propose principles to make the distribution of omics software more sustainable and robust to increase the reproducibility of biomedical and life science research.
Collapse
Affiliation(s)
- Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Brendan Lawlor
- Department of Computer Science, Munster Technological University, Cork, Ireland
- Department of Biological Sciences, Munster Technological University, Cork, Ireland
| | - Richard J Abdill
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Sharon Waymost
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ram Ayyala
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
| | - Neha Rajkumar
- Department of Bioengineering, University of California, Los Angeles, Los Angeles, CA, USA
| | - Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Jaqueline Brito
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA
| | | | - Nour Almadhoun
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Varuni Sarwal
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Can Firtina
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Tomasz Osinski
- Center for Advanced Research Computing, University of Southern California, Los Angeles, CA, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, University of California, Los Angeles, CA, USA
| | - Qiyang Hu
- Office of Advanced Research Computing, University of California, Los Angeles, CA, USA
| | - Derek Strong
- Center for Advanced Research Computing, University of Southern California, Los Angeles, CA, USA
| | - Byoung-Do B D Kim
- Center for Advanced Research Computing, University of Southern California, Los Angeles, CA, USA
| | - Malak S Abedalthagafi
- Department of Pathology & Laboratory Medicine, Emory University Hospital, Atlanta, GA, USA
- King Salman Center for Disability Research, Riyadh, Saudi Arabia
| | - Onur Mutlu
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Serghei Mangul
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
- Titus Family Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
6
|
Holcomb M, Marshall A, Flinn H, Lozano M, Soriano S, Gomez-Pinilla F, Treangen TJ, Villapol S. Probiotic treatment causes sex-specific neuroprotection after traumatic brain injury in mice. RESEARCH SQUARE 2024:rs.3.rs-4196801. [PMID: 38645104 PMCID: PMC11030542 DOI: 10.21203/rs.3.rs-4196801/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Background Recent studies have shed light on the potential role of gut dysbiosis in shaping traumatic brain injury (TBI) outcomes. Changes in the levels and types of Lactobacillus bacteria present might impact the immune system disturbances, neuroinflammatory responses, anxiety and depressive-like behaviors, and compromised neuroprotection mechanisms triggered by TBI. Objective This study aimed to investigate the effects of a daily pan-probiotic (PP) mixture in drinking water containing strains of Lactobacillus plantarum, L. reuteri, L. helveticus, L. fermentum, L. rhamnosus, L. gasseri, and L. casei, administered for either two or seven weeks before inducing TBI on both male and female mice. Methods Mice were subjected to controlled cortical impact (CCI) injury. Short-chain fatty acids (SCFAs) analysis was performed for metabolite measurements. The taxonomic profiles of murine fecal samples were evaluated using 16S rRNA V1-V3 sequencing analysis. Histological analyses were used to assess neuroinflammation and gut changes post-TBI, while behavioral tests were conducted to evaluate sensorimotor and cognitive functions. Results Our findings suggest that PP administration modulates the diversity and composition of the microbiome and increases the levels of SCFAs in a sex-dependent manner. We also observed a reduction of lesion volume, cell death, and microglial and macrophage activation after PP treatment following TBI in male mice. Furthermore, PP-treated mice show motor function improvements and decreases in anxiety and depressive-like behaviors. Conclusion Our findings suggest that PP administration can mitigate neuroinflammation and ameliorate motor and anxiety and depressive-like behavior deficits following TBI. These results underscore the potential of probiotic interventions as a viable therapeutic strategy to address TBI-induced impairments, emphasizing the need for gender-specific treatment approaches.
Collapse
|
7
|
Li J, Xiong Y, Feng S, Pan C, Guo X. CloudProteoAnalyzer: scalable processing of big data from proteomics using cloud computing. BIOINFORMATICS ADVANCES 2024; 4:vbae024. [PMID: 38495055 PMCID: PMC10942798 DOI: 10.1093/bioadv/vbae024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 02/05/2024] [Accepted: 02/21/2024] [Indexed: 03/19/2024]
Abstract
Summary Shotgun proteomics is widely used in many system biology studies to determine the global protein expression profiles of tissues, cultures, and microbiomes. Many non-distributed computer algorithms have been developed for users to process proteomics data on their local computers. However, the amount of data acquired in a typical proteomics study has grown rapidly in recent years, owing to the increasing throughput of mass spectrometry and the expanding scale of study designs. This presents a big data challenge for researchers to process proteomics data in a timely manner. To overcome this challenge, we developed a cloud-based parallel computing application to offer end-to-end proteomics data analysis software as a service (SaaS). A web interface was provided to users to upload mass spectrometry-based proteomics data, configure parameters, submit jobs, and monitor job status. The data processing was distributed across multiple nodes in a supercomputer to achieve scalability for large datasets. Our study demonstrated SaaS for proteomics as a viable solution for the community to scale up the data processing using cloud computing. Availability and implementation This application is available online at https://sipros.oscer.ou.edu/ or https://sipros.unt.edu for free use. The source code is available at https://github.com/Biocomputing-Research-Group/CloudProteoAnalyzer under the GPL version 3.0 license.
Collapse
Affiliation(s)
- Jiancheng Li
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
| | - Yi Xiong
- School of Biological Sciences, University of Oklahoma, Norman, OK 73019, United States
| | - Shichao Feng
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
| | - Chongle Pan
- School of Biological Sciences, University of Oklahoma, Norman, OK 73019, United States
- School of Computer Science, University of Oklahoma, Norman, OK 73019, United States
| | - Xuan Guo
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
| |
Collapse
|
8
|
Schmied C, Nelson MS, Avilov S, Bakker GJ, Bertocchi C, Bischof J, Boehm U, Brocher J, Carvalho MT, Chiritescu C, Christopher J, Cimini BA, Conde-Sousa E, Ebner M, Ecker R, Eliceiri K, Fernandez-Rodriguez J, Gaudreault N, Gelman L, Grunwald D, Gu T, Halidi N, Hammer M, Hartley M, Held M, Jug F, Kapoor V, Koksoy AA, Lacoste J, Le Dévédec S, Le Guyader S, Liu P, Martins GG, Mathur A, Miura K, Montero Llopis P, Nitschke R, North A, Parslow AC, Payne-Dwyer A, Plantard L, Ali R, Schroth-Diez B, Schütz L, Scott RT, Seitz A, Selchow O, Sharma VP, Spitaler M, Srinivasan S, Strambio-De-Castillia C, Taatjes D, Tischer C, Jambor HK. Community-developed checklists for publishing images and image analyses. Nat Methods 2024; 21:170-181. [PMID: 37710020 PMCID: PMC10922596 DOI: 10.1038/s41592-023-01987-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 07/26/2023] [Indexed: 09/16/2023]
Abstract
Images document scientific discoveries and are prevalent in modern biomedical research. Microscopy imaging in particular is currently undergoing rapid technological advancements. However, for scientists wishing to publish obtained images and image-analysis results, there are currently no unified guidelines for best practices. Consequently, microscopy images and image data in publications may be unclear or difficult to interpret. Here, we present community-developed checklists for preparing light microscopy images and describing image analyses for publications. These checklists offer authors, readers and publishers key recommendations for image formatting and annotation, color selection, data availability and reporting image-analysis workflows. The goal of our guidelines is to increase the clarity and reproducibility of image figures and thereby to heighten the quality and explanatory power of microscopy data.
Collapse
Affiliation(s)
- Christopher Schmied
- Fondazione Human Technopole, Milano, Italy.
- Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP), Berlin, Germany.
| | - Michael S Nelson
- Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI, USA
| | - Sergiy Avilov
- Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany
| | - Gert-Jan Bakker
- Medical BioSciences Department, Radboud University Medical Centre, Nijmegen, the Netherlands
| | - Cristina Bertocchi
- Laboratory for Molecular Mechanics of Cell Adhesions, Pontificia Universidad Católica de Chile Santiago, Santiago de Chile, Chile
- Graduate School of Engineering Science, Osaka University, Osaka, Japan
| | | | | | - Jan Brocher
- Scientific Image Processing and Analysis, BioVoxxel, Ludwigshafen, Germany
| | - Mariana T Carvalho
- Nanophotonics and BioImaging Facility at INL, International Iberian Nanotechnology Laboratory, Braga, Portugal
| | | | - Jana Christopher
- Biochemistry Center Heidelberg, Heidelberg University, Heidelberg, Germany
| | - Beth A Cimini
- Imaging Platform, Broad Institute, Cambridge, MA, USA
| | - Eduardo Conde-Sousa
- i3S, Instituto de Investigação e Inovação Em Saúde and INEB, Instituto de Engenharia Biomédica, Universidade do Porto, Porto, Portugal
| | - Michael Ebner
- Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP), Berlin, Germany
| | - Rupert Ecker
- Translational Research Institute, Queensland University of Technology, Woolloongabba, Queensland, Australia
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology, Brisbane, Queensland, Australia
- TissueGnostics GmbH, Vienna, Austria
| | - Kevin Eliceiri
- Department of Medical Physics and Biomedical Engineering, University of Wisconsin-Madison, Madison, WI, USA
| | - Julia Fernandez-Rodriguez
- Centre for Cellular Imaging Core Facility, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | | | - Laurent Gelman
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
| | - David Grunwald
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | | | - Nadia Halidi
- Advanced Light Microscopy Unit, Centre for Genomic Regulation, Barcelona, Spain
| | - Mathias Hammer
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Matthew Hartley
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute, Hinxton, UK
| | - Marie Held
- Centre for Cell Imaging, the University of Liverpool, Liverpool, UK
| | | | - Varun Kapoor
- Department of AI Research, Kapoor Labs, Paris, France
| | | | | | - Sylvia Le Dévédec
- Division of Drug Discovery and Safety, Cell Observatory, Leiden Academic Centre for Drug Research, Leiden University, Leiden, the Netherlands
| | | | - Penghuan Liu
- Key Laboratory for Modern Measurement Technology and Instruments of Zhejiang Province, College of Optical and Electronic Technology, China Jiliang University, Hangzhou, China
| | - Gabriel G Martins
- Advanced Imaging Facility, Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | | | - Kota Miura
- Bioimage Analysis and Research, Heidelberg, Germany
| | | | - Roland Nitschke
- Life Imaging Center, Signalling Research Centres CIBSS and BIOSS, University of Freiburg, Freiburg, Germany
| | - Alison North
- Bio-Imaging Resource Center, the Rockefeller University, New York, NY, USA
| | - Adam C Parslow
- Baker Institute Microscopy Platform, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Alex Payne-Dwyer
- School of Physics, Engineering and Technology, University of York, Heslington, UK
| | - Laure Plantard
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
| | - Rizwan Ali
- King Abdullah International Medical Research Center (KAIMRC), Medical Research Core Facility and Platforms (MRCFP), King Saud bin Abdulaziz University for Health Sciences (KSAU-HS), Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia
| | - Britta Schroth-Diez
- Light Microscopy Facility, Max Planck Institute of Molecular Cell Biology and Genetics Dresden, Dresden, Germany
| | | | - Ryan T Scott
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA, USA
| | - Arne Seitz
- BioImaging and Optics Platform, Faculty of Life Sciences (SV), École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Olaf Selchow
- Microscopy and BioImaging Consulting, Image Processing and Large Data Handling, Gera, Germany
| | - Ved P Sharma
- Bio-Imaging Resource Center, the Rockefeller University, New York, NY, USA
| | | | - Sathya Srinivasan
- Imaging and Morphology Support Core, Oregon National Primate Research Center, OHSU West Campus, Beaverton, OR, USA
| | | | - Douglas Taatjes
- Department of Pathology and Laboratory Medicine, Microscopy Imaging Center, Center for Biomedical Shared Resources, University of Vermont, Burlington, VT, USA
| | | | | |
Collapse
|
9
|
Gogoberidze N, Cimini BA. Defining the boundaries: challenges and advances in identifying cells in microscopy images. Curr Opin Biotechnol 2024; 85:103055. [PMID: 38142646 PMCID: PMC11170924 DOI: 10.1016/j.copbio.2023.103055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Revised: 11/28/2023] [Accepted: 11/28/2023] [Indexed: 12/26/2023]
Abstract
Segmentation, or the outlining of objects within images, is a critical step in the measurement and analysis of cells within microscopy images. While improvements continue to be made in tools that rely on classical methods for segmentation, deep learning-based tools increasingly dominate advances in the technology. Specialist models such as Cellpose continue to improve in accuracy and user-friendliness, and segmentation challenges such as the Multi-Modality Cell Segmentation Challenge continue to push innovation in accuracy across widely varying test data as well as efficiency and usability. Increased attention on documentation, sharing, and evaluation standards is leading to increased user-friendliness and acceleration toward the goal of a truly universal method.
Collapse
Affiliation(s)
| | - Beth A Cimini
- Imaging Platform, Broad Institute, Cambridge, MA 02142, USA.
| |
Collapse
|
10
|
Gabernet G, Marquez S, Bjornson R, Peltzer A, Meng H, Aron E, Lee NY, Jensen C, Ladd D, Hanssen F, Heumos S, Yaari G, Kowarik MC, Nahnsen S, Kleinstein SH. nf-core/airrflow: an adaptive immune receptor repertoire analysis workflow employing the Immcantation framework. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576147. [PMID: 38293151 PMCID: PMC10827190 DOI: 10.1101/2024.01.18.576147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a valuable experimental tool to study the immune state in health and following immune challenges such as infectious diseases, (auto)immune diseases, and cancer. Several tools have been developed to reconstruct B cell and T cell receptor sequences from AIRR-seq data and infer B and T cell clonal relationships. However, currently available tools offer limited parallelization across samples, scalability or portability to high-performance computing infrastructures. To address this need, we developed nf-core/airrflow, an end-to-end bulk and single-cell AIRR-seq processing workflow which integrates the Immcantation Framework following BCR and TCR sequencing data analysis best practices. The Immcantation Framework is a comprehensive toolset, which allows the processing of bulk and single-cell AIRR-seq data from raw read processing to clonal inference. nf-core/airrflow is written in Nextflow and is part of the nf-core project, which collects community contributed and curated Nextflow workflows for a wide variety of analysis tasks. We assessed the performance of nf-core/airrflow on simulated sequencing data with sequencing errors and show example results with real datasets. To demonstrate the applicability of nf-core/airrflow to the high-throughput processing of large AIRR-seq datasets, we validated and extended previously reported findings of convergent antibody responses to SARS-CoV-2 by analyzing 97 COVID-19 infected individuals and 99 healthy controls, including a mixture of bulk and single-cell sequencing datasets. Using this dataset, we extended the convergence findings to 20 additional subjects, highlighting the applicability of nf-core/airrflow to validate findings in small in-house cohorts with reanalysis of large publicly available AIRR datasets. nf-core/airrflow is available free of charge, under the MIT license on GitHub (https://github.com/nf-core/airrflow). Detailed documentation and example results are available on the nf-core website at (https://nf-co.re/airrflow).
Collapse
|
11
|
Bouyssié D, Altıner P, Capella-Gutierrez S, Fernández JM, Hagemeijer YP, Horvatovich P, Hubálek M, Levander F, Mauri P, Palmblad M, Raffelsberger W, Rodríguez-Navas L, Di Silvestre D, Kunkli BT, Uszkoreit J, Vandenbrouck Y, Vizcaíno JA, Winkelhardt D, Schwämmle V. WOMBAT-P: Benchmarking Label-Free Proteomics Data Analysis Workflows. J Proteome Res 2024; 23:418-429. [PMID: 38038272 DOI: 10.1021/acs.jproteome.3c00636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.
Collapse
Affiliation(s)
- David Bouyssié
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III─Paul Sabatier (UT3), 31062 Toulouse, France
- Proteomics French Infrastructure, ProFI, FR 2048 Toulouse, France
| | - Pınar Altıner
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III─Paul Sabatier (UT3), 31062 Toulouse, France
| | | | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Yanick Paco Hagemeijer
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, 9712 CP Groningen, The Netherlands
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, 9713 GZ Groningen, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, 9712 CP Groningen, The Netherlands
| | - Martin Hubálek
- Institute of Organic Chemistry and Biochemistry, CAS, 160 00 Prague, Czech Republic
| | - Fredrik Levander
- National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory, Department of Immunotechnology, Lund University, 22100 Lund, Sweden
| | - Pierluigi Mauri
- Institute for Biomedical Technologies (ITB), Department of Biomedical Sciences, National Research Council (CNR), Segrate, 20054 Milan, Italy
| | - Magnus Palmblad
- Leiden University Medical Center, Postbus 9600, 2300 RC Leiden, The Netherlands
| | - Wolfgang Raffelsberger
- Wolfgang Raffelsberger: Institut de Génétique et de Biologie Moléculaire et Cellulaire, Université de Strasbourg, CNRS UMR7104, INSERM U1258, Illkirch, 1 Rue Laurent Fries, 67404 Illkirch, France
| | - Laura Rodríguez-Navas
- Life Sciences Department, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Dario Di Silvestre
- Institute for Biomedical Technologies (ITB), Department of Biomedical Sciences, National Research Council (CNR), Segrate, 20054 Milan, Italy
| | - Balázs Tibor Kunkli
- Balázs Tibor Kunkli: Department of Biochemistry and Molecular Biology, University of Debrecen, 4032 Debrecen, Hungary
| | - Julian Uszkoreit
- Medical Faculty, Medical Bioinformatics, Ruhr University Bochum, 44801 Bochum, Germany
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany
- Medical Faculty, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Yves Vandenbrouck
- Proteomics French Infrastructure, ProFI, FR 2048 Toulouse, France
- CEA, Fundamental Research Division, Proteomics French Infrastructure, 91191 Gif-sur-Yvette, France
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI), Wellcome Trust, Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Dirk Winkelhardt
- Medical Faculty, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark
| |
Collapse
|
12
|
Bemis KA, Föll MC, Guo D, Lakkimsetty SS, Vitek O. Cardinal v.3: a versatile open-source software for mass spectrometry imaging analysis. Nat Methods 2023; 20:1883-1886. [PMID: 37996752 DOI: 10.1038/s41592-023-02070-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 10/06/2023] [Indexed: 11/25/2023]
Abstract
Cardinal v.3 is an open-source software for reproducible analysis of mass spectrometry imaging experiments. A major update from its previous versions, Cardinal v.3 supports most mass spectrometry imaging workflows. Its analytical capabilities include advanced data processing such as mass recalibration, advanced statistical analyses such as single-ion segmentation and rough annotation-based classification, and memory-efficient analyses of large-scale multitissue experiments.
Collapse
Affiliation(s)
- Kylie Ariel Bemis
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Melanie Christine Föll
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
- Institute of Surgical Pathology, Medical Center, University of Freiburg, Faculty of Medicine, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Dan Guo
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | | | - Olga Vitek
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
| |
Collapse
|
13
|
Stribling D, Gay LA, Renne R. Hybkit: a Python API and command-line toolkit for hybrid sequence data from chimeric RNA methods. Bioinformatics 2023; 39:btad721. [PMID: 38006335 PMCID: PMC10701094 DOI: 10.1093/bioinformatics/btad721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/21/2023] [Accepted: 11/24/2023] [Indexed: 11/27/2023] Open
Abstract
SUMMARY Experimental methods using microRNA/target ligation have recently provided significant insights into microRNA functioning through generation of chimeric (hybrid) RNA sequences. Here, we introduce Hybkit, a Python3 API, and command-line toolkit for analysis of hybrid sequence data in the "hyb" file format to enable customizable evaluation and annotation of hybrid characteristics. The Hybkit API includes a suite of python objects for developing custom analyses of hybrid data as well as miRNA-specific analysis methods, built-in plotting of analysis results, and incorporation of predicted miRNA/target interactions in Vienna format. AVAILABILITY AND IMPLEMENTATION Hybkit is provided free and open source under the GNU GPL license at github.com/RenneLab/hybkit and archived on Zenodo (doi.org/10.5281/zenodo.7834299). Hybkit distributions are also provided via PyPI (pypi.org/project/hybkit), Conda (bioconda.github.io/recipes/hybkit/README.html), and Docker (quay.io/repository/biocontainers/hybkit).
Collapse
Affiliation(s)
- Daniel Stribling
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610, United States
- UF Genetics Institute, University of Florida, Gainesville, FL 32610, United States
- UF Health Cancer Center, University of Florida, Gainesville, FL 32610, United States
| | - Lauren A Gay
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610, United States
| | - Rolf Renne
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610, United States
- UF Genetics Institute, University of Florida, Gainesville, FL 32610, United States
- UF Health Cancer Center, University of Florida, Gainesville, FL 32610, United States
| |
Collapse
|
14
|
Thalén F, Köhne CG, Bleidorn C. Patchwork: Alignment-Based Retrieval and Concatenation of Phylogenetic Markers from Genomic Data. Genome Biol Evol 2023; 15:evad227. [PMID: 38085033 PMCID: PMC10735302 DOI: 10.1093/gbe/evad227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 12/23/2023] Open
Abstract
Low-coverage whole-genome sequencing (also known as "genome skimming") is becoming an increasingly affordable approach to large-scale phylogenetic analyses. While already routinely used to recover organellar genomes, genome skimming is rather rarely utilized for recovering single-copy nuclear markers. One reason might be that only few tools exist to work with this data type within a phylogenomic context, especially to deal with fragmented genome assemblies. We here present a new software tool called Patchwork for mining phylogenetic markers from highly fragmented short-read assemblies as well as directly from sequence reads. Patchwork is an alignment-based tool that utilizes the sequence aligner DIAMOND and is written in the programming language Julia. Homologous regions are obtained via a sequence similarity search, followed by a "hit stitching" phase, in which adjacent or overlapping regions are merged into a single unit. The novel sliding window algorithm trims away any noncoding regions from the resulting sequence. We demonstrate the utility of Patchwork by recovering near-universal single-copy orthologs within a benchmarking study, and we additionally assess the performance of Patchwork in comparison with other programs. We find that Patchwork allows for accurate retrieval of (putatively) single-copy genes from genome skimming data sets at different sequencing depths with high computational speed, outperforming existing software targeting similar tasks. Patchwork is released under the GNU General Public License version 3. Installation instructions, additional documentation, and the source code itself are all available via GitHub at https://github.com/fethalen/Patchwork.
Collapse
Affiliation(s)
- Felix Thalén
- Department for Animal Evolution and Biodiversity, Georg-August-Universität Göttingen, Göttingen 37073, Germany
- Cardio-CARE AG, Medizincampus Davos, Davos Wolfgang 7265, Switzerland
| | - Clara G Köhne
- Department for Animal Evolution and Biodiversity, Georg-August-Universität Göttingen, Göttingen 37073, Germany
| | - Christoph Bleidorn
- Department for Animal Evolution and Biodiversity, Georg-August-Universität Göttingen, Göttingen 37073, Germany
| |
Collapse
|
15
|
Bryce-Smith S, Burri D, Gazzara MR, Herrmann CJ, Danecka W, Fitzsimmons CM, Wan YK, Zhuang F, Fansler MM, Fernández JM, Ferret M, Gonzalez-Uriarte A, Haynes S, Herdman C, Kanitz A, Katsantoni M, Marini F, McDonnel E, Nicolet B, Poon CL, Rot G, Schärfen L, Wu PJ, Yoon Y, Barash Y, Zavolan M. Extensible benchmarking of methods that identify and quantify polyadenylation sites from RNA-seq data. RNA (NEW YORK, N.Y.) 2023; 29:1839-1855. [PMID: 37816550 PMCID: PMC10653393 DOI: 10.1261/rna.079849.123] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 09/21/2023] [Indexed: 10/12/2023]
Abstract
The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, limitations, and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasing need for benchmarks, and for the provision of infrastructure for continuous method evaluation. APAeval is an international community effort, organized by the RNA Society in 2021, to benchmark tools for the identification and quantification of the usage of alternative polyadenylation (APA) sites from short-read, bulk RNA-sequencing (RNA-seq) data. Here, we reviewed 17 tools and benchmarked eight on their ability to perform APA identification and quantification, using a comprehensive set of RNA-seq experiments comprising real, synthetic, and matched 3'-end sequencing data. To support continuous benchmarking, we have incorporated the results into the OpenEBench online platform, which allows for continuous extension of the set of methods, metrics, and challenges. We envisage that our analyses will assist researchers in selecting the appropriate tools for their studies, while the containers and reproducible workflows could easily be deployed and extended to evaluate new methods or data sets.
Collapse
Affiliation(s)
- Sam Bryce-Smith
- Department of Neuromuscular Diseases, UCL Queen Square Motor Neuron Disease Centre, UCL Queen Square Institute of Neurology, UCL, London WC1N 3BG, United Kingdom
| | - Dominik Burri
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Matthew R Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Christina J Herrmann
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Weronika Danecka
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3FF, United Kingdom
| | - Christina M Fitzsimmons
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore, Buona Vista, Singapore 138672
- Yong Loo Lin School of Medicine, National University of Singapore, Kent Ridge, Singapore 119228
| | - Farica Zhuang
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Mervin M Fansler
- Tri-Institutional Program in Computational Biology and Medicine, Weill Cornell Graduate Studies, New York, New York 10065, USA
- Cancer Biology and Genetics, Sloan-Kettering Institute, MSKCC, New York, New York 10065, USA
| | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), 28029 Madrid, Spain
| | - Meritxell Ferret
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), 28029 Madrid, Spain
| | - Asier Gonzalez-Uriarte
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), 28029 Madrid, Spain
| | - Samuel Haynes
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh EH9 3FF, United Kingdom
| | - Chelsea Herdman
- Department of Neurobiology, University of Utah, Salt Lake City, Utah 84132, USA
| | - Alexander Kanitz
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maria Katsantoni
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg-University Mainz, 55118 Mainz, Germany
| | - Euan McDonnel
- Leeds Institute for Data Analytics, School of Molecular and Cellular Biology, University of Leeds, Leeds LS2 9NL, United Kingdom
| | - Ben Nicolet
- Department of Hematopoiesis, Sanquin Research, Landsteiner Laboratory, Amsterdam UMC, University of Amsterdam, 1066 CX Amsterdam, The Netherlands
- Oncode Institute, 3521 AL Utrecht, The Netherlands
| | - Chi-Lam Poon
- Graduate School of Medical Sciences, Weill Cornell Medicine, New York, New York 10065, USA
| | - Gregor Rot
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | - Leonard Schärfen
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Pin-Jou Wu
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, 72076 Tübingen, Germany
| | - Yoseop Yoon
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California Irvine, Irvine, California 92617, USA
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Mihaela Zavolan
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
16
|
Rossini R, Oshaghi M, Nekrasov M, Bellanger A, Domaschenz R, Dijkwel Y, Abdelhalim M, Collas P, Tremethick D, Paulsen J. Multi-level 3D genome organization deteriorates during breast cancer progression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.26.568711. [PMID: 38076897 PMCID: PMC10705249 DOI: 10.1101/2023.11.26.568711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Breast cancer entails intricate alterations in genome organization and expression. However, how three-dimensional (3D) chromatin structure changes in the progression from a normal to a breast cancer malignant state remains unknown. To address this, we conducted an analysis combining Hi-C data with lamina-associated domains (LADs), epigenomic marks, and gene expression in an in vitro model of breast cancer progression. Our results reveal that while the fundamental properties of topologically associating domains (TADs) remain largely stable, significant changes occur in the organization of compartments and subcompartments. These changes are closely correlated with alterations in the expression of oncogenic genes. We also observe a restructuring of TAD-TAD interactions, coinciding with a loss of spatial compartmentalization and radial positioning of the 3D genome. Notably, we identify a previously unrecognized interchromosomal insertion event, wherein a locus on chromosome 8 housing the MYC oncogene is inserted into a highly active subcompartment on chromosome 10. This insertion leads to the formation of de novo enhancer contacts and activation of the oncogene, illustrating how structural variants can interact with the 3D genome to drive oncogenic states. In summary, our findings provide evidence for the degradation of genome organization at multiple scales during breast cancer progression revealing novel relationships between genome 3D structure and oncogenic processes.
Collapse
Affiliation(s)
- Roberto Rossini
- Department of Biosciences, Faculty of Mathematics and Natural Sciences, University of Oslo, 0316 Oslo, Norway
| | - Mohammadsaleh Oshaghi
- Department of Biosciences, Faculty of Mathematics and Natural Sciences, University of Oslo, 0316 Oslo, Norway
| | - Maxim Nekrasov
- Department of Genome Sciences, The John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Aurélie Bellanger
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, 0317 Oslo, Norway
| | - Renae Domaschenz
- Department of Genome Sciences, The John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Yasmin Dijkwel
- Department of Genome Sciences, The John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Mohamed Abdelhalim
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, 0317 Oslo, Norway
| | - Philippe Collas
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, 0317 Oslo, Norway
- Department of Immunology and Transfusion Medicine, Oslo University Hospital, 0424 Oslo, Norway
| | - David Tremethick
- Department of Genome Sciences, The John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Jonas Paulsen
- Department of Biosciences, Faculty of Mathematics and Natural Sciences, University of Oslo, 0316 Oslo, Norway
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
17
|
Ziemann M, Poulain P, Bora A. The five pillars of computational reproducibility: bioinformatics and beyond. Brief Bioinform 2023; 24:bbad375. [PMID: 37870287 PMCID: PMC10591307 DOI: 10.1093/bib/bbad375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/26/2023] [Accepted: 09/30/2023] [Indexed: 10/24/2023] Open
Abstract
Computational reproducibility is a simple premise in theory, but is difficult to achieve in practice. Building upon past efforts and proposals to maximize reproducibility and rigor in bioinformatics, we present a framework called the five pillars of reproducible computational research. These include (1) literate programming, (2) code version control and sharing, (3) compute environment control, (4) persistent data sharing and (5) documentation. These practices will ensure that computational research work can be reproduced quickly and easily, long into the future. This guide is designed for bioinformatics data analysts and bioinformaticians in training, but should be relevant to other domains of study.
Collapse
Affiliation(s)
- Mark Ziemann
- Deakin University, School of Life and Environmental Sciences, Geelong, Australia
- Burnet Institute, Melbourne, Australia
| | - Pierre Poulain
- Université Paris Cité, CNRS, Institut Jacques Monod, Paris, France
| | - Anusuiya Bora
- Deakin University, School of Life and Environmental Sciences, Geelong, Australia
| |
Collapse
|
18
|
Schmied C, Nelson MS, Avilov S, Bakker GJ, Bertocchi C, Bischof J, Boehm U, Brocher J, Carvalho M, Chiritescu C, Christopher J, Cimini BA, Conde-Sousa E, Ebner M, Ecker R, Eliceiri K, Fernandez-Rodriguez J, Gaudreault N, Gelman L, Grunwald D, Gu T, Halidi N, Hammer M, Hartley M, Held M, Jug F, Kapoor V, Koksoy AA, Lacoste J, Dévédec SL, Guyader SL, Liu P, Martins GG, Mathur A, Miura K, Montero Llopis P, Nitschke R, North A, Parslow AC, Payne-Dwyer A, Plantard L, Ali R, Schroth-Diez B, Schütz L, Scott RT, Seitz A, Selchow O, Sharma VP, Spitaler M, Srinivasan S, Strambio-De-Castillia C, Taatjes D, Tischer C, Jambor HK. Community-developed checklists for publishing images and image analyses. ARXIV 2023:arXiv:2302.07005v2. [PMID: 36824427 PMCID: PMC9949169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Images document scientific discoveries and are prevalent in modern biomedical research. Microscopy imaging in particular is currently undergoing rapid technological advancements. However for scientists wishing to publish the obtained images and image analyses results, there are to date no unified guidelines. Consequently, microscopy images and image data in publications may be unclear or difficult to interpret. Here we present community-developed checklists for preparing light microscopy images and image analysis for publications. These checklists offer authors, readers, and publishers key recommendations for image formatting and annotation, color selection, data availability, and for reporting image analysis workflows. The goal of our guidelines is to increase the clarity and reproducibility of image figures and thereby heighten the quality and explanatory power of microscopy data is in publications.
Collapse
Affiliation(s)
- Christopher Schmied
- Fondazione Human Technopole, Viale Rita Levi-Montalcini 1, 20157 Milano, Italy
- Present address: Leibniz-Forschungsinstitut für Molekulare Pharmakologie (FMP), Robert-Rössle-Str. 10, 13125 Berlin, Germany
| | - Michael S Nelson
- Department of Biomedical Engineering, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Sergiy Avilov
- Max Planck Institute of Immunobiology and Epigenetics, 79108 Freiburg, Germany
| | - Gert-Jan Bakker
- Medical BioSciences department, Radboud University Medical Centre, Nijmegen, Netherlands
| | - Cristina Bertocchi
- Laboratory for Molecular mechanics of cell adhesions, Pontificia Universidad Católica de Chile Santiago
- Osaka University, Graduate School of Engineering Science, Japan
| | - Johanna Bischof
- Euro-BioImaging ERIC, Bio-Hub, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Ulrike Boehm
- Carl Zeiss AG, Carl-Zeiss-Straße 22, 73447 Oberkochen, Germany
| | - Jan Brocher
- BioVoxxel, Scientific Image Processing and Analysis, Eugen-Roth-Strasse 8, 67071 Ludwigshafen, Germany
| | - Mariana Carvalho
- Nanophotonics and BioImaging Facility at INL, International Iberian Nanotechnology Laboratory, 4715-330, Portugal
| | | | | | - Beth A Cimini
- Imaging Platform, Broad Institute, Cambridge, MA 02142
| | - Eduardo Conde-Sousa
- i3S, Instituto de Investigação e Inovação Em Saúde and INEB, Instituto de Engenharia Biomédica, Universidade do Porto, Porto, Portugal
| | - Michael Ebner
- Fondazione Human Technopole, Viale Rita Levi-Montalcini 1, 20157 Milano, Italy
| | - Rupert Ecker
- Translational Research Institute, Queensland University of Technology, 37 Kent Street, Woolloongabba, QLD 4102, Australia
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology, Brisbane, QLD 4059, Australia
- TissueGnostics GmbH, 1020 Vienna, Austria
| | - Kevin Eliceiri
- Department of Medical Physics and Biomedical Engineering, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | | | | | - Laurent Gelman
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
| | - David Grunwald
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | | | - Nadia Halidi
- Advanced Light Microscopy Unit, Centre for Genomic Regulation, Barcelona, Spain
| | - Mathias Hammer
- RNA Therapeutics Institute, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Matthew Hartley
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Marie Held
- Centre for Cell Imaging, The University of Liverpool, UK
| | - Florian Jug
- Fondazione Human Technopole, Viale Rita Levi-Montalcini 1, 20157 Milano, Italy
| | - Varun Kapoor
- Department of AI research, Kapoor Labs, Paris, 75005, France
| | | | | | - Sylvia Le Dévédec
- Division of Drug Discovery and Safety, Cell Observatory, Leiden Academic Centre for Drug Research, Leiden University, 2333 CC Leiden, The Netherlands
| | | | - Penghuan Liu
- Key Laboratory for Modern Measurement Technology and Instruments of Zhejiang Province, College of Optical and Electronic Technology, China Jiliang University, Hangzhou, China
| | - Gabriel G Martins
- Advanced Imaging Facility, Instituto Gulbenkian de Ciência, Oeiras 2780-156 - Portugal
| | - Aastha Mathur
- Euro-BioImaging ERIC, Bio-Hub, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Kota Miura
- Bioimage Analysis & Research, 69127 Heidelberg/Germany
| | | | - Roland Nitschke
- Life Imaging Center, Signalling Research Centres CIBSS and BIOSS, University of Freiburg, Germany
| | - Alison North
- Bio-Imaging Resource Center, The Rockefeller University, New York, NY USA
| | - Adam C Parslow
- Baker Institute Microscopy Platform, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
| | - Alex Payne-Dwyer
- School of Physics, Engineering and Technology, University of York, Heslington, YO10 5DD, UK
| | - Laure Plantard
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
| | - Rizwan Ali
- King Abdullah International Medical Research Center (KAIMRC), Medical Research Core Facility and Platforms (MRCFP), King Saud bin Abdulaziz University for Health Sciences (KSAU-HS), Ministry of National Guard Health Affairs (MNGHA), Riyadh 11481, Saudi Arabia
| | - Britta Schroth-Diez
- Light Microscopy Facility, Max Planck Institute of Molecular Cell Biology and Genetics Dresden, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | - Lucas Schütz
- ariadne.ai (Germany) GmbH, 69115 Heidelberg, Germany
| | - Ryan T Scott
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA, 94035, USA
| | - Arne Seitz
- BioImaging & Optics Platform (BIOP), Ecole Polytechnique Fédérale de Lausanne (EPFL), Faculty of Life sciences (SV), CH-1015 Lausanne
| | - Olaf Selchow
- Microscopy & BioImaging Consulting, Image Processing & Large Data Handling, Tobias-Hoppe-Strassse 3, 07548 Gera, Germany
| | - Ved P Sharma
- Bio-Imaging Resource Center, The Rockefeller University, New York, NY USA
| | - Martin Spitaler
- Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried, Germany
| | - Sathya Srinivasan
- Imaging and Morphology Support Core, Oregon National Primate Research Center - (ONPRC - OHSU West Campus), Beaverton, Oregon 97006, USA
| | | | - Douglas Taatjes
- Department of Pathology and Laboratory Medicine, Microscopy Imaging Center (RRID# SCR_018821), Center for Biomedical Shared Resources, University of Vermont, Burlington, VT 05405 USA
| | - Christian Tischer
- Centre for Bioimage Analysis, EMBL Heidelberg, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Helena Klara Jambor
- NCT-UCC, Medizinische Fakultät TU Dresden, Fetscherstrasse 105, 01307 Dresden/Germany
| |
Collapse
|
19
|
Weisbart E, Tromans-Coia C, Diaz-Rohrer B, Stirling DR, Garcia-Fossa F, Senft RA, Hiner MC, de Jesus MB, Eliceiri KW, Cimini BA. CellProfiler plugins - An easy image analysis platform integration for containers and Python tools. J Microsc 2023. [PMID: 37690102 PMCID: PMC10924770 DOI: 10.1111/jmi.13223] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 08/10/2023] [Accepted: 09/05/2023] [Indexed: 09/12/2023]
Abstract
CellProfiler is a widely used software for creating reproducible, reusable image analysis workflows without needing to code. In addition to the >90 modules that make up the main CellProfiler program, CellProfiler has a plugins system that allows for the creation of new modules which integrate with other Python tools or tools that are packaged in software containers. The CellProfiler-plugins repository contains a number of these CellProfiler modules, especially modules that are experimental and/or dependency-heavy. Here, we present an upgraded CellProfiler-plugins repository, an example of accessing containerised tools, improved documentation and added citation/reference tools to facilitate the use and contribution of the community.
Collapse
Affiliation(s)
- Erin Weisbart
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Callum Tromans-Coia
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Barbara Diaz-Rohrer
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - David R Stirling
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Fernanda Garcia-Fossa
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Biochemistry and Tissue Biology, Institute of Biology, University of Campinas, Campinas, São Paulo, Brazil
| | - Rebecca A Senft
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Mark C Hiner
- Center for Quantitative Cell Imaging, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Marcelo B de Jesus
- Department of Biochemistry and Tissue Biology, Institute of Biology, University of Campinas, Campinas, São Paulo, Brazil
| | - Kevin W Eliceiri
- Center for Quantitative Cell Imaging, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Beth A Cimini
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
20
|
Weisbart E, Tromans-Coia C, Diaz-Rohrer B, Stirling DR, Garcia-Fossa F, Senft RA, Hiner MC, de Jesus MB, Eliceiri KW, Cimini BA. CellProfiler plugins -- an easy image analysis platform integration for containers and Python tools. ARXIV 2023:arXiv:2306.01915v2. [PMID: 37645041 PMCID: PMC10462170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
CellProfiler is a widely used software for creating reproducible, reusable image analysis workflows without needing to code. In addition to the >90 modules that make up the main CellProfiler program, CellProfiler has a plugins system that allows for the creation of new modules which integrate with other Python tools or tools that are packaged in software containers. The CellProfiler-plugins repository contains a number of these CellProfiler modules, especially modules that are experimental and/or dependency-heavy. Here, we present an upgraded CellProfiler-plugins repository, an example of accessing containerized tools, improved documentation, and added citation/reference tools to facilitate the use and contribution of the community.
Collapse
Affiliation(s)
- Erin Weisbart
- Broad Institute of MIT and Harvard, Cambridge MA, USA
| | | | | | | | - Fernanda Garcia-Fossa
- Broad Institute of MIT and Harvard, Cambridge MA, USA
- Department of Biochemistry and Tissue Biology, Institute of Biology, University of Campinas, Campinas, São Paulo, Brazil
| | | | - Mark C Hiner
- University of Wisconsin-Madison, Madison, WI, USA
| | - Marcelo B de Jesus
- Department of Biochemistry and Tissue Biology, Institute of Biology, University of Campinas, Campinas, São Paulo, Brazil
| | | | - Beth A Cimini
- Broad Institute of MIT and Harvard, Cambridge MA, USA
| |
Collapse
|
21
|
Weisbart E, Cimini BA. Distributed-Something: scripts to leverage AWS storage and computing for distributed workflows at scale. Nat Methods 2023; 20:1120-1121. [PMID: 37277559 PMCID: PMC10594640 DOI: 10.1038/s41592-023-01918-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Affiliation(s)
- Erin Weisbart
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Beth A Cimini
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
22
|
Bryce-Smith S, Burri D, Gazzara MR, Herrmann CJ, Danecka W, Fitzsimmons CM, Wan YK, Zhuang F, Fansler MM, Fernández JM, Ferret M, Gonzalez-Uriarte A, Haynes S, Herdman C, Kanitz A, Katsantoni M, Marini F, McDonnel E, Nicolet B, Poon CL, Rot G, Schärfen L, Wu PJ, Yoon Y, Barash Y, Zavolan M. Extensible benchmarking of methods that identify and quantify polyadenylation sites from RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.23.546284. [PMID: 37425672 PMCID: PMC10327023 DOI: 10.1101/2023.06.23.546284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, and limitations and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasing need for benchmarks, and for the provision of infrastructure for continuous method evaluation. APAeval is an international community effort, organized by the RNA Society in 2021, to benchmark tools for the identification and quantification of the usage of alternative polyadenylation (APA) sites from short-read, bulk RNA-sequencing (RNA-seq) data. Here, we reviewed 17 tools and benchmarked eight on their ability to perform APA identification and quantification, using a comprehensive set of RNA-seq experiments comprising real, synthetic, and matched 3'-end sequencing data. To support continuous benchmarking, we have incorporated the results into the OpenEBench online platform, which allows for seamless extension of the set of methods, metrics, and challenges. We envisage that our analyses will assist researchers in selecting the appropriate tools for their studies. Furthermore, the containers and reproducible workflows generated in the course of this project can be seamlessly deployed and extended in the future to evaluate new methods or datasets.
Collapse
Affiliation(s)
- Sam Bryce-Smith
- UCL Queen Square Motor Neuron Disease Centre, Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Dominik Burri
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Matthew R. Gazzara
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Christina J. Herrmann
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Weronika Danecka
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom
| | - Christina M. Fitzsimmons
- Laboratory of Cell Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore, Buona Vista, Singapore
- National University of Singapore, Kent Ridge, Singapore
| | - Farica Zhuang
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, USA
| | - Mervin M. Fansler
- Tri-Institutional Program in Computational Biology and Medicine, Weill Cornell GraduateStudies, New York, NY, USA
- Cancer Biology and Genetics, Sloan-Kettering Institute, MSKCC, New York, NY, USA
| | - José M. Fernández
- Barcelona Supercomputing Center, Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES)
| | - Meritxell Ferret
- Barcelona Supercomputing Center, Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES)
| | - Asier Gonzalez-Uriarte
- Barcelona Supercomputing Center, Barcelona, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES)
| | - Samuel Haynes
- Institute for Cell Biology, School of Biological Sciences, The University of Edinburgh, Edinburgh, United Kingdom
| | | | - Alexander Kanitz
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Maria Katsantoni
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Federico Marini
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI) - UniversityMedical Center of the Johannes Gutenberg, University Mainz, Germany
| | - Euan McDonnel
- Leeds Institute for Data Analytics, School of Molecular and Cellular Biology, University of Leeds, United Kingdom
| | - Ben Nicolet
- Department of Hematopoiesis, Sanquin Research, Landsteiner Laboratory, AmsterdamUMC, University of Amsterdam, and Oncode Institute, Amsterdam, The Netherlands
| | | | - Gregor Rot
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Life Sciences, Zurich, Switzerland
| | - Leonard Schärfen
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven CT, USA
| | - Pin-Jou Wu
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, Germany
| | - Yoseop Yoon
- Department of Microbiology and Molecular Genetics, School of Medicine, University of California Irvine, Irvine, California, USA
| | - Yoseph Barash
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
- Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, USA
| | - Mihaela Zavolan
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
23
|
Renton AI, Dao TT, Johnstone T, Civier O, Sullivan RP, White DJ, Lyons P, Slade BM, Abbott DF, Amos TJ, Bollmann S, Botting A, Campbell MEJ, Chang J, Close TG, Eckstein K, Egan GF, Evas S, Flandin G, Garner KG, Garrido MI, Ghosh SS, Grignard M, Hannan AJ, Huber R, Kaczmarzyk JR, Kasper L, Kuhlmann L, Lou K, Mantilla-Ramos YJ, Mattingley JB, Morris J, Narayanan A, Pestilli F, Puce A, Ribeiro FL, Rogasch NC, Rorden C, Schira M, Shaw TB, Sowman PF, Spitz G, Stewart A, Ye X, Zhu JD, Hughes ME, Narayanan A, Bollmann S. Neurodesk: An accessible, flexible, and portable data analysis environment for reproducible neuroimaging. RESEARCH SQUARE 2023:rs.3.rs-2649734. [PMID: 36993557 PMCID: PMC10055538 DOI: 10.21203/rs.3.rs-2649734/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Neuroimaging data analysis often requires purpose-built software, which can be challenging to install and may produce different results across computing environments. Beyond being a roadblock to neuroscientists, these issues of accessibility and portability can hamper the reproducibility of neuroimaging data analysis pipelines. Here, we introduce the Neurodesk platform, which harnesses software containers to support a comprehensive and growing suite of neuroimaging software (https://www.neurodesk.org/). Neurodesk includes a browser-accessible virtual desktop environment and a command line interface, mediating access to containerized neuroimaging software libraries on various computing platforms, including personal and high-performance computers, cloud computing and Jupyter Notebooks. This community-oriented, open-source platform enables a paradigm shift for neuroimaging data analysis, allowing for accessible, flexible, fully reproducible, and portable data analysis pipelines.
Collapse
Affiliation(s)
- Angela I. Renton
- The University of Queensland, Queensland Brain Institute, St Lucia 4072, Australia
- The University of Queensland, School of Information Technology and Electrical Engineering, St Lucia 4072, Australia
| | - Thuy T. Dao
- The University of Queensland, School of Information Technology and Electrical Engineering, St Lucia 4072, Australia
| | - Tom Johnstone
- Centre for Mental Health & Brain Sciences, Swinburne University of Technology, Hawthorn 3122, Australia
| | - Oren Civier
- Centre for Mental Health & Brain Sciences, Swinburne University of Technology, Hawthorn 3122, Australia
| | - Ryan P. Sullivan
- The University of Sydney, School of Biomedical Engineering, Sydney, Australia
| | - David J. White
- Centre for Mental Health & Brain Sciences, Swinburne University of Technology, Hawthorn 3122, Australia
| | - Paris Lyons
- Centre for Mental Health & Brain Sciences, Swinburne University of Technology, Hawthorn 3122, Australia
| | - Benjamin M. Slade
- Centre for Mental Health & Brain Sciences, Swinburne University of Technology, Hawthorn 3122, Australia
| | - David F. Abbott
- The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Victoria, Australia
| | - Toluwani J. Amos
- School of Life Science and Technology, University of Electronic Science and Technology, China
| | - Saskia Bollmann
- The University of Queensland, School of Information Technology and Electrical Engineering, St Lucia 4072, Australia
| | - Andy Botting
- Australian Research Data Commons (ARDC), Australia
| | - Megan E. J. Campbell
- School of Psychological Sciences, University of Newcastle, Australia
- Hunter Medical Research Institute Imaging Centre, Newcastle, Australia
| | - Jeryn Chang
- The University of Queensland, School of Biomedical Sciences, St Lucia 4072, Australia
| | - Thomas G. Close
- The University of Sydney, School of Biomedical Engineering, Sydney, Australia
| | - Korbinian Eckstein
- The University of Queensland, School of Information Technology and Electrical Engineering, St Lucia 4072, Australia
| | - Gary F. Egan
- The Turner Institute for Brain and Mental Health, School of Psychological Sciences, Monash University, Victoria, Australia
- Monash Biomedical Imaging, Monash University, Victoria, Australia
| | - Stefanie Evas
- School of Psychology, University of Adelaide, Adelaide, 5000, Australia
- Human Health, Health & Biosecurity, CSIRO, Adelaide, 5000, Australia
| | - Guillaume Flandin
- Wellcome Centre for Human Neuroimaging, University College London, London, UK
| | - Kelly G. Garner
- The University of Queensland, Queensland Brain Institute, St Lucia 4072, Australia
- The University of Queensland, School of Psychology, St Lucia 4072, Australia
| | - Marta I. Garrido
- Melbourne School of Psychological Sciences, The University of Melbourne
- Graeme Clark Institute for Biomedical Engineering, The University of Melbourne
| | - Satrajit S. Ghosh
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Otolaryngology - Head and Neck Surgery, Harvard Medical School, Boston, MA, USA
| | - Martin Grignard
- GIGA CRC In-Vivo Imaging, University of Liege, Liege, Belgium
| | - Anthony J. Hannan
- The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Victoria, Australia
| | - Renzo Huber
- Functional Magnetic Resonance Imaging Core Facility (FMRIF), National Institute of Mental Health (NIMH), USA
| | - Jakub R. Kaczmarzyk
- Medical Scientist Training Program, Stony Brook University, Stony Brook, NY, United States of America
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, United States of America
| | - Lars Kasper
- Techna Institute, University Health Network, Toronto, Canada
| | - Levin Kuhlmann
- Department of Data Science and AI, Faculty of Information Technology, Monash University, Clayton VIC 3800, Australia
| | - Kexin Lou
- The University of Queensland, School of Information Technology and Electrical Engineering, St Lucia 4072, Australia
| | | | - Jason B. Mattingley
- The University of Queensland, Queensland Brain Institute, St Lucia 4072, Australia
- The University of Queensland, School of Psychology, St Lucia 4072, Australia
| | - Jo Morris
- Australian Research Data Commons (ARDC), Australia
| | | | - Franco Pestilli
- Department of Psychology, Center for Perceptual Systems, Center for Theoretical and Computational Neuroscience, Center on Aging and Population Sciences, Center for Learning and Memory, The University of Texas at Austin, 108 E Dean Keeton St, Austin, TX 78712, USA
| | - Aina Puce
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, USA
| | - Fernanda L. Ribeiro
- The University of Queensland, School of Information Technology and Electrical Engineering, St Lucia 4072, Australia
| | - Nigel C. Rogasch
- Discipline of Psychiatry, Adelaide Medical School, University of Adelaide, Australia
- Hopwood Centre for Neurobiology, Lifelong Health Theme, South Australian Health and Medical Research Institute (SAHMRI), Adelaide, SA, Australia
- The Turner Institute for Brain and Mental Health, School of Psychological Sciences, Monash University, Victoria, Australia
| | - Chris Rorden
- McCausland Center for Brain Imaging, Department of Psychology, University of South Carolina, Columbia SC, 29208, USA
| | - Mark Schira
- School of Psychology, University of Wollongong, Wollongong, 2522, Australia
| | - Thomas B. Shaw
- The University of Queensland, School of Information Technology and Electrical Engineering, St Lucia 4072, Australia
- The University of Queensland, Centre for Advanced Imaging, St Lucia 4072, Australia
- Department of Neurology, Royal Brisbane and Women's Hospital, Brisbane, Australia
| | - Paul F. Sowman
- Macquarie University, School of Psychological Sciences, North Ryde 2112, Australia
| | - Gershon Spitz
- Department of Neuroscience, Central Clinical School, Faculty of Medicine, Nursing and Health Sciences, Monash University, Australia
- Monash-Epworth Rehabilitation Research Centre, Turner Institute for Brain and Mental Health, School of Psychological Sciences, Monash University, Clayton, 3168, Australia
| | - Ashley Stewart
- The University of Queensland, School of Information Technology and Electrical Engineering, St Lucia 4072, Australia
| | - Xincheng Ye
- The University of Queensland, School of Information Technology and Electrical Engineering, St Lucia 4072, Australia
| | - Judy D. Zhu
- Macquarie University, School of Psychological Sciences, North Ryde 2112, Australia
| | - Matthew E. Hughes
- Centre for Mental Health & Brain Sciences, Swinburne University of Technology, Hawthorn 3122, Australia
| | - Aswin Narayanan
- The University of Queensland, Centre for Advanced Imaging, St Lucia 4072, Australia
| | - Steffen Bollmann
- The University of Queensland, School of Information Technology and Electrical Engineering, St Lucia 4072, Australia
| |
Collapse
|
24
|
Djaffardjy M, Marchment G, Sebe C, Blanchet R, Bellajhame K, Gaignard A, Lemoine F, Cohen-Boulakia S. Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems. Comput Struct Biotechnol J 2023; 21:2075-2085. [PMID: 36968012 PMCID: PMC10030817 DOI: 10.1016/j.csbj.2023.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 03/03/2023] [Accepted: 03/03/2023] [Indexed: 03/09/2023] Open
Abstract
Data analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a single user, it is now widely recognized that they are by no means enough to support the development of large-scale, shareable, maintainable and reusable pipelines capable of handling large volumes of data and running on high performance computing clusters. This review outlines the key requirements for building large-scale data pipelines and provides a mapping of existing solutions that fulfill them. We then highlight the benefits of using scientific workflow systems to get modular, reproducible and reusable bioinformatics data analysis pipelines. We finally discuss current workflow reuse practices based on an empirical study we performed on a large collection of workflows.
Collapse
|
25
|
Contrasting Genetic Diversity of Listeria Pathogenicity Islands 3 and 4 Harbored by Nonpathogenic Listeria spp. Appl Environ Microbiol 2023; 89:e0209722. [PMID: 36728444 PMCID: PMC9973017 DOI: 10.1128/aem.02097-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Listeria monocytogenes causes the severe foodborne disease listeriosis. Several clonal groups of L. monocytogenes possess the pathogenicity islands Listeria pathogenicity island 3 (LIPI-3) and LIPI-4. Here, we investigated the prevalence and genetic diversity of LIPI-3 and LIPI-4 among 63 strains of seven nonpathogenic Listeria spp. from the natural environment, i.e., wildlife (black bears [Ursus americanus]) and surface water. Analysis of the whole-genome sequence data suggested that both islands were horizontally acquired but differed considerably in their incidence and genetic diversity. LIPI-3 was identified among half of the L. innocua strains in the same genomic location as in L. monocytogenes (guaA hot spot) in a truncated form, with only three strains harboring full-length LIPI-3, and a highly divergent partial LIPI-3 was observed in three Listeria seeligeri strains, outside the guaA hot spot. Premature stop codons (PMSCs) and frameshifts were frequently noted in the LIPI-3 gene encoding listeriolysin S. On the other hand, full-length LIPI-4 without any PMSCs was found in all Listeria innocua strains, in the same genomic location as L. monocytogenes and with ~85% similarity to the L. monocytogenes counterpart. Our study provides intriguing examples of genetic changes that pathogenicity islands may undergo in nonpathogenic bacterial species, potentially in response to environmental pressures that promote either maintenance or degeneration of the islands. Investigations of the roles that LIPI-3 and LIPI-4 play in nonpathogenic Listeria spp. are warranted to further understand the differential evolution of genetic elements in pathogenic versus nonpathogenic hosts of the same genus. IMPORTANCE Listeria monocytogenes is a serious foodborne pathogen that can harbor the pathogenicity islands Listeria pathogenicity island 3 (LIPI-3) and LIPI-4. Intriguingly, these have also been reported in nonpathogenic L. innocua from food and farm environments, though limited information is available for strains from the natural environment. Here, we analyzed whole-genome sequence data of nonpathogenic Listeria spp. from wildlife and surface water to further elucidate the genetic diversity and evolution of LIPI-3 and LIPI-4 in Listeria. While the full-length islands were found only in L. innocua, LIPI-3 was uncommon and exhibited frequent truncation and genetic diversification, while LIPI-4 was remarkable in being ubiquitous, albeit diversified from L. monocytogenes. These contrasting features demonstrate that pathogenicity islands in nonpathogenic hosts can evolve along different trajectories, leading to either degeneration or maintenance, and highlight the need to examine their physiological roles in nonpathogenic hosts.
Collapse
|
26
|
Bemis KA, Föll MC, Guo D, Lakkimsetty SS, Vitek O. Cardinal v3 - a versatile open source software for mass spectrometry imaging analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.20.529280. [PMID: 36865170 PMCID: PMC9980127 DOI: 10.1101/2023.02.20.529280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Cardinal v3 is an open source software for reproducible analysis of mass spectrometry imaging experiments. A major update from its previous versions, Cardinal v3 supports most mass spectrometry imaging workflows. Its analytical capabilities include advanced data processing such as mass re-calibration, advanced statistical analyses such as single-ion segmentation and rough annotation-based classification, and memory-efficient analyses of large-scale multi-tissue experiments.
Collapse
Affiliation(s)
- Kylie Ariel Bemis
- Khoury College of Computer Sciences, Northeastern University, Boston, USA
| | - Melanie Christine Föll
- Khoury College of Computer Sciences, Northeastern University, Boston, USA
- Institute of Surgical Pathology, Medical Center – University of Freiburg, Faculty of Medicine, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Dan Guo
- Khoury College of Computer Sciences, Northeastern University, Boston, USA
| | | | - Olga Vitek
- Khoury College of Computer Sciences, Northeastern University, Boston, USA
| |
Collapse
|
27
|
Deutsch EW, Mendoza L, Shteynberg DD, Hoopmann MR, Sun Z, Eng JK, Moritz RL. Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite. J Proteome Res 2023; 22:615-624. [PMID: 36648445 PMCID: PMC10166710 DOI: 10.1021/acs.jproteome.2c00624] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The Trans-Proteomic Pipeline (TPP) mass spectrometry data analysis suite has been in continual development and refinement since its first tools, PeptideProphet and ProteinProphet, were published 20 years ago. The current release provides a large complement of tools for spectrum processing, spectrum searching, search validation, abundance computation, protein inference, and more. Many of the tools include machine-learning modeling to extract the most information from data sets and build robust statistical models to compute the probabilities that derived information is correct. Here we present the latest information on the many TPP tools, and how TPP can be deployed on various platforms from personal Windows laptops to Linux clusters and expansive cloud computing environments. We describe tutorials on how to use TPP in a variety of ways and describe synergistic projects that leverage TPP. We conclude with plans for continued development of TPP.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | | | | | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
28
|
Debrie E, Malfait M, Gabriels R, Declerq A, Sticker A, Martens L, Clement L. Quality Control for the Target Decoy Approach for Peptide Identification. J Proteome Res 2023; 22:350-358. [PMID: 36648107 DOI: 10.1021/acs.jproteome.2c00423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Reliable peptide identification is key in mass spectrometry (MS) based proteomics. To this end, the target decoy approach (TDA) has become the cornerstone for extracting a set of reliable peptide-to-spectrum matches (PSMs) that will be used in downstream analysis. Indeed, TDA is now the default method to estimate the false discovery rate (FDR) for a given set of PSMs, and users typically view it as a universal solution for assessing the FDR in the peptide identification step. However, the TDA also relies on a minimal set of assumptions, which are typically never verified in practice. We argue that a violation of these assumptions can lead to poor FDR control, which can be detrimental to any downstream data analysis. We here therefore first clearly spell out these TDA assumptions, and introduce TargetDecoy, a Bioconductor package with all the necessary functionality to control the TDA quality and its underlying assumptions for a given set of PSMs.
Collapse
Affiliation(s)
- Elke Debrie
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000Ghent, Belgium
| | - Milan Malfait
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000Ghent, Belgium.,Statistics and Decision Sciences, Janssen Pharmaceutical Companies of Johnson and Johnson, 2340Beerse, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9052Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000Ghent, Belgium
| | - Arthur Declerq
- VIB-UGent Center for Medical Biotechnology, VIB, 9052Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000Ghent, Belgium
| | - Adriaan Sticker
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000Ghent, Belgium.,VIB-UGent Center for Medical Biotechnology, VIB, 9052Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9052Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000Ghent, Belgium
| | - Lieven Clement
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000Ghent, Belgium
| |
Collapse
|
29
|
Manconi A, Gnocchi M, Milanesi L, Marullo O, Armano G. Framing Apache Spark in life sciences. Heliyon 2023; 9:e13368. [PMID: 36852030 PMCID: PMC9958288 DOI: 10.1016/j.heliyon.2023.e13368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 01/19/2023] [Accepted: 01/29/2023] [Indexed: 02/11/2023] Open
Abstract
Advances in high-throughput and digital technologies have required the adoption of big data for handling complex tasks in life sciences. However, the drift to big data led researchers to face technical and infrastructural challenges for storing, sharing, and analysing them. In fact, this kind of tasks requires distributed computing systems and algorithms able to ensure efficient processing. Cutting edge distributed programming frameworks allow to implement flexible algorithms able to adapt the computation to the data over on-premise HPC clusters or cloud architectures. In this context, Apache Spark is a very powerful HPC engine for large-scale data processing on clusters. Also thanks to specialised libraries for working with structured and relational data, it allows to support machine learning, graph-based computation, and stream processing. This review article is aimed at helping life sciences researchers to ascertain the features of Apache Spark and to assess whether it can be successfully used in their research activities.
Collapse
Affiliation(s)
- Andrea Manconi
- Institute of Biomedical Technologies - National Research Council of Italy, Segrate (Mi), Italy
| | - Matteo Gnocchi
- Institute of Biomedical Technologies - National Research Council of Italy, Segrate (Mi), Italy
| | - Luciano Milanesi
- Institute of Biomedical Technologies - National Research Council of Italy, Segrate (Mi), Italy
| | - Osvaldo Marullo
- Department of Mathematics and Computer science - University of Cagliari, Cagliari, Italy
| | - Giuliano Armano
- Department of Mathematics and Computer science - University of Cagliari, Cagliari, Italy
| |
Collapse
|
30
|
Schackart KE, Graham JB, Ponsero AJ, Hurwitz BL. Evaluation of computational phage detection tools for metagenomic datasets. Front Microbiol 2023; 14:1078760. [PMID: 36760501 PMCID: PMC9902911 DOI: 10.3389/fmicb.2023.1078760] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 01/09/2023] [Indexed: 01/25/2023] Open
Abstract
Introduction As new computational tools for detecting phage in metagenomes are being rapidly developed, a critical need has emerged to develop systematic benchmarks. Methods In this study, we surveyed 19 metagenomic phage detection tools, 9 of which could be installed and run at scale. Those 9 tools were assessed on several benchmark challenges. Fragmented reference genomes are used to assess the effects of fragment length, low viral content, phage taxonomy, robustness to eukaryotic contamination, and computational resource usage. Simulated metagenomes are used to assess the effects of sequencing and assembly quality on the tool performances. Finally, real human gut metagenomes and viromes are used to assess the differences and similarities in the phage communities predicted by the tools. Results We find that the various tools yield strikingly different results. Generally, tools that use a homology approach (VirSorter, MARVEL, viralVerify, VIBRANT, and VirSorter2) demonstrate low false positive rates and robustness to eukaryotic contamination. Conversely, tools that use a sequence composition approach (VirFinder, DeepVirFinder, Seeker), and MetaPhinder, have higher sensitivity, including to phages with less representation in reference databases. These differences led to widely differing predicted phage communities in human gut metagenomes, with nearly 80% of contigs being marked as phage by at least one tool and a maximum overlap of 38.8% between any two tools. While the results were more consistent among the tools on viromes, the differences in results were still significant, with a maximum overlap of 60.65%. Discussion: Importantly, the benchmark datasets developed in this study are publicly available and reusable to enable the future comparability of new tools developed.
Collapse
Affiliation(s)
- Kenneth E. Schackart
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
| | - Jessica B. Graham
- BIO5 Institute, The University of Arizona, Tucson, AZ, United States
| | - Alise J. Ponsero
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
- BIO5 Institute, The University of Arizona, Tucson, AZ, United States
- Human Microbiome Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Bonnie L. Hurwitz
- Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States
- BIO5 Institute, The University of Arizona, Tucson, AZ, United States
| |
Collapse
|
31
|
Fahlgren N, Kapoor M, Yordanova G, Papatheodorou I, Waese J, Cole B, Harrison P, Ware D, Tickle T, Paten B, Burdett T, Elsik CG, Tuggle CK, Provart NJ. Toward a data infrastructure for the Plant Cell Atlas. PLANT PHYSIOLOGY 2023; 191:35-46. [PMID: 36200899 PMCID: PMC9806565 DOI: 10.1093/plphys/kiac468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/18/2022] [Indexed: 06/16/2023]
Abstract
We review how a data infrastructure for the Plant Cell Atlas might be built using existing infrastructure and platforms. The Human Cell Atlas has developed an extensive infrastructure for human and mouse single cell data, while the European Bioinformatics Institute has developed a Single Cell Expression Atlas, that currently houses several plant data sets. We discuss issues related to appropriate ontologies for describing a plant single cell experiment. We imagine how such an infrastructure will enable biologists and data scientists to glean new insights into plant biology in the coming decades, as long as such data are made accessible to the community in an open manner.
Collapse
Affiliation(s)
- Noah Fahlgren
- Donald Danforth Plant Science Center, Saint Louis, Missouri 63132, USA
| | - Muskan Kapoor
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | | | | | - Jamie Waese
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, 1, Cyclotron Road, Berkeley, California 94720, USA
| | - Peter Harrison
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, New York 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Ithaca, New York 14853, USA
| | - Timothy Tickle
- Data Sciences Platform, The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, Massachusetts 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, Baskin School of Engineering, 1156 High Street, Santa Cruz, California 95064, USA
| | - Tony Burdett
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine G Elsik
- Division of Animal Sciences/Division of Plant Science & Technology/Institute for Data Science & Informatics, University of Missouri, Columbia, Missouri 65211, USA
| | - Christopher K Tuggle
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | - Nicholas J Provart
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| |
Collapse
|
32
|
Mendes de Farias T, Wollbrett J, Robinson-Rechavi M, Bastian F. Lessons learned to boost a bioinformatics knowledge base reusability, the Bgee experience. Gigascience 2022; 12:giad058. [PMID: 37589308 PMCID: PMC10433096 DOI: 10.1093/gigascience/giad058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/30/2023] [Accepted: 07/07/2023] [Indexed: 08/18/2023] Open
Abstract
BACKGROUND Enhancing interoperability of bioinformatics knowledge bases is a high-priority requirement to maximize data reusability and thus increase their utility such as the return on investment for biomedical research. A knowledge base may provide useful information for life scientists and other knowledge bases, but it only acquires exchange value once the knowledge base is (re)used, and without interoperability, the utility lies dormant. RESULTS In this article, we discuss several approaches to boost interoperability depending on the interoperable parts. The findings are driven by several real-world scenario examples that were mostly implemented by Bgee, a well-established gene expression knowledge base. To better justify the findings are transferable, for each Bgee interoperability experience, we also highlight similar implementations by major bioinformatics knowledge bases. Moreover, we discuss ten general main lessons learned. These lessons can be applied in the context of any bioinformatics knowledge base to foster data reusability. CONCLUSIONS This work provides pragmatic methods and transferable skills to promote reusability of bioinformatics knowledge bases by focusing on interoperability.
Collapse
Affiliation(s)
- Tarcisio Mendes de Farias
- SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| | - Julien Wollbrett
- SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| | - Marc Robinson-Rechavi
- SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| | - Frederic Bastian
- SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| |
Collapse
|
33
|
Suetake H, Fukusato T, Igarashi T, Ohta T. Workflow sharing with automated metadata validation and test execution to improve the reusability of published workflows. Gigascience 2022; 12:giad006. [PMID: 36810800 PMCID: PMC9944229 DOI: 10.1093/gigascience/giad006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 10/21/2022] [Accepted: 01/30/2023] [Indexed: 02/24/2023] Open
Abstract
BACKGROUND Many open-source workflow systems have made bioinformatics data analysis procedures portable. Sharing these workflows provides researchers easy access to high-quality analysis methods without the requirement of computational expertise. However, published workflows are not always guaranteed to be reliably reusable. Therefore, a system is needed to lower the cost of sharing workflows in a reusable form. RESULTS We introduce Yevis, a system to build a workflow registry that automatically validates and tests workflows to be published. The validation and test are based on the requirements we defined for a workflow being reusable with confidence. Yevis runs on GitHub and Zenodo and allows workflow hosting without the need of dedicated computing resources. A Yevis registry accepts workflow registration via a GitHub pull request, followed by an automatic validation and test process for the submitted workflow. As a proof of concept, we built a registry using Yevis to host workflows from a community to demonstrate how a workflow can be shared while fulfilling the defined requirements. CONCLUSIONS Yevis helps in the building of a workflow registry to share reusable workflows without requiring extensive human resources. By following Yevis's workflow-sharing procedure, one can operate a registry while satisfying the reusable workflow criteria. This system is particularly useful to individuals or communities that want to share workflows but lacks the specific technical expertise to build and maintain a workflow registry from scratch.
Collapse
Affiliation(s)
- Hirotaka Suetake
- Department of Creative Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-0033, Japan
| | - Tsukasa Fukusato
- Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-0033, Japan
| | - Takeo Igarashi
- Department of Creative Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-0033, Japan
| | - Tazro Ohta
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Shizuoka 411-8540, Japan
| |
Collapse
|
34
|
Suetake H, Fukusato T, Igarashi T, Ohta T. A workflow reproducibility scale for automatic validation of biological interpretation results. Gigascience 2022; 12:giad031. [PMID: 37150537 PMCID: PMC10164546 DOI: 10.1093/gigascience/giad031] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 01/26/2023] [Accepted: 04/28/2023] [Indexed: 05/09/2023] Open
Abstract
BACKGROUND Reproducibility of data analysis workflow is a key issue in the field of bioinformatics. Recent computing technologies, such as virtualization, have made it possible to reproduce workflow execution with ease. However, the reproducibility of results is not well discussed; that is, there is no standard way to verify whether the biological interpretation of reproduced results is the same. Therefore, it still remains a challenge to automatically evaluate the reproducibility of results. RESULTS We propose a new metric, a reproducibility scale of workflow execution results, to evaluate the reproducibility of results. This metric is based on the idea of evaluating the reproducibility of results using biological feature values (e.g., number of reads, mapping rate, and variant frequency) representing their biological interpretation. We also implemented a prototype system that automatically evaluates the reproducibility of results using the proposed metric. To demonstrate our approach, we conducted an experiment using workflows used by researchers in real research projects and the use cases that are frequently encountered in the field of bioinformatics. CONCLUSIONS Our approach enables automatic evaluation of the reproducibility of results using a fine-grained scale. By introducing our approach, it is possible to evolve from a binary view of whether the results are superficially identical or not to a more graduated view. We believe that our approach will contribute to more informed discussion on reproducibility in bioinformatics.
Collapse
Affiliation(s)
- Hirotaka Suetake
- Department of Creative Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Tsukasa Fukusato
- Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Takeo Igarashi
- Department of Creative Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Tazro Ohta
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Shizuoka, 411-8540, Japan
| |
Collapse
|
35
|
Hoang V, Hung LH, Perez D, Deng H, Schooley R, Arumilli N, Yeung KY, Lloyd W. Container Profiler: Profiling resource utilization of containerized big data pipelines. Gigascience 2022; 12:giad069. [PMID: 37624874 PMCID: PMC10452954 DOI: 10.1093/gigascience/giad069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 08/02/2023] [Accepted: 08/15/2023] [Indexed: 08/27/2023] Open
Abstract
BACKGROUND This article presents the Container Profiler, a software tool that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of containerized tasks collecting over 60 Linux operating system metrics at the virtual machine, container, and process levels. The Container Profiler supports performing time-series profiling at a configurable sampling interval to enable continuous monitoring of the resources consumed by containerized tasks and pipelines. RESULTS To investigate the utility of the Container Profiler, we profile the resource utilization requirements of a multistage bioinformatics analytical pipeline (RNA sequencing using unique molecular identifiers). We examine profiling metrics to assess patterns of CPU, disk, and network resource utilization across the different stages of the pipeline. We also quantify the profiling overhead of our Container Profiler tool to assess the impact of profiling a running pipeline with different levels of profiling granularity, verifying that impacts are negligible. CONCLUSIONS The Container Profiler provides a useful tool that can be used to continuously monitor the resource consumption of long and complex containerized applications that run locally or on the cloud. This can help identify bottlenecks where more resources are needed to improve performance.
Collapse
Affiliation(s)
- Varik Hoang
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Ling-Hong Hung
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - David Perez
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Huazeng Deng
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Raymond Schooley
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Niharika Arumilli
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Ka Yee Yeung
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Wes Lloyd
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| |
Collapse
|
36
|
Tanizawa Y, Fujisawa T, Kodama Y, Kosuge T, Mashima J, Tanjo T, Nakamura Y. DNA Data Bank of Japan (DDBJ) update report 2022. Nucleic Acids Res 2022; 51:D101-D105. [PMID: 36420889 PMCID: PMC9825463 DOI: 10.1093/nar/gkac1083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 10/24/2022] [Accepted: 11/22/2022] [Indexed: 11/26/2022] Open
Abstract
The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) maintains database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), our primary mission is to collect and distribute nucleotide sequence data, as well as their study and sample information, in collaboration with the National Center for Biotechnology Information in the United States and the European Bioinformatics Institute. In addition to INSDC resources, the Center operates databases for functional genomics (GEA: Genomic Expression Archive), metabolomics (MetaboBank), and human genetic and phenotypic data (JGA: Japanese Genotype-Phenotype Archive). These databases are built on the supercomputer of the National Institute of Genetics, whose remaining computational capacity is actively utilized by domestic researchers for large-scale biological data analyses. Here, we report our recent updates and the activities of our services.
Collapse
Affiliation(s)
- Yasuhiro Tanizawa
- To whom correspondence should be addressed. Tel: +55 981 6859; Fax: +55 981 6889;
| | - Takatomo Fujisawa
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yuichi Kodama
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Takehide Kosuge
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Jun Mashima
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Tomoya Tanjo
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yasukazu Nakamura
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| |
Collapse
|
37
|
hgtseq: A Standard Pipeline to Study Horizontal Gene Transfer. Int J Mol Sci 2022; 23:ijms232314512. [PMID: 36498841 PMCID: PMC9738810 DOI: 10.3390/ijms232314512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 11/14/2022] [Accepted: 11/18/2022] [Indexed: 11/23/2022] Open
Abstract
Horizontal gene transfer (HGT) is well described in prokaryotes: it plays a crucial role in evolution, and has functional consequences in insects and plants. However, less is known about HGT in humans. Studies have reported bacterial integrations in cancer patients, and microbial sequences have been detected in data from well-known human sequencing projects. Few of the existing tools for investigating HGT are highly automated. Thanks to the adoption of Nextflow for life sciences workflows, and to the standards and best practices curated by communities such as nf-core, fully automated, portable, and scalable pipelines can now be developed. Here we present nf-core/hgtseq to facilitate the analysis of HGT from sequencing data in different organisms. We showcase its performance by analysing six exome datasets from five mammals. Hgtseq can be run seamlessly in any computing environment and accepts data generated by existing exome and whole-genome sequencing projects; this will enable researchers to expand their analyses into this area. Fundamental questions are still open about the mechanisms and the extent or role of horizontal gene transfer: by releasing hgtseq we provide a standardised tool which will enable a systematic investigation of this phenomenon, thus paving the way for a better understanding of HGT.
Collapse
|
38
|
Zhao XJG, Cao H. Linking research of biomedical datasets. Brief Bioinform 2022; 23:6712704. [PMID: 36151775 DOI: 10.1093/bib/bbac373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 08/03/2022] [Accepted: 08/08/2022] [Indexed: 12/14/2022] Open
Abstract
Biomedical data preprocessing and efficient computing can be as important as the statistical methods used to fit the data; data processing needs to consider application scenarios, data acquisition and individual rights and interests. We review common principles, knowledge and methods of integrated research according to the whole-pipeline processing mechanism diverse, coherent, sharing, auditable and ecological. First, neuromorphic and native algorithms integrate diverse datasets, providing linear scalability and high visualization. Second, the choice mechanism of different preprocessing, analysis and transaction methods from raw to neuromorphic was summarized on the node and coordinator platforms. Third, combination of node, network, cloud, edge, swarm and graph builds an ecosystem of cohort integrated research and clinical diagnosis and treatment. Looking forward, it is vital to simultaneously combine deep computing, mass data storage and massively parallel communication.
Collapse
Affiliation(s)
- Xiu-Ju George Zhao
- Wuhan Institute of Physics and Mathematics (WIPM), China.,Wuhan Polytechnic University, China
| | - Hui Cao
- Wuhan Polytechnic University, China
| |
Collapse
|
39
|
Martins dos Santos V, Anton M, Szomolay B, Ostaszewski M, Arts I, Benfeitas R, Dominguez Del Angel V, Domínguez-Romero E, Ferk P, Fey D, Goble C, Golebiewski M, Gruden K, Heil KF, Hermjakob H, Kahlem P, Klapa MI, Koehorst J, Kolodkin A, Kutmon M, Leskošek B, Moretti S, Müller W, Pagni M, Rezen T, Rocha M, Rozman D, Šafránek D, T. Scott W, Sheriff RSM, Suarez Diez M, Van Steen K, Westerhoff HV, Wittig U, Wolstencroft K, Zupanic A, Evelo CT, Hancock JM. Systems Biology in ELIXIR: modelling in the spotlight. F1000Res 2022; 11:ELIXIR-1265. [PMID: 36742342 PMCID: PMC9871403 DOI: 10.12688/f1000research.126734.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/09/2022] Open
Abstract
In this white paper, we describe the founding of a new ELIXIR Community - the Systems Biology Community - and its proposed future contributions to both ELIXIR and the broader community of systems biologists in Europe and worldwide. The Community believes that the infrastructure aspects of systems biology - databases, (modelling) tools and standards development, as well as training and access to cloud infrastructure - are not only appropriate components of the ELIXIR infrastructure, but will prove key components of ELIXIR's future support of advanced biological applications and personalised medicine. By way of a series of meetings, the Community identified seven key areas for its future activities, reflecting both future needs and previous and current activities within ELIXIR Platforms and Communities. These are: overcoming barriers to the wider uptake of systems biology; linking new and existing data to systems biology models; interoperability of systems biology resources; further development and embedding of systems medicine; provisioning of modelling as a service; building and coordinating capacity building and training resources; and supporting industrial embedding of systems biology. A set of objectives for the Community has been identified under four main headline areas: Standardisation and Interoperability, Technology, Capacity Building and Training, and Industrial Embedding. These are grouped into short-term (3-year), mid-term (6-year) and long-term (10-year) objectives.
Collapse
Affiliation(s)
- Vitor Martins dos Santos
- Laboratory of Bioprocess Engineering, Wageningen University & Research, Wageningen, 6708 PB, The Netherlands
| | - Mihail Anton
- Department of Biology and Biological Engineering, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Chalmers University of Technology, Gothenburg, SE-41258, Sweden
| | - Barbara Szomolay
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Ilja Arts
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6200 MD, The Netherlands
| | - Rui Benfeitas
- National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | | | | | - Polonca Ferk
- Faculty of Medicine, Institute for Biostatistics and Medical Informatics, Centre ELIXIR-SI, University of Ljubljana, Ljubljana, SI-1000, Slovenia
| | - Dirk Fey
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin, 4, Ireland
| | - Carole Goble
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Martin Golebiewski
- Heidelberg Institute for Theoretical Studies - HITS, Heidelberg, 69118, Germany
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, SI-1000, Slovenia
| | | | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK
| | - Pascal Kahlem
- Scientific Network Management SL, Barcelona, 08015, Spain
| | - Maria I. Klapa
- Metabolic Engineering & Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research & Technology - Hellas (FORTH/ICE-HT), Patras, 26504, Greece
| | - Jasper Koehorst
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, 6708WE, The Netherlands
| | - Alexey Kolodkin
- Competence Center for Methodology and Statistics; Transversal Translational Medicine, Translational Medicine Operations Hub, Luxembourg Institute of Health, Strassen, L-1445, Luxembourg
- ISBE.NL, VU University of Amsterdam, Amsterdam, The Netherlands
| | - Martina Kutmon
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6200 MD, The Netherlands
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6200 MD, The Netherlands
| | - Brane Leskošek
- Faculty of Medicine, Institute for Biostatistics and Medical Informatics, Centre ELIXIR-SI, University of Ljubljana, Ljubljana, SI-1000, Slovenia
| | | | - Wolfgang Müller
- Heidelberg Institute for Theoretical Studies - HITS, Heidelberg, 69118, Germany
| | - Marco Pagni
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tadeja Rezen
- Faculty of Medicine, University of Ljubljana, Ljubljana, SI-1000, Slovenia
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Damjana Rozman
- Faculty of Medicine, University of Ljubljana, Ljubljana, SI-1000, Slovenia
| | - David Šafránek
- Faculty of Informatics, Masaryk University, Brno, 602 00, Czech Republic
| | - William T. Scott
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, 6708WE, The Netherlands
- UNLOCK, Wageningen University & Research, 6708 PB Wageningen, The Netherlands
| | - Rahuman S. Malik Sheriff
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK
| | - Maria Suarez Diez
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, 6708WE, The Netherlands
| | - Kristel Van Steen
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, 3000, Belgium
- BIO3 - Systems Genetics, GIGA-R Medical Genomics, University of Liege, Liege, 4000, Belgium
| | | | - Ulrike Wittig
- Heidelberg Institute for Theoretical Studies - HITS, Heidelberg, 69118, Germany
| | - Katherine Wolstencroft
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, 2333 CA, The Netherlands
| | - Anze Zupanic
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, SI-1000, Slovenia
| | - Chris T. Evelo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6200 MD, The Netherlands
| | - John M. Hancock
- Faculty of Medicine, University of Ljubljana, Ljubljana, SI-1000, Slovenia
| |
Collapse
|
40
|
Martins dos Santos V, Anton M, Szomolay B, Ostaszewski M, Arts I, Benfeitas R, Dominguez Del Angel V, Domínguez-Romero E, Ferk P, Fey D, Goble C, Golebiewski M, Gruden K, Heil KF, Hermjakob H, Kahlem P, Klapa MI, Koehorst J, Kolodkin A, Kutmon M, Leskošek B, Moretti S, Müller W, Pagni M, Rezen T, Rocha M, Rozman D, Šafránek D, T. Scott W, Sheriff RSM, Suarez Diez M, Van Steen K, Westerhoff HV, Wittig U, Wolstencroft K, Zupanic A, Evelo CT, Hancock JM. Systems Biology in ELIXIR: modelling in the spotlight. F1000Res 2022; 11:ELIXIR-1265. [PMID: 36742342 PMCID: PMC9871403 DOI: 10.12688/f1000research.126734.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/20/2024] [Indexed: 06/05/2024] Open
Abstract
In this white paper, we describe the founding of a new ELIXIR Community - the Systems Biology Community - and its proposed future contributions to both ELIXIR and the broader community of systems biologists in Europe and worldwide. The Community believes that the infrastructure aspects of systems biology - databases, (modelling) tools and standards development, as well as training and access to cloud infrastructure - are not only appropriate components of the ELIXIR infrastructure, but will prove key components of ELIXIR's future support of advanced biological applications and personalised medicine. By way of a series of meetings, the Community identified seven key areas for its future activities, reflecting both future needs and previous and current activities within ELIXIR Platforms and Communities. These are: overcoming barriers to the wider uptake of systems biology; linking new and existing data to systems biology models; interoperability of systems biology resources; further development and embedding of systems medicine; provisioning of modelling as a service; building and coordinating capacity building and training resources; and supporting industrial embedding of systems biology. A set of objectives for the Community has been identified under four main headline areas: Standardisation and Interoperability, Technology, Capacity Building and Training, and Industrial Embedding. These are grouped into short-term (3-year), mid-term (6-year) and long-term (10-year) objectives.
Collapse
Affiliation(s)
- Vitor Martins dos Santos
- Laboratory of Bioprocess Engineering, Wageningen University & Research, Wageningen, 6708 PB, The Netherlands
| | - Mihail Anton
- Department of Biology and Biological Engineering, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Chalmers University of Technology, Gothenburg, SE-41258, Sweden
| | - Barbara Szomolay
- Division of Infection and Immunity, School of Medicine, Cardiff University, Cardiff, UK
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Ilja Arts
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6200 MD, The Netherlands
| | - Rui Benfeitas
- National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | | | | | - Polonca Ferk
- Faculty of Medicine, Institute for Biostatistics and Medical Informatics, Centre ELIXIR-SI, University of Ljubljana, Ljubljana, SI-1000, Slovenia
| | - Dirk Fey
- Systems Biology Ireland, School of Medicine, University College Dublin, Dublin, 4, Ireland
| | - Carole Goble
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Martin Golebiewski
- Heidelberg Institute for Theoretical Studies - HITS, Heidelberg, 69118, Germany
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, SI-1000, Slovenia
| | | | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK
| | - Pascal Kahlem
- Scientific Network Management SL, Barcelona, 08015, Spain
| | - Maria I. Klapa
- Metabolic Engineering & Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research & Technology - Hellas (FORTH/ICE-HT), Patras, 26504, Greece
| | - Jasper Koehorst
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, 6708WE, The Netherlands
| | - Alexey Kolodkin
- Competence Center for Methodology and Statistics; Transversal Translational Medicine, Translational Medicine Operations Hub, Luxembourg Institute of Health, Strassen, L-1445, Luxembourg
- ISBE.NL, VU University of Amsterdam, Amsterdam, The Netherlands
| | - Martina Kutmon
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6200 MD, The Netherlands
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6200 MD, The Netherlands
| | - Brane Leskošek
- Faculty of Medicine, Institute for Biostatistics and Medical Informatics, Centre ELIXIR-SI, University of Ljubljana, Ljubljana, SI-1000, Slovenia
| | | | - Wolfgang Müller
- Heidelberg Institute for Theoretical Studies - HITS, Heidelberg, 69118, Germany
| | - Marco Pagni
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tadeja Rezen
- Faculty of Medicine, University of Ljubljana, Ljubljana, SI-1000, Slovenia
| | - Miguel Rocha
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Damjana Rozman
- Faculty of Medicine, University of Ljubljana, Ljubljana, SI-1000, Slovenia
| | - David Šafránek
- Faculty of Informatics, Masaryk University, Brno, 602 00, Czech Republic
| | - William T. Scott
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, 6708WE, The Netherlands
- UNLOCK, Wageningen University & Research, 6708 PB Wageningen, The Netherlands
| | - Rahuman S. Malik Sheriff
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK
| | - Maria Suarez Diez
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, 6708WE, The Netherlands
| | - Kristel Van Steen
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, 3000, Belgium
- BIO3 - Systems Genetics, GIGA-R Medical Genomics, University of Liege, Liege, 4000, Belgium
| | | | - Ulrike Wittig
- Heidelberg Institute for Theoretical Studies - HITS, Heidelberg, 69118, Germany
| | - Katherine Wolstencroft
- Leiden Institute of Advanced Computer Science, Leiden University, Leiden, 2333 CA, The Netherlands
| | - Anze Zupanic
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, SI-1000, Slovenia
| | - Chris T. Evelo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, 6200 MD, The Netherlands
| | - John M. Hancock
- Faculty of Medicine, University of Ljubljana, Ljubljana, SI-1000, Slovenia
| |
Collapse
|
41
|
Sheffield NC, Bonazzi VR, Bourne PE, Burdett T, Clark T, Grossman RL, Spjuth O, Yates AD. From biomedical cloud platforms to microservices: next steps in FAIR data and analysis. Sci Data 2022; 9:553. [PMID: 36075919 PMCID: PMC9458632 DOI: 10.1038/s41597-022-01619-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 08/08/2022] [Indexed: 11/29/2022] Open
Affiliation(s)
- Nathan C Sheffield
- Center for Public Health Genomics, School of Medicine, University of Virginia, 22908, Charlottesville, VA, USA.
- School of Data Science, University of Virginia, Charlottesville VA 22904, Charlottesville, VA, USA.
- Department of Biomedical Engineering, School of Medicine, University of Virginia, 22904, Charlottesville, VA, USA.
- Department of Public Health Sciences, School of Medicine, University of Virginia, 22908, Charlottesville, VA, USA.
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, 22908, Charlottesville, VA, USA.
| | | | - Philip E Bourne
- School of Data Science, University of Virginia, Charlottesville VA 22904, Charlottesville, VA, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, 22904, Charlottesville, VA, USA
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Timothy Clark
- School of Data Science, University of Virginia, Charlottesville VA 22904, Charlottesville, VA, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, 22908, Charlottesville, VA, USA
| | - Robert L Grossman
- Center for Translational Data Science, University of Chicago, Chicago, IL, 60615, USA
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, 75124, Uppsala, Sweden
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
42
|
Waterhouse RM, Adam-Blondon AF, Agosti D, Baldrian P, Balech B, Corre E, Davey RP, Lantz H, Pesole G, Quast C, Glöckner FO, Raes N, Sandionigi A, Santamaria M, Addink W, Vohradsky J, Nunes-Jorge A, Willassen NP, Lanfear J. Recommendations for connecting molecular sequence and biodiversity research infrastructures through ELIXIR. F1000Res 2022; 10. [PMID: 35999898 PMCID: PMC9360911 DOI: 10.12688/f1000research.73825.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/27/2022] [Indexed: 12/03/2022] Open
Abstract
Threats to global biodiversity are increasingly recognised by scientists and the public as a critical challenge. Molecular sequencing technologies offer means to catalogue, explore, and monitor the richness and biogeography of life on Earth. However, exploiting their full potential requires tools that connect biodiversity infrastructures and resources. As a research infrastructure developing services and technical solutions that help integrate and coordinate life science resources across Europe, ELIXIR is a key player. To identify opportunities, highlight priorities, and aid strategic thinking, here we survey approaches by which molecular technologies help inform understanding of biodiversity. We detail example use cases to highlight how DNA sequencing is: resolving taxonomic issues; Increasing knowledge of marine biodiversity; helping understand how agriculture and biodiversity are critically linked; and playing an essential role in ecological studies. Together with examples of national biodiversity programmes, the use cases show where progress is being made but also highlight common challenges and opportunities for future enhancement of underlying technologies and services that connect molecular and wider biodiversity domains. Based on emerging themes, we propose key recommendations to guide future funding for biodiversity research: biodiversity and bioinformatic infrastructures need to collaborate closely and strategically; taxonomic efforts need to be aligned and harmonised across domains; metadata needs to be standardised and common data management approaches widely adopted; current approaches need to be scaled up dramatically to address the anticipated explosion of molecular data; bioinformatics support for biodiversity research needs to be enabled and sustained; training for end users of biodiversity research infrastructures needs to be prioritised; and community initiatives need to be proactive and focused on enabling solutions. For sequencing data to deliver their full potential they must be connected to knowledge: together, molecular sequence data collection initiatives and biodiversity research infrastructures can advance global efforts to prevent further decline of Earth’s biodiversity.
Collapse
Affiliation(s)
- Robert M. Waterhouse
- Department of Ecology and Evolution and Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Vaud, 1015, Switzerland
| | | | | | - Petr Baldrian
- Institute of Microbiology of the Czech Academy of Sciences, Praha, 142 20, Czech Republic
| | - Bachir Balech
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, CNR, Bari, 70126, Italy
| | - Erwan Corre
- CNRS/Sorbonne Université, Station Biologique de Roscoff, Roscoff, 29680, France
| | | | - Henrik Lantz
- Department of Medical Biochemistry and Microbiology/NBIS, Uppsala University, Uppsala, Sweden
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, CNR, Bari, 70126, Italy
- Department of Biosciences. Biotechnology and Biopharmaceutics, University of Bari “A. Moro”, Bari, 70126, Italy
| | - Christian Quast
- Life Sciences & Chemistry, Jacobs University Bremen gGmbH, Bremen, Germany
| | - Frank Oliver Glöckner
- MARUM - Center for Marine Environmental Sciences, University of Bremen, Bremerhaven, 27570, Germany
- Alfred Wegener Institute, Helmholtz Center for Polar- and Marine Research, Bremerhaven, 27570, Germany
| | - Niels Raes
- NLBIF - Netherlands Biodiversity Information Facility, Naturalis Biodiversity Center, Leiden, 2300 RA, The Netherlands
| | | | - Monica Santamaria
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, CNR, Bari, 70126, Italy
| | - Wouter Addink
- DiSSCo - Distributed System of Scientific Collections, Naturalis Biodiversity Center, Leiden, 2300 RA, The Netherlands
| | - Jiri Vohradsky
- Laboratory of Bioinformatics, Institute of Microbiology, Prague, 142 20, Czech Republic
| | | | | | - Jerry Lanfear
- ELIXIR Hub, Wellcome Genome Campus, Cambridge, CB10 1SD, UK
| |
Collapse
|
43
|
Poulain P, Camadro JM. AutoClassWeb: a simple web interface for Bayesian clustering of omics data. BMC Res Notes 2022; 15:241. [PMID: 35799281 PMCID: PMC9264669 DOI: 10.1186/s13104-022-06129-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 06/21/2022] [Indexed: 11/10/2022] Open
Abstract
Objective Data clustering is a common exploration step in the omics era, notably in genomics and proteomics where many genes or proteins can be quantified from one or more experiments. Bayesian clustering is a powerful unsupervised algorithm that can classify several thousands of genes or proteins. AutoClass C, its original implementation, handles missing data, automatically determines the best number of clusters but is not user-friendly. Results We developed an online tool called AutoClassWeb, which provides an easy-to-use and simple web interface for Bayesian clustering with AutoClass. Input data are entered as TSV files and quality controlled. Results are provided in formats that ease further analyses with spreadsheet programs or with programming languages, such as Python or R. AutoClassWeb is implemented in Python and is published under the 3-Clauses BSD license. The source code is available at https://github.com/pierrepo/autoclassweb along with a detailed documentation.
Collapse
Affiliation(s)
- Pierre Poulain
- Université Paris Cité, CNRS, Institut Jacques Monod, Paris, F-75013, France.
| | | |
Collapse
|
44
|
The human "contaminome": bacterial, viral, and computational contamination in whole genome sequences from 1000 families. Sci Rep 2022; 12:9863. [PMID: 35701436 PMCID: PMC9198055 DOI: 10.1038/s41598-022-13269-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 05/18/2022] [Indexed: 01/11/2023] Open
Abstract
The unmapped readspace of whole genome sequencing data tends to be large but is often ignored. We posit that it contains valuable signals of both human infection and contamination. Using unmapped and poorly aligned reads from whole genome sequences (WGS) of over 1000 families and nearly 5000 individuals, we present insights into common viral, bacterial, and computational contamination that plague whole genome sequencing studies. We present several notable results: (1) In addition to known contaminants such as Epstein-Barr virus and phiX, sequences from whole blood and lymphocyte cell lines contain many other contaminants, likely originating from storage, prep, and sequencing pipelines. (2) Sequencing plate and biological sample source of a sample strongly influence contamination profile. And, (3) Y-chromosome fragments not on the human reference genome commonly mismap to bacterial reference genomes. Both experiment-derived and computational contamination is prominent in next-generation sequencing data. Such contamination can compromise results from WGS as well as metagenomics studies, and standard protocols for identifying and removing contamination should be developed to ensure the fidelity of sequencing-based studies.
Collapse
|
45
|
Bayarri G, Andrio P, Hospital A, Orozco M, Gelpí JL. BioExcel Building Blocks Workflows (BioBB-Wfs), an integrated web-based platform for biomolecular simulations. Nucleic Acids Res 2022; 50:W99-W107. [PMID: 35639735 PMCID: PMC9252775 DOI: 10.1093/nar/gkac380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 04/19/2022] [Accepted: 05/02/2022] [Indexed: 11/15/2022] Open
Abstract
We present BioExcel Building Blocks Workflows, a web-based graphical user interface (GUI) offering access to a collection of transversal pre-configured biomolecular simulation workflows assembled with the BioExcel Building Blocks library. Available workflows include Molecular Dynamics setup, protein-ligand docking, trajectory analyses and small molecule parameterization. Workflows can be launched in the platform or downloaded to be run in the users’ own premises. Remote launching of long executions to user's available High-Performance computers is possible, only requiring configuration of the appropriate access credentials. The web-based graphical user interface offers a high level of interactivity, with integration with the NGL viewer to visualize and check 3D structures, MDsrv to visualize trajectories, and Plotly to explore 2D plots. The server requires no login but is recommended to store the users’ projects and manage sensitive information such as remote credentials. Private projects can be made public and shared with colleagues with a simple URL. The tool will help biomolecular simulation users with the most common and repetitive processes by means of a very intuitive and interactive graphical user interface. The server is accessible at https://mmb.irbbarcelona.org/biobb-wfs.
Collapse
Affiliation(s)
- Genís Bayarri
- Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology. Baldiri Reixac 10-12, 08028 Barcelona, Spain
| | - Pau Andrio
- Barcelona Supercomputing Center (BSC), Jordi Girona 29, 08034, Barcelona, Spain
| | - Adam Hospital
- Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology. Baldiri Reixac 10-12, 08028 Barcelona, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine (IRB Barcelona). The Barcelona Institute of Science and Technology. Baldiri Reixac 10-12, 08028 Barcelona, Spain.,Department of Biochemistry and Molecular Biology, University of Barcelona, 08028 Barcelona, Spain
| | - Josep Lluís Gelpí
- Barcelona Supercomputing Center (BSC), Jordi Girona 29, 08034, Barcelona, Spain.,Department of Biochemistry and Molecular Biology, University of Barcelona, 08028 Barcelona, Spain
| |
Collapse
|
46
|
Pinter N, Glätzer D, Fahrner M, Fröhlich K, Johnson J, Grüning BA, Warscheid B, Drepper F, Schilling O, Föll MC. MaxQuant and MSstats in Galaxy Enable Reproducible Cloud-Based Analysis of Quantitative Proteomics Experiments for Everyone. J Proteome Res 2022; 21:1558-1565. [PMID: 35503992 DOI: 10.1021/acs.jproteome.2c00051] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Quantitative mass spectrometry-based proteomics has become a high-throughput technology for the identification and quantification of thousands of proteins in complex biological samples. Two frequently used tools, MaxQuant and MSstats, allow for the analysis of raw data and finding proteins with differential abundance between conditions of interest. To enable accessible and reproducible quantitative proteomics analyses in a cloud environment, we have integrated MaxQuant (including TMTpro 16/18plex), Proteomics Quality Control (PTXQC), MSstats, and MSstatsTMT into the open-source Galaxy framework. This enables the web-based analysis of label-free and isobaric labeling proteomics experiments via Galaxy's graphical user interface on public clouds. MaxQuant and MSstats in Galaxy can be applied in conjunction with thousands of existing Galaxy tools and integrated into standardized, sharable workflows. Galaxy tracks all metadata and intermediate results in analysis histories, which can be shared privately for collaborations or publicly, allowing full reproducibility and transparency of published analysis. To further increase accessibility, we provide detailed hands-on training materials. The integration of MaxQuant and MSstats into the Galaxy framework enables their usage in a reproducible way on accessible large computational infrastructures, hence realizing the foundation for high-throughput proteomics data science for everyone.
Collapse
Affiliation(s)
- Niko Pinter
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
| | - Damian Glätzer
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Klemens Fröhlich
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany.,Spemann Graduate School of Biology and Medicine (SGBM), Albert-Ludwigs-University Freiburg, 79104 Freiburg, Germany
| | - James Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | | | - Bettina Warscheid
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany.,Faculty of Chemistry and Pharmacy, Department of Biochemistry, Julius Maximilian University of Würzburg, 97074 Würzburg, Germany
| | - Friedel Drepper
- Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), 79106 Freiburg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center, University of Freiburg, 79106 Freiburg, Germany.,Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany.,Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts 02115, United States
| |
Collapse
|
47
|
Kadri S, Sboner A, Sigaras A, Roy S. Containers in Bioinformatics: Applications, Practical Considerations, and Best Practices in Molecular Pathology. J Mol Diagn 2022; 24:442-454. [PMID: 35189355 DOI: 10.1016/j.jmoldx.2022.01.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 11/15/2021] [Accepted: 01/21/2022] [Indexed: 12/19/2022] Open
Abstract
Systematic implementation of bioinformatics resources for next generation sequencing (NGS)-based clinical testing is an arduous undertaking. One of the key challenges involves developing an ecosystem of information technology infrastructure for enabling scalable and reproducible bioinformatics services that is resilient and secure for handling genetic and protected health information, often embedded in an existing non-bioinformatics-oriented infrastructure. Container technology provides an ideal and infrastructure-agnostic solution for molecular laboratories developing and using bioinformatics pipelines, whether on-premise or using the cloud. A container is a technology that provides a consistent computational environment and enables reproducibility, scalability, and security when developing NGS bioinformatics analysis pipelines. Containers can increase the bioinformatics team's productivity by automating and simplifying the maintenance of complex bioinformatics resources, as well as facilitate validation, version control, and documentation necessary for clinical laboratory regulatory compliance. Although there is increasing popularity in adopting containers for developing NGS bioinformatics pipelines, there is wide variability and inconsistency in the usage of containers that may result in suboptimal performance and potentially compromise the security and privacy of protected health information. In this article, the authors highlight the current state and provide best or recommended practices for building, using containers in NGS bioinformatics solutions in a clinical setting with focus on scalability, optimization, maintainability, and data security.
Collapse
Affiliation(s)
- Sabah Kadri
- Department of Bioinformatics, Ann & Robert H Lurie Children's Hospital, Chicago, Illinois
| | - Andrea Sboner
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, New York; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, New York; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York
| | - Alexandros Sigaras
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, New York; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York
| | - Somak Roy
- Department of Molecular Pathology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.
| |
Collapse
|
48
|
Galaxy workflows for fragment-based virtual screening: a case study on the SARS-CoV-2 main protease. J Cheminform 2022; 14:22. [PMID: 35414112 PMCID: PMC9003163 DOI: 10.1186/s13321-022-00588-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 02/09/2022] [Indexed: 12/03/2022] Open
Abstract
We present several workflows for protein-ligand docking and free energy calculation for use in the workflow management system Galaxy. The workflows are composed of several widely used open-source tools, including rDock and GROMACS, and can be executed on public infrastructure using either Galaxy’s graphical interface or the command line. We demonstrate the utility of the workflows by running a high-throughput virtual screening of around 50000 compounds against the SARS-CoV-2 main protease, a system which has been the subject of intense study in the last year.
Collapse
|
49
|
Serrano-Solano B, Fouilloux A, Eguinoa I, Kalaš M, Grüning B, Coppens F. Galaxy: A Decade of Realising CWFR Concepts. DATA INTELLIGENCE 2022. [DOI: 10.1162/dint_a_00136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
Despite recent encouragement to follow the FAIR principles, the day-to-day research practices have not changed substantially. Due to new developments and the increasing pressure to apply best practices, initiatives to improve the efficiency and reproducibility of scientific workflows are becoming more prevalent. In this article, we discuss the importance of well-annotated tools and the specific requirements to ensure reproducible research with FAIR outputs. We detail how Galaxy, an open-source workflow management system with a web-based interface, has implemented the concepts that are put forward by the Canonical Workflow Framework for Research (CWFR), whilst minimising changes to the practices of scientific communities. Although we showcase concrete applications from two different domains, this approach is generalisable to any domain and particularly useful in interdisciplinary research and science-based applications.
Collapse
Affiliation(s)
| | - Anne Fouilloux
- Department of Geosciences, University of Oslo, Oslo 0316, Norway
| | - Ignacio Eguinoa
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB, Gent, Oost-Vlaanderen 9052, Belgium
| | - Matúš Kalaš
- Department of Informatics, University of Bergen Ringgold standard institution, University of Bergen, Bergen, Hordaland 5008, Norway
| | - Björn Grüning
- Bioinformatics Group, University of Freiburg, Baden-Württemberg 79098, Germany
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB, Gent, Oost-Vlaanderen 9052, Belgium
| |
Collapse
|
50
|
Czmil A, Wronski M, Czmil S, Sochacka-Pietal M, Cmil M, Gawor J, Wołkowicz T, Plewczynski D, Strzalka D, Pietal M. NanoForms: an integrated server for processing, analysis and assembly of raw sequencing data of microbial genomes, from Oxford Nanopore technology. PeerJ 2022; 10:e13056. [PMID: 35368340 PMCID: PMC8973472 DOI: 10.7717/peerj.13056] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 02/13/2022] [Indexed: 01/11/2023] Open
Abstract
Background Next Generation Sequencing (NGS) techniques dominate today's landscape of genetics and genomics research. Though Illumina still dominates worldwide sequencing, Oxford Nanopore is one of the leading technologies currently being used by biologists, medics and geneticists across various applications. Oxford Nanopore is automated and relatively simple for conducting experiments, but generates gigabytes of raw data, to be processed by often ambiguous set of alternative bioinformatics command-line tools, and genomics frameworks which require a knowledge of bioinformatics to run. Results We established an inter-collegiate collaboration across experimentalists and bioinformaticians in order to provide a novel bioinformatics tool, free for academics. This tool allows people without extensive bioinformatics knowledge to simply process their raw genome sequencing data. Currently, due to ICT resources' maintenance reasons, our server is only capable of handling small genomes (up to 15 Mb). In this paper, we introduce our tool, NanoForms: an intuitive and integrated web server for the processing and analysis of raw prokaryotic genome data, coming from Oxford Nanopore. NanoForms is freely available for academics at the following locations: http://nanoforms.tech (webserver) and https://github.com/czmilanna/nanoforms (GitHub source repository).
Collapse
Affiliation(s)
- Anna Czmil
- Department of Complex Systems, Rzeszow University of Technology, Rzeszow, Subcarpathian, Poland
| | - Michal Wronski
- Department of Complex Systems, Rzeszow University of Technology, Rzeszow, Subcarpathian, Poland
| | - Sylwester Czmil
- Department of Complex Systems, Rzeszow University of Technology, Rzeszow, Subcarpathian, Poland
| | - Marta Sochacka-Pietal
- Department of Biotechnology and Bioinformatics, Rzeszow University of Technology, Rzeszow, Subcarpathian, Poland
| | - Michal Cmil
- Department of Complex Systems, Rzeszow University of Technology, Rzeszow, Subcarpathian, Poland
| | - Jan Gawor
- DNA Sequencing and Oligonucleotide Synthesis Laboratory, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Masovian, Poland
| | - Tomasz Wołkowicz
- Department of Bacteriology and Biocontamination Control, National Institute of Public Health-National Institute of Hygiene, Warsaw, Masovian, Poland
| | - Dariusz Plewczynski
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Masovian, Poland,Laboratory of Bioinformatics and Computational Genomics, Warsaw University of Technology, Warsaw, Masovian, Poland
| | - Dominik Strzalka
- Department of Complex Systems, Rzeszow University of Technology, Rzeszow, Subcarpathian, Poland
| | - Michal Pietal
- Department of Complex Systems, Rzeszow University of Technology, Rzeszow, Subcarpathian, Poland
| |
Collapse
|