1
|
Zhao Q, Zhou X, Wu J, Cai J, Bao X, Tang L, Wang C, Liu C, Wang Y, Teng Y, Zheng M, Mu W, Zuo Z, Xie Y, Luo X, Ren J. BioTreasury: a community-based repository enabling indexing and rating of bioinformatics tools. SCIENCE CHINA. LIFE SCIENCES 2024; 67:221-229. [PMID: 38157107 DOI: 10.1007/s11427-023-2509-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 12/12/2023] [Indexed: 01/03/2024]
Abstract
The exponential growth of bioinformatics tools in recent years has posed challenges for scientists in selecting the most suitable one for their data analysis assignments. Therefore, to aid scientists in making informed choices, a community-based platform that indexes and rates bioinformatics tools is urgently needed. In this study, we introduce BioTreasury ( http://biotreasury.rjmart.cn ), an integrated community-based repository that provides an interactive platform for users and developers to share their experiences in various bioinformatics tools. BioTreasury offers a comprehensive collection of well-indexed bioinformatics software, tools, and databases, totaling over 10,000 entries. In the past two years, we have continuously improved and maintained BioTreasury, adding several exciting features, including creating structured homepages for every tool and user, a hierarchical category of bioinformatics tools and classifying tools using large language model (LLM). BioTreasury streamlines the tool submission process with intelligent auto-completion. Additionally, BioTreasury provides a wide range of social features, for example, enabling users to participate in interactive discussions, rate tools, build and share tool collections for the public. We believe BioTreasury can be a valuable resource and knowledge-sharing platform for the biomedical community. It empowers researchers to effectively discover and evaluate bioinformatics tools, fostering collaboration and advancing bioinformatics research.
Collapse
Affiliation(s)
- Qi Zhao
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China
| | - Xin Zhou
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China
| | - Jingxing Wu
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China
| | - Jieyi Cai
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China
| | - Xiaoqiong Bao
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China
| | - Lin Tang
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China
| | - Chaoye Wang
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China
| | - Chunlei Liu
- Institute of Precision Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510080, China
| | - Yukai Wang
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China
| | - Yuyan Teng
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China
| | - Mohan Zheng
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China
| | - Weiping Mu
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China
| | - Zhixiang Zuo
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China
| | - Yubin Xie
- Institute of Precision Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510080, China
| | - Xiaotong Luo
- Guangdong Institute of Gastroenterology, Department of General Surgery, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510275, China.
| | - Jian Ren
- School of Life Sciences, State Key Laboratory of Oncology in South China, Cancer Center, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University, Guangzhou, 510060, China.
| |
Collapse
|
2
|
Špačková A, Vávra O, Raček T, Bazgier V, Sehnal D, Damborský J, Svobodová R, Bednář D, Berka K. ChannelsDB 2.0: a comprehensive database of protein tunnels and pores in AlphaFold era. Nucleic Acids Res 2024; 52:D413-D418. [PMID: 37956324 PMCID: PMC10767935 DOI: 10.1093/nar/gkad1012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/12/2023] [Accepted: 10/23/2023] [Indexed: 11/15/2023] Open
Abstract
ChannelsDB 2.0 is an updated database providing structural information about the position, geometry and physicochemical properties of protein channels-tunnels and pores-within deposited biomacromolecular structures from PDB and AlphaFoldDB databases. The newly deposited information originated from several sources. Firstly, we included data calculated using a popular CAVER tool to complement the data obtained using original MOLE tool for detection and analysis of protein tunnels and pores. Secondly, we added tunnels starting from cofactors within the AlphaFill database to enlarge the scope of the database to protein models based on Uniprot. This has enlarged available channel annotations ∼4.6 times as of 1 September 2023. The database stores information about geometrical features, e.g. length and radius, and physico-chemical properties based on channel-lining amino acids. The stored data are interlinked with the available UniProt mutation annotation data. ChannelsDB 2.0 provides an excellent resource for deep analysis of the role of biomacromolecular tunnels and pores. The database is available free of charge: https://channelsdb2.biodata.ceitec.cz.
Collapse
Affiliation(s)
- Anna Špačková
- Department of Physical Chemistry, Faculty of Science, Palacký University, tř. 17. listopadu 12, 771 46 Olomouc, Czech Republic
| | - Ondřej Vávra
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Pekařská 53, 656 91 Brno, Czech Republic
| | - Tomáš Raček
- CEITEC – Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno, Czech Republic
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University Brno, Kamenice 5, 625 00 Brno, Czech Republic
| | - Václav Bazgier
- Department of Physical Chemistry, Faculty of Science, Palacký University, tř. 17. listopadu 12, 771 46 Olomouc, Czech Republic
| | - David Sehnal
- CEITEC – Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno, Czech Republic
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University Brno, Kamenice 5, 625 00 Brno, Czech Republic
| | - Jiří Damborský
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Pekařská 53, 656 91 Brno, Czech Republic
| | - Radka Svobodová
- CEITEC – Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno, Czech Republic
- National Centre for Biomolecular Research, Faculty of Science, Masaryk University Brno, Kamenice 5, 625 00 Brno, Czech Republic
| | - David Bednář
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Pekařská 53, 656 91 Brno, Czech Republic
| | - Karel Berka
- Department of Physical Chemistry, Faculty of Science, Palacký University, tř. 17. listopadu 12, 771 46 Olomouc, Czech Republic
| |
Collapse
|
3
|
Bouyssié D, Altıner P, Capella-Gutierrez S, Fernández JM, Hagemeijer YP, Horvatovich P, Hubálek M, Levander F, Mauri P, Palmblad M, Raffelsberger W, Rodríguez-Navas L, Di Silvestre D, Kunkli BT, Uszkoreit J, Vandenbrouck Y, Vizcaíno JA, Winkelhardt D, Schwämmle V. WOMBAT-P: Benchmarking Label-Free Proteomics Data Analysis Workflows. J Proteome Res 2024; 23:418-429. [PMID: 38038272 DOI: 10.1021/acs.jproteome.3c00636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.
Collapse
Affiliation(s)
- David Bouyssié
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III─Paul Sabatier (UT3), 31062 Toulouse, France
- Proteomics French Infrastructure, ProFI, FR 2048 Toulouse, France
| | - Pınar Altıner
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III─Paul Sabatier (UT3), 31062 Toulouse, France
| | | | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Yanick Paco Hagemeijer
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, 9712 CP Groningen, The Netherlands
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, 9713 GZ Groningen, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, 9712 CP Groningen, The Netherlands
| | - Martin Hubálek
- Institute of Organic Chemistry and Biochemistry, CAS, 160 00 Prague, Czech Republic
| | - Fredrik Levander
- National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory, Department of Immunotechnology, Lund University, 22100 Lund, Sweden
| | - Pierluigi Mauri
- Institute for Biomedical Technologies (ITB), Department of Biomedical Sciences, National Research Council (CNR), Segrate, 20054 Milan, Italy
| | - Magnus Palmblad
- Leiden University Medical Center, Postbus 9600, 2300 RC Leiden, The Netherlands
| | - Wolfgang Raffelsberger
- Wolfgang Raffelsberger: Institut de Génétique et de Biologie Moléculaire et Cellulaire, Université de Strasbourg, CNRS UMR7104, INSERM U1258, Illkirch, 1 Rue Laurent Fries, 67404 Illkirch, France
| | - Laura Rodríguez-Navas
- Life Sciences Department, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Dario Di Silvestre
- Institute for Biomedical Technologies (ITB), Department of Biomedical Sciences, National Research Council (CNR), Segrate, 20054 Milan, Italy
| | - Balázs Tibor Kunkli
- Balázs Tibor Kunkli: Department of Biochemistry and Molecular Biology, University of Debrecen, 4032 Debrecen, Hungary
| | - Julian Uszkoreit
- Medical Faculty, Medical Bioinformatics, Ruhr University Bochum, 44801 Bochum, Germany
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany
- Medical Faculty, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Yves Vandenbrouck
- Proteomics French Infrastructure, ProFI, FR 2048 Toulouse, France
- CEA, Fundamental Research Division, Proteomics French Infrastructure, 91191 Gif-sur-Yvette, France
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI), Wellcome Trust, Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Dirk Winkelhardt
- Medical Faculty, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark
| |
Collapse
|
4
|
Martens M, Stierum R, Schymanski EL, Evelo CT, Aalizadeh R, Aladjov H, Arturi K, Audouze K, Babica P, Berka K, Bessems J, Blaha L, Bolton EE, Cases M, Damalas DΕ, Dave K, Dilger M, Exner T, Geerke DP, Grafström R, Gray A, Hancock JM, Hollert H, Jeliazkova N, Jennen D, Jourdan F, Kahlem P, Klanova J, Kleinjans J, Kondic T, Kone B, Lynch I, Maran U, Martinez Cuesta S, Ménager H, Neumann S, Nymark P, Oberacher H, Ramirez N, Remy S, Rocca-Serra P, Salek RM, Sallach B, Sansone SA, Sanz F, Sarimveis H, Sarntivijai S, Schulze T, Slobodnik J, Spjuth O, Tedds J, Thomaidis N, Weber RJ, van Westen GJ, Wheelock CE, Williams AJ, Witters H, Zdrazil B, Županič A, Willighagen EL. ELIXIR and Toxicology: a community in development. F1000Res 2023; 10:ELIXIR-1129. [PMID: 37842337 PMCID: PMC10568213 DOI: 10.12688/f1000research.74502.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/28/2023] [Indexed: 10/17/2023] Open
Abstract
Toxicology has been an active research field for many decades, with academic, industrial and government involvement. Modern omics and computational approaches are changing the field, from merely disease-specific observational models into target-specific predictive models. Traditionally, toxicology has strong links with other fields such as biology, chemistry, pharmacology and medicine. With the rise of synthetic and new engineered materials, alongside ongoing prioritisation needs in chemical risk assessment for existing chemicals, early predictive evaluations are becoming of utmost importance to both scientific and regulatory purposes. ELIXIR is an intergovernmental organisation that brings together life science resources from across Europe. To coordinate the linkage of various life science efforts around modern predictive toxicology, the establishment of a new ELIXIR Community is seen as instrumental. In the past few years, joint efforts, building on incidental overlap, have been piloted in the context of ELIXIR. For example, the EU-ToxRisk, diXa, HeCaToS, transQST, and the nanotoxicology community have worked with the ELIXIR TeSS, Bioschemas, and Compute Platforms and activities. In 2018, a core group of interested parties wrote a proposal, outlining a sketch of what this new ELIXIR Toxicology Community would look like. A recent workshop (held September 30th to October 1st, 2020) extended this into an ELIXIR Toxicology roadmap and a shortlist of limited investment-high gain collaborations to give body to this new community. This Whitepaper outlines the results of these efforts and defines our vision of the ELIXIR Toxicology Community and how it complements other ELIXIR activities.
Collapse
Affiliation(s)
- Marvin Martens
- Department of Bioinformatics - BiGCaT, Maastricht University, Maastricht, 6229 ER, The Netherlands
| | - Rob Stierum
- Risk Analysis for Products In Development (RAPID), Netherlands Organisation for applied scientific research TNO, Utrecht, 3584 CB, The Netherlands
| | - Emma L. Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, 4367, Luxembourg
| | - Chris T. Evelo
- Department of Bioinformatics - BiGCaT, Maastricht University, Maastricht, 6229 ER, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6229 EN, The Netherlands
| | - Reza Aalizadeh
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Athens, 15771, Greece
| | - Hristo Aladjov
- Institute of Biophysics and Biomedical Engineering, Bulgarian Academy of Sciences, Sofia, 1113, Bulgaria
| | - Kasia Arturi
- Department Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, 8600, Switzerland
| | | | - Pavel Babica
- RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Karel Berka
- Department of Physical Chemistry, Palacky University Olomouc, Olomouc, 77146, Czech Republic
| | | | - Ludek Blaha
- RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Evan E. Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Dimitrios Ε. Damalas
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Athens, 15771, Greece
| | - Kirtan Dave
- School of Science, GSFC University, Gujarat, 391750, India
| | - Marco Dilger
- Forschungs- und Beratungsinstitut Gefahrstoffe (FoBiG) GmbH, Freiburg im Breisgau, 79106, Germany
| | | | - Daan P. Geerke
- AIMMS Division of Molecular Toxicology, Vrije Universiteit, Amsterdam, 1081 HZ, The Netherlands
| | - Roland Grafström
- Department of Toxicology, Misvik Biology, Turku, 20520, Finland
- Institute of Environmental Medicine, Karolinska Institute, Stockholm, 17177, Sweden
| | - Alasdair Gray
- Department of Computer Science, Heriot-Watt University, Edinburgh, UK
| | | | - Henner Hollert
- Department Evolutionary Ecology & Environmental Toxicology (E3T), Goethe-University, Frankfurt, D-60438, Germany
| | | | - Danyel Jennen
- Department of Toxicogenomics, Maastricht University, Maastricht, 6200 MD, The Netherlands
| | - Fabien Jourdan
- MetaboHUB, French metabolomics infrastructure in Metabolomics and Fluxomics, Toulouse, France
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, Toulouse, France
| | - Pascal Kahlem
- Scientific Network Management SL, Barcelona, 08015, Spain
| | - Jana Klanova
- RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Jos Kleinjans
- Department of Toxicogenomics, Maastricht University, Maastricht, 6200 MD, The Netherlands
| | - Todor Kondic
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, 4367, Luxembourg
| | - Boï Kone
- Faculty of Pharmacy, Malaria Research and Training Center, Bamako, BP:1805, Mali
| | - Iseult Lynch
- School of Geography, Earth and Environmental Sciences, University of Birmingham, UK, Birmingham, B15 2TT, UK
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, 50411, Estonia
| | | | - Hervé Ménager
- Institut Français de Bioinformatique, Evry, F-91000, France
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Paris, F-75015, France
| | - Steffen Neumann
- Research group Bioinformatics and Scientific Data, Leibniz Institute of Plant Biochemistry, Halle, 06120, Germany
| | - Penny Nymark
- Institute of Environmental Medicine, Karolinska Institute, Stockholm, 17177, Sweden
| | - Herbert Oberacher
- Institute of Legal Medicine and Core Facility Metabolomics, Medical University of Innsbruck, Innsbruck, A-6020, Austria
| | - Noelia Ramirez
- Institut d'Investigacio Sanitaria Pere Virgili-Universitat Rovira i Virgili, Tarragona, 43007, Spain
| | | | - Philippe Rocca-Serra
- Data Readiness Group, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Reza M. Salek
- International Agency for Research on Cancer, World Health Organisation, Lyon, 69372, France
| | - Brett Sallach
- Department of Environment and Geography, University of York, UK, York, YO10 5NG, UK
| | | | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University, Barcelona, 08003, Spain
| | | | | | - Tobias Schulze
- Helmholtz Centre for Environmental Research - UFZ, Leipzig, 04318, Germany
| | | | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, SE-75124, Sweden
| | - Jonathan Tedds
- ELIXIR Hub, Wellcome Genome Campus, Cambridge, CB10 1SD, UK
| | - Nikolaos Thomaidis
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Athens, 15771, Greece
| | - Ralf J.M. Weber
- School of Biosciences, University of Birmingham, UK, Birmingham, B15 2TT, UK
| | - Gerard J.P. van Westen
- Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research, Leiden, 2333 CC, The Netherlands
| | - Craig E. Wheelock
- Department of Respiratory Medicine and Allergy, Karolinska University Hospital, Stockholm SE-141-86, Sweden
- Department of Medical Biochemistry and Biophysics, Karolinska Institute, Stockholm, 17177, Sweden
| | - Antony J. Williams
- Center for Computational Toxicology and Exposure, United States Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | | | - Barbara Zdrazil
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | - Anže Županič
- Department Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, 1000, Slovenia
| | - Egon L. Willighagen
- Department of Bioinformatics - BiGCaT, Maastricht University, Maastricht, 6229 ER, The Netherlands
| |
Collapse
|
5
|
de Visser C, Johansson LF, Kulkarni P, Mei H, Neerincx P, Joeri van der Velde K, Horvatovich P, van Gool AJ, Swertz MA, Hoen PAC‘, Niehues A. Ten quick tips for building FAIR workflows. PLoS Comput Biol 2023; 19:e1011369. [PMID: 37768885 PMCID: PMC10538699 DOI: 10.1371/journal.pcbi.1011369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/30/2023] Open
Abstract
Research data is accumulating rapidly and with it the challenge of fully reproducible science. As a consequence, implementation of high-quality management of scientific data has become a global priority. The FAIR (Findable, Accesible, Interoperable and Reusable) principles provide practical guidelines for maximizing the value of research data; however, processing data using workflows-systematic executions of a series of computational tools-is equally important for good data management. The FAIR principles have recently been adapted to Research Software (FAIR4RS Principles) to promote the reproducibility and reusability of any type of research software. Here, we propose a set of 10 quick tips, drafted by experienced workflow developers that will help researchers to apply FAIR4RS principles to workflows. The tips have been arranged according to the FAIR acronym, clarifying the purpose of each tip with respect to the FAIR4RS principles. Altogether, these tips can be seen as practical guidelines for workflow developers who aim to contribute to more reproducible and sustainable computational science, aiming to positively impact the open science and FAIR community.
Collapse
Affiliation(s)
- Casper de Visser
- Medical BioSciences Department, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Lennart F. Johansson
- Genomics Coordination Center and Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Purva Kulkarni
- Medical BioSciences Department, Radboud University Medical Center, Nijmegen, the Netherlands
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, Nijmegen, the Netherlands
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Hailiang Mei
- Sequencing Analysis Support Core, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| | - Pieter Neerincx
- Genomics Coordination Center and Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - K. Joeri van der Velde
- Genomics Coordination Center and Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Péter Horvatovich
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, Groningen, the Netherlands
| | - Alain J. van Gool
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, Nijmegen, the Netherlands
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Morris A. Swertz
- Genomics Coordination Center and Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Peter A. C. ‘t Hoen
- Medical BioSciences Department, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Anna Niehues
- Medical BioSciences Department, Radboud University Medical Center, Nijmegen, the Netherlands
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, Nijmegen, the Netherlands
| |
Collapse
|
6
|
Caro H, Dollin S, Biton A, Brancotte B, Desvillechabrol D, Dufresne Y, Li B, Kornobis E, Lemoine F, Maillet N, Perrin A, Traut N, Néron B, Cokelaer T. BioConvert: a comprehensive format converter for life sciences. NAR Genom Bioinform 2023; 5:lqad074. [PMID: 37608802 PMCID: PMC10440784 DOI: 10.1093/nargab/lqad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 07/18/2023] [Accepted: 08/15/2023] [Indexed: 08/24/2023] Open
Abstract
Bioinformatics is a field known for the numerous standards and formats that have been developed over the years. This plethora of formats, sometimes complementary, and often redundant, poses many challenges to bioinformatics data analysts. They constantly need to find the best tool to convert their data into the suitable format, which is often a complex, technical and time consuming task. Moreover, these small yet important tasks are often difficult to make reproducible. To overcome these difficulties, we initiated BioConvert, a collaborative project to facilitate the conversion of life science data from one format to another. BioConvert aggregates existing software within a single framework and complemented them with original code when needed. It provides a common interface to make the user experience more streamlined instead of having to learn tens of them. Currently, BioConvert supports about 50 formats and 100 direct conversions in areas such as alignment, sequencing, phylogeny, and variant calling. In addition to being useful for end-users, BioConvert can also be utilized by developers as a universal benchmarking framework for evaluating and comparing numerous conversion tools. Additionally, we provide a web server implementing an online user-friendly interface to BioConvert, hence allowing direct use for the community.
Collapse
Affiliation(s)
- Hugo Caro
- Institut Pasteur, Université Paris Cité, Plate-forme Technologique Biomics, F-75015 Paris, France
| | - Sulyvan Dollin
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| | - Anne Biton
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| | - Bryan Brancotte
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| | - Dimitri Desvillechabrol
- Institut Pasteur, Université Paris Cité, Plate-forme Technologique Biomics, F-75015 Paris, France
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| | - Yoann Dufresne
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, Paris, France
| | - Blaise Li
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| | - Etienne Kornobis
- Institut Pasteur, Université Paris Cité, Plate-forme Technologique Biomics, F-75015 Paris, France
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| | - Frédéric Lemoine
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
- Institut Pasteur, Université Paris Cité, G5 Evolutionary Genomics of RNA Viruses, Paris, France
| | - Nicolas Maillet
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| | - Amandine Perrin
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
- Institut Pasteur, Université Paris Cité, CNRS UMR3525, Microbial Evolutionary Genomics, Paris, France
| | - Nicolas Traut
- Institut Pasteur, Université Paris Cité, Unité de Neuroanatomie Appliquée et Théorique, F-75015 Paris, France
| | - Bertrand Néron
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| | - Thomas Cokelaer
- Institut Pasteur, Université Paris Cité, Plate-forme Technologique Biomics, F-75015 Paris, France
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| |
Collapse
|
7
|
Patel B, Soundarajan S, Ménager H, Hu Z. Making Biomedical Research Software FAIR: Actionable Step-by-step Guidelines with a User-support Tool. Sci Data 2023; 10:557. [PMID: 37612312 PMCID: PMC10447492 DOI: 10.1038/s41597-023-02463-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/10/2023] [Indexed: 08/25/2023] Open
Abstract
Findable, Accessible, Interoperable, and Reusable (FAIR) guiding principles tailored for research software have been proposed by the FAIR for Research Software (FAIR4RS) Working Group. They provide a foundation for optimizing the reuse of research software. The FAIR4RS principles are, however, aspirational and do not provide practical instructions to the researchers. To fill this gap, we propose in this work the first actionable step-by-step guidelines for biomedical researchers to make their research software compliant with the FAIR4RS principles. We designate them as the FAIR Biomedical Research Software (FAIR-BioRS) guidelines. Our process for developing these guidelines, presented here, is based on an in-depth study of the FAIR4RS principles and a thorough review of current practices in the field. To support researchers, we have also developed a workflow that streamlines the process of implementing these guidelines. This workflow is incorporated in FAIRshare, a free and open-source software application aimed at simplifying the curation and sharing of FAIR biomedical data and software through user-friendly interfaces and automation. Details about this tool are also presented.
Collapse
Affiliation(s)
- Bhavesh Patel
- FAIR Data Innovations Hub, California Medical Innovations Institute, San Diego, CA, 92121, USA.
| | - Sanjay Soundarajan
- FAIR Data Innovations Hub, California Medical Innovations Institute, San Diego, CA, 92121, USA
| | - Hervé Ménager
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, 75015, Paris, France
| | - Zicheng Hu
- Computational Health Science, University of California San Francisco, San Francisco, CA, 94158, USA
| |
Collapse
|
8
|
Licata L, Via A, Turina P, Babbi G, Benevenuta S, Carta C, Casadio R, Cicconardi A, Facchiano A, Fariselli P, Giordano D, Isidori F, Marabotti A, Martelli PL, Pascarella S, Pinelli M, Pippucci T, Russo R, Savojardo C, Scafuri B, Valeriani L, Capriotti E. Resources and tools for rare disease variant interpretation. Front Mol Biosci 2023; 10:1169109. [PMID: 37234922 PMCID: PMC10206239 DOI: 10.3389/fmolb.2023.1169109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023] Open
Abstract
Collectively, rare genetic disorders affect a substantial portion of the world's population. In most cases, those affected face difficulties in receiving a clinical diagnosis and genetic characterization. The understanding of the molecular mechanisms of these diseases and the development of therapeutic treatments for patients are also challenging. However, the application of recent advancements in genome sequencing/analysis technologies and computer-aided tools for predicting phenotype-genotype associations can bring significant benefits to this field. In this review, we highlight the most relevant online resources and computational tools for genome interpretation that can enhance the diagnosis, clinical management, and development of treatments for rare disorders. Our focus is on resources for interpreting single nucleotide variants. Additionally, we present use cases for interpreting genetic variants in clinical settings and review the limitations of these results and prediction tools. Finally, we have compiled a curated set of core resources and tools for analyzing rare disease genomes. Such resources and tools can be utilized to develop standardized protocols that will enhance the accuracy and effectiveness of rare disease diagnosis.
Collapse
Affiliation(s)
- Luana Licata
- Department of Biology, University of Rome Tor Vergata, Roma, Italy
| | - Allegra Via
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Claudio Carta
- National Centre for Rare Diseases, Istituto Superiore di Sanità, Roma, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Andrea Cicconardi
- Department of Physics, University of Genova, Genova, Italy
- Italiano di Tecnologia—IIT, Genova, Italy
| | - Angelo Facchiano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Deborah Giordano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Federica Isidori
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Anna Marabotti
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Stefano Pascarella
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Michele Pinelli
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Roberta Russo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
- CEINGE Biotecnologie Avanzate Franco Salvatore, Napoli, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | | | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
9
|
Djaffardjy M, Marchment G, Sebe C, Blanchet R, Bellajhame K, Gaignard A, Lemoine F, Cohen-Boulakia S. Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems. Comput Struct Biotechnol J 2023; 21:2075-2085. [PMID: 36968012 PMCID: PMC10030817 DOI: 10.1016/j.csbj.2023.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 03/03/2023] [Accepted: 03/03/2023] [Indexed: 03/09/2023] Open
Abstract
Data analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a single user, it is now widely recognized that they are by no means enough to support the development of large-scale, shareable, maintainable and reusable pipelines capable of handling large volumes of data and running on high performance computing clusters. This review outlines the key requirements for building large-scale data pipelines and provides a mapping of existing solutions that fulfill them. We then highlight the benefits of using scientific workflow systems to get modular, reproducible and reusable bioinformatics data analysis pipelines. We finally discuss current workflow reuse practices based on an empirical study we performed on a large collection of workflows.
Collapse
|
10
|
Du X, Dastmalchi F, Ye H, Garrett TJ, Diller MA, Liu M, Hogan WR, Brochhausen M, Lemas DJ. Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software. Metabolomics 2023; 19:11. [PMID: 36745241 DOI: 10.1007/s11306-023-01974-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 01/20/2023] [Indexed: 02/07/2023]
Abstract
BACKGROUND Liquid chromatography-high resolution mass spectrometry (LC-HRMS) is a popular approach for metabolomics data acquisition and requires many data processing software tools. The FAIR Principles - Findability, Accessibility, Interoperability, and Reusability - were proposed to promote open science and reusable data management, and to maximize the benefit obtained from contemporary and formal scholarly digital publishing. More recently, the FAIR principles were extended to include Research Software (FAIR4RS). AIM OF REVIEW This study facilitates open science in metabolomics by providing an implementation solution for adopting FAIR4RS in the LC-HRMS metabolomics data processing software. We believe our evaluation guidelines and results can help improve the FAIRness of research software. KEY SCIENTIFIC CONCEPTS OF REVIEW We evaluated 124 LC-HRMS metabolomics data processing software obtained from a systematic review and selected 61 software for detailed evaluation using FAIR4RS-related criteria, which were extracted from the literature along with internal discussions. We assigned each criterion one or more FAIR4RS categories through discussion. The minimum, median, and maximum percentages of criteria fulfillment of software were 21.6%, 47.7%, and 71.8%. Statistical analysis revealed no significant improvement in FAIRness over time. We identified four criteria covering multiple FAIR4RS categories but had a low %fulfillment: (1) No software had semantic annotation of key information; (2) only 6.3% of evaluated software were registered to Zenodo and received DOIs; (3) only 14.5% of selected software had official software containerization or virtual machine; (4) only 16.7% of evaluated software had a fully documented functions in code. According to the results, we discussed improvement strategies and future directions.
Collapse
Affiliation(s)
- Xinsong Du
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Farhad Dastmalchi
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Hao Ye
- Health Science Center Libraries, University of Florida, Florida, USA
| | - Timothy J Garrett
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Florida, USA
| | - Matthew A Diller
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mei Liu
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mathias Brochhausen
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, USA
| | - Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA.
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Florida, Gainesville, United States.
- Center for Perinatal Outcomes Research, University of Florida College of Medicine, Gainesville, United States.
| |
Collapse
|
11
|
Garijo D, Ménager H, Hwang L, Trisovic A, Hucka M, Morrell T, Allen A. Nine best practices for research software registries and repositories. PeerJ Comput Sci 2022; 8:e1023. [PMID: 36092012 PMCID: PMC9455149 DOI: 10.7717/peerj-cs.1023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 06/09/2022] [Indexed: 06/15/2023]
Abstract
Scientific software registries and repositories improve software findability and research transparency, provide information for software citations, and foster preservation of computational methods in a wide range of disciplines. Registries and repositories play a critical role by supporting research reproducibility and replicability, but developing them takes effort and few guidelines are available to help prospective creators of these resources. To address this need, the FORCE11 Software Citation Implementation Working Group convened a Task Force to distill the experiences of the managers of existing resources in setting expectations for all stakeholders. In this article, we describe the resultant best practices which include defining the scope, policies, and rules that govern individual registries and repositories, along with the background, examples, and collaborative work that went into their development. We believe that establishing specific policies such as those presented here will help other scientific software registries and repositories better serve their users and their disciplines.
Collapse
Affiliation(s)
| | - Hervé Ménager
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, Paris, France
| | - Lorraine Hwang
- University of California, Davis, Davis, California, United States
| | - Ana Trisovic
- Harvard University, Boston, Massachusetts, United States
| | - Michael Hucka
- California Institute of Technology, Pasadena, California, United States
| | - Thomas Morrell
- California Institute of Technology, Pasadena, California, United States
| | - Alice Allen
- University of Maryland, College Park, MD, United States
| | | | | |
Collapse
|
12
|
Serrano-Solano B, Fouilloux A, Eguinoa I, Kalaš M, Grüning B, Coppens F. Galaxy: A Decade of Realising CWFR Concepts. DATA INTELLIGENCE 2022. [DOI: 10.1162/dint_a_00136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
Despite recent encouragement to follow the FAIR principles, the day-to-day research practices have not changed substantially. Due to new developments and the increasing pressure to apply best practices, initiatives to improve the efficiency and reproducibility of scientific workflows are becoming more prevalent. In this article, we discuss the importance of well-annotated tools and the specific requirements to ensure reproducible research with FAIR outputs. We detail how Galaxy, an open-source workflow management system with a web-based interface, has implemented the concepts that are put forward by the Canonical Workflow Framework for Research (CWFR), whilst minimising changes to the practices of scientific communities. Although we showcase concrete applications from two different domains, this approach is generalisable to any domain and particularly useful in interdisciplinary research and science-based applications.
Collapse
Affiliation(s)
| | - Anne Fouilloux
- Department of Geosciences, University of Oslo, Oslo 0316, Norway
| | - Ignacio Eguinoa
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB, Gent, Oost-Vlaanderen 9052, Belgium
| | - Matúš Kalaš
- Department of Informatics, University of Bergen Ringgold standard institution, University of Bergen, Bergen, Hordaland 5008, Norway
| | - Björn Grüning
- Bioinformatics Group, University of Freiburg, Baden-Württemberg 79098, Germany
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB, Gent, Oost-Vlaanderen 9052, Belgium
| |
Collapse
|
13
|
Noor A. Improving bioinformatics software quality through incorporation of software engineering practices. PeerJ Comput Sci 2022; 8:e839. [PMID: 35111923 PMCID: PMC8771759 DOI: 10.7717/peerj-cs.839] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 12/13/2021] [Indexed: 06/14/2023]
Abstract
BACKGROUND Bioinformatics software is developed for collecting, analyzing, integrating, and interpreting life science datasets that are often enormous. Bioinformatics engineers often lack the software engineering skills necessary for developing robust, maintainable, reusable software. This study presents review and discussion of the findings and efforts made to improve the quality of bioinformatics software. METHODOLOGY A systematic review was conducted of related literature that identifies core software engineering concepts for improving bioinformatics software development: requirements gathering, documentation, testing, and integration. The findings are presented with the aim of illuminating trends within the research that could lead to viable solutions to the struggles faced by bioinformatics engineers when developing scientific software. RESULTS The findings suggest that bioinformatics engineers could significantly benefit from the incorporation of software engineering principles into their development efforts. This leads to suggestion of both cultural changes within bioinformatics research communities as well as adoption of software engineering disciplines into the formal education of bioinformatics engineers. Open management of scientific bioinformatics development projects can result in improved software quality through collaboration amongst both bioinformatics engineers and software engineers. CONCLUSIONS While strides have been made both in identification and solution of issues of particular import to bioinformatics software development, there is still room for improvement in terms of shifts in both the formal education of bioinformatics engineers as well as the culture and approaches of managing scientific bioinformatics research and development efforts.
Collapse
|
14
|
Turning Data to Knowledge: Online Tools, Databases, and Resources in microRNA Research. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1385:133-160. [DOI: 10.1007/978-3-031-08356-3_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
15
|
Tangaro MA, Mandreoli P, Chiara M, Donvito G, Antonacci M, Parisi A, Bianco A, Romano A, Bianchi DM, Cangelosi D, Uva P, Molineris I, Nosi V, Calogero RA, Alessandri L, Pedrini E, Mordenti M, Bonetti E, Sangiorgi L, Pesole G, Zambelli F. Laniakea@ReCaS: exploring the potential of customisable Galaxy on-demand instances as a cloud-based service. BMC Bioinformatics 2021; 22:544. [PMID: 34749633 PMCID: PMC8574934 DOI: 10.1186/s12859-021-04401-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 09/24/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Improving the availability and usability of data and analytical tools is a critical precondition for further advancing modern biological and biomedical research. For instance, one of the many ramifications of the COVID-19 global pandemic has been to make even more evident the importance of having bioinformatics tools and data readily actionable by researchers through convenient access points and supported by adequate IT infrastructures. One of the most successful efforts in improving the availability and usability of bioinformatics tools and data is represented by the Galaxy workflow manager and its thriving community. In 2020 we introduced Laniakea, a software platform conceived to streamline the configuration and deployment of "on-demand" Galaxy instances over the cloud. By facilitating the set-up and configuration of Galaxy web servers, Laniakea provides researchers with a powerful and highly customisable platform for executing complex bioinformatics analyses. The system can be accessed through a dedicated and user-friendly web interface that allows the Galaxy web server's initial configuration and deployment. RESULTS "Laniakea@ReCaS", the first instance of a Laniakea-based service, is managed by ELIXIR-IT and was officially launched in February 2020, after about one year of development and testing that involved several users. Researchers can request access to Laniakea@ReCaS through an open-ended call for use-cases. Ten project proposals have been accepted since then, totalling 18 Galaxy on-demand virtual servers that employ ~ 100 CPUs, ~ 250 GB of RAM and ~ 5 TB of storage and serve several different communities and purposes. Herein, we present eight use cases demonstrating the versatility of the platform. CONCLUSIONS During this first year of activity, the Laniakea-based service emerged as a flexible platform that facilitated the rapid development of bioinformatics tools, the efficient delivery of training activities, and the provision of public bioinformatics services in different settings, including food safety and clinical research. Laniakea@ReCaS provides a proof of concept of how enabling access to appropriate, reliable IT resources and ready-to-use bioinformatics tools can considerably streamline researchers' work.
Collapse
Affiliation(s)
- Marco Antonio Tangaro
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy
- National Institute for Nuclear Physics (INFN), Section of Bari, Via Orabona 4, 70126, Bari, Italy
| | - Pietro Mandreoli
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy
- Department of Biosciences, University of Milan, Via Celoria 26, 20133, Milano, Italy
| | - Matteo Chiara
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy
- Department of Biosciences, University of Milan, Via Celoria 26, 20133, Milano, Italy
| | - Giacinto Donvito
- National Institute for Nuclear Physics (INFN), Section of Bari, Via Orabona 4, 70126, Bari, Italy
| | - Marica Antonacci
- National Institute for Nuclear Physics (INFN), Section of Bari, Via Orabona 4, 70126, Bari, Italy
| | - Antonio Parisi
- Istituto Zooprofilattico Sperimentale Della Puglia e Della Basilicata, Via Manfredonia 20, 71121, Foggia, Italy
| | - Angelica Bianco
- Istituto Zooprofilattico Sperimentale Della Puglia e Della Basilicata, Via Manfredonia 20, 71121, Foggia, Italy
| | - Angelo Romano
- National Reference Laboratory for Coagulase-Positive Staphylococci Including Staphylococcus Aureus, Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy
| | - Daniela Manila Bianchi
- National Reference Laboratory for Coagulase-Positive Staphylococci Including Staphylococcus Aureus, Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy
| | - Davide Cangelosi
- Clinical Bioinformatics Unit, Scientific Direction, IRCCS Istituto Giannina Gaslini, Via Gerolamo Gaslini 5, 16147, Genova, Italy
| | - Paolo Uva
- Clinical Bioinformatics Unit, Scientific Direction, IRCCS Istituto Giannina Gaslini, Via Gerolamo Gaslini 5, 16147, Genova, Italy
- Italian Institute of Technology, Via Morego 30, 16163, Genova, Italy
| | - Ivan Molineris
- Department of Life Science and System Biology, University of Turin, Via Accademia Albertina, 13-1023, Turin, Italy
| | - Vladimir Nosi
- Department of Computer Science, University of Turin, Via Pessinetto 12, 10049, Turin, Italy
| | - Raffaele A Calogero
- Department of Molecular Biotechnology and Health Sciences, Via Nizza 52, 10126, Turin, Italy
| | - Luca Alessandri
- Department of Molecular Biotechnology and Health Sciences, Via Nizza 52, 10126, Turin, Italy
| | - Elena Pedrini
- Department of Rare Skeletal Disorders, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136, Bologna, Italy
| | - Marina Mordenti
- Department of Rare Skeletal Disorders, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136, Bologna, Italy
| | - Emanuele Bonetti
- Department of Rare Skeletal Disorders, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136, Bologna, Italy
- Department of Experimental Oncology, European Institute of Oncology, Via Adamello 16, 20139, Milan, Italy
| | - Luca Sangiorgi
- Department of Rare Skeletal Disorders, IRCCS Istituto Ortopedico Rizzoli, Via di Barbiano 1/10, 40136, Bologna, Italy
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy.
- Department of Biosciences, Biotechnologies and Biopharmaceutics, University of Bari, Via Orabona 4, 70126, Bari, Italy.
| | - Federico Zambelli
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Via Giovanni Amendola 122/O, 70126, Bari, Italy.
- Department of Biosciences, University of Milan, Via Celoria 26, 20133, Milano, Italy.
| |
Collapse
|
16
|
Converting Biomedical Text Annotated Resources into FAIR Research Objects with an Open Science Platform. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11209648] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Today, there are excellent resources for the semantic annotation of biomedical text. These resources span from ontologies, tools for NLP, annotators, and web services. Most of these are available either in the form of open source components (i.e., MetaMap) or as web services that offer free access (i.e., Whatizit). In order to use these resources in automatic text annotation pipelines, researchers face significant technical challenges. For open-source tools, the challenges include the setting up of the computational environment, the resolution of dependencies, as well as the compilation and installation of the software. For web services, the challenge is implementing clients to undertake communication with the respective web APIs. Even resources that are available as Docker containers (i.e., NCBO annotator) require significant technical skills for installation and setup. This work deals with the task of creating ready-to-install and run Research Objects (ROs) for a large collection of components in biomedical text analysis. These components include (a) tools such as cTAKES, NOBLE Coder, MetaMap, NCBO annotator, BeCAS, and Neji; (b) ontologies from BioPortal, NCBI BioSystems, and Open Biomedical Ontologies; and (c) text corpora such as BC4GO, Mantra Gold Standard Corpus, and the COVID-19 Open Research Dataset. We make these resources available in OpenBio.eu, an open-science RO repository and workflow management system. All ROs can be searched, shared, edited, downloaded, commented on, and rated. We also demonstrate how one can easily connect these ROs to form a large variety of text annotation pipelines.
Collapse
|
17
|
Mayer G, Müller W, Schork K, Uszkoreit J, Weidemann A, Wittig U, Rey M, Quast C, Felden J, Glöckner FO, Lange M, Arend D, Beier S, Junker A, Scholz U, Schüler D, Kestler HA, Wibberg D, Pühler A, Twardziok S, Eils J, Eils R, Hoffmann S, Eisenacher M, Turewicz M. Implementing FAIR data management within the German Network for Bioinformatics Infrastructure (de.NBI) exemplified by selected use cases. Brief Bioinform 2021; 22:bbab010. [PMID: 33589928 PMCID: PMC8425304 DOI: 10.1093/bib/bbab010] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 12/21/2020] [Accepted: 01/06/2021] [Indexed: 12/21/2022] Open
Abstract
This article describes some use case studies and self-assessments of FAIR status of de.NBI services to illustrate the challenges and requirements for the definition of the needs of adhering to the FAIR (findable, accessible, interoperable and reusable) data principles in a large distributed bioinformatics infrastructure. We address the challenge of heterogeneity of wet lab technologies, data, metadata, software, computational workflows and the levels of implementation and monitoring of FAIR principles within the different bioinformatics sub-disciplines joint in de.NBI. On the one hand, this broad service landscape and the excellent network of experts are a strong basis for the development of useful research data management plans. On the other hand, the large number of tools and techniques maintained by distributed teams renders FAIR compliance challenging.
Collapse
Affiliation(s)
- Gerhard Mayer
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
- Ulm University, Institute of Medical Systems Biology, Ulm, Germany
| | - Wolfgang Müller
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | - Karin Schork
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| | - Julian Uszkoreit
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| | - Andreas Weidemann
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | - Ulrike Wittig
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | - Maja Rey
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | | | - Janine Felden
- Jacobs University Bremen gGmbH, Bremen, Germany
- University of Bremen, MARUM - Center for Marine Environmental Sciences, Bremen, Germany
| | - Frank Oliver Glöckner
- Jacobs University Bremen gGmbH, Bremen, Germany
- University of Bremen, MARUM - Center for Marine Environmental Sciences, Bremen, Germany
- Alfred Wegener Institute - Helmholtz Center for Polar- and Marine Research, Bremerhaven, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Sebastian Beier
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Hans A Kestler
- Ulm University, Institute of Medical Systems Biology, Ulm, Germany
- Leibniz Institute on Ageing - Fritz Lipmann Institute, Jena
| | - Daniel Wibberg
- Bielefeld University, Center for Biotechnology (CeBiTec), Bielefeld, Germany
| | - Alfred Pühler
- Bielefeld University, Center for Biotechnology (CeBiTec), Bielefeld, Germany
| | - Sven Twardziok
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
| | - Jürgen Eils
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
| | - Roland Eils
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
- Heidelberg University Hospital and BioQuant, Health Data Science Unit, Heidelberg, Germany
| | - Steve Hoffmann
- Leibniz Institute on Ageing - Fritz Lipmann Institute, Jena
| | - Martin Eisenacher
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| | - Michael Turewicz
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| |
Collapse
|
18
|
Fehlmann T, Kern F, Hirsch P, Steinhaus R, Seelow D, Keller A. Aviator: a web service for monitoring the availability of web services. Nucleic Acids Res 2021; 49:W46-W51. [PMID: 34038559 PMCID: PMC8262725 DOI: 10.1093/nar/gkab396] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 04/26/2021] [Accepted: 04/28/2021] [Indexed: 02/06/2023] Open
Abstract
With Aviator, we present a web service and repository that facilitates surveillance of online tools. Aviator consists of a user-friendly website and two modules, a literature-mining based general and a manually curated module. The general module currently checks 9417 websites twice a day with respect to their availability and stores many features (frontend and backend response time, required RAM and size of the web page, security certificates, analytic tools and trackers embedded in the webpage and others) in a data warehouse. Aviator is also equipped with an analysis functionality, for example authors can check and evaluate the availability of their own tools or those of their peers. Likewise, users can check the availability of a certain tool they intend to use in research or teaching to avoid including unstable tools. The curated section of Aviator offers additional services. We provide API snippets for common programming languages (Perl, PHP, Python, JavaScript) as well as an OpenAPI documentation for embedding in the backend of own web services for an automatic test of their function. We query the respective APIs twice a day and send automated notifications in case of an unexpected result. Naturally, the same analysis functionality as for the literature-based module is available for the curated section. Aviator can freely be used at https://www.ccb.uni-saarland.de/aviator.
Collapse
Affiliation(s)
- Tobias Fehlmann
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | - Fabian Kern
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | - Pascal Hirsch
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | - Robin Steinhaus
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany.,Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, 13353 Berlin, Germany
| | - Dominik Seelow
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, 10117 Berlin, Germany.,Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, 13353 Berlin, Germany
| | - Andreas Keller
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Center for Bioinformatics, Saarland Informatics Campus, Saarland University, 66123 Saarbrücken, Germany.,Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
19
|
Duvaud S, Gabella C, Lisacek F, Stockinger H, Ioannidis V, Durinx C. Expasy, the Swiss Bioinformatics Resource Portal, as designed by its users. Nucleic Acids Res 2021; 49:W216-W227. [PMID: 33849055 PMCID: PMC8265094 DOI: 10.1093/nar/gkab225] [Citation(s) in RCA: 265] [Impact Index Per Article: 88.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/11/2021] [Accepted: 04/01/2021] [Indexed: 12/16/2022] Open
Abstract
The SIB Swiss Institute of Bioinformatics (https://www.sib.swiss) creates, maintains and disseminates a portfolio of reliable and state-of-the-art bioinformatics services and resources for the storage, analysis and interpretation of biological data. Through Expasy (https://www.expasy.org), the Swiss Bioinformatics Resource Portal, the scientific community worldwide, freely accesses more than 160 SIB resources supporting a wide range of life science and biomedical research areas. In 2020, Expasy was redesigned through a user-centric approach, known as User-Centred Design (UCD), whose aim is to create user interfaces that are easy-to-use, efficient and targeting the intended community. This approach, widely used in other fields such as marketing, e-commerce, and design of mobile applications, is still scarcely explored in bioinformatics. In total, around 50 people were actively involved, including internal stakeholders and end-users. In addition to an optimised interface that meets users' needs and expectations, the new version of Expasy provides an up-to-date and accurate description of high-quality resources based on a standardised ontology, allowing to connect functionally-related resources.
Collapse
Affiliation(s)
- Séverine Duvaud
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| | - Chiara Gabella
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| | - Frédérique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, and Computer Science Department, University of Geneva, CH-1227 Geneva, Switzerland.,Section of Biology, University of Geneva, CH-1205 Geneva, Switzerland
| | - Heinz Stockinger
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| | - Vassilios Ioannidis
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| | - Christine Durinx
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Bâtiment Amphipôle, CH-1015 Lausanne, Switzerland
| |
Collapse
|
20
|
Paul-Gilloteaux P, Tosi S, Hériché JK, Gaignard A, Ménager H, Marée R, Baecker V, Klemm A, Kalaš M, Zhang C, Miura K, Colombelli J. Bioimage analysis workflows: community resources to navigate through a complex ecosystem. F1000Res 2021; 10:320. [PMID: 34136134 PMCID: PMC8182692 DOI: 10.12688/f1000research.52569.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/14/2021] [Indexed: 11/20/2022] Open
Abstract
Workflows are the keystone of bioimage analysis, and the NEUBIAS (Network of European BioImage AnalystS) community is trying to gather the actors of this field and organize the information around them. One of its most recent outputs is the opening of the F1000Research NEUBIAS gateway, whose main objective is to offer a channel of publication for bioimage analysis workflows and associated resources. In this paper we want to express some personal opinions and recommendations related to finding, handling and developing bioimage analysis workflows. The emergence of "big data" in bioimaging and resource-intensive analysis algorithms make local data storage and computing solutions a limiting factor. At the same time, the need for data sharing with collaborators and a general shift towards remote work, have created new challenges and avenues for the execution and sharing of bioimage analysis workflows. These challenges are to reproducibly run workflows in remote environments, in particular when their components come from different software packages, but also to document them and link their parameters and results by following the FAIR principles (Findable, Accessible, Interoperable, Reusable) to foster open and reproducible science. In this opinion paper, we focus on giving some directions to the reader to tackle these challenges and navigate through this complex ecosystem, in order to find and use workflows, and to compare workflows addressing the same problem. We also discuss tools to run workflows in the cloud and on High Performance Computing resources, and suggest ways to make these workflows FAIR.
Collapse
Affiliation(s)
- Perrine Paul-Gilloteaux
- Université de Nantes, CNRS, INSERM, l’institut du thorax, Nantes, F-44000, France
- Université de Nantes, CHU Nantes, Inserm, CNRS, SFR Santé, Inserm UMS 016, CNRS UMS 3556, Nantes, F-44000, France
| | - Sébastien Tosi
- Institute for Research in Biomedicine, IRB Barcelona, Barcelona Institute of Science and Technology, BIST, Barcelona, Spain
| | - Jean-Karim Hériché
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory, Heidelberg, 69117, Germany
| | - Alban Gaignard
- Université de Nantes, CNRS, INSERM, l’institut du thorax, Nantes, F-44000, France
| | - Hervé Ménager
- Hub de Bioinformatique et Biostatistique, Département Biologie Computationnelle, Institut Pasteur, USR 3756, CNRS, Paris, 75015, France
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, Evry, 91000, France
| | - Raphaël Marée
- Montefiore Institute, University of Liège, Liège, Belgium
| | - Volker Baecker
- Montpellier Ressources Imagerie, BioCampus Montpellier, CNRS, INSERM, University of Montpellier, Montpellier, F-34000, France
| | - Anna Klemm
- BioImage Informatics Facility, SciLifeLab, Stockholm, Sweden
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Chong Zhang
- Department of Information and Communication Technologies, University Pompeu Fabra, Barcelona, Spain
| | - Kota Miura
- Nikon Imaging Center, University of Heidelberg, Heidelberg, Germany
| | - Julien Colombelli
- Institute for Research in Biomedicine, IRB Barcelona, Barcelona Institute of Science and Technology, BIST, Barcelona, Spain
| |
Collapse
|
21
|
Bai J, Bandla C, Guo J, Alvarez RV, Bai M, Vizcaíno JA, Moreno P, Grüning B, Sallou O, Perez-Riverol Y. BioContainers Registry: Searching Bioinformatics and Proteomics Tools, Packages, and Containers. J Proteome Res 2021; 20:2056-2061. [PMID: 33625229 PMCID: PMC7611561 DOI: 10.1021/acs.jproteome.0c00904] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
BioContainers is an open-source project that aims to create, store, and distribute bioinformatics software containers and packages. The BioContainers community has developed a set of guidelines to standardize software containers including the metadata, versions, licenses, and software dependencies. BioContainers supports multiple packaging and container technologies such as Conda, Docker, and Singularity. The BioContainers provide over 9000 bioinformatics tools, including more than 200 proteomics and mass spectrometry tools. Here we introduce the BioContainers Registry and Restful API to make containerized bioinformatics tools more findable, accessible, interoperable, and reusable (FAIR). The BioContainers Registry provides a fast and convenient way to find and retrieve bioinformatics tool packages and containers. By doing so, it will increase the use of bioinformatics packages and containers while promoting replicability and reproducibility in research.
Collapse
Affiliation(s)
- Jingwen Bai
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chakradhar Bandla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jiaxin Guo
- College of Bioinformation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Roberto Vera Alvarez
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mingze Bai
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing, 400065, China
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Björn Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg,79110, Germany
| | - Olivier Sallou
- Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA/INRIA) -GenOuest Platform, Université de Rennes, Rennes, France
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
22
|
Schwämmle V, Harrow J, Ienasescu H. Proteomics Software in bio.tools: Coverage and Annotations. J Proteome Res 2021; 20:1821-1825. [PMID: 33720718 DOI: 10.1021/acs.jproteome.0c00978] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The large diversity of experimental methods in proteomics as well as their increasing usage across biological and clinical research has led to the development of hundreds if not thousands of software tools to aid in the analysis and interpretation of the resulting data. Detailed information about these tools needs to be collected, categorized, and validated to guarantee their optimal utilization. A tools registry like bio.tools enables users and developers to identify new tools with more powerful algorithms or to find tools with similar functions for comparison. Here we present the content of the registry, which now comprises more than 1000 proteomics tool entries. Furthermore, we discuss future applications and engagement with other community efforts resulting in a high impact on the bioinformatics landscape.
Collapse
Affiliation(s)
- Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Jennifer Harrow
- ELIXIR-Hub, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Hans Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| |
Collapse
|
23
|
Ison J, Ienasescu H, Rydza E, Chmura P, Rapacki K, Gaignard A, Schwämmle V, van Helden J, Kalaš M, Ménager H. biotoolsSchema: a formalized schema for bioinformatics software description. Gigascience 2021; 10:giaa157. [PMID: 33506265 PMCID: PMC7842104 DOI: 10.1093/gigascience/giaa157] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 11/10/2020] [Accepted: 12/07/2020] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description-and cataloguing-of bioinformatics resources. FINDINGS Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability. CONCLUSIONS biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.
Collapse
Affiliation(s)
- Jon Ison
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
| | - Hans Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| | - Emil Rydza
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200 København, Denmark
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200 København, Denmark
| | - Kristoffer Rapacki
- Department of Health Technology, Ørsteds Plads, Building 345C, DK-2800 Kongens, Lyngby, Denmark
| | - Alban Gaignard
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- L'institut du Thorax, INSERM, CNRS, University of Nantes, 44007 Nantes, France
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Jacques van Helden
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- Département de Biologie, Aix-Marseille Université (AMU), 3 place Victor Hugo, 13003 Marseille, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5008 Bergen, Norway
| | - Hervé Ménager
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- Hub de Bioinformatique et Biostatistique–Département Biologie Computationnelle, Institut Pasteur, USR 3756, CNRS, Paris 75015, France
| |
Collapse
|
24
|
Jespersgaard C, Syed A, Chmura P, Løngreen P. Supercomputing and Secure Cloud Infrastructures in Biology and Medicine. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-012920-013357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The increasing amounts of healthcare data stored in health registries, in combination with genomic and other types of data, have the potential to enable better decision making and pave the path for personalized medicine. However, reaping the full benefits of big, sensitive data for the benefit of patients requires greater access to data across organizations and institutions in various regions. This overview first introduces cloud computing and takes stock of the challenges to enhancing data availability in the healthcare system. Four models for ensuring higher data accessibility are then discussed. Finally, several cases are discussed that explore how enhanced access to data would benefit the end user.
Collapse
Affiliation(s)
| | - Ali Syed
- Danish National Genome Center, DK-2300 Copenhagen S, Denmark
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Peter Løngreen
- Danish National Genome Center, DK-2300 Copenhagen S, Denmark
| |
Collapse
|
25
|
Lamprecht AL, Garcia L, Kuzak M, Martinez C, Arcila R, Martin Del Pico E, Dominguez Del Angel V, van de Sandt S, Ison J, Martinez PA, McQuilton P, Valencia A, Harrow J, Psomopoulos F, Gelpi JL, Chue Hong N, Goble C, Capella-Gutierrez S. Towards FAIR principles for research software. ACTA ACUST UNITED AC 2020. [DOI: 10.3233/ds-190026] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
| | - Leyla Garcia
- ZBMED Information Centre for Life Sciences, Germany. E-mail:
| | - Mateusz Kuzak
- Netherlands eScience Center, The Netherlands
- Dutch Techcentre for Life Sciences, The Netherlands. E-mail:
| | | | | | | | | | | | - Jon Ison
- National Life Science Supercomputing Center, Technical University of Denmark, Denmark. E-mail:
| | | | | | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Spain. E-mail:
| | | | | | - Josep Ll. Gelpi
- Barcelona Supercomputing Center (BSC), Spain
- University of Barcelona, Spain. E-mail:
| | - Neil Chue Hong
- Software Sustainability Institute, UK
- EPCC, University of Edinburgh, UK. E-mail:
| | | | | |
Collapse
|
26
|
Tsiamis V, Ienasescu HI, Gabrielaitis D, Palmblad M, Schwämmle V, Ison J. One Thousand and One Software for Proteomics: Tales of the Toolmakers of Science. J Proteome Res 2019; 18:3580-3585. [DOI: 10.1021/acs.jproteome.9b00219] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Vasileios Tsiamis
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | | | | | - Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Postzone S3-P, Postbus 9600, 2300 RC Leiden, The Netherlands
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | | |
Collapse
|