1
|
Harris NL, Fields CJ, Hokamp K, Just J, Khetani R, Maia J, Ménager H, Munoz-Torres MC, Unni D, Williams J. BOSC 2023, the 24th annual Bioinformatics Open Source Conference. F1000Res 2023; 12:1568. [PMID: 38076297 PMCID: PMC10704065 DOI: 10.12688/f1000research.143015.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
The 24th annual Bioinformatics Open Source Conference ( BOSC 2023) was part of the 2023i conference on Intelligent Systems for Molecular Biology and the European Conference on Computational Biology (ISMB/ECCB 2023). Launched in 2000 and held yearly since, BOSC is the premier meeting covering open-source bioinformatics and open science. Like ISMB 2022, the 2023 meeting was a hybrid conference, with the in-person component hosted in Lyon, France. ISMB/ECCB attracted a near-record number of attendees, with over 2100 in person and about 900 more online. Approximately 200 people participated in BOSC sessions. In addition to 43 talks and 49 posters, BOSC featured two keynotes: Sara El-Gebali, who spoke about "A New Odyssey: Pioneering the Future of Scientific Progress Through Open Collaboration", and Joseph Yracheta, who spoke about "The Dissonance between Scientific Altruism & Capitalist Extraction: The Zero Trust and Federated Data Sovereignty Solution." Once again, a joint session brought together BOSC and the Bio-Ontologies COSI. The conference ended with a panel on Open and Ethical Data Sharing. As in prior years, BOSC was preceded by a CollaborationFest, a collaborative work event that brought together about 40 participants interested in synergistically combining ideas, shaping project plans, developing software, and more.
Collapse
Affiliation(s)
- Nomi L. Harris
- Lawrence Berkeley National Laboratory, Berkeley, California, 94720, USA
| | - Christopher J. Fields
- Carver Biotechnology Center, University of Illinois Urbana-Champaign, Urbana, Illinois, 61801, USA
| | - Karsten Hokamp
- Smurfit Institute of Genetics, Trinity College of Dublin, Dublin, D02 PN40, Ireland
| | - Jérémy Just
- Ecole Normale Superieure de Lyon, Lyon, Auvergne-Rhône-Alpes, 69364, France
| | - Radhika Khetani
- Bioinformatics Core, Harvard T.H. Chan School of Public Health, Cambridge, Massachusetts, 02115, USA
| | - Jessica Maia
- BD Technologies and Innovation, Research Triangle Park, North Carolina, 27709, USA
| | | | | | - Deepak Unni
- Swiss Institute of Bioinformatics, Basel, 4051, Switzerland
| | - Jason Williams
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11724, USA
| |
Collapse
|
2
|
Suzuki Y, Ménager H, Brancotte B, Vernet R, Nerin C, Boetto C, Auvergne A, Linhard C, Torchet R, Lechat P, Troubat L, Cho MH, Bouzigon E, Aschard H, Julienne H. Trait selection strategy in multi-trait GWAS: Boosting SNPs discoverability. bioRxiv 2023:2023.10.27.564319. [PMID: 37961722 PMCID: PMC10634875 DOI: 10.1101/2023.10.27.564319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Since the first Genome-Wide Association Studies (GWAS), thousands of variant-trait associations have been discovered. However, the sample size required to detect additional variants using standard univariate association screening is increasingly prohibitive. Multi-trait GWAS offers a relevant alternative: it can improve statistical power and lead to new insights about gene function and the joint genetic architecture of human phenotypes. Although many methodological hurdles of multi-trait testing have been discussed, the strategy to select trait, among overwhelming possibilities, has been overlooked. In this study, we conducted extensive multi-trait tests using JASS (Joint Analysis of Summary Statistics) and assessed which genetic features of the analysed sets were associated with an increased detection of variants as compared to univariate screening. Our analyses identified multiple factors associated with the gain in the association detection in multi-trait tests. Together, these factors of the analysed sets are predictive of the gain of the multi-trait test (Pearson's ρ equal to 0.43 between the observed and predicted gain, P < 1.6 × 10-60). Applying an alternative multi-trait approach (MTAG, multi-trait analysis of GWAS), we found that in most scenarios but particularly those with larger numbers of traits, JASS outperformed MTAG. Finally, we benchmark several strategies to select set of traits including the prevalent strategy of selecting clinically similar traits, which systematically underperformed selecting clinically heterogenous traits or selecting sets that issued from our data-driven models. This work provides a unique picture of the determinant of multi-trait GWAS statistical power and outline practical strategies for multi-trait testing.
Collapse
Affiliation(s)
- Yuka Suzuki
- Institut Pasteur, Université Paris Cité, Department of Computational Biology, Paris, 75015 France
| | - Hervé Ménager
- Institut Pasteur, Université Paris Cité, Bioinformatics of Biostatistics Hub, F-75015 Paris, France
| | - Bryan Brancotte
- Institut Pasteur, Université Paris Cité, Bioinformatics of Biostatistics Hub, F-75015 Paris, France
| | - Raphaël Vernet
- Université Paris Cité, Institut National de la Santé et de la Recherche Médicale (INSERM), UMR-1124, Group of Genomic Epidemiology of Multifactorial Diseases, Paris, France
| | - Cyril Nerin
- Institut Pasteur, Université Paris Cité, Department of Computational Biology, Paris, 75015 France
| | - Christophe Boetto
- Institut Pasteur, Université Paris Cité, Department of Computational Biology, Paris, 75015 France
| | - Antoine Auvergne
- Institut Pasteur, Université Paris Cité, Department of Computational Biology, Paris, 75015 France
| | - Christophe Linhard
- Université Paris Cité, Institut National de la Santé et de la Recherche Médicale (INSERM), UMR-1124, Group of Genomic Epidemiology of Multifactorial Diseases, Paris, France
| | - Rachel Torchet
- Institut Pasteur, Université Paris Cité, Bioinformatics of Biostatistics Hub, F-75015 Paris, France
| | - Pierre Lechat
- Institut Pasteur, Université Paris Cité, Bioinformatics of Biostatistics Hub, F-75015 Paris, France
| | - Lucie Troubat
- Institut Pasteur, Université Paris Cité, Department of Computational Biology, Paris, 75015 France
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, 181 Longwood Ave, Boston, MA, 02115, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Emmanuelle Bouzigon
- Université Paris Cité, Institut National de la Santé et de la Recherche Médicale (INSERM), UMR-1124, Group of Genomic Epidemiology of Multifactorial Diseases, Paris, France
| | - Hugues Aschard
- Institut Pasteur, Université Paris Cité, Department of Computational Biology, Paris, 75015 France
| | - Hanna Julienne
- Institut Pasteur, Université Paris Cité, Department of Computational Biology, Paris, 75015 France
- Institut Pasteur, Université Paris Cité, Bioinformatics of Biostatistics Hub, F-75015 Paris, France
| |
Collapse
|
3
|
Martens M, Stierum R, Schymanski EL, Evelo CT, Aalizadeh R, Aladjov H, Arturi K, Audouze K, Babica P, Berka K, Bessems J, Blaha L, Bolton EE, Cases M, Damalas DΕ, Dave K, Dilger M, Exner T, Geerke DP, Grafström R, Gray A, Hancock JM, Hollert H, Jeliazkova N, Jennen D, Jourdan F, Kahlem P, Klanova J, Kleinjans J, Kondic T, Kone B, Lynch I, Maran U, Martinez Cuesta S, Ménager H, Neumann S, Nymark P, Oberacher H, Ramirez N, Remy S, Rocca-Serra P, Salek RM, Sallach B, Sansone SA, Sanz F, Sarimveis H, Sarntivijai S, Schulze T, Slobodnik J, Spjuth O, Tedds J, Thomaidis N, Weber RJ, van Westen GJ, Wheelock CE, Williams AJ, Witters H, Zdrazil B, Županič A, Willighagen EL. ELIXIR and Toxicology: a community in development. F1000Res 2023; 10:ELIXIR-1129. [PMID: 37842337 PMCID: PMC10568213 DOI: 10.12688/f1000research.74502.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/28/2023] [Indexed: 10/17/2023] Open
Abstract
Toxicology has been an active research field for many decades, with academic, industrial and government involvement. Modern omics and computational approaches are changing the field, from merely disease-specific observational models into target-specific predictive models. Traditionally, toxicology has strong links with other fields such as biology, chemistry, pharmacology and medicine. With the rise of synthetic and new engineered materials, alongside ongoing prioritisation needs in chemical risk assessment for existing chemicals, early predictive evaluations are becoming of utmost importance to both scientific and regulatory purposes. ELIXIR is an intergovernmental organisation that brings together life science resources from across Europe. To coordinate the linkage of various life science efforts around modern predictive toxicology, the establishment of a new ELIXIR Community is seen as instrumental. In the past few years, joint efforts, building on incidental overlap, have been piloted in the context of ELIXIR. For example, the EU-ToxRisk, diXa, HeCaToS, transQST, and the nanotoxicology community have worked with the ELIXIR TeSS, Bioschemas, and Compute Platforms and activities. In 2018, a core group of interested parties wrote a proposal, outlining a sketch of what this new ELIXIR Toxicology Community would look like. A recent workshop (held September 30th to October 1st, 2020) extended this into an ELIXIR Toxicology roadmap and a shortlist of limited investment-high gain collaborations to give body to this new community. This Whitepaper outlines the results of these efforts and defines our vision of the ELIXIR Toxicology Community and how it complements other ELIXIR activities.
Collapse
Affiliation(s)
- Marvin Martens
- Department of Bioinformatics - BiGCaT, Maastricht University, Maastricht, 6229 ER, The Netherlands
| | - Rob Stierum
- Risk Analysis for Products In Development (RAPID), Netherlands Organisation for applied scientific research TNO, Utrecht, 3584 CB, The Netherlands
| | - Emma L. Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, 4367, Luxembourg
| | - Chris T. Evelo
- Department of Bioinformatics - BiGCaT, Maastricht University, Maastricht, 6229 ER, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, 6229 EN, The Netherlands
| | - Reza Aalizadeh
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Athens, 15771, Greece
| | - Hristo Aladjov
- Institute of Biophysics and Biomedical Engineering, Bulgarian Academy of Sciences, Sofia, 1113, Bulgaria
| | - Kasia Arturi
- Department Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, 8600, Switzerland
| | | | - Pavel Babica
- RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Karel Berka
- Department of Physical Chemistry, Palacky University Olomouc, Olomouc, 77146, Czech Republic
| | | | - Ludek Blaha
- RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Evan E. Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Dimitrios Ε. Damalas
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Athens, 15771, Greece
| | - Kirtan Dave
- School of Science, GSFC University, Gujarat, 391750, India
| | - Marco Dilger
- Forschungs- und Beratungsinstitut Gefahrstoffe (FoBiG) GmbH, Freiburg im Breisgau, 79106, Germany
| | | | - Daan P. Geerke
- AIMMS Division of Molecular Toxicology, Vrije Universiteit, Amsterdam, 1081 HZ, The Netherlands
| | - Roland Grafström
- Department of Toxicology, Misvik Biology, Turku, 20520, Finland
- Institute of Environmental Medicine, Karolinska Institute, Stockholm, 17177, Sweden
| | - Alasdair Gray
- Department of Computer Science, Heriot-Watt University, Edinburgh, UK
| | | | - Henner Hollert
- Department Evolutionary Ecology & Environmental Toxicology (E3T), Goethe-University, Frankfurt, D-60438, Germany
| | | | - Danyel Jennen
- Department of Toxicogenomics, Maastricht University, Maastricht, 6200 MD, The Netherlands
| | - Fabien Jourdan
- MetaboHUB, French metabolomics infrastructure in Metabolomics and Fluxomics, Toulouse, France
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, Toulouse, France
| | - Pascal Kahlem
- Scientific Network Management SL, Barcelona, 08015, Spain
| | - Jana Klanova
- RECETOX, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Jos Kleinjans
- Department of Toxicogenomics, Maastricht University, Maastricht, 6200 MD, The Netherlands
| | - Todor Kondic
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Belvaux, 4367, Luxembourg
| | - Boï Kone
- Faculty of Pharmacy, Malaria Research and Training Center, Bamako, BP:1805, Mali
| | - Iseult Lynch
- School of Geography, Earth and Environmental Sciences, University of Birmingham, UK, Birmingham, B15 2TT, UK
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Tartu, 50411, Estonia
| | | | - Hervé Ménager
- Institut Français de Bioinformatique, Evry, F-91000, France
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Paris, F-75015, France
| | - Steffen Neumann
- Research group Bioinformatics and Scientific Data, Leibniz Institute of Plant Biochemistry, Halle, 06120, Germany
| | - Penny Nymark
- Institute of Environmental Medicine, Karolinska Institute, Stockholm, 17177, Sweden
| | - Herbert Oberacher
- Institute of Legal Medicine and Core Facility Metabolomics, Medical University of Innsbruck, Innsbruck, A-6020, Austria
| | - Noelia Ramirez
- Institut d'Investigacio Sanitaria Pere Virgili-Universitat Rovira i Virgili, Tarragona, 43007, Spain
| | | | - Philippe Rocca-Serra
- Data Readiness Group, Department of Engineering Science, University of Oxford, Oxford, UK
| | - Reza M. Salek
- International Agency for Research on Cancer, World Health Organisation, Lyon, 69372, France
| | - Brett Sallach
- Department of Environment and Geography, University of York, UK, York, YO10 5NG, UK
| | | | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University, Barcelona, 08003, Spain
| | | | | | - Tobias Schulze
- Helmholtz Centre for Environmental Research - UFZ, Leipzig, 04318, Germany
| | | | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala, SE-75124, Sweden
| | - Jonathan Tedds
- ELIXIR Hub, Wellcome Genome Campus, Cambridge, CB10 1SD, UK
| | - Nikolaos Thomaidis
- Laboratory of Analytical Chemistry, Department of Chemistry, National and Kapodistrian University of Athens, Athens, 15771, Greece
| | - Ralf J.M. Weber
- School of Biosciences, University of Birmingham, UK, Birmingham, B15 2TT, UK
| | - Gerard J.P. van Westen
- Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research, Leiden, 2333 CC, The Netherlands
| | - Craig E. Wheelock
- Department of Respiratory Medicine and Allergy, Karolinska University Hospital, Stockholm SE-141-86, Sweden
- Department of Medical Biochemistry and Biophysics, Karolinska Institute, Stockholm, 17177, Sweden
| | - Antony J. Williams
- Center for Computational Toxicology and Exposure, United States Environmental Protection Agency, Research Triangle Park, NC 27711, USA
| | | | - Barbara Zdrazil
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, 1090, Austria
| | - Anže Županič
- Department Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, 1000, Slovenia
| | - Egon L. Willighagen
- Department of Bioinformatics - BiGCaT, Maastricht University, Maastricht, 6229 ER, The Netherlands
| |
Collapse
|
4
|
Patel B, Soundarajan S, Ménager H, Hu Z. Making Biomedical Research Software FAIR: Actionable Step-by-step Guidelines with a User-support Tool. Sci Data 2023; 10:557. [PMID: 37612312 PMCID: PMC10447492 DOI: 10.1038/s41597-023-02463-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/10/2023] [Indexed: 08/25/2023] Open
Abstract
Findable, Accessible, Interoperable, and Reusable (FAIR) guiding principles tailored for research software have been proposed by the FAIR for Research Software (FAIR4RS) Working Group. They provide a foundation for optimizing the reuse of research software. The FAIR4RS principles are, however, aspirational and do not provide practical instructions to the researchers. To fill this gap, we propose in this work the first actionable step-by-step guidelines for biomedical researchers to make their research software compliant with the FAIR4RS principles. We designate them as the FAIR Biomedical Research Software (FAIR-BioRS) guidelines. Our process for developing these guidelines, presented here, is based on an in-depth study of the FAIR4RS principles and a thorough review of current practices in the field. To support researchers, we have also developed a workflow that streamlines the process of implementing these guidelines. This workflow is incorporated in FAIRshare, a free and open-source software application aimed at simplifying the curation and sharing of FAIR biomedical data and software through user-friendly interfaces and automation. Details about this tool are also presented.
Collapse
Affiliation(s)
- Bhavesh Patel
- FAIR Data Innovations Hub, California Medical Innovations Institute, San Diego, CA, 92121, USA.
| | - Sanjay Soundarajan
- FAIR Data Innovations Hub, California Medical Innovations Institute, San Diego, CA, 92121, USA
| | - Hervé Ménager
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, 75015, Paris, France
| | - Zicheng Hu
- Computational Health Science, University of California San Francisco, San Francisco, CA, 94158, USA
| |
Collapse
|
5
|
Harris NL, Hokamp K, Ménager H, Munoz-Torres M, Unni D, Vasilevsky N, Williams J. BOSC 2022: the first hybrid and 23rd annual Bioinformatics Open Source Conference. F1000Res 2022; 11:1034. [PMID: 36128559 PMCID: PMC9468630 DOI: 10.12688/f1000research.125043.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/19/2022] [Indexed: 11/20/2022] Open
Abstract
The 23
rd annual Bioinformatics Open Source Conference (BOSC 2022) was part of this year’s conference on Intelligent Systems for Molecular Biology (ISMB). Launched in 2000 and held every year since, BOSC is the premier meeting covering open source bioinformatics and open science. ISMB 2022 was, for the first time, a hybrid conference, with the in-person component hosted in Madison, Wisconsin (USA). About 1000 people attended ISMB 2022 in person, with another 800 online. Approximately 200 people participated in BOSC sessions, which included 28 talks chosen from submitted abstracts, 46 posters, and a panel discussion, “Building and Sustaining Inclusive Open Science Communities”. BOSC 2022 included joint keynotes with two other COSIs. Jason Williams gave a BOSC / Education COSI keynote entitled "Riding the bicycle: Including all scientists on a path to excellence". A joint session with Bio-Ontologies featured a keynote by Melissa Haendel, “The open data highway: turbo-boosting translational traffic with ontologies.”
Collapse
Affiliation(s)
- Nomi L. Harris
- Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Karsten Hokamp
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| | | | | | - Deepak Unni
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | | | - Jason Williams
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
6
|
Panei FP, Torchet R, Ménager H, Gkeka P, Bonomi M. HARIBOSS: a curated database of RNA-small molecules structures to aid rational drug design. Bioinformatics 2022; 38:4185-4193. [PMID: 35799352 DOI: 10.1093/bioinformatics/btac483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 07/04/2022] [Accepted: 07/06/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION RNA molecules are implicated in numerous fundamental biological processes and many human pathologies, such as cancer, neurodegenerative disorders, muscular diseases and bacterial infections. Modulating the mode of action of disease-implicated RNA molecules can lead to the discovery of new therapeutical agents and even address pathologies linked to 'undruggable' protein targets. This modulation can be achieved by direct targeting of RNA with small molecules. As of today, only a few RNA-targeting small molecules are used clinically. One of the main obstacles that have hampered the development of a rational drug design protocol to target RNA with small molecules is the lack of a comprehensive understanding of the molecular mechanisms at the basis of RNA-small molecule (RNA-SM) recognition. RESULTS Here, we present Harnessing RIBOnucleic acid-Small molecule Structures (HARIBOSS), a curated collection of RNA-SM structures determined by X-ray crystallography, nuclear magnetic resonance spectroscopy and cryo-electron microscopy. HARIBOSS facilitates the exploration of drug-like compounds known to bind RNA, the analysis of ligands and pockets properties and ultimately the development of in silico strategies to identify RNA-targeting small molecules. AVAILABILITY AND IMPLEMENTATION HARIBOSS can be explored via a web interface available at http://hariboss.pasteur.cloud. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- F P Panei
- Sanofi, R&D, Data & In Silico Sciences, 91385 Chilly Mazarin, France.,Department of Structural Biology and Chemistry, Institut Pasteur, Université Paris Cité, CNRS UMR 3528, 75015 Paris, France.,Ecole Doctorale Complexité du Vivant, Sorbonne Université, 75005 Paris, France
| | - R Torchet
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| | - H Ménager
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015 Paris, France
| | - P Gkeka
- Sanofi, R&D, Data & In Silico Sciences, 91385 Chilly Mazarin, France
| | - M Bonomi
- Department of Structural Biology and Chemistry, Institut Pasteur, Université Paris Cité, CNRS UMR 3528, 75015 Paris, France
| |
Collapse
|
7
|
Garijo D, Ménager H, Hwang L, Trisovic A, Hucka M, Morrell T, Allen A. Nine best practices for research software registries and repositories. PeerJ Comput Sci 2022; 8:e1023. [PMID: 36092012 PMCID: PMC9455149 DOI: 10.7717/peerj-cs.1023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 06/09/2022] [Indexed: 06/15/2023]
Abstract
Scientific software registries and repositories improve software findability and research transparency, provide information for software citations, and foster preservation of computational methods in a wide range of disciplines. Registries and repositories play a critical role by supporting research reproducibility and replicability, but developing them takes effort and few guidelines are available to help prospective creators of these resources. To address this need, the FORCE11 Software Citation Implementation Working Group convened a Task Force to distill the experiences of the managers of existing resources in setting expectations for all stakeholders. In this article, we describe the resultant best practices which include defining the scope, policies, and rules that govern individual registries and repositories, along with the background, examples, and collaborative work that went into their development. We believe that establishing specific policies such as those presented here will help other scientific software registries and repositories better serve their users and their disciplines.
Collapse
Affiliation(s)
| | - Hervé Ménager
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, Paris, France
| | - Lorraine Hwang
- University of California, Davis, Davis, California, United States
| | - Ana Trisovic
- Harvard University, Boston, Massachusetts, United States
| | - Michael Hucka
- California Institute of Technology, Pasadena, California, United States
| | - Thomas Morrell
- California Institute of Technology, Pasadena, California, United States
| | - Alice Allen
- University of Maryland, College Park, MD, United States
| | | | | |
Collapse
|
8
|
Lamprecht AL, Palmblad M, Ison J, Schwämmle V, Al Manir MS, Altintas I, Baker CJO, Ben Hadj Amor A, Capella-Gutierrez S, Charonyktakis P, Crusoe MR, Gil Y, Goble C, Griffin TJ, Groth P, Ienasescu H, Jagtap P, Kalaš M, Kasalica V, Khanteymoori A, Kuhn T, Mei H, Ménager H, Möller S, Richardson RA, Robert V, Soiland-Reyes S, Stevens R, Szaniszlo S, Verberne S, Verhoeven A, Wolstencroft K. Perspectives on automated composition of workflows in the life sciences. F1000Res 2021; 10:897. [PMID: 34804501 PMCID: PMC8573700 DOI: 10.12688/f1000research.54159.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/27/2021] [Indexed: 12/29/2022] Open
Abstract
Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.
Collapse
Affiliation(s)
| | - Magnus Palmblad
- Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands
| | - Jon Ison
- French Institute of Bioinformatics, 91057 Évry, France
| | | | | | - Ilkay Altintas
- University of California San Diego, La Jolla, CA, 92093, USA
| | - Christopher J. O. Baker
- University of New Brunswick, Saint John, E2L 4L5, Canada
- IPSNP Computing Inc., Saint John, E2L 4S6, Canada
| | | | | | | | | | - Yolanda Gil
- University of Southern California, Marina Del Rey, CA, 90292, USA
| | - Carole Goble
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Paul Groth
- University of Amsterdam, 1090 GH Amsterdam, The Netherlands
| | - Hans Ienasescu
- Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | - Pratik Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA
| | | | | | | | - Tobias Kuhn
- VU Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - Hailiang Mei
- Sequencing Analysis Support Core, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
| | | | - Steffen Möller
- IBIMA, Rostock University Medical Center, 18057 Rostock, Germany
| | | | | | - Stian Soiland-Reyes
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
- Informatics Institute, University of Amsterdam, 1090 GH Amsterdam, The Netherlands
| | - Robert Stevens
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | | | - Suzan Verberne
- Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
| | - Aswin Verhoeven
- Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands
| | - Katherine Wolstencroft
- Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
| |
Collapse
|
9
|
Julienne H, Laville V, McCaw ZR, He Z, Guillemot V, Lasry C, Ziyatdinov A, Nerin C, Vaysse A, Lechat P, Ménager H, Le Goff W, Dube MP, Kraft P, Ionita-Laza I, Vilhjálmsson BJ, Aschard H. Multitrait GWAS to connect disease variants and biological mechanisms. PLoS Genet 2021; 17:e1009713. [PMID: 34460823 PMCID: PMC8437297 DOI: 10.1371/journal.pgen.1009713] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 09/13/2021] [Accepted: 07/12/2021] [Indexed: 12/30/2022] Open
Abstract
Genome-wide association studies (GWASs) have uncovered a wealth of associations between common variants and human phenotypes. Here, we present an integrative analysis of GWAS summary statistics from 36 phenotypes to decipher multitrait genetic architecture and its link with biological mechanisms. Our framework incorporates multitrait association mapping along with an investigation of the breakdown of genetic associations into clusters of variants harboring similar multitrait association profiles. Focusing on two subsets of immunity and metabolism phenotypes, we then demonstrate how genetic variants within clusters can be mapped to biological pathways and disease mechanisms. Finally, for the metabolism set, we investigate the link between gene cluster assignment and the success of drug targets in randomized controlled trials.
Collapse
Affiliation(s)
- Hanna Julienne
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Vincent Laville
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Zachary R. McCaw
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, California, United States of America
| | - Vincent Guillemot
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Carla Lasry
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Andrey Ziyatdinov
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Cyril Nerin
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Amaury Vaysse
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Pierre Lechat
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Hervé Ménager
- Department of Computational Biology, Institut Pasteur, Paris, France
| | - Wilfried Le Goff
- Sorbonne Université, INSERM, Institute of Cardiometabolism and Nutrition (ICAN), UMR_S 1166, Paris, France
| | - Marie-Pierre Dube
- Université de Montréal Beaulieu-Saucier Pharmacogenomics Centre, Montreal Heart Institute, Montreal, Canada
- Université de Montréal, Faculty of Medicine, Department of medicine, Université de Montréal, Montreal, Canada
| | - Peter Kraft
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Iuliana Ionita-Laza
- Department of Biostatistics, Columbia University, New York, New York, United States of America
| | - Bjarni J. Vilhjálmsson
- National Centre for Register-based Research, Department of Economics and Business Economics, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Paris, France
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
10
|
Paul-Gilloteaux P, Tosi S, Hériché JK, Gaignard A, Ménager H, Marée R, Baecker V, Klemm A, Kalaš M, Zhang C, Miura K, Colombelli J. Bioimage analysis workflows: community resources to navigate through a complex ecosystem. F1000Res 2021; 10:320. [PMID: 34136134 PMCID: PMC8182692 DOI: 10.12688/f1000research.52569.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/14/2021] [Indexed: 11/20/2022] Open
Abstract
Workflows are the keystone of bioimage analysis, and the NEUBIAS (Network of European BioImage AnalystS) community is trying to gather the actors of this field and organize the information around them. One of its most recent outputs is the opening of the F1000Research NEUBIAS gateway, whose main objective is to offer a channel of publication for bioimage analysis workflows and associated resources. In this paper we want to express some personal opinions and recommendations related to finding, handling and developing bioimage analysis workflows. The emergence of "big data" in bioimaging and resource-intensive analysis algorithms make local data storage and computing solutions a limiting factor. At the same time, the need for data sharing with collaborators and a general shift towards remote work, have created new challenges and avenues for the execution and sharing of bioimage analysis workflows. These challenges are to reproducibly run workflows in remote environments, in particular when their components come from different software packages, but also to document them and link their parameters and results by following the FAIR principles (Findable, Accessible, Interoperable, Reusable) to foster open and reproducible science. In this opinion paper, we focus on giving some directions to the reader to tackle these challenges and navigate through this complex ecosystem, in order to find and use workflows, and to compare workflows addressing the same problem. We also discuss tools to run workflows in the cloud and on High Performance Computing resources, and suggest ways to make these workflows FAIR.
Collapse
Affiliation(s)
- Perrine Paul-Gilloteaux
- Université de Nantes, CNRS, INSERM, l’institut du thorax, Nantes, F-44000, France
- Université de Nantes, CHU Nantes, Inserm, CNRS, SFR Santé, Inserm UMS 016, CNRS UMS 3556, Nantes, F-44000, France
| | - Sébastien Tosi
- Institute for Research in Biomedicine, IRB Barcelona, Barcelona Institute of Science and Technology, BIST, Barcelona, Spain
| | - Jean-Karim Hériché
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory, Heidelberg, 69117, Germany
| | - Alban Gaignard
- Université de Nantes, CNRS, INSERM, l’institut du thorax, Nantes, F-44000, France
| | - Hervé Ménager
- Hub de Bioinformatique et Biostatistique, Département Biologie Computationnelle, Institut Pasteur, USR 3756, CNRS, Paris, 75015, France
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, Evry, 91000, France
| | - Raphaël Marée
- Montefiore Institute, University of Liège, Liège, Belgium
| | - Volker Baecker
- Montpellier Ressources Imagerie, BioCampus Montpellier, CNRS, INSERM, University of Montpellier, Montpellier, F-34000, France
| | - Anna Klemm
- BioImage Informatics Facility, SciLifeLab, Stockholm, Sweden
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Chong Zhang
- Department of Information and Communication Technologies, University Pompeu Fabra, Barcelona, Spain
| | - Kota Miura
- Nikon Imaging Center, University of Heidelberg, Heidelberg, Germany
| | - Julien Colombelli
- Institute for Research in Biomedicine, IRB Barcelona, Barcelona Institute of Science and Technology, BIST, Barcelona, Spain
| |
Collapse
|
11
|
Lamy-Besnier Q, Brancotte B, Ménager H, Debarbieux L. Viral Host Range database, an online tool for recording, analyzing and disseminating virus-host interactions. Bioinformatics 2021; 37:2798-2801. [PMID: 33594411 PMCID: PMC8428608 DOI: 10.1093/bioinformatics/btab070] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 01/11/2021] [Accepted: 02/15/2021] [Indexed: 11/13/2022] Open
Abstract
Motivation Viruses are ubiquitous in the living world, and their ability to infect more than one host defines their host range. However, information about which virus infects which host, and about which host is infected by which virus, is not readily available. Results We developed a web-based tool called the Viral Host Range database to record, analyze and disseminate experimental host range data for viruses infecting archaea, bacteria and eukaryotes. Availability and implementation The ViralHostRangeDB application is available from https://viralhostrangedb.pasteur.cloud. Its source code is freely available from the Gitlab instance of Institut Pasteur (https://gitlab.pasteur.fr/hub/viralhostrangedb).
Collapse
Affiliation(s)
- Quentin Lamy-Besnier
- Bacteriophage, Bacterium, Host Laboratory, Department of Microbiology, Institut Pasteur, Paris, F-75015, France.,Université de Paris, Paris, France
| | - Bryan Brancotte
- Bioinformatics and Biostatistics, Institut Pasteur, Paris, F-75015, France
| | - Hervé Ménager
- Bioinformatics and Biostatistics, Institut Pasteur, Paris, F-75015, France
| | - Laurent Debarbieux
- Bacteriophage, Bacterium, Host Laboratory, Department of Microbiology, Institut Pasteur, Paris, F-75015, France
| |
Collapse
|
12
|
Ison J, Ienasescu H, Rydza E, Chmura P, Rapacki K, Gaignard A, Schwämmle V, van Helden J, Kalaš M, Ménager H. biotoolsSchema: a formalized schema for bioinformatics software description. Gigascience 2021; 10:giaa157. [PMID: 33506265 PMCID: PMC7842104 DOI: 10.1093/gigascience/giaa157] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 11/10/2020] [Accepted: 12/07/2020] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description-and cataloguing-of bioinformatics resources. FINDINGS Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability. CONCLUSIONS biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.
Collapse
Affiliation(s)
- Jon Ison
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
| | - Hans Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| | - Emil Rydza
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200 København, Denmark
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200 København, Denmark
| | - Kristoffer Rapacki
- Department of Health Technology, Ørsteds Plads, Building 345C, DK-2800 Kongens, Lyngby, Denmark
| | - Alban Gaignard
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- L'institut du Thorax, INSERM, CNRS, University of Nantes, 44007 Nantes, France
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Jacques van Helden
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- Département de Biologie, Aix-Marseille Université (AMU), 3 place Victor Hugo, 13003 Marseille, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5008 Bergen, Norway
| | - Hervé Ménager
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- Hub de Bioinformatique et Biostatistique–Département Biologie Computationnelle, Institut Pasteur, USR 3756, CNRS, Paris 75015, France
| |
Collapse
|
13
|
Torchet R, Druart K, Ruano LC, Moine-Franel A, Borges H, Doppelt-Azeroual O, Brancotte B, Mareuil F, Nilges M, Ménager H, Sperandio O. The iPPI-DB initiative: A Community-centered database of Protein-Protein Interaction modulators. Bioinformatics 2021; 37:89-96. [PMID: 33416858 PMCID: PMC8034526 DOI: 10.1093/bioinformatics/btaa1091] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 11/25/2020] [Accepted: 12/23/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION One avenue to address the paucity of clinically testable targets is to reinvestigate the druggable genome by tackling complicated types of targets such as Protein-Protein Interactions (PPIs). Given the challenge to target those interfaces with small chemical compounds, it has become clear that learning from successful examples of PPI modulation is a powerful strategy. Freely-accessible databases of PPI modulators that provide the community with tractable chemical and pharmacological data, as well as powerful tools to query them, are therefore essential to stimulate new drug discovery projects on PPI targets. RESULTS Here, we present the new version iPPI-DB, our manually curated database of PPI modulators. In this completely redesigned version of the database, we introduce a new web interface relying on crowdsourcing for the maintenance of the database. This interface was created to enable community contributions, whereby external experts can suggest new database entries. Moreover, the data model, the graphical interface, and the tools to query the database have been completely modernized and improved. We added new PPI modulators, new PPI targets, and extended our focus to stabilizers of PPIs as well. AVAILABILITY AND IMPLEMENTATION The iPPI-DB server is available at https://ippidb.pasteur.fr The source code for this server is available at https://gitlab.pasteur.fr/ippidb/ippidb-web/ and is distributed under GPL licence (http://www.gnu.org/licences/gpl). Queries can be shared through persistent links according to the FAIR data standards. Data can be downloaded from the website as csv files. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rachel Torchet
- Hub de Bioinformatique et Biostatistique-Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Karen Druart
- Department of Structural Biology and Chemistry, Institut Pasteur, Paris, 75015, France
| | - Luis Checa Ruano
- Department of Structural Biology and Chemistry, Institut Pasteur, Paris, 75015, France
| | | | - Hélène Borges
- Department of Structural Biology and Chemistry, Institut Pasteur, Paris, 75015, France
| | - Olivia Doppelt-Azeroual
- Hub de Bioinformatique et Biostatistique-Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Bryan Brancotte
- Hub de Bioinformatique et Biostatistique-Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Fabien Mareuil
- Hub de Bioinformatique et Biostatistique-Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Michael Nilges
- Department of Structural Biology and Chemistry, Institut Pasteur, Paris, 75015, France
| | - Hervé Ménager
- Hub de Bioinformatique et Biostatistique-Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Olivier Sperandio
- Department of Structural Biology and Chemistry, Institut Pasteur, Paris, 75015, France
| |
Collapse
|
14
|
Allain F, Mareuil F, Ménager H, Nilges M, Bardiaux B. ARIAweb: a server for automated NMR structure calculation. Nucleic Acids Res 2020; 48:W41-W47. [PMID: 32383755 PMCID: PMC7319541 DOI: 10.1093/nar/gkaa362] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 04/14/2020] [Accepted: 04/28/2020] [Indexed: 11/13/2022] Open
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is a method of choice to study the dynamics and determine the atomic structure of macromolecules in solution. The standalone program ARIA (Ambiguous Restraints for Iterative Assignment) for automated assignment of nuclear Overhauser enhancement (NOE) data and structure calculation is well established in the NMR community. To ultimately provide a perfectly transparent and easy to use service, we designed an online user interface to ARIA with additional functionalities. Data conversion, structure calculation setup and execution, followed by interactive visualization of the generated 3D structures are all integrated in ARIAweb and freely accessible at https://ariaweb.pasteur.fr.
Collapse
Affiliation(s)
- Fabrice Allain
- Structural Bioinformatics Unit, Department of Structural Biology and Chemistry, CNRS UMR 3528, Institut Pasteur, Paris, 75015, France
| | - Fabien Mareuil
- Bioinformatics and Biostatistics Hub, Department of Computational Biology, CNRS USR 3756, Institut Pasteur, Paris, 75015, France
| | - Hervé Ménager
- Bioinformatics and Biostatistics Hub, Department of Computational Biology, CNRS USR 3756, Institut Pasteur, Paris, 75015, France
| | - Michael Nilges
- Structural Bioinformatics Unit, Department of Structural Biology and Chemistry, CNRS UMR 3528, Institut Pasteur, Paris, 75015, France
| | - Benjamin Bardiaux
- Structural Bioinformatics Unit, Department of Structural Biology and Chemistry, CNRS UMR 3528, Institut Pasteur, Paris, 75015, France
| |
Collapse
|
15
|
Calvo-Villamañán A, Ng JW, Planel R, Ménager H, Chen A, Cui L, Bikard D. On-target activity predictions enable improved CRISPR-dCas9 screens in bacteria. Nucleic Acids Res 2020; 48:e64. [PMID: 32352514 PMCID: PMC7293049 DOI: 10.1093/nar/gkaa294] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 04/13/2020] [Accepted: 04/17/2020] [Indexed: 12/26/2022] Open
Abstract
The ability to block gene expression in bacteria with the catalytically inactive mutant of Cas9, known as dCas9, is quickly becoming a standard methodology to probe gene function, perform high-throughput screens, and engineer cells for desired purposes. Yet, we still lack a good understanding of the design rules that determine on-target activity for dCas9. Taking advantage of high-throughput screening data, we fit a model to predict the ability of dCas9 to block the RNA polymerase based on the target sequence, and validate its performance on independently generated datasets. We further design a novel genome wide guide RNA library for E. coli MG1655, EcoWG1, using our model to choose guides with high activity while avoiding guides which might be toxic or have off-target effects. A screen performed using the EcoWG1 library during growth in rich medium improved upon previously published screens, demonstrating that very good performances can be attained using only a small number of well designed guides. Being able to design effective, smaller libraries will help make CRISPRi screens even easier to perform and more cost-effective. Our model and materials are available to the community through crispr.pasteur.fr and Addgene.
Collapse
Affiliation(s)
- Alicia Calvo-Villamañán
- Synthetic Biology Group, Microbiology Department, Institut Pasteur, Paris 75015, France
- Université Paris Diderot, Sorbonne Paris Cité, Paris 75013, France
| | - Jérome Wong Ng
- Synthetic Biology Group, Microbiology Department, Institut Pasteur, Paris 75015, France
| | - Rémi Planel
- Hub de Bioinformatique et Biostatistique – Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris 75015, France
| | - Hervé Ménager
- Hub de Bioinformatique et Biostatistique – Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris 75015, France
| | - Arthur Chen
- Synthetic Biology Group, Microbiology Department, Institut Pasteur, Paris 75015, France
| | - Lun Cui
- Synthetic Biology Group, Microbiology Department, Institut Pasteur, Paris 75015, France
| | - David Bikard
- Synthetic Biology Group, Microbiology Department, Institut Pasteur, Paris 75015, France
| |
Collapse
|
16
|
Julienne H, Lechat P, Guillemot V, Lasry C, Yao C, Araud R, Laville V, Vilhjalmsson B, Ménager H, Aschard H. JASS: command line and web interface for the joint analysis of GWAS results. NAR Genom Bioinform 2020; 2:lqaa003. [PMID: 32002517 PMCID: PMC6978790 DOI: 10.1093/nargab/lqaa003] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 12/03/2019] [Accepted: 01/09/2020] [Indexed: 12/11/2022] Open
Abstract
Genome-wide association study (GWAS) has been the driving force for identifying association between genetic variants and human phenotypes. Thousands of GWAS summary statistics covering a broad range of human traits and diseases are now publicly available. These GWAS have proven their utility for a range of secondary analyses, including in particular the joint analysis of multiple phenotypes to identify new associated genetic variants. However, although several methods have been proposed, there are very few large-scale applications published so far because of challenges in implementing these methods on real data. Here, we present JASS (Joint Analysis of Summary Statistics), a polyvalent Python package that addresses this need. Our package incorporates recently developed joint tests such as the omnibus approach and various weighted sum of Z-score tests while solving all practical and computational barriers for large-scale multivariate analysis of GWAS summary statistics. This includes data cleaning and harmonization tools, an efficient algorithm for fast derivation of joint statistics, an optimized data management process and a web interface for exploration purposes. Both benchmark analyses and real data applications demonstrated the robustness and strong potential of JASS for the detection of new associated genetic variants. Our package is freely available at https://gitlab.pasteur.fr/statistical-genetics/jass.
Collapse
Affiliation(s)
- Hanna Julienne
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Pierre Lechat
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Vincent Guillemot
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Carla Lasry
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Chunzi Yao
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Robinson Araud
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Vincent Laville
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Bjarni Vilhjalmsson
- National Center for Register-Based Research, Aarhus University, DK-8210 Aarhus, Denmark
| | - Hervé Ménager
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
| | - Hugues Aschard
- Department of Computational Biology—USR 3756 CNRS, Institut Pasteur, 75015 Paris, France
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, 02115 Boston, MA, USA
| |
Collapse
|
17
|
Ison J, Ménager H, Brancotte B, Jaaniso E, Salumets A, Raček T, Lamprecht AL, Palmblad M, Kalaš M, Chmura P, Hancock JM, Schwämmle V, Ienasescu HI. Community curation of bioinformatics software and data resources. Brief Bioinform 2019; 21:1697-1705. [PMID: 31624831 PMCID: PMC7947956 DOI: 10.1093/bib/bbz075] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 05/13/2019] [Accepted: 05/30/2019] [Indexed: 11/13/2022] Open
Abstract
The corpus of bioinformatics resources is huge and expanding rapidly, presenting life scientists with a growing challenge in selecting tools that fit the desired purpose. To address this, the European Infrastructure for Biological Information is supporting a systematic approach towards a comprehensive registry of tools and databases for all domains of bioinformatics, provided under a single portal (https://bio.tools). We describe here the practical means by which scientific communities, including individual developers and projects, through major service providers and research infrastructures, can describe their own bioinformatics resources and share these via bio.tools.
Collapse
Affiliation(s)
- Jon Ison
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| | - Hervé Ménager
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Bryan Brancotte
- Hub de Bioinformatique et Biostatistique - Département Biologie Computationnelle, Institut Pasteur, USR 3756 CNRS, Paris, France
| | - Erik Jaaniso
- ELIXIR-EE, Institute of Computer Science, University of Tartu. J Liivi 2, Tartu, Estonia
| | - Ahto Salumets
- ELIXIR-EE, Institute of Computer Science, University of Tartu. J Liivi 2, Tartu, Estonia
| | - Tomáš Raček
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00 Brno-Bohunice, Czech Republic.,Faculty of Informatics, Masaryk University, Botanická 68a, 602 00 Brno, Czech Republic
| | - Anna-Lena Lamprecht
- Department of Information and Computing Sciences, Utrecht University, Utrecht, Netherlands
| | - Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, Netherlands
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5020 Bergen, Norway
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen
| | - John M Hancock
- ELIXIR-Hub, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Hans-Ioan Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| |
Collapse
|
18
|
Ison J, Ienasescu H, Chmura P, Rydza E, Ménager H, Kalaš M, Schwämmle V, Grüning B, Beard N, Lopez R, Duvaud S, Stockinger H, Persson B, Vařeková RS, Raček T, Vondrášek J, Peterson H, Salumets A, Jonassen I, Hooft R, Nyrönen T, Valencia A, Capella S, Gelpí J, Zambelli F, Savakis B, Leskošek B, Rapacki K, Blanchet C, Jimenez R, Oliveira A, Vriend G, Collin O, van Helden J, Løngreen P, Brunak S. The bio.tools registry of software tools and data resources for the life sciences. Genome Biol 2019; 20:164. [PMID: 31405382 PMCID: PMC6691543 DOI: 10.1186/s13059-019-1772-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 07/22/2019] [Indexed: 11/28/2022] Open
Abstract
Bioinformaticians and biologists rely increasingly upon workflows for the flexible utilization of the many life science tools that are needed to optimally convert data into knowledge. We outline a pan-European enterprise to provide a catalogue ( https://bio.tools ) of tools and databases that can be used in these workflows. bio.tools not only lists where to find resources, but also provides a wide variety of practical information.
Collapse
Affiliation(s)
- Jon Ison
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800, Kongens Lyngby, Denmark.
| | - Hans Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800, Kongens Lyngby, Denmark
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen, Denmark
| | - Emil Rydza
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen, Denmark
| | - Hervé Ménager
- Hub de Bioinformatique et de Biostatistiques, Institut Pasteur, C3BI USR, 3756 IP CNRS, Paris, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5020, Bergen, Norway
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230, Odense, Denmark
| | - Björn Grüning
- Department of Computer Science, Albert-Ludwigs-University Freiburg, Georges-Köhler-Allee 106, 79110, Freiburg, Germany
| | - Niall Beard
- School of Computer Science, The University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Rodrigo Lopez
- The EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Severine Duvaud
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Amphipole, CH-1015, Lausanne, Switzerland
| | - Heinz Stockinger
- SIB Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Amphipole, CH-1015, Lausanne, Switzerland
| | - Bengt Persson
- Bioinformatics Infrastructure for Life Sciences, Science for Life Laboratory, Dept of Cell and Molecular Biology, Uppsala University, S-75124, Uppsala, Sweden
| | - Radka Svobodová Vařeková
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00, Brno-Bohunice, Czech Republic
| | - Tomáš Raček
- CEITEC - Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 625 00, Brno-Bohunice, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Flemingovo namesti 2, 160 00, Prague, Czech Republic
| | - Hedi Peterson
- ELIXIR-EE, Institute of Computer Science, University of Tartu. J Liivi 2, Tartu, Estonia
| | - Ahto Salumets
- ELIXIR-EE, Institute of Computer Science, University of Tartu. J Liivi 2, Tartu, Estonia
| | - Inge Jonassen
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5020, Bergen, Norway
| | - Rob Hooft
- Dutch Techcentre for Life Sciences, Jaarbeursplein 6, 3521, AL, Utrecht, The Netherlands
| | - Tommi Nyrönen
- CSC - IT Center for Science, PO BOX 405, FI-02101, Espoo, Finland
| | - Alfonso Valencia
- Barcelona Supercomputing Centre (BSC), 08034, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluıs Companys 23, 08010, Barcelona, Spain
| | | | - Josep Gelpí
- Barcelona Supercomputing Centre (BSC), 08034, Barcelona, Spain
- Department of Biochemistry and Molecular Biomedicine, University of Barcelona, INB / BSC-CNS, Barcelona, Spain
| | - Federico Zambelli
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), via Amendola 165/A, Bari, Italy
- Department of Biosciences, University of Milano, Via Celoria 26, Milan, Italy
| | - Babis Savakis
- Biomedical Sciences Research Centre, Alexander Fleming 34 Al. Fleming Str, 16672, Vari, Greece
| | - Brane Leskošek
- Faculty of Medicine / ELIXIR-SI, University of Ljubljana, Vrazov trg 2, SI-1000, Ljubljana, Slovenia
| | - Kristoffer Rapacki
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800, Kongens Lyngby, Denmark
| | - Christophe Blanchet
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000, Evry, France
| | - Rafael Jimenez
- ELIXIR-Hub, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Arlindo Oliveira
- INESC-ID / Instituto Superior Técnico, R. Alves Redol 9, Lisbon, Portugal
| | - Gert Vriend
- Radboud University Medical Centre, CMBI, Postbus 9101, 6500 HB, Nijmegen, Netherlands
| | - Olivier Collin
- Plateforme GenOuest Univ Rennes, Inria, CNRS, IRISA, F-35000, Rennes, France
| | - Jacques van Helden
- Aix-Marseille Univ, INSERM, lab. Theory and Approaches of Genome Complexity (TAGC), Marseille, France
| | - Peter Løngreen
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800, Kongens Lyngby, Denmark
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200, Copenhagen, Denmark
- Department of Bio and Health Informatics, Technical University of Denmark, Building 208, DK-2800, Kongens Lyngby, Denmark
| |
Collapse
|
19
|
Gruening B, Sallou O, Moreno P, da Veiga Leprevost F, Ménager H, Søndergaard D, Röst H, Sachsenberg T, O'Connor B, Madeira F, Dominguez Del Angel V, Crusoe MR, Varma S, Blankenberg D, Jimenez RC, Perez-Riverol Y. Recommendations for the packaging and containerizing of bioinformatics software. F1000Res 2018; 7. [PMID: 31543945 DOI: 10.12688/f1000research.15140.1] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/01/2018] [Indexed: 11/20/2022] Open
Abstract
Software Containers are changing the way scientists and researchers develop, deploy and exchange scientific software. They allow labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. However, containers and software packages should be produced under certain rules and standards in order to be reusable, compatible and easy to integrate into pipelines and analysis workflows. Here, we presented a set of recommendations developed by the BioContainers Community to produce standardized bioinformatics packages and containers. These recommendations provide practical guidelines to make bioinformatics software more discoverable, reusable and transparent. They are aimed to guide developers, organisations, journals and funders to increase the quality and sustainability of research software.
Collapse
Affiliation(s)
- Bjorn Gruening
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, 79110, Germany
| | - Olivier Sallou
- Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA/INRIA) - GenOuest Platform, Université de Rennes, Rennes, France
| | - Pablo Moreno
- EMBL European Bioinformatics Institute, Cambridge, UK
| | | | - Hervé Ménager
- Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France
| | - Dan Søndergaard
- Bioinformatics Research Centre, Aarhus University, Aarhus, DK-8000, Denmark
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, Toronto, Ontario, M5S 3E1, Canada
| | - Timo Sachsenberg
- Applied Bioinformatics Group, Wilhelm Schickard Institut für Informatik, Universität Tübingen, Tübingen, D-72076, Germany
| | - Brian O'Connor
- Computational Genomics Lab, UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, USA
| | - Fábio Madeira
- EMBL European Bioinformatics Institute, Cambridge, UK
| | | | - Michael R Crusoe
- Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA
| | - Susheel Varma
- EMBL European Bioinformatics Institute, Cambridge, UK
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | | | | | | |
Collapse
|
20
|
Gruening B, Sallou O, Moreno P, da Veiga Leprevost F, Ménager H, Søndergaard D, Röst H, Sachsenberg T, O'Connor B, Madeira F, Dominguez Del Angel V, Crusoe MR, Varma S, Blankenberg D, Jimenez RC, Perez-Riverol Y. Recommendations for the packaging and containerizing of bioinformatics software. F1000Res 2018; 7. [PMID: 31543945 PMCID: PMC6738188 DOI: 10.12688/f1000research.15140.2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/18/2019] [Indexed: 11/22/2022] Open
Abstract
Software Containers are changing the way scientists and researchers develop, deploy and exchange scientific software. They allow labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. However, containers and software packages should be produced under certain rules and standards in order to be reusable, compatible and easy to integrate into pipelines and analysis workflows. Here, we presented a set of recommendations developed by the BioContainers Community to produce standardized bioinformatics packages and containers. These recommendations provide practical guidelines to make bioinformatics software more discoverable, reusable and transparent. They are aimed to guide developers, organisations, journals and funders to increase the quality and sustainability of research software.
Collapse
Affiliation(s)
- Bjorn Gruening
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, 79110, Germany
| | - Olivier Sallou
- Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA/INRIA) - GenOuest Platform, Université de Rennes, Rennes, France
| | - Pablo Moreno
- EMBL European Bioinformatics Institute, Cambridge, UK
| | | | - Hervé Ménager
- Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France
| | - Dan Søndergaard
- Bioinformatics Research Centre, Aarhus University, Aarhus, DK-8000, Denmark
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, Toronto, Ontario, M5S 3E1, Canada
| | - Timo Sachsenberg
- Applied Bioinformatics Group, Wilhelm Schickard Institut für Informatik, Universität Tübingen, Tübingen, D-72076, Germany
| | - Brian O'Connor
- Computational Genomics Lab, UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, USA
| | - Fábio Madeira
- EMBL European Bioinformatics Institute, Cambridge, UK
| | | | - Michael R Crusoe
- Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA
| | - Susheel Varma
- EMBL European Bioinformatics Institute, Cambridge, UK
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | | | | | | |
Collapse
|
21
|
Doppelt-Azeroual O, Mareuil F, Deveaud E, Kalaš M, Soranzo N, van den Beek M, Grüning B, Ison J, Ménager H. ReGaTE: Registration of Galaxy Tools in Elixir. Gigascience 2018; 6:1-4. [PMID: 28402416 PMCID: PMC5530318 DOI: 10.1093/gigascience/gix022] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Accepted: 03/21/2017] [Indexed: 11/14/2022] Open
Abstract
Background Bioinformaticians routinely use multiple software tools and data sources in their day-to-day work and have been guided in their choices by a number of cataloguing initiatives. The ELIXIR Tools and Data Services Registry (bio.tools) aims to provide a central information point, independent of any specific scientific scope within bioinformatics or technological implementation. Meanwhile, efforts to integrate bioinformatics software in workbench and workflow environments have accelerated to enable the design, automation, and reproducibility of bioinformatics experiments. One such popular environment is the Galaxy framework, with currently more than 80 publicly available Galaxy servers around the world. In the context of a generic registry for bioinformatics software, such as bio.tools, Galaxy instances constitute a major source of valuable content. Yet there has been, to date, no convenient mechanism to register such services en masse. We present ReGaTE (Registration of Galaxy Tools in Elixir), a software utility that automates the process of registering the services available in a Galaxy instance. This utility uses the BioBlend application program interface to extract service metadata from a Galaxy server, enhance the metadata with the scientific information required by bio.tools, and push it to the registry. ReGaTE provides a fast and convenient way to publish Galaxy services in bio.tools. By doing so, service providers may increase the visibility of their services while enriching the software discovery function that bio.tools provides for its users. The source code of ReGaTE is freely available on Github at https://github.com/C3BI-pasteur-fr/ReGaTE .
Collapse
Affiliation(s)
- Olivia Doppelt-Azeroual
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), 25 rue du Docteur Roux, Paris, France
| | - Fabien Mareuil
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), 25 rue du Docteur Roux, Paris, France
| | - Eric Deveaud
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), 25 rue du Docteur Roux, Paris, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, Thormøhlensgate 55, Bergen, Norway
| | - Nicola Soranzo
- Earlham Institute, Norwich Research Park, NR4 7UG Norwich, United Kingdom
| | - Marius van den Beek
- Institut de Biologie Paris-Seine, Université Pierre et Marie Curie, Paris, France
| | - Björn Grüning
- Department of Computer Science, Albert-Ludwigs-University,Center for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany
| | - Jon Ison
- Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Building 208, 2800 Kongens, Lyngby, Denmark
| | - Hervé Ménager
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), 25 rue du Docteur Roux, Paris, France
| |
Collapse
|
22
|
Hillion KH, Kuzmin I, Khodak A, Rasche E, Crusoe M, Peterson H, Ison J, Ménager H. Using bio.tools to generate and annotate workbench tool descriptions. F1000Res 2017; 6:ELIXIR-2074. [PMID: 29333231 PMCID: PMC5747335 DOI: 10.12688/f1000research.12974.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/26/2017] [Indexed: 11/20/2022] Open
Abstract
Workbench and workflow systems such as Galaxy, Taverna, Chipster, or Common Workflow Language (CWL)-based frameworks, facilitate the access to bioinformatics tools in a user-friendly, scalable and reproducible way. Still, the integration of tools in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete or outdated description of tools that are often missing important information, including parameters and metadata such as publication or links to documentation. ToolDog (Tool DescriptiOn Generator) facilitates the integration of tools - which have been registered in the ELIXIR tools registry (https://bio.tools) - into workbench environments by generating tool description templates. ToolDog includes two modules. The first module analyses the source code of the bioinformatics software with language-specific plugins, and generates a skeleton for a Galaxy XML or CWL tool description. The second module is dedicated to the enrichment of the generated tool description, using metadata provided by bio.tools. This last module can also be used on its own to complete or correct existing tool descriptions with missing metadata.
Collapse
Affiliation(s)
- Kenzo-Hugo Hillion
- Bioinformatics and Biostatistics HUB, Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), Paris, France
| | - Ivan Kuzmin
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Anton Khodak
- Igor Sikorsky Kyiv Polytechnic Institute, National Technical University of Ukraine, Kyiv, Ukraine
| | - Eric Rasche
- Lehrstuhl für Bioinformatik, Institut für Informatik, Albert-Ludwigs-Universität Freiburg, Freiburg, Germany
| | | | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Jon Ison
- DTU Bioinformatics, Technical University of Denmark, Copenhagen, Denmark
| | - Hervé Ménager
- Bioinformatics and Biostatistics HUB, Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), Paris, France
| |
Collapse
|
23
|
Abstract
Linux container technologies, as represented by Docker, provide an alternative to complex and time-consuming installation processes needed for scientific software. The ease of deployment and the process isolation they enable, as well as the reproducibility they permit across environments and versions, are among the qualities that make them interesting candidates for the construction of bioinformatic infrastructures, at any scale from single workstations to high throughput computing architectures. The Docker Hub is a public registry which can be used to distribute bioinformatic software as Docker images. However, its lack of curation and its genericity make it difficult for a bioinformatics user to find the most appropriate images needed. BioShaDock is a bioinformatics-focused Docker registry, which provides a local and fully controlled environment to build and publish bioinformatic software as portable Docker images. It provides a number of improvements over the base Docker registry on authentication and permissions management, that enable its integration in existing bioinformatic infrastructures such as computing platforms. The metadata associated with the registered images are domain-centric, including for instance concepts defined in the EDAM ontology, a shared and structured vocabulary of commonly used terms in bioinformatics. The registry also includes user defined tags to facilitate its discovery, as well as a link to the tool description in the ELIXIR registry if it already exists. If it does not, the BioShaDock registry will synchronize with the registry to create a new description in the Elixir registry, based on the BioShaDock entry metadata. This link will help users get more information on the tool such as its EDAM operations, input and output types. This allows integration with the ELIXIR Tools and Data Services Registry, thus providing the appropriate visibility of such images to the bioinformatics community.
Collapse
Affiliation(s)
| | - Olivier Sallou
- Genouest Bioinformatics Facility, University of Rennes 1/IRISA, Rennes, France
| | - Hervé Ménager
- Centre d'Informatique pour la Biologie, C3BI, Institut Pasteur, Paris, France
| | - Yvan Le Bras
- Genouest Bioinformatics Facility, University of Rennes 1/IRISA, Rennes, France
| | - Cyril Monjeaud
- Genouest Bioinformatics Facility, University of Rennes 1/IRISA, Rennes, France
| | - Christophe Blanchet
- Genouest Bioinformatics Facility, University of Rennes 1/IRISA, Rennes, France
| | - Olivier Collin
- French Institute of Bioinformatics, CNRS IFB-Core, Gif-sur-Yvette, France
| |
Collapse
|
24
|
Ison J, Rapacki K, Ménager H, Kalaš M, Rydza E, Chmura P, Anthon C, Beard N, Berka K, Bolser D, Booth T, Bretaudeau A, Brezovsky J, Casadio R, Cesareni G, Coppens F, Cornell M, Cuccuru G, Davidsen K, Vedova GD, Dogan T, Doppelt-Azeroual O, Emery L, Gasteiger E, Gatter T, Goldberg T, Grosjean M, Grüning B, Helmer-Citterich M, Ienasescu H, Ioannidis V, Jespersen MC, Jimenez R, Juty N, Juvan P, Koch M, Laibe C, Li JW, Licata L, Mareuil F, Mičetić I, Friborg RM, Moretti S, Morris C, Möller S, Nenadic A, Peterson H, Profiti G, Rice P, Romano P, Roncaglia P, Saidi R, Schafferhans A, Schwämmle V, Smith C, Sperotto MM, Stockinger H, Vařeková RS, Tosatto SCE, de la Torre V, Uva P, Via A, Yachdav G, Zambelli F, Vriend G, Rost B, Parkinson H, Løngreen P, Brunak S. Tools and data services registry: a community effort to document bioinformatics resources. Nucleic Acids Res 2015; 44:D38-47. [PMID: 26538599 PMCID: PMC4702812 DOI: 10.1093/nar/gkv1116] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 10/13/2015] [Indexed: 01/24/2023] Open
Abstract
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand. Here we present a community-driven curation effort, supported by ELIXIR—the European infrastructure for biological information—that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners. As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.
Collapse
Affiliation(s)
- Jon Ison
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Kristoffer Rapacki
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Hervé Ménager
- Centre d'Informatique pour la Biologie, C3BI, Institut Pasteur, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, Norway
| | - Emil Rydza
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Piotr Chmura
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Christian Anthon
- Department of Veterinary Clinical and Animal Sciences, Faculty for Health and Medical Sciences, University of Copenhagen, Denmark
| | - Niall Beard
- School of Computer Science, University of Manchester, UK
| | - Karel Berka
- Department of Physical Chemistry, RCPTM, Faculty of Science, Palacky University, Czech Republic
| | - Dan Bolser
- The European Bioinformatics Institute (EMBL-EBI), UK
| | - Tim Booth
- NEBC Wallingford, Centre for Ecology and Hydrology, UK
| | - Anthony Bretaudeau
- INRA, UMR Institut de Génétique, Environnement et Protection des Plantes (IGEPP), BioInformatics Platform for Agroecosystems Arthropods (BIPAA), France INRIA, IRISA, GenOuest Core Facility, France
| | - Jan Brezovsky
- Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, Czech Republic
| | - Rita Casadio
- Bologna Biocomputing Group, University of Bologna, Italy
| | | | - Frederik Coppens
- Department of Plant Systems Biology, VIB, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, Belgium
| | | | | | - Kristian Davidsen
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | | | - Tunca Dogan
- UniProt, European Bioinformatics Institute (EMBL-EBI), UK
| | | | - Laura Emery
- The European Bioinformatics Institute (EMBL-EBI), UK
| | | | - Thomas Gatter
- Faculty of Technology and Center for Biotechnology, Universität Bielefeld, Germany
| | | | - Marie Grosjean
- Institut Français de Bioinformatique (French Institute of Bioinformatics), CNRS, UMS3601, France
| | - Björn Grüning
- Albert-Ludwigs-Universität Freiburg, Fahnenbergplatz, 79085 Freiburg
| | | | - Hans Ienasescu
- Bioinformatics Centre, Department of Biology, University of Copenhagen, Denmark
| | | | - Martin Closter Jespersen
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | | | - Nick Juty
- The European Bioinformatics Institute (EMBL-EBI), UK
| | - Peter Juvan
- Centre for Functional Genomics and Biochips, Faculty of Medicine, University of Ljubljana, Slovenia
| | | | - Camille Laibe
- The European Bioinformatics Institute (EMBL-EBI), UK
| | - Jing-Woei Li
- Faculty of Medicine, The Chinese University of Hong Kong, China Hong Kong Bioinformatics Centre, School of Life Sciences,The Chinese University of Hong Kong, China
| | - Luana Licata
- Dept. of Biology, University of Rome Tor Vergata, Italy
| | - Fabien Mareuil
- Centre d'Informatique pour la Biologie, C3BI, Institut Pasteur, France
| | - Ivan Mičetić
- Department of Biomedical Sciences, University of Padua, Italy
| | | | - Sebastien Moretti
- SIB Swiss Institute of Bioinformatics, Switzerland Department of Ecology and Evolution, Biophore, Evolutionary Bioinformatics group, University of Lausanne, Switzerland
| | | | - Steffen Möller
- Department of Dermatology, University of Lübeck, Germany Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, Germany
| | | | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Estonia
| | | | - Peter Rice
- Department of Computing, William Penney Laboratory, Imperial College London, UK
| | | | | | - Rabie Saidi
- UniProt, European Bioinformatics Institute (EMBL-EBI), UK
| | | | - Veit Schwämmle
- Protein Research Group, Department for Biochemistry and Molecular Biology, University of Southern Denmark, Denmark
| | | | - Maria Maddalena Sperotto
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | | | | | | | - Victor de la Torre
- National Bioinformatics Institute Unit (INB), Fundacion Centro Nacional de Investigaciones Oncologicas, Spain
| | | | - Allegra Via
- Dept. of Physics, Sapienza University, Italy
| | - Guy Yachdav
- Department of Informatics, Bioinformatics-I12, TUM, Germany
| | - Federico Zambelli
- Institute of Biomembranes and Bioenergetics, National Research Council (CNR), and Dept. of Biosciences, University of Milano, Italy
| | - Gert Vriend
- Radboud University Medical Centre, CMBI, Netherlands
| | - Burkhard Rost
- Department of Informatics, Bioinformatics-I12, TUM, Germany
| | | | - Peter Løngreen
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark
| | - Søren Brunak
- Center for Biological Sequence Analysis Department of Systems Biology, Technical University of Denmark, Denmark Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| |
Collapse
|
25
|
Abby SS, Néron B, Ménager H, Touchon M, Rocha EPC. MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems. PLoS One 2014; 9:e110726. [PMID: 25330359 PMCID: PMC4201578 DOI: 10.1371/journal.pone.0110726] [Citation(s) in RCA: 206] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 09/15/2014] [Indexed: 01/21/2023] Open
Abstract
Motivation Biologists often wish to use their knowledge on a few experimental models of a given molecular system to identify homologs in genomic data. We developed a generic tool for this purpose. Results Macromolecular System Finder (MacSyFinder) provides a flexible framework to model the properties of molecular systems (cellular machinery or pathway) including their components, evolutionary associations with other systems and genetic architecture. Modelled features also include functional analogs, and the multiple uses of a same component by different systems. Models are used to search for molecular systems in complete genomes or in unstructured data like metagenomes. The components of the systems are searched by sequence similarity using Hidden Markov model (HMM) protein profiles. The assignment of hits to a given system is decided based on compliance with the content and organization of the system model. A graphical interface, MacSyView, facilitates the analysis of the results by showing overviews of component content and genomic context. To exemplify the use of MacSyFinder we built models to detect and class CRISPR-Cas systems following a previously established classification. We show that MacSyFinder allows to easily define an accurate “Cas-finder” using publicly available protein profiles. Availability and Implementation MacSyFinder is a standalone application implemented in Python. It requires Python 2.7, Hmmer and makeblastdb (version 2.2.28 or higher). It is freely available with its source code under a GPLv3 license at https://github.com/gem-pasteur/macsyfinder. It is compatible with all platforms supporting Python and Hmmer/makeblastdb. The “Cas-finder” (models and HMM profiles) is distributed as a compressed tarball archive as Supporting Information.
Collapse
Affiliation(s)
- Sophie S. Abby
- Microbial Evolutionary Genomics, Institut Pasteur, Paris, France
- UMR3525, CNRS, Paris, France
- * E-mail:
| | - Bertrand Néron
- Centre d’Informatique pour la Biologie, Institut Pasteur, Paris, France
| | - Hervé Ménager
- Centre d’Informatique pour la Biologie, Institut Pasteur, Paris, France
| | - Marie Touchon
- Microbial Evolutionary Genomics, Institut Pasteur, Paris, France
- UMR3525, CNRS, Paris, France
| | - Eduardo P. C. Rocha
- Microbial Evolutionary Genomics, Institut Pasteur, Paris, France
- UMR3525, CNRS, Paris, France
| |
Collapse
|
26
|
Abstract
Motivation: For the biologist, running bioinformatics analyses involves a time-consuming management of data and tools. Users need support to organize their work, retrieve parameters and reproduce their analyses. They also need to be able to combine their analytic tools using a safe data flow software mechanism. Finally, given that scientific tools can be difficult to install, it is particularly helpful for biologists to be able to use these tools through a web user interface. However, providing a web interface for a set of tools raises the problem that a single web portal cannot offer all the existing and possible services: it is the user, again, who has to cope with data copy among a number of different services. A framework enabling portal administrators to build a network of cooperating services would therefore clearly be beneficial. Results: We have designed a system, Mobyle, to provide a flexible and usable Web environment for defining and running bioinformatics analyses. It embeds simple yet powerful data management features that allow the user to reproduce analyses and to combine tools using a hierarchical typing system. Mobyle offers invocation of services distributed over remote Mobyle servers, thus enabling a federated network of curated bioinformatics portals without the user having to learn complex concepts or to install sophisticated software. While being focused on the end user, the Mobyle system also addresses the need, for the bioinfomatician, to automate remote services execution: PlayMOBY is a companion tool that automates the publication of BioMOBY web services, using Mobyle program definitions. Availability: The Mobyle system is distributed under the terms of the GNU GPLv2 on the project web site (http://bioweb2.pasteur.fr/projects/mobyle/). It is already deployed on three servers: http://mobyle.pasteur.fr, http://mobyle.rpbs.univ-paris-diderot.fr and http://lipm-bioinfo.toulouse.inra.fr/Mobyle. The PlayMOBY companion is distributed under the terms of the CeCILL license, and is available at http://lipm-bioinfo.toulouse.inra.fr/biomoby/PlayMOBY/. Contact:mobyle-support@pasteur.fr; mobyle-support@rpbs.univ-paris-diderot.fr; letondal@pasteur.fr Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bertrand Néron
- Groupe Logiciels et Banques de Données, Institut Pasteur, 28, rue du Dr Roux, 75724 Paris Cedex, France.
| | | | | | | | | | | | | | | | | |
Collapse
|