1
|
Palmblad M, Asein E, Bergman NP, Ivanova A, Ramasauskas L, Reyes HM, Ruchti S, Soto-Jácome L, Bergquist J. Semantic Annotation of Experimental Methods in Analytical Chemistry. Anal Chem 2022; 94:15464-15471. [PMID: 36281827 PMCID: PMC9647698 DOI: 10.1021/acs.analchem.2c03565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 10/12/2022] [Indexed: 11/30/2022]
Abstract
A major obstacle for reusing and integrating existing data is finding the data that is most relevant in a given context. The primary metadata resource is the scientific literature describing the experiments that produced the data. To stimulate the development of natural language processing methods for extracting this information from articles, we have manually annotated 100 recent open access publications in Analytical Chemistry as semantic graphs. We focused on articles mentioning mass spectrometry in their experimental sections, as we are particularly interested in the topic, which is also within the domain of several ontologies and controlled vocabularies. The resulting gold standard dataset is publicly available and directly applicable to validating automated methods for retrieving this metadata from the literature. In the process, we also made a number of observations on the structure and description of experiments and open access publication in this journal.
Collapse
Affiliation(s)
- Magnus Palmblad
- Center
for Proteomics and Metabolomics, Leiden
University Medical Center, 2300 RC Leiden, The Netherlands
| | - Enahoro Asein
- Institute
of Chemistry, University of Tartu, Ravila 14a, 50411 Tartu, Estonia
| | - Nina P. Bergman
- Analytical
Pharmaceutical Chemistry, Department of Medicinal Chemistry - BMC, Uppsala University, SE-75123 Uppsala, Sweden
| | - Arina Ivanova
- Analytical
Chemistry and Neurochemistry, Department of Chemistry—BMC, Uppsala University, SE-75124 Uppsala, Sweden
| | - Lukas Ramasauskas
- Analytical
Chemistry and Neurochemistry, Department of Chemistry—BMC, Uppsala University, SE-75124 Uppsala, Sweden
| | | | - Stefan Ruchti
- Institute
of Chemistry, University of Tartu, Ravila 14a, 50411 Tartu, Estonia
- Analytical
Chemistry and Neurochemistry, Department of Chemistry—BMC, Uppsala University, SE-75124 Uppsala, Sweden
| | | | - Jonas Bergquist
- Analytical
Chemistry and Neurochemistry, Department of Chemistry—BMC, Uppsala University, SE-75124 Uppsala, Sweden
| |
Collapse
|
2
|
Serrano-Solano B, Fouilloux A, Eguinoa I, Kalaš M, Grüning B, Coppens F. Galaxy: A Decade of Realising CWFR Concepts. DATA INTELLIGENCE 2022. [DOI: 10.1162/dint_a_00136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
Despite recent encouragement to follow the FAIR principles, the day-to-day research practices have not changed substantially. Due to new developments and the increasing pressure to apply best practices, initiatives to improve the efficiency and reproducibility of scientific workflows are becoming more prevalent. In this article, we discuss the importance of well-annotated tools and the specific requirements to ensure reproducible research with FAIR outputs. We detail how Galaxy, an open-source workflow management system with a web-based interface, has implemented the concepts that are put forward by the Canonical Workflow Framework for Research (CWFR), whilst minimising changes to the practices of scientific communities. Although we showcase concrete applications from two different domains, this approach is generalisable to any domain and particularly useful in interdisciplinary research and science-based applications.
Collapse
Affiliation(s)
| | - Anne Fouilloux
- Department of Geosciences, University of Oslo, Oslo 0316, Norway
| | - Ignacio Eguinoa
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB, Gent, Oost-Vlaanderen 9052, Belgium
| | - Matúš Kalaš
- Department of Informatics, University of Bergen Ringgold standard institution, University of Bergen, Bergen, Hordaland 5008, Norway
| | - Björn Grüning
- Bioinformatics Group, University of Freiburg, Baden-Württemberg 79098, Germany
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium
- VIB, Gent, Oost-Vlaanderen 9052, Belgium
| |
Collapse
|
3
|
Lamprecht AL, Palmblad M, Ison J, Schwämmle V, Al Manir MS, Altintas I, Baker CJO, Ben Hadj Amor A, Capella-Gutierrez S, Charonyktakis P, Crusoe MR, Gil Y, Goble C, Griffin TJ, Groth P, Ienasescu H, Jagtap P, Kalaš M, Kasalica V, Khanteymoori A, Kuhn T, Mei H, Ménager H, Möller S, Richardson RA, Robert V, Soiland-Reyes S, Stevens R, Szaniszlo S, Verberne S, Verhoeven A, Wolstencroft K. Perspectives on automated composition of workflows in the life sciences. F1000Res 2021; 10:897. [PMID: 34804501 PMCID: PMC8573700 DOI: 10.12688/f1000research.54159.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/27/2021] [Indexed: 12/29/2022] Open
Abstract
Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.
Collapse
Affiliation(s)
| | - Magnus Palmblad
- Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands
| | - Jon Ison
- French Institute of Bioinformatics, 91057 Évry, France
| | | | | | - Ilkay Altintas
- University of California San Diego, La Jolla, CA, 92093, USA
| | - Christopher J. O. Baker
- University of New Brunswick, Saint John, E2L 4L5, Canada
- IPSNP Computing Inc., Saint John, E2L 4S6, Canada
| | | | | | | | | | - Yolanda Gil
- University of Southern California, Marina Del Rey, CA, 90292, USA
| | - Carole Goble
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Paul Groth
- University of Amsterdam, 1090 GH Amsterdam, The Netherlands
| | - Hans Ienasescu
- Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | - Pratik Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA
| | | | | | | | - Tobias Kuhn
- VU Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - Hailiang Mei
- Sequencing Analysis Support Core, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
| | | | - Steffen Möller
- IBIMA, Rostock University Medical Center, 18057 Rostock, Germany
| | | | | | - Stian Soiland-Reyes
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
- Informatics Institute, University of Amsterdam, 1090 GH Amsterdam, The Netherlands
| | - Robert Stevens
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | | | - Suzan Verberne
- Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
| | - Aswin Verhoeven
- Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands
| | - Katherine Wolstencroft
- Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
| |
Collapse
|
4
|
Schwämmle V, Harrow J, Ienasescu H. Proteomics Software in bio.tools: Coverage and Annotations. J Proteome Res 2021; 20:1821-1825. [PMID: 33720718 DOI: 10.1021/acs.jproteome.0c00978] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The large diversity of experimental methods in proteomics as well as their increasing usage across biological and clinical research has led to the development of hundreds if not thousands of software tools to aid in the analysis and interpretation of the resulting data. Detailed information about these tools needs to be collected, categorized, and validated to guarantee their optimal utilization. A tools registry like bio.tools enables users and developers to identify new tools with more powerful algorithms or to find tools with similar functions for comparison. Here we present the content of the registry, which now comprises more than 1000 proteomics tool entries. Furthermore, we discuss future applications and engagement with other community efforts resulting in a high impact on the bioinformatics landscape.
Collapse
Affiliation(s)
- Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Jennifer Harrow
- ELIXIR-Hub, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Hans Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| |
Collapse
|
5
|
Kasalica V, Schwämmle V, Palmblad M, Ison J, Lamprecht AL. APE in the Wild: Automated Exploration of Proteomics Workflows in the bio.tools Registry. J Proteome Res 2021; 20:2157-2165. [PMID: 33720735 PMCID: PMC8041394 DOI: 10.1021/acs.jproteome.0c00983] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The bio.tools registry is a main catalogue of computational tools in the life sciences. More than 17 000 tools have been registered by the international bioinformatics community. The bio.tools metadata schema includes semantic annotations of tool functions, that is, formal descriptions of tools' data types, formats, and operations with terms from the EDAM bioinformatics ontology. Such annotations enable the automated composition of tools into multistep pipelines or workflows. In this Technical Note, we revisit a previous case study on the automated composition of proteomics workflows. We use the same four workflow scenarios but instead of using a small set of tools with carefully handcrafted annotations, we explore workflows directly on bio.tools. We use the Automated Pipeline Explorer (APE), a reimplementation and extension of the workflow composition method previously used. Moving "into the wild" opens up an unprecedented wealth of tools and a huge number of alternative workflows. Automated composition tools can be used to explore this space of possibilities systematically. Inevitably, the mixed quality of semantic annotations in bio.tools leads to unintended or erroneous tool combinations. However, our results also show that additional control mechanisms (tool filters, configuration options, and workflow constraints) can effectively guide the exploration toward smaller sets of more meaningful workflows.
Collapse
Affiliation(s)
- Vedran Kasalica
- Department of Information and Computing Sciences, Utrecht University, Utrecht 3584 CC, The Netherlands
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense 5230, Denmark
| | - Magnus Palmblad
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden 2300 RC, The Netherlands
| | - Jon Ison
- Institut Français de Bioinformatique, CNRS, Crémieux F-91000, France
| | - Anna-Lena Lamprecht
- Department of Information and Computing Sciences, Utrecht University, Utrecht 3584 CC, The Netherlands
| |
Collapse
|
6
|
Ison J, Ienasescu H, Rydza E, Chmura P, Rapacki K, Gaignard A, Schwämmle V, van Helden J, Kalaš M, Ménager H. biotoolsSchema: a formalized schema for bioinformatics software description. Gigascience 2021; 10:giaa157. [PMID: 33506265 PMCID: PMC7842104 DOI: 10.1093/gigascience/giaa157] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 11/10/2020] [Accepted: 12/07/2020] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description-and cataloguing-of bioinformatics resources. FINDINGS Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability. CONCLUSIONS biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.
Collapse
Affiliation(s)
- Jon Ison
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
| | - Hans Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| | - Emil Rydza
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200 København, Denmark
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200 København, Denmark
| | - Kristoffer Rapacki
- Department of Health Technology, Ørsteds Plads, Building 345C, DK-2800 Kongens, Lyngby, Denmark
| | - Alban Gaignard
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- L'institut du Thorax, INSERM, CNRS, University of Nantes, 44007 Nantes, France
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Jacques van Helden
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- Département de Biologie, Aix-Marseille Université (AMU), 3 place Victor Hugo, 13003 Marseille, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5008 Bergen, Norway
| | - Hervé Ménager
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- Hub de Bioinformatique et Biostatistique–Département Biologie Computationnelle, Institut Pasteur, USR 3756, CNRS, Paris 75015, France
| |
Collapse
|
7
|
Salgado D, Armean IM, Baudis M, Beltran S, Capella-Gutierrez S, Carvalho-Silva D, Dominguez Del Angel V, Dopazo J, Furlong LI, Gao B, Garcia L, Gerloff D, Gut I, Gyenesei A, Habermann N, Hancock JM, Hanauer M, Hovig E, Johansson LF, Keane T, Korbel J, Lauer KB, Laurie S, Leskošek B, Lloyd D, Marques-Bonet T, Mei H, Monostory K, Piñero J, Poterlowicz K, Rath A, Samarakoon P, Sanz F, Saunders G, Sie D, Swertz MA, Tsukanov K, Valencia A, Vidak M, Yenyxe González C, Ylstra B, Béroud C. The ELIXIR Human Copy Number Variations Community: building bioinformatics infrastructure for research. F1000Res 2020; 9. [PMID: 34367618 PMCID: PMC8311797 DOI: 10.12688/f1000research.24887.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/27/2020] [Indexed: 02/02/2023] Open
Abstract
Copy number variations (CNVs) are major causative contributors both in the genesis of genetic diseases and human neoplasias. While “High-Throughput” sequencing technologies are increasingly becoming the primary choice for genomic screening analysis, their ability to efficiently detect CNVs is still heterogeneous and remains to be developed. The aim of this white paper is to provide a guiding framework for the future contributions of ELIXIR’s recently established
human CNV Community, with implications beyond human disease diagnostics and population genomics. This white paper is the direct result of a strategy meeting that took place in September 2018 in Hinxton (UK) and involved representatives of 11 ELIXIR Nodes. The meeting led to the definition of priority objectives and tasks, to address a wide range of CNV-related challenges ranging from detection and interpretation to sharing and training. Here, we provide suggestions on how to align these tasks within the ELIXIR Platforms strategy, and on how to frame the activities of this new ELIXIR Community in the international context.
Collapse
Affiliation(s)
| | - Irina M Armean
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Michael Baudis
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Sergi Beltran
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 4, Barcelona 08028, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Salvador Capella-Gutierrez
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Spanish National Bioinformatics Institute (INB)/ELIXIR-ES, Barcelona, Spain
| | - Denise Carvalho-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.,Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Joaquin Dopazo
- Clinical Bioinformatics Area, Fundación Progreso y Salud, CDCA, Hospital Virgen del Rocio, Sevilla, Spain
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University (UPF), Barcelona, Spain
| | - Bo Gao
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Leyla Garcia
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK.,ZB MED Information Centre for Life Sciences, Cologne, Germany.,ELIXIR Hub, Hinxton, UK
| | - Dietlind Gerloff
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg
| | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 4, Barcelona 08028, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Attila Gyenesei
- Szentágothai Research Center, University of Pécs, Pécs, Hungary
| | - Nina Habermann
- Genome Biology, European Molecular Biological Laboratory, Heidelberg, Germany
| | | | | | - Eivind Hovig
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.,Centre for bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Lennart F Johansson
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Thomas Keane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Jan Korbel
- Genome Biology, European Molecular Biological Laboratory, Heidelberg, Germany
| | | | - Steve Laurie
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri Reixac 4, Barcelona 08028, Spain
| | - Brane Leskošek
- Faculty of Medicine - ELIXIR Slovenia, University of Ljubljana, Ljubljana, Slovenia
| | | | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), Catalan Institution for Research and Advanced Studies, Barcelona, Spain
| | - Hailiang Mei
- Sequencing Analysis Support Core, Leiden University Medical Center, Leiden, The Netherlands
| | - Katalin Monostory
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Janet Piñero
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University (UPF), Barcelona, Spain
| | | | | | - Pubudu Samarakoon
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University (UPF), Barcelona, Spain
| | | | - Daoud Sie
- Department of Clinical Genetics, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Morris A Swertz
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Kirill Tsukanov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,Spanish National Bioinformatics Institute (INB)/ELIXIR-ES, Barcelona, Spain.,Catalan Institution of Research and Advanced Studies, Barcelona, Spain
| | - Marko Vidak
- Faculty of Medicine - ELIXIR Slovenia, University of Ljubljana, Ljubljana, Slovenia
| | - Cristina Yenyxe González
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Bauke Ylstra
- Department of Pathology, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Christophe Béroud
- Aix Marseille Univ, INSERM, MMG, Marseille, France.,Département de Génétique Médicale et de Biologie Cellulaire, APHM, Hôpital d'enfants de la Timone, 13385 Marseille, France
| |
Collapse
|
8
|
Lamprecht AL, Garcia L, Kuzak M, Martinez C, Arcila R, Martin Del Pico E, Dominguez Del Angel V, van de Sandt S, Ison J, Martinez PA, McQuilton P, Valencia A, Harrow J, Psomopoulos F, Gelpi JL, Chue Hong N, Goble C, Capella-Gutierrez S. Towards FAIR principles for research software. ACTA ACUST UNITED AC 2020. [DOI: 10.3233/ds-190026] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
| | - Leyla Garcia
- ZBMED Information Centre for Life Sciences, Germany. E-mail:
| | - Mateusz Kuzak
- Netherlands eScience Center, The Netherlands
- Dutch Techcentre for Life Sciences, The Netherlands. E-mail:
| | | | | | | | | | | | - Jon Ison
- National Life Science Supercomputing Center, Technical University of Denmark, Denmark. E-mail:
| | | | | | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Spain. E-mail:
| | | | | | - Josep Ll. Gelpi
- Barcelona Supercomputing Center (BSC), Spain
- University of Barcelona, Spain. E-mail:
| | - Neil Chue Hong
- Software Sustainability Institute, UK
- EPCC, University of Edinburgh, UK. E-mail:
| | | | | |
Collapse
|
9
|
Krzhizhanovskaya VV, Závodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, Teixeira J. APE: A Command-Line Tool and API for Automated Workflow Composition. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7304703 DOI: 10.1007/978-3-030-50436-6_34] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Automated workflow composition is bound to take the work with scientific workflows to the next level. On top of today’s comprehensive eScience infrastructure, it enables the automated generation of possible workflows for a given specification. However, functionality for automated workflow composition tends to be integrated with one of the many available workflow management systems, and is thus difficult or impossible to apply in other environments. Therefore we have developed APE (the Automated Pipeline Explorer) as a command-line tool and API for automated composition of scientific workflows. APE is easily configured to a new application domain by providing it with a domain ontology and semantically annotated tools. It can then be used to synthesize purpose-specific workflows based on a specification of the available workflow inputs, desired outputs and possibly additional constraints. The workflows can further be transformed into executable implementations and/or exported into standard workflow formats. In this paper we describe APE v1.0 and discuss lessons learned from applications in bioinformatics and geosciences.
Collapse
|