1
|
Bonde B. Edge, Fog, and Cloud Against Disease: The Potential of High-Performance Cloud Computing for Pharma Drug Discovery. Methods Mol Biol 2024; 2716:181-202. [PMID: 37702940 DOI: 10.1007/978-1-0716-3449-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
The high-performance computing (HPC) platform for large-scale drug discovery simulation demands significant investment in speciality hardware, maintenance, resource management, and running costs. The rapid growth in computing hardware has made it possible to provide cost-effective, robust, secure, and scalable alternatives to the on-premise (on-prem) HPC via Cloud, Fog, and Edge computing. It has enabled recent state-of-the-art machine learning (ML) and artificial intelligence (AI)-based tools for drug discovery, such as BERT, BARD, AlphaFold2, and GPT. This chapter attempts to overview types of software architectures for developing scientific software or application with deployment agnostic (on-prem to cloud and hybrid) use cases. Furthermore, the chapter aims to outline how the innovation is disrupting the orthodox mindset of monolithic software running on on-prem HPC and provide the paradigm shift landscape to microservices driven application programming (API) and message parsing interface (MPI)-based scientific computing across the distributed, high-available infrastructure. This is coupled with agile DevOps, and good coding practices, low code and no-code application development frameworks for cost-efficient, secure, automated, and robust scientific application life cycle management.
Collapse
Affiliation(s)
- Bhushan Bonde
- Evotec (UK) Ltd., Dorothy Crowfoot Hodgkin Campus, Abingdon, Oxfordshire, UK.
- Digital Futures Institute, University of Suffolk, Ipswich, United Kingdom.
| |
Collapse
|
2
|
Fahlgren N, Kapoor M, Yordanova G, Papatheodorou I, Waese J, Cole B, Harrison P, Ware D, Tickle T, Paten B, Burdett T, Elsik CG, Tuggle CK, Provart NJ. Toward a data infrastructure for the Plant Cell Atlas. PLANT PHYSIOLOGY 2023; 191:35-46. [PMID: 36200899 PMCID: PMC9806565 DOI: 10.1093/plphys/kiac468] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/18/2022] [Indexed: 06/16/2023]
Abstract
We review how a data infrastructure for the Plant Cell Atlas might be built using existing infrastructure and platforms. The Human Cell Atlas has developed an extensive infrastructure for human and mouse single cell data, while the European Bioinformatics Institute has developed a Single Cell Expression Atlas, that currently houses several plant data sets. We discuss issues related to appropriate ontologies for describing a plant single cell experiment. We imagine how such an infrastructure will enable biologists and data scientists to glean new insights into plant biology in the coming decades, as long as such data are made accessible to the community in an open manner.
Collapse
Affiliation(s)
- Noah Fahlgren
- Donald Danforth Plant Science Center, Saint Louis, Missouri 63132, USA
| | - Muskan Kapoor
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | | | | | - Jamie Waese
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, 1, Cyclotron Road, Berkeley, California 94720, USA
| | - Peter Harrison
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, New York 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Ithaca, New York 14853, USA
| | - Timothy Tickle
- Data Sciences Platform, The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, Massachusetts 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, Baskin School of Engineering, 1156 High Street, Santa Cruz, California 95064, USA
| | - Tony Burdett
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine G Elsik
- Division of Animal Sciences/Division of Plant Science & Technology/Institute for Data Science & Informatics, University of Missouri, Columbia, Missouri 65211, USA
| | - Christopher K Tuggle
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | - Nicholas J Provart
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| |
Collapse
|
3
|
Sturtevant C, DeRego E, Metzger S, Ayres E, Allen D, Burlingame T, Catolico N, Cawley K, Csavina J, Durden D, Florian C, Frost S, Gaddie R, Knapp E, Laney C, Lee R, Lenz D, Litt G, Luo H, Roberti J, Slemmons C, Styers K, Tran C, Vance T, SanClements M. A process approach to quality management doubles
NEON
sensor data quality. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Cove Sturtevant
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Elizabeth DeRego
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Stefan Metzger
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Edward Ayres
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Dan Allen
- National Ecological Observatory Network, Battelle Boulder CO USA
| | | | - Nora Catolico
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Kaelin Cawley
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Janae Csavina
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - David Durden
- National Ecological Observatory Network, Battelle Boulder CO USA
| | | | - Shalane Frost
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Ross Gaddie
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Elizabeth Knapp
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Christine Laney
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Robert Lee
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Dawn Lenz
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Guy Litt
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Hongyan Luo
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Joshua Roberti
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Caleb Slemmons
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Kevin Styers
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Chau Tran
- National Ecological Observatory Network, Battelle Boulder CO USA
| | - Tanya Vance
- National Ecological Observatory Network, Battelle Boulder CO USA
| | | |
Collapse
|
4
|
Elisseev V, Gardiner LJ, Krishna R. Scalable in-memory processing of omics workflows. Comput Struct Biotechnol J 2022; 20:1914-1924. [PMID: 35521547 PMCID: PMC9052061 DOI: 10.1016/j.csbj.2022.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 04/11/2022] [Accepted: 04/11/2022] [Indexed: 11/17/2022] Open
Affiliation(s)
- Vadim Elisseev
- IBM Research Europe, Hartree Centre, Daresbury Laboratory, Keckwick Lane, WarringtonWA4 4AD, Cheshire, UK
- Wrexham Glyndwr University, Mold Rd, Wrexham LL11 2AW, Wales, UK
| | - Laura-Jayne Gardiner
- IBM Research Europe, Hartree Centre, Daresbury Laboratory, Keckwick Lane, WarringtonWA4 4AD, Cheshire, UK
| | - Ritesh Krishna
- IBM Research Europe, Hartree Centre, Daresbury Laboratory, Keckwick Lane, WarringtonWA4 4AD, Cheshire, UK
| |
Collapse
|
5
|
Dwight Z. Data Innovation Provides a Smooth Road to Production: Bioinformatics Needs to Accelerate. Clin Chem 2021; 68:264-265. [DOI: 10.1093/clinchem/hvab247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 10/19/2021] [Indexed: 11/14/2022]
Affiliation(s)
- Zachary Dwight
- Clinical Pathology Labs, Sonic Healthcare USA—Data Science, Austin, TX, USA
| |
Collapse
|
6
|
Benítez-Hidalgo A, Barba-González C, García-Nieto J, Gutiérrez-Moncayo P, Paneque M, Nebro AJ, Roldán-García MDM, Aldana-Montes JF, Navas-Delgado I. TITAN: A knowledge-based platform for Big Data workflow management. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
7
|
Wratten L, Wilm A, Göke J. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nat Methods 2021; 18:1161-1168. [PMID: 34556866 DOI: 10.1038/s41592-021-01254-9] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 07/29/2021] [Indexed: 02/08/2023]
Abstract
The rapid growth of high-throughput technologies has transformed biomedical research. With the increasing amount and complexity of data, scalability and reproducibility have become essential not just for experiments, but also for computational analysis. However, transforming data into information involves running a large number of tools, optimizing parameters, and integrating dynamically changing reference data. Workflow managers were developed in response to such challenges. They simplify pipeline development, optimize resource usage, handle software installation and versions, and run on different compute platforms, enabling workflow portability and sharing. In this Perspective, we highlight key features of workflow managers, compare commonly used approaches for bioinformatics workflows, and provide a guide for computational and noncomputational users. We outline community-curated pipeline initiatives that enable novice and experienced users to perform complex, best-practice analyses without having to manually assemble workflows. In sum, we illustrate how workflow managers contribute to making computational analysis in biomedical research shareable, scalable, and reproducible.
Collapse
Affiliation(s)
| | | | - Jonathan Göke
- Genome Institute of Singapore, Singapore, Singapore.
| |
Collapse
|
8
|
Ma L, Peterson EA, Shin IJ, Muesse J, Marino K, Steliga MA, Johann DJ. NPARS-A Novel Approach to Address Accuracy and Reproducibility in Genomic Data Science. Front Big Data 2021; 4:725095. [PMID: 34647017 PMCID: PMC8503682 DOI: 10.3389/fdata.2021.725095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 09/07/2021] [Indexed: 11/13/2022] Open
Abstract
Background: Accuracy and reproducibility are vital in science and presents a significant challenge in the emerging discipline of data science, especially when the data are scientifically complex and massive in size. Further complicating matters, in the field of genomic-based science high-throughput sequencing technologies generate considerable amounts of data that needs to be stored, manipulated, and analyzed using a plethora of software tools. Researchers are rarely able to reproduce published genomic studies. Results: Presented is a novel approach which facilitates accuracy and reproducibility for large genomic research data sets. All data needed is loaded into a portable local database, which serves as an interface for well-known software frameworks. These include python-based Jupyter Notebooks and the use of RStudio projects and R markdown. All software is encapsulated using Docker containers and managed by Git, simplifying software configuration management. Conclusion: Accuracy and reproducibility in science is of a paramount importance. For the biomedical sciences, advances in high throughput technologies, molecular biology and quantitative methods are providing unprecedented insights into disease mechanisms. With these insights come the associated challenge of scientific data that is complex and massive in size. This makes collaboration, verification, validation, and reproducibility of findings difficult. To address these challenges the NGS post-pipeline accuracy and reproducibility system (NPARS) was developed. NPARS is a robust software infrastructure and methodology that can encapsulate data, code, and reporting for large genomic studies. This paper demonstrates the successful use of NPARS on large and complex genomic data sets across different computational platforms.
Collapse
Affiliation(s)
- Li Ma
- Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, United States
- Department of Information Science, University of Arkansas at Little Rock, Little Rock, AR, United States
| | - Erich A. Peterson
- Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Ik Jae Shin
- Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Jason Muesse
- Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Katy Marino
- Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Matthew A. Steliga
- Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Donald J. Johann
- Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| |
Collapse
|
9
|
John A, Muenzen K, Ausmees K. Evaluation of serverless computing for scalable execution of a joint variant calling workflow. PLoS One 2021; 16:e0254363. [PMID: 34242357 PMCID: PMC8270184 DOI: 10.1371/journal.pone.0254363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 06/24/2021] [Indexed: 11/18/2022] Open
Abstract
Advances in whole-genome sequencing have greatly reduced the cost and time of obtaining raw genetic information, but the computational requirements of analysis remain a challenge. Serverless computing has emerged as an alternative to using dedicated compute resources, but its utility has not been widely evaluated for standardized genomic workflows. In this study, we define and execute a best-practice joint variant calling workflow using the SWEEP workflow management system. We present an analysis of performance and scalability, and discuss the utility of the serverless paradigm for executing workflows in the field of genomics research. The GATK best-practice short germline joint variant calling pipeline was implemented as a SWEEP workflow comprising 18 tasks. The workflow was executed on Illumina paired-end read samples from the European and African super populations of the 1000 Genomes project phase III. Cost and runtime increased linearly with increasing sample size, although runtime was driven primarily by a single task for larger problem sizes. Execution took a minimum of around 3 hours for 2 samples, up to nearly 13 hours for 62 samples, with costs ranging from $2 to $70.
Collapse
Affiliation(s)
- Aji John
- Department of Biology, University of Washington, Seattle, Washington, United States of America
- * E-mail:
| | - Kathleen Muenzen
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, United States of America
| | - Kristiina Ausmees
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
10
|
Spjuth O, Frid J, Hellander A. The machine learning life cycle and the cloud: implications for drug discovery. Expert Opin Drug Discov 2021; 16:1071-1079. [PMID: 34057379 DOI: 10.1080/17460441.2021.1932812] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Introduction: Artificial intelligence (AI) and machine learning (ML) are increasingly used in many aspects of drug discovery. Larger data sizes and methods such as Deep Neural Networks contribute to challenges in data management, the required software stack, and computational infrastructure. There is an increasing need in drug discovery to continuously re-train models and make them available in production environments.Areas covered: This article describes how cloud computing can aid the ML life cycle in drug discovery. The authors discuss opportunities with containerization and scientific workflows and introduce the concept of MLOps and describe how it can facilitate reproducible and robust ML modeling in drug discovery organizations. They also discuss ML on private, sensitive and regulated data.Expert opinion: Cloud computing offers a compelling suite of building blocks to sustain the ML life cycle integrated in iterative drug discovery. Containerization and platforms such as Kubernetes together with scientific workflows can enable reproducible and resilient analysis pipelines, and the elasticity and flexibility of cloud infrastructures enables scalable and efficient access to compute resources. Drug discovery commonly involves working with sensitive or private data, and cloud computing and federated learning can contribute toward enabling collaborative drug discovery within and between organizations.Abbreviations: AI = Artificial Intelligence; DL = Deep Learning; GPU = Graphics Processing Unit; IaaS = Infrastructure as a Service; K8S = Kubernetes; ML = Machine Learning; MLOps = Machine Learning and Operations; PaaS = Platform as a Service; QC = Quality Control; SaaS = Software as a Service.
Collapse
Affiliation(s)
- Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Uppsala Sweden.,Scaleout Systems AB, Sweden
| | | | - Andreas Hellander
- Scaleout Systems AB, Sweden.,Department of Information Technology, Uppsala University, Sweden
| |
Collapse
|
11
|
Blamey B, Toor S, Dahlö M, Wieslander H, Harrison PJ, Sintorn IM, Sabirsh A, Wählby C, Spjuth O, Hellander A. Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit. Gigascience 2021; 10:giab018. [PMID: 33739401 PMCID: PMC7976223 DOI: 10.1093/gigascience/giab018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 01/26/2021] [Accepted: 02/23/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Large streamed datasets, characteristic of life science applications, are often resource-intensive to process, transport and store. We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered "data hierarchy". We introduce the HASTE Toolkit, a proof-of-concept cloud-native software toolkit based on this pipeline model, to partition and prioritize data streams to optimize use of limited computing resources. FINDINGS In our pipeline model, an "interestingness function" assigns an interestingness score to data objects in the stream, inducing a data hierarchy. From this score, a "policy" guides decisions on how to prioritize computational resource use for a given object. The HASTE Toolkit is a collection of tools to adopt this approach. We evaluate with 2 microscopy imaging case studies. The first is a high content screening experiment, where images are analyzed in an on-premise container cloud to prioritize storage and subsequent computation. The second considers edge processing of images for upload into the public cloud for real-time control of a transmission electron microscope. CONCLUSIONS Through our evaluation, we created smart data pipelines capable of effective use of storage, compute, and network resources, enabling more efficient data-intensive experiments. We note a beneficial separation between scientific concerns of data priority, and the implementation of this behaviour for different resources in different deployment contexts. The toolkit allows intelligent prioritization to be `bolted on' to new and existing systems - and is intended for use with a range of technologies in different deployment scenarios.
Collapse
Affiliation(s)
- Ben Blamey
- Department of Information Technology, Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden
| | - Salman Toor
- Department of Information Technology, Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden
| | - Martin Dahlö
- Department of Pharmaceutical Biosciences, Uppsala University, Husargatan 3, 75237, Uppsala, Sweden
- Science for Life Laboratory, Uppsala University, Husargatan 3, 75237 Uppsala, Sweden
| | - Håkan Wieslander
- Department of Information Technology, Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden
| | - Philip J Harrison
- Department of Pharmaceutical Biosciences, Uppsala University, Husargatan 3, 75237, Uppsala, Sweden
- Science for Life Laboratory, Uppsala University, Husargatan 3, 75237 Uppsala, Sweden
| | - Ida-Maria Sintorn
- Department of Information Technology, Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden
- Science for Life Laboratory, Uppsala University, Husargatan 3, 75237 Uppsala, Sweden
- Vironova AB, Gävlegatan 22, 11330 Stockholm, Sweden
| | - Alan Sabirsh
- Advanced Drug Delivery, Pharmaceutical Sciences, R&D, AstraZeneca, Pepparedsleden 1, 43183 Mölndal, Sweden
| | - Carolina Wählby
- Department of Information Technology, Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden
- Science for Life Laboratory, Uppsala University, Husargatan 3, 75237 Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Husargatan 3, 75237, Uppsala, Sweden
- Science for Life Laboratory, Uppsala University, Husargatan 3, 75237 Uppsala, Sweden
| | - Andreas Hellander
- Department of Information Technology, Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden
| |
Collapse
|
12
|
DeepCell Kiosk: scaling deep learning-enabled cellular image analysis with Kubernetes. Nat Methods 2021; 18:43-45. [PMID: 33398191 DOI: 10.1038/s41592-020-01023-0] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 11/23/2020] [Indexed: 12/24/2022]
Abstract
Deep learning is transforming the analysis of biological images, but applying these models to large datasets remains challenging. Here we describe the DeepCell Kiosk, cloud-native software that dynamically scales deep learning workflows to accommodate large imaging datasets. To demonstrate the scalability and affordability of this software, we identified cell nuclei in 106 1-megapixel images in ~5.5 h for ~US$250, with a cost below US$100 achievable depending on cluster configuration. The DeepCell Kiosk can be downloaded at https://github.com/vanvalenlab/kiosk-console ; a persistent deployment is available at https://deepcell.org/ .
Collapse
|
13
|
Capuccini M, Dahlö M, Toor S, Spjuth O. MaRe: Processing Big Data with application containers on Apache Spark. Gigascience 2020; 9:giaa042. [PMID: 32369166 PMCID: PMC7199472 DOI: 10.1093/gigascience/giaa042] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 02/10/2020] [Accepted: 04/07/2020] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Life science is increasingly driven by Big Data analytics, and the MapReduce programming model has been proven successful for data-intensive analyses. However, current MapReduce frameworks offer poor support for reusing existing processing tools in bioinformatics pipelines. Furthermore, these frameworks do not have native support for application containers, which are becoming popular in scientific data processing. RESULTS Here we present MaRe, an open source programming library that introduces support for Docker containers in Apache Spark. Apache Spark and Docker are the MapReduce framework and container engine that have collected the largest open source community; thus, MaRe provides interoperability with the cutting-edge software ecosystem. We demonstrate MaRe on 2 data-intensive applications in life science, showing ease of use and scalability. CONCLUSIONS MaRe enables scalable data-intensive processing in life science with Apache Spark and application containers. When compared with current best practices, which involve the use of workflow systems, MaRe has the advantage of providing data locality, ingestion from heterogeneous storage systems, and interactive processing. MaRe is generally applicable and available as open source software.
Collapse
Affiliation(s)
- Marco Capuccini
- Department of Information Technology, Uppsala University, Box 337, 75105, Uppsala, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Martin Dahlö
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
- Science for Life Laboratory, Uppsala University, Box 591, 751 24, Uppsala, Sweden
- Uppsala Multidisciplinary Center for Advanced Computational Science, Uppsala University, Box 337, 75105, Uppsala, Sweden
| | - Salman Toor
- Department of Information Technology, Uppsala University, Box 337, 75105, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| |
Collapse
|
14
|
Carlsson H, Abujrais S, Herman S, Khoonsari PE, Åkerfeldt T, Svenningsson A, Burman J, Kultima K. Targeted metabolomics of CSF in healthy individuals and patients with secondary progressive multiple sclerosis using high-resolution mass spectrometry. Metabolomics 2020; 16:26. [PMID: 32052189 PMCID: PMC7015966 DOI: 10.1007/s11306-020-1648-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 02/01/2020] [Indexed: 12/24/2022]
Abstract
INTRODUCTION Standardized commercial kits enable targeted metabolomics analysis and may thus provide an attractive complement to the more explorative approaches. The kits are typically developed for triple quadrupole mass spectrometers using serum and plasma. OBJECTIVES Here we measure the concentrations of preselected metabolites in cerebrospinal fluid (CSF) using a kit developed for high-resolution mass spectrometry (HRMS). Secondarily, the study aimed to investigate metabolite alterations in patients with secondary progressive multiple sclerosis (SPMS) compared to controls. METHODS We performed targeted metabolomics in human CSF on twelve SPMS patients and twelve age and sex-matched healthy controls using the Absolute IDQ-p400 kit (Biocrates Life Sciences AG) developed for HRMS. The extracts were analysed using two methods; liquid chromatography-mass spectrometry (LC-HRMS) and flow injection analysis-MS (FIA-HRMS). RESULTS Out of 408 targeted metabolites, 196 (48%) were detected above limit of detection and 35 were absolutely quantified. Metabolites analyzed using LC-HRMS had a median coefficient of variation (CV) of 3% and 2.5% between reinjections the same day and after prolonged storage, respectively. The corresponding results for the FIA-HRMS were a median CV of 27% and 21%, respectively. We found significantly (p < 0.05) elevated levels of glycine, asymmetric dimethylarginine (ADMA), glycerophospholipid PC-O (34:0) and sum of hexoses in SPMS patients compared to controls. CONCLUSION The Absolute IDQ-p400 kit could successfully be used for quantifying targeted metabolites in the CSF. Metabolites quantified using LC-HRMS showed superior reproducibility compared to FIA-HRMS.
Collapse
Affiliation(s)
- Henrik Carlsson
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala University Hospital, Entrance 61, 3rd Floor, Dag Hammarskjölds Väg 18, 751 85, Uppsala, Sweden
| | - Sandy Abujrais
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala University Hospital, Entrance 61, 3rd Floor, Dag Hammarskjölds Väg 18, 751 85, Uppsala, Sweden
| | - Stephanie Herman
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala University Hospital, Entrance 61, 3rd Floor, Dag Hammarskjölds Väg 18, 751 85, Uppsala, Sweden
| | - Payam Emami Khoonsari
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala University Hospital, Entrance 61, 3rd Floor, Dag Hammarskjölds Väg 18, 751 85, Uppsala, Sweden
| | - Torbjörn Åkerfeldt
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala University Hospital, Entrance 61, 3rd Floor, Dag Hammarskjölds Väg 18, 751 85, Uppsala, Sweden
| | - Anders Svenningsson
- Department of Clinical Sciences, Danderyd Hospital, Karolinska Institutet, Stockholm, Sweden
| | - Joachim Burman
- Department of Neuroscience, Uppsala University, Uppsala, Sweden
| | - Kim Kultima
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, Uppsala University Hospital, Entrance 61, 3rd Floor, Dag Hammarskjölds Väg 18, 751 85, Uppsala, Sweden.
| |
Collapse
|
15
|
Capuccini M, Larsson A, Carone M, Novella JA, Sadawi N, Gao J, Toor S, Spjuth O. On-demand virtual research environments using microservices. PeerJ Comput Sci 2019; 5:e232. [PMID: 33816885 PMCID: PMC7924445 DOI: 10.7717/peerj-cs.232] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 10/10/2019] [Indexed: 06/12/2023]
Abstract
The computational demands for scientific applications are continuously increasing. The emergence of cloud computing has enabled on-demand resource allocation. However, relying solely on infrastructure as a service does not achieve the degree of flexibility required by the scientific community. Here we present a microservice-oriented methodology, where scientific applications run in a distributed orchestration platform as software containers, referred to as on-demand, virtual research environments. The methodology is vendor agnostic and we provide an open source implementation that supports the major cloud providers, offering scalable management of scientific pipelines. We demonstrate applicability and scalability of our methodology in life science applications, but the methodology is general and can be applied to other scientific domains.
Collapse
Affiliation(s)
- Marco Capuccini
- Department of Information Technology, Uppsala University, Uppsala, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Anders Larsson
- National Bioinformatics Infrastructure Sweden, Uppsala University, Uppsala, Sweden
| | - Matteo Carone
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| | - Jon Ander Novella
- National Bioinformatics Infrastructure Sweden, Uppsala University, Uppsala, Sweden
| | - Noureddin Sadawi
- Department of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - Jianliang Gao
- Department of Surgery and Cancer, Imperial College London, London, United Kingdom
| | - Salman Toor
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|
16
|
Mammoliti A, Smirnov P, Safikhani Z, Ba-Alawi W, Haibe-Kains B. Creating reproducible pharmacogenomic analysis pipelines. Sci Data 2019; 6:166. [PMID: 31481707 PMCID: PMC6722117 DOI: 10.1038/s41597-019-0174-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 08/13/2019] [Indexed: 02/01/2023] Open
Abstract
The field of pharmacogenomics presents great challenges for researchers that are willing to make their studies reproducible and shareable. This is attributed to the generation of large volumes of high-throughput multimodal data, and the lack of standardized workflows that are robust, scalable, and flexible to perform large-scale analyses. To address this issue, we developed pharmacogenomic workflows in the Common Workflow Language to process two breast cancer datasets in a reproducible and transparent manner. Our pipelines combine both pharmacological and molecular profiles into a portable data object that can be used for future analyses in cancer research. Our data objects and workflows are shared on Harvard Dataverse and Code Ocean where they have been assigned a unique Digital Object Identifier, providing a level of data provenance and a persistent location to access and share our data with the community.
Collapse
Affiliation(s)
- Anthony Mammoliti
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Petr Smirnov
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - Zhaleh Safikhani
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - Wail Ba-Alawi
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Ontario Institute of Cancer Research, Toronto, Ontario, Canada.
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada.
| |
Collapse
|
17
|
Lampa S, Dahlö M, Alvarsson J, Spjuth O. SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines. Gigascience 2019; 8:giz044. [PMID: 31029061 PMCID: PMC6486472 DOI: 10.1093/gigascience/giz044] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 03/03/2019] [Accepted: 03/28/2019] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex workflows, such as with nested loops, dynamic scheduling, and parametrization, which is common in, e.g., machine learning. FINDINGS SciPipe is a workflow programming library implemented in the programming language Go, for managing complex and dynamic pipelines in bioinformatics, cheminformatics, and other fields. SciPipe helps in particular with workflow constructs common in machine learning, such as extensive branching, parameter sweeps, and dynamic scheduling and parametrization of downstream tasks. SciPipe builds on flow-based programming principles to support agile development of workflows based on a library of self-contained, reusable components. It supports running subsets of workflows for improved iterative development and provides a data-centric audit logging feature that saves a full audit trace for every output file of a workflow, which can be converted to other formats such as HTML, TeX, and PDF on demand. The utility of SciPipe is demonstrated with a machine learning pipeline, a genomics, and a transcriptomics pipeline. CONCLUSIONS SciPipe provides a solution for agile development of complex and dynamic pipelines, especially in machine learning, through a flexible application programming interface suitable for scientists used to programming or scripting.
Collapse
Affiliation(s)
- Samuel Lampa
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, 751 24, Uppsala, Sweden
- Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Svante Arrhenius väg 16C, 106 91, Solna, Sweden
| | - Martin Dahlö
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Jonathan Alvarsson
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratory, Uppsala University, Box 591, 751 24, Uppsala, Sweden
| |
Collapse
|
18
|
Peters K, Bradbury J, Bergmann S, Capuccini M, Cascante M, de Atauri P, Ebbels TMD, Foguet C, Glen R, Gonzalez-Beltran A, Günther UL, Handakas E, Hankemeier T, Haug K, Herman S, Holub P, Izzo M, Jacob D, Johnson D, Jourdan F, Kale N, Karaman I, Khalili B, Emami Khonsari P, Kultima K, Lampa S, Larsson A, Ludwig C, Moreno P, Neumann S, Novella JA, O'Donovan C, Pearce JTM, Peluso A, Piras ME, Pireddu L, Reed MAC, Rocca-Serra P, Roger P, Rosato A, Rueedi R, Ruttkies C, Sadawi N, Salek RM, Sansone SA, Selivanov V, Spjuth O, Schober D, Thévenot EA, Tomasoni M, van Rijswijk M, van Vliet M, Viant MR, Weber RJM, Zanetti G, Steinbeck C. PhenoMeNal: processing and analysis of metabolomics data in the cloud. Gigascience 2019; 8:giy149. [PMID: 30535405 PMCID: PMC6377398 DOI: 10.1093/gigascience/giy149] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Revised: 10/19/2018] [Accepted: 11/20/2018] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological, and many other applied biological domains. Its computationally intensive nature has driven requirements for open data formats, data repositories, and data analysis tools. However, the rapid progress has resulted in a mosaic of independent, and sometimes incompatible, analysis methods that are difficult to connect into a useful and complete data analysis solution. FINDINGS PhenoMeNal (Phenome and Metabolome aNalysis) is an advanced and complete solution to set up Infrastructure-as-a-Service (IaaS) that brings workflow-oriented, interoperable metabolomics data analysis platforms into the cloud. PhenoMeNal seamlessly integrates a wide array of existing open-source tools that are tested and packaged as Docker containers through the project's continuous integration process and deployed based on a kubernetes orchestration framework. It also provides a number of standardized, automated, and published analysis workflows in the user interfaces Galaxy, Jupyter, Luigi, and Pachyderm. CONCLUSIONS PhenoMeNal constitutes a keystone solution in cloud e-infrastructures available for metabolomics. PhenoMeNal is a unique and complete solution for setting up cloud e-infrastructures through easy-to-use web interfaces that can be scaled to any custom public and private cloud environment. By harmonizing and automating software installation and configuration and through ready-to-use scientific workflow user interfaces, PhenoMeNal has succeeded in providing scientists with workflow-driven, reproducible, and shareable metabolomics data analysis platforms that are interfaced through standard data formats, representative datasets, versioned, and have been tested for reproducibility and interoperability. The elastic implementation of PhenoMeNal further allows easy adaptation of the infrastructure to other application areas and 'omics research domains.
Collapse
Affiliation(s)
- Kristian Peters
- Leibniz Institute of Plant Biochemistry, Stress and Developmental Biology, Weinberg 3, 06120 Halle (Saale), Germany
| | - James Bradbury
- School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
| | - Sven Bergmann
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marco Capuccini
- Division of Scientific Computing, Department of Information Technology, Uppsala University, Sweden
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24 Uppsala, Sweden
| | - Marta Cascante
- Department of Biochemistry and Molecular Biomedicine, Universitat de Barcelona; Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Instituto de Salud Carlos III (ISCIII), Spain
| | - Pedro de Atauri
- Department of Biochemistry and Molecular Biomedicine, Universitat de Barcelona; Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Instituto de Salud Carlos III (ISCIII), Spain
| | - Timothy M D Ebbels
- Department of Surgery & Cancer, Imperial College London, South Kensington, London, SW7 2AZ, United Kingdom
| | - Carles Foguet
- Department of Biochemistry and Molecular Biomedicine, Universitat de Barcelona; Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Instituto de Salud Carlos III (ISCIII), Spain
| | - Robert Glen
- Department of Surgery & Cancer, Imperial College London, South Kensington, London, SW7 2AZ, United Kingdom
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB21EW, United Kingdom
| | - Alejandra Gonzalez-Beltran
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, OX1 3QG, Oxford, United Kingdom
| | - Ulrich L Günther
- Institute of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
| | - Evangelos Handakas
- Department of Surgery & Cancer, Imperial College London, South Kensington, London, SW7 2AZ, United Kingdom
| | - Thomas Hankemeier
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research (LACDR), Leiden University, Leiden, 2333 CC, The Netherlands
| | - Kenneth Haug
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Stephanie Herman
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24 Uppsala, Sweden
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, 751 85 Uppsala, Sweden
| | | | - Massimiliano Izzo
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, OX1 3QG, Oxford, United Kingdom
| | - Daniel Jacob
- INRA, University of Bordeaux, Plateforme Métabolome Bordeaux-MetaboHUB, 33140 Villenave d'Ornon, France
| | - David Johnson
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, OX1 3QG, Oxford, United Kingdom
- Department of Informatics and Media, Uppsala University, Box 513, 751 20 Uppsala, Sweden
| | - Fabien Jourdan
- INRA - French National Institute for Agricultural Research, UMR1331, Toxalim, Research Centre in Food Toxicology, Toulouse, France
| | - Namrata Kale
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Ibrahim Karaman
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, St. Mary's Campus, Norfolk Place, W2 1PG, London, United Kingdom
| | - Bita Khalili
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Payam Emami Khonsari
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, 751 85 Uppsala, Sweden
| | - Kim Kultima
- Department of Medical Sciences, Clinical Chemistry, Uppsala University, 751 85 Uppsala, Sweden
| | - Samuel Lampa
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24 Uppsala, Sweden
| | - Anders Larsson
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24 Uppsala, Sweden
- National Bioinformatics Infrastructure Sweden, Uppsala University, Uppsala, Sweden
| | - Christian Ludwig
- Institute of Metabolism and Systems Research (IMSR), University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, Stress and Developmental Biology, Weinberg 3, 06120 Halle (Saale), Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103 Leipzig, Germany
| | - Jon Ander Novella
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24 Uppsala, Sweden
- National Bioinformatics Infrastructure Sweden, Uppsala University, Uppsala, Sweden
| | - Claire O'Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Jake T M Pearce
- Department of Surgery & Cancer, Imperial College London, South Kensington, London, SW7 2AZ, United Kingdom
| | - Alina Peluso
- Department of Surgery & Cancer, Imperial College London, South Kensington, London, SW7 2AZ, United Kingdom
| | | | | | - Michelle A C Reed
- Institute of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, OX1 3QG, Oxford, United Kingdom
| | - Pierrick Roger
- CEA, LIST, Laboratory for Data Analysis and Systems’ Intelligence, MetaboHUB, Gif-Sur-Yvette F-91191, France
| | - Antonio Rosato
- Magnetic Resonance Center (CERM) and Department of Chemistry, University of Florence and CIRMMP, 50019 Sesto Fiorentino, Florence, Italy
| | - Rico Rueedi
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Christoph Ruttkies
- Leibniz Institute of Plant Biochemistry, Stress and Developmental Biology, Weinberg 3, 06120 Halle (Saale), Germany
| | - Noureddin Sadawi
- Department of Computer Science, College of Engineering, Design and Physical Sciences, Brunel University, London, UK
- Department of Surgery & Cancer, Imperial College London, South Kensington, London, SW7 2AZ, United Kingdom
| | - Reza M Salek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, OX1 3QG, Oxford, United Kingdom
| | - Vitaly Selivanov
- Department of Biochemistry and Molecular Biomedicine, Universitat de Barcelona; Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Instituto de Salud Carlos III (ISCIII), Spain
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24 Uppsala, Sweden
| | - Daniel Schober
- Leibniz Institute of Plant Biochemistry, Stress and Developmental Biology, Weinberg 3, 06120 Halle (Saale), Germany
| | - Etienne A Thévenot
- CEA, LIST, Laboratory for Data Analysis and Systems’ Intelligence, MetaboHUB, Gif-Sur-Yvette F-91191, France
| | - Mattia Tomasoni
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Merlijn van Rijswijk
- Netherlands Metabolomics Center, Leiden, 2333 CC, Netherlands
- ELIXIR-NL, Dutch Techcentre for Life Sciences, Utrecht, 3503 RM, Netherlands
| | - Michael van Vliet
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research (LACDR), Leiden University, Leiden, 2333 CC, The Netherlands
| | - Mark R Viant
- School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
- Phenome Centre Birmingham, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
| | - Ralf J M Weber
- School of Biosciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
- Phenome Centre Birmingham, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
| | | | - Christoph Steinbeck
- Cheminformatics and Computational Metabolomics, Institute for Analytical Chemistry, Lessingstr. 8, 07743 Jena, Germany
| |
Collapse
|