1
|
Bouyssié D, Altıner P, Capella-Gutierrez S, Fernández JM, Hagemeijer YP, Horvatovich P, Hubálek M, Levander F, Mauri P, Palmblad M, Raffelsberger W, Rodríguez-Navas L, Di Silvestre D, Kunkli BT, Uszkoreit J, Vandenbrouck Y, Vizcaíno JA, Winkelhardt D, Schwämmle V. WOMBAT-P: Benchmarking Label-Free Proteomics Data Analysis Workflows. J Proteome Res 2024; 23:418-429. [PMID: 38038272 DOI: 10.1021/acs.jproteome.3c00636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.
Collapse
Affiliation(s)
- David Bouyssié
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III─Paul Sabatier (UT3), 31062 Toulouse, France
- Proteomics French Infrastructure, ProFI, FR 2048 Toulouse, France
| | - Pınar Altıner
- Institut de Pharmacologie et de Biologie Structurale (IPBS), Université de Toulouse, CNRS, Université Toulouse III─Paul Sabatier (UT3), 31062 Toulouse, France
| | | | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Yanick Paco Hagemeijer
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, 9712 CP Groningen, The Netherlands
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, 9713 GZ Groningen, The Netherlands
| | - Peter Horvatovich
- Department of Analytical Biochemistry, University of Groningen, Groningen Research Institute of Pharmacy, 9712 CP Groningen, The Netherlands
| | - Martin Hubálek
- Institute of Organic Chemistry and Biochemistry, CAS, 160 00 Prague, Czech Republic
| | - Fredrik Levander
- National Bioinformatics Infrastructure Sweden (NBIS), Science for Life Laboratory, Department of Immunotechnology, Lund University, 22100 Lund, Sweden
| | - Pierluigi Mauri
- Institute for Biomedical Technologies (ITB), Department of Biomedical Sciences, National Research Council (CNR), Segrate, 20054 Milan, Italy
| | - Magnus Palmblad
- Leiden University Medical Center, Postbus 9600, 2300 RC Leiden, The Netherlands
| | - Wolfgang Raffelsberger
- Wolfgang Raffelsberger: Institut de Génétique et de Biologie Moléculaire et Cellulaire, Université de Strasbourg, CNRS UMR7104, INSERM U1258, Illkirch, 1 Rue Laurent Fries, 67404 Illkirch, France
| | - Laura Rodríguez-Navas
- Life Sciences Department, Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Dario Di Silvestre
- Institute for Biomedical Technologies (ITB), Department of Biomedical Sciences, National Research Council (CNR), Segrate, 20054 Milan, Italy
| | - Balázs Tibor Kunkli
- Balázs Tibor Kunkli: Department of Biochemistry and Molecular Biology, University of Debrecen, 4032 Debrecen, Hungary
| | - Julian Uszkoreit
- Medical Faculty, Medical Bioinformatics, Ruhr University Bochum, 44801 Bochum, Germany
- Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Ruhr University Bochum, 44801 Bochum, Germany
- Medical Faculty, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Yves Vandenbrouck
- Proteomics French Infrastructure, ProFI, FR 2048 Toulouse, France
- CEA, Fundamental Research Division, Proteomics French Infrastructure, 91191 Gif-sur-Yvette, France
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI), Wellcome Trust, Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Dirk Winkelhardt
- Medical Faculty, Medizinisches Proteom-Center, Ruhr University Bochum, 44801 Bochum, Germany
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark
| |
Collapse
|
2
|
Du X, Dastmalchi F, Diller MA, Brochhausen M, Garrett TJ, Hogan WR, Lemas DJ. An Automated Workflow Composition System for Liquid Chromatography-Mass Spectrometry Metabolomics Data Processing. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2023; 34:2857-2863. [PMID: 37874901 DOI: 10.1021/jasms.3c00248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2023]
Abstract
Liquid chromatography-mass spectrometry (LC-MS) metabolomics studies produce high-dimensional data that must be processed by a complex network of informatics tools to generate analysis-ready data sets. As the first computational step in metabolomics, data processing is increasingly becoming a challenge for researchers to develop customized computational workflows that are applicable for LC-MS metabolomics analysis. Ontology-based automated workflow composition (AWC) systems provide a feasible approach for developing computational workflows that consume high-dimensional molecular data. We used the Automated Pipeline Explorer (APE) to create an AWC for LC-MS metabolomics data processing across three use cases. Our results show that APE predicted 145 data processing workflows across all the three use cases. We identified six traditional workflows and six novel workflows. Through manual review, we found that one-third of novel workflows were executable whereby the data processing function could be completed without obtaining an error. When selecting the top six workflows from each use case, the computational viable rate of our predicted workflows reached 45%. Collectively, our study demonstrates the feasibility of developing an AWC system for LC-MS metabolomics data processing.
Collapse
Affiliation(s)
- Xinsong Du
- Division of General Internal Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, United States
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Farhad Dastmalchi
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida 32610, United States
| | - Matthew A Diller
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida 32610, United States
| | - Mathias Brochhausen
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas 72205, United States
| | - Timothy J Garrett
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, Florida 32610, United States
| | - William R Hogan
- Data Science Institute, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, United States
| | - Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida 32610, United States
- Department of Obstetrics and Gynecology, College of Medicine, University of Florida, Gainesville, Florida 32610, United States
- Center for Perinatal Outcomes Research, College of Medicine, University of Florida, Gainesville, Florida 32610, United States
| |
Collapse
|
3
|
Fu J, Zhu F, Xu CJ, Li Y. Metabolomics meets systems immunology. EMBO Rep 2023; 24:e55747. [PMID: 36916532 PMCID: PMC10074123 DOI: 10.15252/embr.202255747] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 12/24/2022] [Accepted: 02/24/2023] [Indexed: 03/16/2023] Open
Abstract
Metabolic processes play a critical role in immune regulation. Metabolomics is the systematic analysis of small molecules (metabolites) in organisms or biological samples, providing an opportunity to comprehensively study interactions between metabolism and immunity in physiology and disease. Integrating metabolomics into systems immunology allows the exploration of the interactions of multilayered features in the biological system and the molecular regulatory mechanism of these features. Here, we provide an overview on recent technological developments of metabolomic applications in immunological research. To begin, two widely used metabolomics approaches are compared: targeted and untargeted metabolomics. Then, we provide a comprehensive overview of the analysis workflow and the computational tools available, including sample preparation, raw spectra data preprocessing, data processing, statistical analysis, and interpretation. Third, we describe how to integrate metabolomics with other omics approaches in immunological studies using available tools. Finally, we discuss new developments in metabolomics and its prospects for immunology research. This review provides guidance to researchers using metabolomics and multiomics in immunity research, thus facilitating the application of systems immunology to disease research.
Collapse
Affiliation(s)
- Jianbo Fu
- Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz Centre for Infection Research (HZI) and Hannover Medical School (MHH), Hannover, Germany.,TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Helmholtz Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Cheng-Jian Xu
- Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz Centre for Infection Research (HZI) and Hannover Medical School (MHH), Hannover, Germany.,TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Helmholtz Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany.,Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Yang Li
- Centre for Individualised Infection Medicine (CiiM), a joint venture between the Helmholtz Centre for Infection Research (HZI) and Hannover Medical School (MHH), Hannover, Germany.,TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Helmholtz Centre for Infection Research (HZI) and the Hannover Medical School (MHH), Hannover, Germany.,Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, Nijmegen, The Netherlands
| |
Collapse
|
4
|
Du X, Dastmalchi F, Ye H, Garrett TJ, Diller MA, Liu M, Hogan WR, Brochhausen M, Lemas DJ. Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software. Metabolomics 2023; 19:11. [PMID: 36745241 DOI: 10.1007/s11306-023-01974-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 01/20/2023] [Indexed: 02/07/2023]
Abstract
BACKGROUND Liquid chromatography-high resolution mass spectrometry (LC-HRMS) is a popular approach for metabolomics data acquisition and requires many data processing software tools. The FAIR Principles - Findability, Accessibility, Interoperability, and Reusability - were proposed to promote open science and reusable data management, and to maximize the benefit obtained from contemporary and formal scholarly digital publishing. More recently, the FAIR principles were extended to include Research Software (FAIR4RS). AIM OF REVIEW This study facilitates open science in metabolomics by providing an implementation solution for adopting FAIR4RS in the LC-HRMS metabolomics data processing software. We believe our evaluation guidelines and results can help improve the FAIRness of research software. KEY SCIENTIFIC CONCEPTS OF REVIEW We evaluated 124 LC-HRMS metabolomics data processing software obtained from a systematic review and selected 61 software for detailed evaluation using FAIR4RS-related criteria, which were extracted from the literature along with internal discussions. We assigned each criterion one or more FAIR4RS categories through discussion. The minimum, median, and maximum percentages of criteria fulfillment of software were 21.6%, 47.7%, and 71.8%. Statistical analysis revealed no significant improvement in FAIRness over time. We identified four criteria covering multiple FAIR4RS categories but had a low %fulfillment: (1) No software had semantic annotation of key information; (2) only 6.3% of evaluated software were registered to Zenodo and received DOIs; (3) only 14.5% of selected software had official software containerization or virtual machine; (4) only 16.7% of evaluated software had a fully documented functions in code. According to the results, we discussed improvement strategies and future directions.
Collapse
Affiliation(s)
- Xinsong Du
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Farhad Dastmalchi
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Hao Ye
- Health Science Center Libraries, University of Florida, Florida, USA
| | - Timothy J Garrett
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Florida, USA
| | - Matthew A Diller
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mei Liu
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mathias Brochhausen
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, USA
| | - Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA.
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Florida, Gainesville, United States.
- Center for Perinatal Outcomes Research, University of Florida College of Medicine, Gainesville, United States.
| |
Collapse
|
5
|
Du X, Aristizabal-Henao JJ, Garrett TJ, Brochhausen M, Hogan WR, Lemas DJ. A Checklist for Reproducible Computational Analysis in Clinical Metabolomics Research. Metabolites 2022; 12:87. [PMID: 35050209 PMCID: PMC8779534 DOI: 10.3390/metabo12010087] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 12/25/2021] [Accepted: 01/10/2022] [Indexed: 12/15/2022] Open
Abstract
Clinical metabolomics emerged as a novel approach for biomarker discovery with the translational potential to guide next-generation therapeutics and precision health interventions. However, reproducibility in clinical research employing metabolomics data is challenging. Checklists are a helpful tool for promoting reproducible research. Existing checklists that promote reproducible metabolomics research primarily focused on metadata and may not be sufficient to ensure reproducible metabolomics data processing. This paper provides a checklist including actions that need to be taken by researchers to make computational steps reproducible for clinical metabolomics studies. We developed an eight-item checklist that includes criteria related to reusable data sharing and reproducible computational workflow development. We also provided recommended tools and resources to complete each item, as well as a GitHub project template to guide the process. The checklist is concise and easy to follow. Studies that follow this checklist and use recommended resources may facilitate other researchers to reproduce metabolomics results easily and efficiently.
Collapse
Affiliation(s)
- Xinsong Du
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA; (X.D.); (W.R.H.)
| | | | - Timothy J. Garrett
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, USA;
| | - Mathias Brochhausen
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| | - William R. Hogan
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA; (X.D.); (W.R.H.)
| | - Dominick J. Lemas
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA; (X.D.); (W.R.H.)
| |
Collapse
|
6
|
Lamprecht AL, Palmblad M, Ison J, Schwämmle V, Al Manir MS, Altintas I, Baker CJO, Ben Hadj Amor A, Capella-Gutierrez S, Charonyktakis P, Crusoe MR, Gil Y, Goble C, Griffin TJ, Groth P, Ienasescu H, Jagtap P, Kalaš M, Kasalica V, Khanteymoori A, Kuhn T, Mei H, Ménager H, Möller S, Richardson RA, Robert V, Soiland-Reyes S, Stevens R, Szaniszlo S, Verberne S, Verhoeven A, Wolstencroft K. Perspectives on automated composition of workflows in the life sciences. F1000Res 2021; 10:897. [PMID: 34804501 PMCID: PMC8573700 DOI: 10.12688/f1000research.54159.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/27/2021] [Indexed: 12/29/2022] Open
Abstract
Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.
Collapse
Affiliation(s)
| | - Magnus Palmblad
- Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands
| | - Jon Ison
- French Institute of Bioinformatics, 91057 Évry, France
| | | | | | - Ilkay Altintas
- University of California San Diego, La Jolla, CA, 92093, USA
| | - Christopher J. O. Baker
- University of New Brunswick, Saint John, E2L 4L5, Canada
- IPSNP Computing Inc., Saint John, E2L 4S6, Canada
| | | | | | | | | | - Yolanda Gil
- University of Southern California, Marina Del Rey, CA, 90292, USA
| | - Carole Goble
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Paul Groth
- University of Amsterdam, 1090 GH Amsterdam, The Netherlands
| | - Hans Ienasescu
- Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | - Pratik Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA
| | | | | | | | - Tobias Kuhn
- VU Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - Hailiang Mei
- Sequencing Analysis Support Core, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands
| | | | - Steffen Möller
- IBIMA, Rostock University Medical Center, 18057 Rostock, Germany
| | | | | | - Stian Soiland-Reyes
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
- Informatics Institute, University of Amsterdam, 1090 GH Amsterdam, The Netherlands
| | - Robert Stevens
- Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK
| | | | - Suzan Verberne
- Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
| | - Aswin Verhoeven
- Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands
| | - Katherine Wolstencroft
- Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands
| |
Collapse
|