1
|
Do K, Mehta S, Wagner R, Bhuming D, Rajczewski AT, Skubitz APN, Johnson JE, Griffin TJ, Jagtap PD. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease. mSphere 2024:e0079323. [PMID: 38780289 DOI: 10.1128/msphere.00793-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/17/2024] [Indexed: 05/25/2024] Open
Abstract
Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification, and prioritization of microbial proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant [to generate peptide-spectral matches (PSMs) and quantification], PepQuery2 (to verify the quality of PSMs), Unipept (for taxonomic and functional annotation), and MSstatsTMT (for statistical analysis). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies. IMPORTANCE Clinical metaproteomics has immense potential to offer functional insights into the microbiome and its contributions to human disease. However, there are numerous challenges in the metaproteomic analysis of clinical samples, including handling of very large protein sequence databases for sensitive and accurate peptide and protein identification from mass spectrometry data, as well as taxonomic and functional annotation of quantified peptides and proteins to enable interpretation of results. To address these challenges, we have developed a novel clinical metaproteomics workflow that provides customized bioinformatic identification, verification, quantification, and taxonomic and functional annotation. This bioinformatic workflow is implemented in the Galaxy ecosystem and has been used to characterize diverse clinical sample types, such as nasopharyngeal swabs and bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness and availability for use by the research community via analysis of residual fluid from cervical swabs.
Collapse
Affiliation(s)
- Katherine Do
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Reid Wagner
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, USA
| | - Dechen Bhuming
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Andrew T Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Amy P N Skubitz
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota, USA
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
2
|
Do K, Mehta S, Wagner R, Bhuming D, Rajczewski AT, Skubitz APN, Johnson JE, Griffin TJ, Jagtap PD. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.21.568121. [PMID: 38045370 PMCID: PMC10690215 DOI: 10.1101/2023.11.21.568121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, which are usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification and prioritization of microbial and host proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant (to generate peptide-spectral matches (PSMs) and quantification), PepQuery2 (to verify the quality of PSMs), and Unipept and MSstatsTMT (for taxonomy and functional annotation). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies.
Collapse
|
3
|
Wang XY, Xu YM, Lau ATY. Proteogenomics in Cancer: Then and Now. J Proteome Res 2023; 22:3103-3122. [PMID: 37725793 DOI: 10.1021/acs.jproteome.3c00196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/21/2023]
Abstract
For years, the paths of sequencing technologies and mass spectrometry have occurred in isolation, with each developing its own unique culture and expertise. These two technologies are crucial for inspecting complementary aspects of the molecular phenotype across the central dogma. Integrative multiomics strives to bridge the analysis gap among different fields to complete more comprehensive mechanisms of life events and diseases. Proteogenomics is one integrated multiomics field. Here in this review, we mainly summarize and discuss three aspects: workflow of proteogenomics, proteogenomics applications in cancer research, and the SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis of proteogenomics in cancer research. In conclusion, proteogenomics has a promising future as it clarifies the functional consequences of many unannotated genomic abnormalities or noncanonical variants and identifies driver genes and novel therapeutic targets across cancers, which would substantially accelerate the development of precision oncology.
Collapse
Affiliation(s)
- Xiu-Yun Wang
- Laboratory of Cancer Biology and Epigenetics, Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, People's Republic of China
| | - Yan-Ming Xu
- Laboratory of Cancer Biology and Epigenetics, Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, People's Republic of China
| | - Andy T Y Lau
- Laboratory of Cancer Biology and Epigenetics, Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, People's Republic of China
| |
Collapse
|
4
|
Mehta S, Bernt M, Chambers M, Fahrner M, Föll MC, Gruening B, Horro C, Johnson JE, Loux V, Rajczewski AT, Schilling O, Vandenbrouck Y, Gustafsson OJR, Thang WCM, Hyde C, Price G, Jagtap PD, Griffin TJ. A Galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 2023; 20:251-266. [PMID: 37787106 DOI: 10.1080/14789450.2023.2265062] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/06/2023] [Indexed: 10/04/2023]
Abstract
INTRODUCTION Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.
Collapse
Affiliation(s)
- Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Matthias Bernt
- Helmholtz Centre for Environmental Research - UFZ, Department Computational Biology, Leipzig, Germany
| | | | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Bjoern Gruening
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Carlos Horro
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Valentin Loux
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
- Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, Jouy-en-Josas, France
| | - Andrew T Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | | | - W C Mike Thang
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Sippy Downs, University of the Sunshine Coast, Australia
| | - Gareth Price
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
5
|
Gardner L, Kostarelos K, Mallick P, Dive C, Hadjidemetriou M. Nano-omics: nanotechnology-based multidimensional harvesting of the blood-circulating cancerome. Nat Rev Clin Oncol 2022; 19:551-561. [PMID: 35739399 DOI: 10.1038/s41571-022-00645-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/10/2022] [Indexed: 02/08/2023]
Abstract
Over the past decade, the development of 'simple' blood tests that enable cancer screening, diagnosis or monitoring and facilitate the design of personalized therapies without the need for invasive tumour biopsy sampling has been a core ambition in cancer research. Data emerging from ongoing biomarker development efforts indicate that multiple markers, used individually or as part of a multimodal panel, are required to enhance the sensitivity and specificity of assays for early stage cancer detection. The discovery of cancer-associated molecular alterations that are reflected in blood at multiple dimensions (genome, epigenome, transcriptome, proteome and metabolome) and integration of the resultant multi-omics data have the potential to uncover novel biomarkers as well as to further elucidate the underlying molecular pathways. Herein, we review key advances in multi-omics liquid biopsy approaches and introduce the 'nano-omics' paradigm: the development and utilization of nanotechnology tools for the enrichment and subsequent omics analysis of the blood-circulating cancerome.
Collapse
Affiliation(s)
- Lois Gardner
- Nanomedicine Lab, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK
- Cancer Research UK Manchester Institute Cancer Biomarker Centre, The University of Manchester, Manchester, UK
| | - Kostas Kostarelos
- Nanomedicine Lab, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK
- Catalan Institute of Nanoscience & Nanotechnology (ICN2), UAB Campus, Barcelona, Spain
| | - Parag Mallick
- Canary Center at Stanford for Cancer Early Detection, Stanford University, California, USA
| | - Caroline Dive
- Cancer Research UK Manchester Institute Cancer Biomarker Centre, The University of Manchester, Manchester, UK
| | - Marilena Hadjidemetriou
- Nanomedicine Lab, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK.
| |
Collapse
|
6
|
Rajczewski AT, Han Q, Mehta S, Kumar P, Jagtap PD, Knutson CG, Fox JG, Tretyakova NY, Griffin TJ. Quantitative Proteogenomic Characterization of Inflamed Murine Colon Tissue Using an Integrated Discovery, Verification, and Validation Proteogenomic Workflow. Proteomes 2022; 10:proteomes10020011. [PMID: 35466239 PMCID: PMC9036229 DOI: 10.3390/proteomes10020011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 03/27/2022] [Accepted: 04/07/2022] [Indexed: 11/24/2022] Open
Abstract
Chronic inflammation of the colon causes genomic and/or transcriptomic events, which can lead to expression of non-canonical protein sequences contributing to oncogenesis. To better understand these mechanisms, Rag2−/−Il10−/− mice were infected with Helicobacter hepaticus to induce chronic inflammation of the cecum and the colon. Transcriptomic data from harvested proximal colon samples were used to generate a customized FASTA database containing non-canonical protein sequences. Using a proteogenomic approach, mass spectrometry data for proximal colon proteins were searched against this custom FASTA database using the Galaxy for Proteomics (Galaxy-P) platform. In addition to the increased abundance in inflammatory response proteins, we also discovered several non-canonical peptide sequences derived from unique proteoforms. We confirmed the veracity of these novel sequences using an automated bioinformatics verification workflow with targeted MS-based assays for peptide validation. Our bioinformatics discovery workflow identified 235 putative non-canonical peptide sequences, of which 58 were verified with high confidence and 39 were validated in targeted proteomics assays. This study provides insights into challenges faced when identifying non-canonical peptides using a proteogenomics approach and demonstrates an integrated workflow addressing these challenges. Our bioinformatic discovery and verification workflow is publicly available and accessible via the Galaxy platform and should be valuable in non-canonical peptide identification using proteogenomics.
Collapse
Affiliation(s)
- Andrew T. Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Qiyuan Han
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Charles G. Knutson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; (C.G.K.); (J.G.F.)
| | - James G. Fox
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; (C.G.K.); (J.G.F.)
| | - Natalia Y. Tretyakova
- Department of Medicinal Chemistry, the Masonic Cancer Center, University of Minnesota, Minneapolis, MN 55455, USA;
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
- Correspondence:
| |
Collapse
|
7
|
Karimi MR, Karimi AH, Abolmaali S, Sadeghi M, Schmitz U. Prospects and challenges of cancer systems medicine: from genes to disease networks. Brief Bioinform 2021; 23:6361045. [PMID: 34471925 PMCID: PMC8769701 DOI: 10.1093/bib/bbab343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 12/20/2022] Open
Abstract
It is becoming evident that holistic perspectives toward cancer are crucial in deciphering the overwhelming complexity of tumors. Single-layer analysis of genome-wide data has greatly contributed to our understanding of cellular systems and their perturbations. However, fundamental gaps in our knowledge persist and hamper the design of effective interventions. It is becoming more apparent than ever, that cancer should not only be viewed as a disease of the genome but as a disease of the cellular system. Integrative multilayer approaches are emerging as vigorous assets in our endeavors to achieve systemic views on cancer biology. Herein, we provide a comprehensive review of the approaches, methods and technologies that can serve to achieve systemic perspectives of cancer. We start with genome-wide single-layer approaches of omics analyses of cellular systems and move on to multilayer integrative approaches in which in-depth descriptions of proteogenomics and network-based data analysis are provided. Proteogenomics is a remarkable example of how the integration of multiple levels of information can reduce our blind spots and increase the accuracy and reliability of our interpretations and network-based data analysis is a major approach for data interpretation and a robust scaffold for data integration and modeling. Overall, this review aims to increase cross-field awareness of the approaches and challenges regarding the omics-based study of cancer and to facilitate the necessary shift toward holistic approaches.
Collapse
Affiliation(s)
| | | | | | - Mehdi Sadeghi
- Department of Cell & Molecular Biology, Semnan University, Semnan, Iran
| | - Ulf Schmitz
- Department of Molecular & Cell Biology, James Cook University, Townsville, QLD 4811, Australia
| |
Collapse
|
8
|
Tsang O, Wong JWH. Proteogenomic interrogation of cancer cell lines: an overview of the field. Expert Rev Proteomics 2021; 18:221-232. [PMID: 33877947 DOI: 10.1080/14789450.2021.1914594] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Introduction: Cancer cell lines (CCLs) have been a major resource for cancer research. Over the past couple of decades, they have been instrumental in omic profiling method development and as model systems to generate new knowledge in cell and cancer biology. More recently, with the increasing amount of genomic, transcriptomic and proteomic data being generated in hundreds of CCLs, there is growing potential for integrative proteogenomic data analyses to be performed.Areas covered: In this review, we first describe the most commonly used proteome profiling methods in CCLs. We then discuss how these proteomics data can be integrated with genomics data for proteogenomics analyses. Finally, we highlight some of the recent biological discoveries that have arisen from proteogenomics analyses of CCLs.Expert opinion: Protegeonomics analyses of CCLs have so far enabled the discovery of novel proteins and proteoforms. It has also improved our understanding of biological processes including post-transcriptional regulation of protein abundance and the presentation of antigens by major histocompatibility complex alleles. With proteomics data to be generated in hundreds to thousands of CCLs in coming years, there will be further potential for large-scale proteogenomics analyses and data integration with the phenotypically well-characterized CCLs.
Collapse
Affiliation(s)
- Olson Tsang
- Centre for PanorOmic Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR
| | - Jason W H Wong
- Centre for PanorOmic Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR.,School of Biomedical Sciences, The University of Hong Kong, Pokfulam, Hong Kong SAR
| |
Collapse
|
9
|
Sajulga R, Easterly C, Riffle M, Mesuere B, Muth T, Mehta S, Kumar P, Johnson J, Gruening BA, Schiebenhoefer H, Kolmeder CA, Fuchs S, Nunn BL, Rudney J, Griffin TJ, Jagtap PD. Survey of metaproteomics software tools for functional microbiome analysis. PLoS One 2020; 15:e0241503. [PMID: 33170893 PMCID: PMC7654790 DOI: 10.1371/journal.pone.0241503] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 10/15/2020] [Indexed: 11/23/2022] Open
Abstract
To gain a thorough appreciation of microbiome dynamics, researchers characterize the functional relevance of expressed microbial genes or proteins. This can be accomplished through metaproteomics, which characterizes the protein expression of microbiomes. Several software tools exist for analyzing microbiomes at the functional level by measuring their combined proteome-level response to environmental perturbations. In this survey, we explore the performance of six available tools, to enable researchers to make informed decisions regarding software choice based on their research goals. Tandem mass spectrometry-based proteomic data obtained from dental caries plaque samples grown with and without sucrose in paired biofilm reactors were used as representative data for this evaluation. Microbial peptides from one sample pair were identified by the X! tandem search algorithm via SearchGUI and subjected to functional analysis using software tools including eggNOG-mapper, MEGAN5, MetaGOmics, MetaProteomeAnalyzer (MPA), ProPHAnE, and Unipept to generate functional annotation through Gene Ontology (GO) terms. Among these software tools, notable differences in functional annotation were detected after comparing differentially expressed protein functional groups. Based on the generated GO terms of these tools we performed a peptide-level comparison to evaluate the quality of their functional annotations. A BLAST analysis against the NCBI non-redundant database revealed that the sensitivity and specificity of functional annotation varied between tools. For example, eggNOG-mapper mapped to the most number of GO terms, while Unipept generated more accurate GO terms. Based on our evaluation, metaproteomics researchers can choose the software according to their analytical needs and developers can use the resulting feedback to further optimize their algorithms. To make more of these tools accessible via scalable metaproteomics workflows, eggNOG-mapper and Unipept 4.0 were incorporated into the Galaxy platform.
Collapse
Affiliation(s)
- Ray Sajulga
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Caleb Easterly
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Michael Riffle
- University of Washington, Seattle, Washington, United States of America
| | | | - Thilo Muth
- Federal Institute for Materials Research and Testing, Berlin, Germany
| | - Subina Mehta
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Praveen Kumar
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - James Johnson
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | | | | | | | | | - Brook L. Nunn
- University of Washington, Seattle, Washington, United States of America
| | - Joel Rudney
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Timothy J. Griffin
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Pratik D. Jagtap
- University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
10
|
Precursor Intensity-Based Label-Free Quantification Software Tools for Proteomic and Multi-Omic Analysis within the Galaxy Platform. Proteomes 2020; 8:proteomes8030015. [PMID: 32650610 PMCID: PMC7563855 DOI: 10.3390/proteomes8030015] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 07/06/2020] [Accepted: 07/07/2020] [Indexed: 01/15/2023] Open
Abstract
For mass spectrometry-based peptide and protein quantification, label-free quantification (LFQ) based on precursor mass peak (MS1) intensities is considered reliable due to its dynamic range, reproducibility, and accuracy. LFQ enables peptide-level quantitation, which is useful in proteomics (analyzing peptides carrying post-translational modifications) and multi-omics studies such as metaproteomics (analyzing taxon-specific microbial peptides) and proteogenomics (analyzing non-canonical sequences). Bioinformatics workflows accessible via the Galaxy platform have proven useful for analysis of such complex multi-omic studies. However, workflows within the Galaxy platform have lacked well-tested LFQ tools. In this study, we have evaluated moFF and FlashLFQ, two open-source LFQ tools, and implemented them within the Galaxy platform to offer access and use via established workflows. Through rigorous testing and communication with the tool developers, we have optimized the performance of each tool. Software features evaluated include: (a) match-between-runs (MBR); (b) using multiple file-formats as input for improved quantification; (c) use of containers and/or conda packages; (d) parameters needed for analyzing large datasets; and (e) optimization and validation of software performance. This work establishes a process for software implementation, optimization, and validation, and offers access to two robust software tools for LFQ-based analysis within the Galaxy platform.
Collapse
|
11
|
McGowan T, Johnson JE, Kumar P, Sajulga R, Mehta S, Jagtap PD, Griffin TJ. Multi-omics Visualization Platform: An extensible Galaxy plug-in for multi-omics data visualization and exploration. Gigascience 2020; 9:giaa025. [PMID: 32236523 PMCID: PMC7102281 DOI: 10.1093/gigascience/giaa025] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 02/13/2020] [Accepted: 02/24/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Proteogenomics integrates genomics, transcriptomics, and mass spectrometry (MS)-based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants. Proteogenomic data analysis requires integration of disparate 'omic software tools, as well as customized tools to view and interpret results. The flexible Galaxy platform has proven valuable for proteogenomic data analysis. Here, we describe a novel Multi-omics Visualization Platform (MVP) for organizing, visualizing, and exploring proteogenomic results, adding a critically needed tool for data exploration and interpretation. FINDINGS MVP is built as an HTML Galaxy plug-in, primarily based on JavaScript. Via the Galaxy API, MVP uses SQLite databases as input-a custom data type (mzSQLite) containing MS-based peptide identification information, a variant annotation table, and a coding sequence table. Users can interactively filter identified peptides based on sequence and data quality metrics, view annotated peptide MS data, and visualize protein-level information, along with genomic coordinates. Peptides that pass the user-defined thresholds can be sent back to Galaxy via the API for further analysis; processed data and visualizations can also be saved and shared. MVP leverages the Integrated Genomics Viewer JavaScript framework, enabling interactive visualization of peptides and corresponding transcript and genomic coding information within the MVP interface. CONCLUSIONS MVP provides a powerful, extensible platform for automated, interactive visualization of proteogenomic results within the Galaxy environment, adding a unique and critically needed tool for empowering exploration and interpretation of results. The platform is extensible, providing a basis for further development of new functionalities for proteogenomic data visualization.
Collapse
Affiliation(s)
- Thomas McGowan
- Minnesota Supercomputing Institute, University of Minnesota, 599 Walter Library, 117 Pleasant Street SE, Minneapolis, MN 55455, USA
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, 599 Walter Library, 117 Pleasant Street SE, Minneapolis, MN 55455, USA
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
- Bioinformatics and Computational Biology program, University of Minnesota-Rochester, 111 South Broadway, Suite 300, Rochester, MN 55904, USA
| | - Ray Sajulga
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, 6–155 Jackson Hall, 321 Church Street SE, Minneapolis, MN 55455, USA
| |
Collapse
|
12
|
Hulstaert N, Shofstahl J, Sachsenberg T, Walzer M, Barsnes H, Martens L, Perez-Riverol Y. ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion. J Proteome Res 2019; 19:537-542. [PMID: 31755270 DOI: 10.1021/acs.jproteome.9b00328] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The field of computational proteomics is approaching the big data age, driven both by a continuous growth in the number of samples analyzed per experiment as well as by the growing amount of data obtained in each analytical run. In order to process these large amounts of data, it is increasingly necessary to use elastic compute resources such as Linux-based cluster environments and cloud infrastructures. Unfortunately, the vast majority of cross-platform proteomics tools are not able to operate directly on the proprietary formats generated by the diverse mass spectrometers. Here, we present ThermoRawFileParser, an open-source, cross-platform tool that converts Thermo RAW files into open file formats such as MGF and the HUPO-PSI standard file format mzML. To ensure the broadest possible availability and to increase integration capabilities with popular workflow systems such as Galaxy or Nextflow, we have also built Conda package and BioContainers container around ThermoRawFileParser. In addition, we implemented a user-friendly interface (ThermoRawFileParserGUI) for those users not familiar with command-line tools. Finally, we performed a benchmark of ThermoRawFileParser and msconvert to verify that the converted mzML files contain reliable quantitative results.
Collapse
Affiliation(s)
- Niels Hulstaert
- VIB-UGent Center for Medical Biotechnology, VIB , Ghent B-9000 , Belgium.,Department of Biomolecular Medicine , Ghent University , Ghent B-9000 , Belgium
| | - Jim Shofstahl
- Thermo Fisher Scientific , 355 River Oaks Parkway , San Jose , California 95134 , United States
| | - Timo Sachsenberg
- Applied Bioinformatics, Department for Computer Science , University of Tuebingen , Sand 14 , 72076 Tuebingen , Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| | - Harald Barsnes
- Computational Biology Unit (CBU), Department of Informatics , University of Bergen , Bergen 5020 , Norway.,Proteomics Unit (PROBE), Department of Biomedicine , University of Bergen , Bergen 5020 , Norway
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB , Ghent B-9000 , Belgium.,Department of Biomolecular Medicine , Ghent University , Ghent B-9000 , Belgium
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory , European Bioinformatics Institute (EMBL-EBI) , Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD , United Kingdom
| |
Collapse
|
13
|
Hubler SL, Kumar P, Mehta S, Easterly C, Johnson JE, Jagtap PD, Griffin TJ. Challenges in Peptide-Spectrum Matching: A Robust and Reproducible Statistical Framework for Removing Low-Accuracy, High-Scoring Hits. J Proteome Res 2019; 19:161-173. [DOI: 10.1021/acs.jproteome.9b00478] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
14
|
Ang MY, Low TY, Lee PY, Wan Mohamad Nazarie WF, Guryev V, Jamal R. Proteogenomics: From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine. Clin Chim Acta 2019; 498:38-46. [DOI: 10.1016/j.cca.2019.08.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 08/13/2019] [Accepted: 08/13/2019] [Indexed: 12/14/2022]
|
15
|
González-Gomariz J, Guruceaga E, López-Sánchez M, Segura V. Proteogenomics in the context of the Human Proteome Project (HPP). Expert Rev Proteomics 2019; 16:267-275. [PMID: 30654666 DOI: 10.1080/14789450.2019.1571916] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
INTRODUCTION The technological and scientific progress performed in the Human Proteome Project (HPP) has provided to the scientific community a new set of experimental and bioinformatic methods in the challenging field of shotgun and SRM/MRM-based Proteomics. The requirements for a protein to be considered experimentally validated are now well-established, and the information about the human proteome is available in the neXtProt database, while targeted proteomic assays are stored in SRMAtlas. However, the study of the missing proteins continues being an outstanding issue. Areas covered: This review is focused on the implementation of proteogenomic methods designed to improve the detection and validation of the missing proteins. The evolution of the methodological strategies based on the combination of different omic technologies and the use of huge publicly available datasets is shown taking the Chromosome 16 Consortium as reference. Expert commentary: Proteogenomics and other strategies of data analysis implemented within the C-HPP initiative could be used as guidance to complete in a near future the catalog of the human proteins. Besides, in the next years, we will probably witness their use in the B/D-HPP initiative to go a step forward on the implications of the proteins in the human biology and disease.
Collapse
Affiliation(s)
- José González-Gomariz
- a Bioinformatics Platform, Center for Applied Medical Research , University of Navarra , Pamplona , Spain.,b IdiSNA , Navarra Institute for Health Research , Pamplona , Spain
| | - Elizabeth Guruceaga
- a Bioinformatics Platform, Center for Applied Medical Research , University of Navarra , Pamplona , Spain.,b IdiSNA , Navarra Institute for Health Research , Pamplona , Spain
| | - Macarena López-Sánchez
- a Bioinformatics Platform, Center for Applied Medical Research , University of Navarra , Pamplona , Spain
| | - Victor Segura
- a Bioinformatics Platform, Center for Applied Medical Research , University of Navarra , Pamplona , Spain.,b IdiSNA , Navarra Institute for Health Research , Pamplona , Spain
| |
Collapse
|
16
|
Guillot L, Delage L, Viari A, Vandenbrouck Y, Com E, Ritter A, Lavigne R, Marie D, Peterlongo P, Potin P, Pineau C. Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes. BMC Genomics 2019; 20:56. [PMID: 30654742 PMCID: PMC6337836 DOI: 10.1186/s12864-019-5431-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 01/03/2019] [Indexed: 01/02/2023] Open
Abstract
Background Accurate structural annotation of genomes is still a challenge, despite the progress made over the past decade. The prediction of gene structure remains difficult, especially for eukaryotic species, and is often erroneous and incomplete. We used a proteogenomics strategy, taking advantage of the combination of proteomics datasets and bioinformatics tools, to identify novel protein coding-genes and splice isoforms, assign correct start sites, and validate predicted exons and genes. Results Our proteogenomics workflow, Peptimapper, was applied to the genome annotation of Ectocarpus sp., a key reference genome for both the brown algal lineage and stramenopiles. We generated proteomics data from various life cycle stages of Ectocarpus sp. strains and sub-cellular fractions using a shotgun approach. First, we directly generated peptide sequence tags (PSTs) from the proteomics data. Second, we mapped PSTs onto the translated genomic sequence. Closely located hits (i.e., PSTs locations on the genome) were then clustered to detect potential coding regions based on parameters optimized for the organism. Third, we evaluated each cluster and compared it to gene predictions from existing conventional genome annotation approaches. Finally, we integrated cluster locations into GFF files to use a genome viewer. We identified two potential novel genes, a ribosomal protein L22 and an aryl sulfotransferase and corrected the gene structure of a dihydrolipoamide acetyltransferase. We experimentally validated the results by RT-PCR and using transcriptomics data. Conclusions Peptimapper is a complementary tool for the expert annotation of genomes. It is suitable for any organism and is distributed through a Docker image available on two public bioinformatics docker repositories: Docker Hub and BioShaDock. This workflow is also accessible through the Galaxy framework and for use by non-computer scientists at https://galaxy.protim.eu. Data are available via ProteomeXchange under identifier PXD010618. Electronic supplementary material The online version of this article (10.1186/s12864-019-5431-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laetitia Guillot
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
| | - Ludovic Delage
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
| | - Alain Viari
- INRIA Grenoble-Rhône-Alpes, F-38330, Montbonnot-Saint-Martin, France
| | - Yves Vandenbrouck
- University Grenoble Alpes, CEA, Inserm, BIG-BGE, 38000, Grenoble, France
| | - Emmanuelle Com
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
| | - Andrés Ritter
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France.,Present address: Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
| | - Régis Lavigne
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
| | - Dominique Marie
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
| | | | - Philippe Potin
- Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
| | - Charles Pineau
- Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France. .,Protim, Univ Rennes, F-35042, Rennes cedex, France.
| |
Collapse
|
17
|
Kumar P, Panigrahi P, Johnson J, Weber WJ, Mehta S, Sajulga R, Easterly C, Crooker BA, Heydarian M, Anamika K, Griffin TJ, Jagtap PD. QuanTP: A Software Resource for Quantitative Proteo-Transcriptomic Comparative Data Analysis and Informatics. J Proteome Res 2018; 18:782-790. [DOI: 10.1021/acs.jproteome.8b00727] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Praveen Kumar
- Bioinformatics and Computational Biology Program, University of Minnesota-Rochester, Rochester, Minnesota 55904, United States
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | | | - James Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Wanda J. Weber
- Department of Animal Science, University of Minnesota, St. Paul, Minnesota 55108, United States
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Ray Sajulga
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Caleb Easterly
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Brian A. Crooker
- Department of Animal Science, University of Minnesota, St. Paul, Minnesota 55108, United States
| | - Mohammad Heydarian
- Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Krishanpal Anamika
- LABS, Persistent Systems, Aryabhata-Pingala, Erandwane, Pune 411004, India
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
18
|
Johnson JE, Kumar P, Easterly C, Esler M, Mehta S, Eschenlauer AC, Hegeman AD, Jagtap PD, Griffin TJ. Improve your Galaxy text life: The Query Tabular Tool. F1000Res 2018; 7:1604. [PMID: 30519459 PMCID: PMC6248266 DOI: 10.12688/f1000research.16450.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/02/2019] [Indexed: 11/20/2022] Open
Abstract
Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different ‘omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of even advanced users and developers. For users with some SQL knowledge, these text operations could be combined into single, concise query on a relational database. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs.
Collapse
Affiliation(s)
- James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA.,Bioinformatics and Computational Biology Program, University of Minnesota-Rochester, Rochester, MN, 55904, USA
| | - Caleb Easterly
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| | - Mark Esler
- Department of Horticulture, University of Minnesota, St. Paul, MN, 55108, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| | - Arthur C Eschenlauer
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA.,Department of Horticulture, University of Minnesota, St. Paul, MN, 55108, USA
| | - Adrian D Hegeman
- Department of Horticulture, University of Minnesota, St. Paul, MN, 55108, USA
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| |
Collapse
|
19
|
Johnson JE, Kumar P, Easterly C, Esler M, Mehta S, Eschenlauer AC, Hegeman AD, Jagtap PD, Griffin TJ. Improve your Galaxy text life: The Query Tabular Tool. F1000Res 2018; 7:1604. [PMID: 30519459 PMCID: PMC6248266 DOI: 10.12688/f1000research.16450.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/02/2019] [Indexed: 10/04/2023] Open
Abstract
Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different 'omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of even advanced users and developers. For users with some SQL knowledge, these text operations could be combined into single, concise query on a relational database. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs.
Collapse
Affiliation(s)
- James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
- Bioinformatics and Computational Biology Program, University of Minnesota-Rochester, Rochester, MN, 55904, USA
| | - Caleb Easterly
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| | - Mark Esler
- Department of Horticulture, University of Minnesota, St. Paul, MN, 55108, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| | - Arthur C. Eschenlauer
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
- Department of Horticulture, University of Minnesota, St. Paul, MN, 55108, USA
| | - Adrian D. Hegeman
- Department of Horticulture, University of Minnesota, St. Paul, MN, 55108, USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, 55455, USA
| |
Collapse
|
20
|
Sajulga R, Mehta S, Kumar P, Johnson JE, Guerrero CR, Ryan MC, Karchin R, Jagtap PD, Griffin TJ. Bridging the Chromosome-centric and Biology/Disease-driven Human Proteome Projects: Accessible and Automated Tools for Interpreting the Biological and Pathological Impact of Protein Sequence Variants Detected via Proteogenomics. J Proteome Res 2018; 17:4329-4336. [DOI: 10.1021/acs.jproteome.8b00404] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Ray Sajulga
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
- Bioinformatics and Computational Biology Program, University of Minnesota-Rochester, Rochester, Minnesota 55904, United States
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Candace R. Guerrero
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Michael C. Ryan
- In-Silico Solutions, Falls Church, Virginia 22043, United States
| | - Rachel Karchin
- Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, United States
- The Institute for Computational Medicine, The Johns Hopkins University, Baltimore, Maryland 21218, United States
- Department of Oncology, The Johns Hopkins University School of Medicine, Baltimore, Maryland 21217, United States
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
21
|
Barsnes H, Vaudel M. SearchGUI: A Highly Adaptable Common Interface for Proteomics Search and de Novo Engines. J Proteome Res 2018; 17:2552-2555. [PMID: 29774740 DOI: 10.1021/acs.jproteome.8b00175] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Mass-spectrometry-based proteomics has become the standard approach for identifying and quantifying proteins. A vital step consists of analyzing experimentally generated mass spectra to identify the underlying peptide sequences for later mapping to the originating proteins. We here present the latest developments in SearchGUI, a common open-source interface for the most frequently used freely available proteomics search and de novo engines that has evolved into a central component in numerous bioinformatics workflows.
Collapse
Affiliation(s)
| | - Marc Vaudel
- Center for Medical Genetics and Molecular Medicine , Haukeland University Hospital , 5021 Bergen , Norway
| |
Collapse
|