Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chambers MC, Jagtap PD, Johnson JE, McGowan T, Kumar P, Onsongo G, Guerrero CR, Barsnes H, Vaudel M, Martens L, Grüning B, Cooke IR, Heydarian M, Reddy KL, Griffin TJ. An Accessible Proteogenomics Informatics Resource for Cancer Researchers. Cancer Res 2017;77:e43-e46. [PMID: 29092937 DOI: 10.1158/0008-5472.can-17-0331] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Revised: 04/07/2017] [Accepted: 06/30/2017] [Indexed: 11/16/2022]

For:	Chambers MC, Jagtap PD, Johnson JE, McGowan T, Kumar P, Onsongo G, Guerrero CR, Barsnes H, Vaudel M, Martens L, Grüning B, Cooke IR, Heydarian M, Reddy KL, Griffin TJ. An Accessible Proteogenomics Informatics Resource for Cancer Researchers. Cancer Res 2017;77:e43-e46. [PMID: 29092937 DOI: 10.1158/0008-5472.can-17-0331] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Revised: 04/07/2017] [Accepted: 06/30/2017] [Indexed: 11/16/2022]

Number

Cited by Other Article(s)

Do K, Mehta S, Wagner R, Bhuming D, Rajczewski AT, Skubitz APN, Johnson JE, Griffin TJ, Jagtap PD. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease. mSphere 2024:e0079323. [PMID: 38780289 DOI: 10.1128/msphere.00793-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/17/2024] [Indexed: 05/25/2024] Open

Abstract

Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification, and prioritization of microbial proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant [to generate peptide-spectral matches (PSMs) and quantification], PepQuery2 (to verify the quality of PSMs), Unipept (for taxonomic and functional annotation), and MSstatsTMT (for statistical analysis). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies.

IMPORTANCE

Clinical metaproteomics has immense potential to offer functional insights into the microbiome and its contributions to human disease. However, there are numerous challenges in the metaproteomic analysis of clinical samples, including handling of very large protein sequence databases for sensitive and accurate peptide and protein identification from mass spectrometry data, as well as taxonomic and functional annotation of quantified peptides and proteins to enable interpretation of results. To address these challenges, we have developed a novel clinical metaproteomics workflow that provides customized bioinformatic identification, verification, quantification, and taxonomic and functional annotation. This bioinformatic workflow is implemented in the Galaxy ecosystem and has been used to characterize diverse clinical sample types, such as nasopharyngeal swabs and bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness and availability for use by the research community via analysis of residual fluid from cervical swabs.

Collapse

Do K, Mehta S, Wagner R, Bhuming D, Rajczewski AT, Skubitz APN, Johnson JE, Griffin TJ, Jagtap PD. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.21.568121. [PMID: 38045370 PMCID: PMC10690215 DOI: 10.1101/2023.11.21.568121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]

Wang XY, Xu YM, Lau ATY. Proteogenomics in Cancer: Then and Now. J Proteome Res 2023;22:3103-3122. [PMID: 37725793 DOI: 10.1021/acs.jproteome.3c00196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/21/2023]

Mehta S, Bernt M, Chambers M, Fahrner M, Föll MC, Gruening B, Horro C, Johnson JE, Loux V, Rajczewski AT, Schilling O, Vandenbrouck Y, Gustafsson OJR, Thang WCM, Hyde C, Price G, Jagtap PD, Griffin TJ. A Galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 2023;20:251-266. [PMID: 37787106 DOI: 10.1080/14789450.2023.2265062] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/06/2023] [Indexed: 10/04/2023]

Affiliation(s)

Subina Mehta Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
Matthias Bernt Helmholtz Centre for Environmental Research - UFZ, Department Computational Biology, Leipzig, Germany
Matthew Chambers Bioinformatics Consultant, Stamford, CT, USA
Matthias Fahrner Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
Melanie Christine Föll Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
Bjoern Gruening Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
Carlos Horro Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
James E Johnson Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
Valentin Loux Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, Jouy-en-Josas, France
Andrew T Rajczewski Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
Oliver Schilling Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
Yves Vandenbrouck Proteomics French Infrastructure, CEA, Grenoble, France
Ove Johan Ragnar Gustafsson Australian BioCommons, University of Melbourne, Melbourne, Australia
W C Mike Thang Queensland Cyber Infrastructure Foundation (QCIF), Australia Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
Cameron Hyde Queensland Cyber Infrastructure Foundation (QCIF), Australia Sippy Downs, University of the Sunshine Coast, Australia
Gareth Price Queensland Cyber Infrastructure Foundation (QCIF), Australia Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
Pratik D Jagtap Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
Timothy J Griffin Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA

Collapse

Gardner L, Kostarelos K, Mallick P, Dive C, Hadjidemetriou M. Nano-omics: nanotechnology-based multidimensional harvesting of the blood-circulating cancerome. Nat Rev Clin Oncol 2022;19:551-561. [PMID: 35739399 DOI: 10.1038/s41571-022-00645-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/10/2022] [Indexed: 02/08/2023]

Rajczewski AT, Han Q, Mehta S, Kumar P, Jagtap PD, Knutson CG, Fox JG, Tretyakova NY, Griffin TJ. Quantitative Proteogenomic Characterization of Inflamed Murine Colon Tissue Using an Integrated Discovery, Verification, and Validation Proteogenomic Workflow. Proteomes 2022;10:proteomes10020011. [PMID: 35466239 PMCID: PMC9036229 DOI: 10.3390/proteomes10020011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 03/27/2022] [Accepted: 04/07/2022] [Indexed: 11/24/2022] Open

Karimi MR, Karimi AH, Abolmaali S, Sadeghi M, Schmitz U. Prospects and challenges of cancer systems medicine: from genes to disease networks. Brief Bioinform 2021;23:6361045. [PMID: 34471925 PMCID: PMC8769701 DOI: 10.1093/bib/bbab343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 12/20/2022] Open

Tsang O, Wong JWH. Proteogenomic interrogation of cancer cell lines: an overview of the field. Expert Rev Proteomics 2021;18:221-232. [PMID: 33877947 DOI: 10.1080/14789450.2021.1914594] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Sajulga R, Easterly C, Riffle M, Mesuere B, Muth T, Mehta S, Kumar P, Johnson J, Gruening BA, Schiebenhoefer H, Kolmeder CA, Fuchs S, Nunn BL, Rudney J, Griffin TJ, Jagtap PD. Survey of metaproteomics software tools for functional microbiome analysis. PLoS One 2020;15:e0241503. [PMID: 33170893 PMCID: PMC7654790 DOI: 10.1371/journal.pone.0241503] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 10/15/2020] [Indexed: 11/23/2022] Open

Abstract

To gain a thorough appreciation of microbiome dynamics, researchers characterize the functional relevance of expressed microbial genes or proteins. This can be accomplished through metaproteomics, which characterizes the protein expression of microbiomes. Several software tools exist for analyzing microbiomes at the functional level by measuring their combined proteome-level response to environmental perturbations. In this survey, we explore the performance of six available tools, to enable researchers to make informed decisions regarding software choice based on their research goals. Tandem mass spectrometry-based proteomic data obtained from dental caries plaque samples grown with and without sucrose in paired biofilm reactors were used as representative data for this evaluation. Microbial peptides from one sample pair were identified by the X! tandem search algorithm via SearchGUI and subjected to functional analysis using software tools including eggNOG-mapper, MEGAN5, MetaGOmics, MetaProteomeAnalyzer (MPA), ProPHAnE, and Unipept to generate functional annotation through Gene Ontology (GO) terms. Among these software tools, notable differences in functional annotation were detected after comparing differentially expressed protein functional groups. Based on the generated GO terms of these tools we performed a peptide-level comparison to evaluate the quality of their functional annotations. A BLAST analysis against the NCBI non-redundant database revealed that the sensitivity and specificity of functional annotation varied between tools. For example, eggNOG-mapper mapped to the most number of GO terms, while Unipept generated more accurate GO terms. Based on our evaluation, metaproteomics researchers can choose the software according to their analytical needs and developers can use the resulting feedback to further optimize their algorithms. To make more of these tools accessible via scalable metaproteomics workflows, eggNOG-mapper and Unipept 4.0 were incorporated into the Galaxy platform.

Collapse

Precursor Intensity-Based Label-Free Quantification Software Tools for Proteomic and Multi-Omic Analysis within the Galaxy Platform. Proteomes 2020;8:proteomes8030015. [PMID: 32650610 PMCID: PMC7563855 DOI: 10.3390/proteomes8030015] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 07/06/2020] [Accepted: 07/07/2020] [Indexed: 01/15/2023] Open

McGowan T, Johnson JE, Kumar P, Sajulga R, Mehta S, Jagtap PD, Griffin TJ. Multi-omics Visualization Platform: An extensible Galaxy plug-in for multi-omics data visualization and exploration. Gigascience 2020;9:giaa025. [PMID: 32236523 PMCID: PMC7102281 DOI: 10.1093/gigascience/giaa025] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 02/13/2020] [Accepted: 02/24/2020] [Indexed: 12/22/2022] Open

Abstract

BACKGROUND

Proteogenomics integrates genomics, transcriptomics, and mass spectrometry (MS)-based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants. Proteogenomic data analysis requires integration of disparate 'omic software tools, as well as customized tools to view and interpret results. The flexible Galaxy platform has proven valuable for proteogenomic data analysis. Here, we describe a novel Multi-omics Visualization Platform (MVP) for organizing, visualizing, and exploring proteogenomic results, adding a critically needed tool for data exploration and interpretation.

FINDINGS

MVP is built as an HTML Galaxy plug-in, primarily based on JavaScript. Via the Galaxy API, MVP uses SQLite databases as input-a custom data type (mzSQLite) containing MS-based peptide identification information, a variant annotation table, and a coding sequence table. Users can interactively filter identified peptides based on sequence and data quality metrics, view annotated peptide MS data, and visualize protein-level information, along with genomic coordinates. Peptides that pass the user-defined thresholds can be sent back to Galaxy via the API for further analysis; processed data and visualizations can also be saved and shared. MVP leverages the Integrated Genomics Viewer JavaScript framework, enabling interactive visualization of peptides and corresponding transcript and genomic coding information within the MVP interface.

CONCLUSIONS

MVP provides a powerful, extensible platform for automated, interactive visualization of proteogenomic results within the Galaxy environment, adding a unique and critically needed tool for empowering exploration and interpretation of results. The platform is extensible, providing a basis for further development of new functionalities for proteogenomic data visualization.

Collapse

Hulstaert N, Shofstahl J, Sachsenberg T, Walzer M, Barsnes H, Martens L, Perez-Riverol Y. ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion. J Proteome Res 2019;19:537-542. [PMID: 31755270 DOI: 10.1021/acs.jproteome.9b00328] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Hubler SL, Kumar P, Mehta S, Easterly C, Johnson JE, Jagtap PD, Griffin TJ. Challenges in Peptide-Spectrum Matching: A Robust and Reproducible Statistical Framework for Removing Low-Accuracy, High-Scoring Hits. J Proteome Res 2019;19:161-173. [DOI: 10.1021/acs.jproteome.9b00478] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Ang MY, Low TY, Lee PY, Wan Mohamad Nazarie WF, Guryev V, Jamal R. Proteogenomics: From next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine. Clin Chim Acta 2019;498:38-46. [DOI: 10.1016/j.cca.2019.08.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 08/13/2019] [Accepted: 08/13/2019] [Indexed: 12/14/2022]

González-Gomariz J, Guruceaga E, López-Sánchez M, Segura V. Proteogenomics in the context of the Human Proteome Project (HPP). Expert Rev Proteomics 2019;16:267-275. [PMID: 30654666 DOI: 10.1080/14789450.2019.1571916] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Guillot L, Delage L, Viari A, Vandenbrouck Y, Com E, Ritter A, Lavigne R, Marie D, Peterlongo P, Potin P, Pineau C. Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes. BMC Genomics 2019;20:56. [PMID: 30654742 PMCID: PMC6337836 DOI: 10.1186/s12864-019-5431-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 01/03/2019] [Indexed: 01/02/2023] Open

Abstract

Background

Accurate structural annotation of genomes is still a challenge, despite the progress made over the past decade. The prediction of gene structure remains difficult, especially for eukaryotic species, and is often erroneous and incomplete. We used a proteogenomics strategy, taking advantage of the combination of proteomics datasets and bioinformatics tools, to identify novel protein coding-genes and splice isoforms, assign correct start sites, and validate predicted exons and genes.

Results

Our proteogenomics workflow, Peptimapper, was applied to the genome annotation of Ectocarpus sp., a key reference genome for both the brown algal lineage and stramenopiles. We generated proteomics data from various life cycle stages of Ectocarpus sp. strains and sub-cellular fractions using a shotgun approach. First, we directly generated peptide sequence tags (PSTs) from the proteomics data. Second, we mapped PSTs onto the translated genomic sequence. Closely located hits (i.e., PSTs locations on the genome) were then clustered to detect potential coding regions based on parameters optimized for the organism. Third, we evaluated each cluster and compared it to gene predictions from existing conventional genome annotation approaches. Finally, we integrated cluster locations into GFF files to use a genome viewer. We identified two potential novel genes, a ribosomal protein L22 and an aryl sulfotransferase and corrected the gene structure of a dihydrolipoamide acetyltransferase. We experimentally validated the results by RT-PCR and using transcriptomics data.

Conclusions

Peptimapper is a complementary tool for the expert annotation of genomes. It is suitable for any organism and is distributed through a Docker image available on two public bioinformatics docker repositories: Docker Hub and BioShaDock. This workflow is also accessible through the Galaxy framework and for use by non-computer scientists at https://galaxy.protim.eu.

Data are available via ProteomeXchange under identifier PXD010618.

Electronic supplementary material

The online version of this article (10.1186/s12864-019-5431-9) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Laetitia Guillot Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
Ludovic Delage Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
Alain Viari INRIA Grenoble-Rhône-Alpes, F-38330, Montbonnot-Saint-Martin, France
Yves Vandenbrouck University Grenoble Alpes, CEA, Inserm, BIG-BGE, 38000, Grenoble, France
Emmanuelle Com Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
Andrés Ritter Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France.,Present address: Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France
Régis Lavigne Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.,Protim, Univ Rennes, F-35042, Rennes cedex, France
Dominique Marie Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
Pierre Peterlongo University Rennes, Inria, CNRS, IRISA, F-35042, Rennes, France
Philippe Potin Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France
Charles Pineau Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France. .,Protim, Univ Rennes, F-35042, Rennes cedex, France.

Collapse

Kumar P, Panigrahi P, Johnson J, Weber WJ, Mehta S, Sajulga R, Easterly C, Crooker BA, Heydarian M, Anamika K, Griffin TJ, Jagtap PD. QuanTP: A Software Resource for Quantitative Proteo-Transcriptomic Comparative Data Analysis and Informatics. J Proteome Res 2018;18:782-790. [DOI: 10.1021/acs.jproteome.8b00727] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Johnson JE, Kumar P, Easterly C, Esler M, Mehta S, Eschenlauer AC, Hegeman AD, Jagtap PD, Griffin TJ. Improve your Galaxy text life: The Query Tabular Tool. F1000Res 2018;7:1604. [PMID: 30519459 PMCID: PMC6248266 DOI: 10.12688/f1000research.16450.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/02/2019] [Indexed: 11/20/2022] Open

Abstract

Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different ‘omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of even advanced users and developers. For users with some SQL knowledge, these text operations could be combined into single, concise query on a relational database. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs.

Collapse

Johnson JE, Kumar P, Easterly C, Esler M, Mehta S, Eschenlauer AC, Hegeman AD, Jagtap PD, Griffin TJ. Improve your Galaxy text life: The Query Tabular Tool. F1000Res 2018;7:1604. [PMID: 30519459 PMCID: PMC6248266 DOI: 10.12688/f1000research.16450.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/02/2019] [Indexed: 10/04/2023] Open

Abstract

Galaxy provides an accessible platform where multi-step data analysis workflows integrating disparate software can be run, even by researchers with limited programming expertise. Applications of such sophisticated workflows are many, including those which integrate software from different 'omic domains (e.g. genomics, proteomics, metabolomics). In these complex workflows, intermediate outputs are often generated as tabular text files, which must be transformed into customized formats which are compatible with the next software tools in the pipeline. Consequently, many text manipulation steps are added to an already complex workflow, overly complicating the process. In some cases, limitations to existing text manipulation are such that desired analyses can only be carried out using highly sophisticated processing steps beyond the reach of even advanced users and developers. For users with some SQL knowledge, these text operations could be combined into single, concise query on a relational database. As a solution, we have developed the Query Tabular Galaxy tool, which leverages a SQLite database generated from tabular input data. This database can be queried and manipulated to produce transformed and customized tabular outputs compatible with downstream processing steps. Regular expressions can also be utilized for even more sophisticated manipulations, such as find and replace and other filtering actions. Using several Galaxy-based multi-omic workflows as an example, we demonstrate how the Query Tabular tool dramatically streamlines and simplifies the creation of multi-step analyses, efficiently enabling complicated textual manipulations and processing. This tool should find broad utility for users of the Galaxy platform seeking to develop and use sophisticated workflows involving text manipulation on tabular outputs.

Collapse

Sajulga R, Mehta S, Kumar P, Johnson JE, Guerrero CR, Ryan MC, Karchin R, Jagtap PD, Griffin TJ. Bridging the Chromosome-centric and Biology/Disease-driven Human Proteome Projects: Accessible and Automated Tools for Interpreting the Biological and Pathological Impact of Protein Sequence Variants Detected via Proteogenomics. J Proteome Res 2018;17:4329-4336. [DOI: 10.1021/acs.jproteome.8b00404] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Barsnes H, Vaudel M. SearchGUI: A Highly Adaptable Common Interface for Proteomics Search and de Novo Engines. J Proteome Res 2018;17:2552-2555. [PMID: 29774740 DOI: 10.1021/acs.jproteome.8b00175] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]