1
|
Do K, Mehta S, Wagner R, Bhuming D, Rajczewski AT, Skubitz APN, Johnson JE, Griffin TJ, Jagtap PD. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease. mSphere 2024:e0079323. [PMID: 38780289 DOI: 10.1128/msphere.00793-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/17/2024] [Indexed: 05/25/2024] Open
Abstract
Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification, and prioritization of microbial proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant [to generate peptide-spectral matches (PSMs) and quantification], PepQuery2 (to verify the quality of PSMs), Unipept (for taxonomic and functional annotation), and MSstatsTMT (for statistical analysis). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies. IMPORTANCE Clinical metaproteomics has immense potential to offer functional insights into the microbiome and its contributions to human disease. However, there are numerous challenges in the metaproteomic analysis of clinical samples, including handling of very large protein sequence databases for sensitive and accurate peptide and protein identification from mass spectrometry data, as well as taxonomic and functional annotation of quantified peptides and proteins to enable interpretation of results. To address these challenges, we have developed a novel clinical metaproteomics workflow that provides customized bioinformatic identification, verification, quantification, and taxonomic and functional annotation. This bioinformatic workflow is implemented in the Galaxy ecosystem and has been used to characterize diverse clinical sample types, such as nasopharyngeal swabs and bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness and availability for use by the research community via analysis of residual fluid from cervical swabs.
Collapse
Affiliation(s)
- Katherine Do
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Reid Wagner
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, USA
| | - Dechen Bhuming
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Andrew T Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Amy P N Skubitz
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota, USA
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
2
|
Nebauer DJ, Pearson LA, Neilan BA. Critical steps in an environmental metaproteomics workflow. Environ Microbiol 2024; 26:e16637. [PMID: 38760994 DOI: 10.1111/1462-2920.16637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 04/30/2024] [Indexed: 05/20/2024]
Abstract
Environmental metaproteomics is a rapidly advancing field that provides insights into the structure, dynamics, and metabolic activity of microbial communities. As the field is still maturing, it lacks consistent workflows, making it challenging for non-expert researchers to navigate. This review aims to introduce the workflow of environmental metaproteomics. It outlines the standard practices for sample collection, processing, and analysis, and offers strategies to overcome the unique challenges presented by common environmental matrices such as soil, freshwater, marine environments, biofilms, sludge, and symbionts. The review also highlights the bottlenecks in data analysis that are specific to metaproteomics samples and provides suggestions for researchers to obtain high-quality datasets. It includes recent benchmarking studies and descriptions of software packages specifically built for metaproteomics analysis. The article is written without assuming the reader's familiarity with single-organism proteomic workflows, making it accessible to those new to proteomics or mass spectrometry in general. This primer for environmental metaproteomics aims to improve accessibility to this exciting technology and empower researchers to tackle challenging and ambitious research questions. While it is primarily a resource for those new to the field, it should also be useful for established researchers looking to streamline or troubleshoot their metaproteomics experiments.
Collapse
Affiliation(s)
- Daniel J Nebauer
- School of Environmental and Life Sciences, The University of Newcastle, Callaghan, New South Wales, Australia
- Centre of Excellence in Synthetic Biology, Australian Research Council, Sydney, New South Wales, Australia
| | - Leanne A Pearson
- School of Environmental and Life Sciences, The University of Newcastle, Callaghan, New South Wales, Australia
- Centre of Excellence in Synthetic Biology, Australian Research Council, Sydney, New South Wales, Australia
| | - Brett A Neilan
- School of Environmental and Life Sciences, The University of Newcastle, Callaghan, New South Wales, Australia
- Centre of Excellence in Synthetic Biology, Australian Research Council, Sydney, New South Wales, Australia
| |
Collapse
|
3
|
Messer LF, Lee CE, Wattiez R, Matallana-Surget S. Novel functional insights into the microbiome inhabiting marine plastic debris: critical considerations to counteract the challenges of thin biofilms using multi-omics and comparative metaproteomics. MICROBIOME 2024; 12:36. [PMID: 38389111 PMCID: PMC10882806 DOI: 10.1186/s40168-024-01751-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 01/03/2024] [Indexed: 02/24/2024]
Abstract
BACKGROUND Microbial functioning on marine plastic surfaces has been poorly documented, especially within cold climates where temperature likely impacts microbial activity and the presence of hydrocarbonoclastic microorganisms. To date, only two studies have used metaproteomics to unravel microbial genotype-phenotype linkages in the marine 'plastisphere', and these have revealed the dominance of photosynthetic microorganisms within warm climates. Advancing the functional representation of the marine plastisphere is vital for the development of specific databases cataloging the functional diversity of the associated microorganisms and their peptide and protein sequences, to fuel biotechnological discoveries. Here, we provide a comprehensive assessment for plastisphere metaproteomics, using multi-omics and data mining on thin plastic biofilms to provide unique insights into plastisphere metabolism. Our robust experimental design assessed DNA/protein co-extraction and cell lysis strategies, proteomics workflows, and diverse protein search databases, to resolve the active plastisphere taxa and their expressed functions from an understudied cold environment. RESULTS For the first time, we demonstrate the predominance and activity of hydrocarbonoclastic genera (Psychrobacter, Flavobacterium, Pseudomonas) within a primarily heterotrophic plastisphere. Correspondingly, oxidative phosphorylation, the citrate cycle, and carbohydrate metabolism were the dominant pathways expressed. Quorum sensing and toxin-associated proteins of Streptomyces were indicative of inter-community interactions. Stress response proteins expressed by Psychrobacter, Planococcus, and Pseudoalteromonas and proteins mediating xenobiotics degradation in Psychrobacter and Pseudoalteromonas suggested phenotypic adaptations to the toxic chemical microenvironment of the plastisphere. Interestingly, a targeted search strategy identified plastic biodegradation enzymes, including polyamidase, hydrolase, and depolymerase, expressed by rare taxa. The expression of virulence factors and mechanisms of antimicrobial resistance suggested pathogenic genera were active, despite representing a minor component of the plastisphere community. CONCLUSION Our study addresses a critical gap in understanding the functioning of the marine plastisphere, contributing new insights into the function and ecology of an emerging and important microbial niche. Our comprehensive multi-omics and comparative metaproteomics experimental design enhances biological interpretations to provide new perspectives on microorganisms of potential biotechnological significance beyond biodegradation and to improve the assessment of the risks associated with microorganisms colonizing marine plastic pollution. Video Abstract.
Collapse
Affiliation(s)
- Lauren F Messer
- Division of Biological and Environmental Sciences, Faculty of Natural Sciences, University of Stirling, Stirling, FK9 4LA, Scotland
| | - Charlotte E Lee
- Division of Biological and Environmental Sciences, Faculty of Natural Sciences, University of Stirling, Stirling, FK9 4LA, Scotland
| | - Ruddy Wattiez
- Proteomics and Microbiology Department, University of Mons, Mons, 7000, Belgium
| | - Sabine Matallana-Surget
- Division of Biological and Environmental Sciences, Faculty of Natural Sciences, University of Stirling, Stirling, FK9 4LA, Scotland.
| |
Collapse
|
4
|
Do K, Mehta S, Wagner R, Bhuming D, Rajczewski AT, Skubitz APN, Johnson JE, Griffin TJ, Jagtap PD. A novel clinical metaproteomics workflow enables bioinformatic analysis of host-microbe dynamics in disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.21.568121. [PMID: 38045370 PMCID: PMC10690215 DOI: 10.1101/2023.11.21.568121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Clinical metaproteomics has the potential to offer insights into the host-microbiome interactions underlying diseases. However, the field faces challenges in characterizing microbial proteins found in clinical samples, which are usually present at low abundance relative to the host proteins. As a solution, we have developed an integrated workflow coupling mass spectrometry-based analysis with customized bioinformatic identification, quantification and prioritization of microbial and host proteins, enabling targeted assay development to investigate host-microbe dynamics in disease. The bioinformatics tools are implemented in the Galaxy ecosystem, offering the development and dissemination of complex bioinformatic workflows. The modular workflow integrates MetaNovo (to generate a reduced protein database), SearchGUI/PeptideShaker and MaxQuant (to generate peptide-spectral matches (PSMs) and quantification), PepQuery2 (to verify the quality of PSMs), and Unipept and MSstatsTMT (for taxonomy and functional annotation). We have utilized this workflow in diverse clinical samples, from the characterization of nasopharyngeal swab samples to bronchoalveolar lavage fluid. Here, we demonstrate its effectiveness via analysis of residual fluid from cervical swabs. The complete workflow, including training data and documentation, is available via the Galaxy Training Network, empowering non-expert researchers to utilize these powerful tools in their clinical studies.
Collapse
|
5
|
Mehta S, Bernt M, Chambers M, Fahrner M, Föll MC, Gruening B, Horro C, Johnson JE, Loux V, Rajczewski AT, Schilling O, Vandenbrouck Y, Gustafsson OJR, Thang WCM, Hyde C, Price G, Jagtap PD, Griffin TJ. A Galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 2023; 20:251-266. [PMID: 37787106 DOI: 10.1080/14789450.2023.2265062] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/06/2023] [Indexed: 10/04/2023]
Abstract
INTRODUCTION Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.
Collapse
Affiliation(s)
- Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Matthias Bernt
- Helmholtz Centre for Environmental Research - UFZ, Department Computational Biology, Leipzig, Germany
| | | | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Bjoern Gruening
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Carlos Horro
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Valentin Loux
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
- Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, Jouy-en-Josas, France
| | - Andrew T Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | | | - W C Mike Thang
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Sippy Downs, University of the Sunshine Coast, Australia
| | - Gareth Price
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
6
|
Miura N, Okuda S. Current progress and critical challenges to overcome in the bioinformatics of mass spectrometry-based metaproteomics. Comput Struct Biotechnol J 2023; 21:1140-1150. [PMID: 36817962 PMCID: PMC9925844 DOI: 10.1016/j.csbj.2023.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/14/2023] [Accepted: 01/14/2023] [Indexed: 01/18/2023] Open
Abstract
Metaproteomics is a relatively young field that has only been studied for approximately 15 years. Nevertheless, it has the potential to play a key role in disease research by elucidating the mechanisms of communication between the human host and the microbiome. Although it has been useful in developing an understanding of various diseases, its analytical strategies remain limited to the extended application of proteomics. The sequence databases in metaproteomics must be large because of the presence of thousands of species in a typical sample, which causes problems unique to large databases. In this review, we demonstrate the usefulness of metaproteomics in disease research through examples from several studies. Additionally, we discuss the challenges of applying metaproteomics to conventional proteomics analysis methods and introduce studies that may provide clues to the solutions. We also discuss the need for a standard false discovery rate control method for metaproteomics to replace common target-decoy search approaches in proteomics and a method to ensure the reliability of peptide spectrum match.
Collapse
Affiliation(s)
- Nobuaki Miura
- Division of Bioinformatics, Niigata University Graduate School of Medical and Dental Sciences, 2-5274 Gakkocho-dori, Chuo-ku, Niigata 951-8514, Japan
| | - Shujiro Okuda
- Division of Bioinformatics, Niigata University Graduate School of Medical and Dental Sciences, 2-5274 Gakkocho-dori, Chuo-ku, Niigata 951-8514, Japan,Medical AI Center, Niigata University School of Medicine, 2-5274 Gakkocho-dori, Chuo-ku, Niigata 951-8514, Japan,Corresponding author at: Medical AI Center, Niigata University School of Medicine, 2-5274 Gakkocho-dori, Chuo-ku, Niigata 951-8514, Japan.
| |
Collapse
|
7
|
Armengaud J. Metaproteomics to understand how microbiota function: The crystal ball predicts a promising future. Environ Microbiol 2023; 25:115-125. [PMID: 36209500 PMCID: PMC10091800 DOI: 10.1111/1462-2920.16238] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 09/30/2022] [Indexed: 01/21/2023]
Abstract
In the medical, environmental, and biotechnological fields, microbial communities have attracted much attention due to their roles and numerous possible applications. The study of these communities is challenging due to their diversity and complexity. Innovative methods are needed to identify the taxonomic components of individual microbiota, their changes over time, and to determine how microoorganisms interact and function. Metaproteomics is based on the identification and quantification of proteins, and can potentially provide this full picture. Due to the wide molecular panorama and functional insights it provides, metaproteomics is gaining momentum in microbiome and holobiont research. Its full potential should be unleashed in the coming years with progress in speed and cost of analyses. In this exploratory crystal ball exercise, I discuss the technical and conceptual advances in metaproteomics that I expect to drive innovative research over the next few years in microbiology. I also debate the concepts of 'microbial dark matter' and 'Metaproteomics-Assembled Proteomes (MAPs)' and present some long-term prospects for metaproteomics in clinical diagnostics and personalized medicine, environmental monitoring, agriculture, and biotechnology.
Collapse
Affiliation(s)
- Jean Armengaud
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris-Saclay, CEA, INRAE, Bagnols-sur-Cèze, France
| |
Collapse
|
8
|
Fancello L, Burger T. An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics. Genome Biol 2022; 23:132. [PMID: 35725496 PMCID: PMC9208142 DOI: 10.1186/s13059-022-02701-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 06/09/2022] [Indexed: 12/03/2022] Open
Abstract
Background Proteogenomics aims to identify variant or unknown proteins in bottom-up proteomics, by searching transcriptome- or genome-derived custom protein databases. However, empirical observations reveal that these large proteogenomic databases produce lower-sensitivity peptide identifications. Various strategies have been proposed to avoid this, including the generation of reduced transcriptome-informed protein databases, which only contain proteins whose transcripts are detected in the sample-matched transcriptome. These were found to increase peptide identification sensitivity. Here, we present a detailed evaluation of this approach. Results We establish that the increased sensitivity in peptide identification is in fact a statistical artifact, directly resulting from the limited capability of target-decoy competition to accurately model incorrect target matches when using excessively small databases. As anti-conservative false discovery rates (FDRs) are likely to hamper the robustness of the resulting biological conclusions, we advocate for alternative FDR control methods that are less sensitive to database size. Nevertheless, reduced transcriptome-informed databases are useful, as they reduce the ambiguity of protein identifications, yielding fewer shared peptides. Furthermore, searching the reference database and subsequently filtering proteins whose transcripts are not expressed reduces protein identification ambiguity to a similar extent, but is more transparent and reproducible. Conclusions In summary, using transcriptome information is an interesting strategy that has not been promoted for the right reasons. While the increase in peptide identifications from searching reduced transcriptome-informed databases is an artifact caused by the use of an FDR control method unsuitable to excessively small databases, transcriptome information can reduce the ambiguity of protein identifications. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02701-2.
Collapse
Affiliation(s)
- Laura Fancello
- CNRS, CEA, Inserm, BioSanté U1292, Profi FR2048, Université Grenoble Alpes, Grenoble, France
| | - Thomas Burger
- CNRS, CEA, Inserm, BioSanté U1292, Profi FR2048, Université Grenoble Alpes, Grenoble, France.
| |
Collapse
|
9
|
Aggarwal S, Raj A, Kumar D, Dash D, Yadav AK. False discovery rate: the Achilles' heel of proteogenomics. Brief Bioinform 2022; 23:6582880. [PMID: 35534181 DOI: 10.1093/bib/bbac163] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 03/14/2022] [Accepted: 04/12/2022] [Indexed: 12/25/2022] Open
Abstract
Proteogenomics refers to the integrated analysis of the genome and proteome that leverages mass-spectrometry (MS)-based proteomics data to improve genome annotations, understand gene expression control through proteoforms and find sequence variants to develop novel insights for disease classification and therapeutic strategies. However, proteogenomic studies often suffer from reduced sensitivity and specificity due to inflated database size. To control the error rates, proteogenomics depends on the target-decoy search strategy, the de-facto method for false discovery rate (FDR) estimation in proteomics. The proteogenomic databases constructed from three- or six-frame nucleotide database translation not only increase the search space and compute-time but also violate the equivalence of target and decoy databases. These searches result in poorer separation between target and decoy scores, leading to stringent FDR thresholds. Understanding these factors and applying modified strategies such as two-pass database search or peptide-class-specific FDR can result in a better interpretation of MS data without introducing additional statistical biases. Based on these considerations, a user can interpret the proteogenomics results appropriately and control false positives and negatives in a more informed manner. In this review, first, we briefly discuss the proteogenomic workflows and limitations in database construction, followed by various considerations that can influence potential novel discoveries in a proteogenomic study. We conclude with suggestions to counter these challenges for better proteogenomic data interpretation.
Collapse
Affiliation(s)
- Suruchi Aggarwal
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd milestone, PO Box No. 04, Faridabad-Gurgaon Expressway, Faridabad-121001, Haryana, India
| | - Anurag Raj
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics & Integrative Biology, South Campus, Mathura Road, New Delhi 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad-201002, India
| | - Dhirendra Kumar
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics & Integrative Biology, South Campus, Mathura Road, New Delhi 110025, India
| | - Debasis Dash
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics & Integrative Biology, South Campus, Mathura Road, New Delhi 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad-201002, India
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd milestone, PO Box No. 04, Faridabad-Gurgaon Expressway, Faridabad-121001, Haryana, India
| |
Collapse
|
10
|
Rajczewski AT, Han Q, Mehta S, Kumar P, Jagtap PD, Knutson CG, Fox JG, Tretyakova NY, Griffin TJ. Quantitative Proteogenomic Characterization of Inflamed Murine Colon Tissue Using an Integrated Discovery, Verification, and Validation Proteogenomic Workflow. Proteomes 2022; 10:proteomes10020011. [PMID: 35466239 PMCID: PMC9036229 DOI: 10.3390/proteomes10020011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 03/27/2022] [Accepted: 04/07/2022] [Indexed: 11/24/2022] Open
Abstract
Chronic inflammation of the colon causes genomic and/or transcriptomic events, which can lead to expression of non-canonical protein sequences contributing to oncogenesis. To better understand these mechanisms, Rag2−/−Il10−/− mice were infected with Helicobacter hepaticus to induce chronic inflammation of the cecum and the colon. Transcriptomic data from harvested proximal colon samples were used to generate a customized FASTA database containing non-canonical protein sequences. Using a proteogenomic approach, mass spectrometry data for proximal colon proteins were searched against this custom FASTA database using the Galaxy for Proteomics (Galaxy-P) platform. In addition to the increased abundance in inflammatory response proteins, we also discovered several non-canonical peptide sequences derived from unique proteoforms. We confirmed the veracity of these novel sequences using an automated bioinformatics verification workflow with targeted MS-based assays for peptide validation. Our bioinformatics discovery workflow identified 235 putative non-canonical peptide sequences, of which 58 were verified with high confidence and 39 were validated in targeted proteomics assays. This study provides insights into challenges faced when identifying non-canonical peptides using a proteogenomics approach and demonstrates an integrated workflow addressing these challenges. Our bioinformatic discovery and verification workflow is publicly available and accessible via the Galaxy platform and should be valuable in non-canonical peptide identification using proteogenomics.
Collapse
Affiliation(s)
- Andrew T. Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Qiyuan Han
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
| | - Charles G. Knutson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; (C.G.K.); (J.G.F.)
| | - James G. Fox
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; (C.G.K.); (J.G.F.)
| | - Natalia Y. Tretyakova
- Department of Medicinal Chemistry, the Masonic Cancer Center, University of Minnesota, Minneapolis, MN 55455, USA;
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA; (A.T.R.); (Q.H.); (S.M.); (P.K.); (P.D.J.)
- Correspondence:
| |
Collapse
|
11
|
Rajczewski AT, Jagtap PD, Griffin TJ. An overview of technologies for MS-based proteomics-centric multi-omics. Expert Rev Proteomics 2022; 19:165-181. [PMID: 35466851 PMCID: PMC9613604 DOI: 10.1080/14789450.2022.2070476] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION Mass spectrometry-based proteomics reveals dynamic molecular signatures underlying phenotypes reflecting normal and perturbed conditions in living systems. Although valuable on its own, the proteome has only one level of moleclar information, with the genome, epigenome, transcriptome, and metabolome, all providing complementary information. Multi-omic analysis integrating information from one or more of these other domains with proteomic information provides a more complete picture of molecular contributors to dynamic biological systems. AREAS COVERED Here, we discuss the improvements to mass spectrometry-based technologies, focused on peptide-based, bottom-up approaches that have enabled deep, quantitative characterization of complex proteomes. These advances are facilitating the integration of proteomics data with other 'omic information, providing a more complete picture of living systems. We also describe the current state of bioinformatics software and approaches for integrating proteomics and other 'omics data, critical for enabling new discoveries driven by multi-omics. EXPERT COMMENTARY Multi-omics, centered on the integration of proteomics information with other 'omic information, has tremendous promise for biological and biomedical studies. Continued advances in approaches for generating deep, reliable proteomic data and bioinformatics tools aimed at integrating data across 'omic domains will ensure the discoveries offered by these multi-omic studies continue to increase.
Collapse
Affiliation(s)
- Andrew T. Rajczewski
- Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA
| | - Pratik D. Jagtap
- Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA,Coauthor, Research Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA
| | - Timothy J. Griffin
- Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA,Department of Biochemistry, Molecular and Cell Biology Building, University of Minnesota, 420 Washington Ave SE 7-129, Minneapolis, MN, 55455, USA
| |
Collapse
|
12
|
Nalpas N, Hoyles L, Anselm V, Ganief T, Martinez-Gili L, Grau C, Droste-Borel I, Davidovic L, Altafaj X, Dumas ME, Macek B. An integrated workflow for enhanced taxonomic and functional coverage of the mouse fecal metaproteome. Gut Microbes 2022; 13:1994836. [PMID: 34763597 PMCID: PMC8726736 DOI: 10.1080/19490976.2021.1994836] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Intestinal microbiota plays a key role in shaping host homeostasis by regulating metabolism, immune responses and behavior. Its dysregulation has been associated with metabolic, immune and neuropsychiatric disorders and is accompanied by changes in bacterial metabolic regulation. Although proteomics is well suited for analysis of individual microbes, metaproteomics of fecal samples is challenging due to the physical structure of the sample, presence of contaminating host proteins and coexistence of hundreds of taxa. Furthermore, there is a lack of consensus regarding preparation of fecal samples, as well as downstream bioinformatic analyses following metaproteomics data acquisition. Here we assess sample preparation and data analysis strategies applied to mouse feces in a typical mass spectrometry-based metaproteomic experiment. We show that subtle changes in sample preparation protocols may influence interpretation of biological findings. Two-step database search strategies led to significant underestimation of false positive protein identifications. Unipept software provided the highest sensitivity and specificity in taxonomic annotation of the identified peptides of unknown origin. Comparison of matching metaproteome and metagenome data revealed a positive correlation between protein and gene abundances. Notably, nearly all functional categories of detected protein groups were differentially abundant in the metaproteome compared to what would be expected from the metagenome, highlighting the need to perform metaproteomics when studying complex microbiome samples.
Collapse
Affiliation(s)
- Nicolas Nalpas
- Proteome Center Tuebingen, University of Tuebingen, Tuebingen, Germany
| | - Lesley Hoyles
- Biomolecular Medicine Section, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London, UK,Department of Biosciences, Nottingham Trent University, Nottingham, UK
| | - Viktoria Anselm
- Proteome Center Tuebingen, University of Tuebingen, Tuebingen, Germany
| | - Tariq Ganief
- Proteome Center Tuebingen, University of Tuebingen, Tuebingen, Germany
| | - Laura Martinez-Gili
- Biomolecular Medicine Section, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London, UK
| | - Cristina Grau
- Pharmacology unit, Bellvitge Biomedical Research Institute, University of Barcelona, Barcelona, Spain
| | | | | | - Xavier Altafaj
- Pharmacology unit, Bellvitge Biomedical Research Institute, University of Barcelona, Barcelona, Spain,Neurophysiology Unit, University of Barcelona – Idibaps, Barcelona, Spain
| | - Marc-Emmanuel Dumas
- Biomolecular Medicine Section, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London, UK,Genomic and Environmental Medicine, National Heart & Lung Institute, Faculty of Medicine, Imperial College London, London, UK,European Genomic Institute for Diabetes, Inserm Umr 1283, Cnrs Umr 8199, Institut Pasteur De Lille, Lille University Hospital, University of Lille, Lille, France
| | - Boris Macek
- Proteome Center Tuebingen, University of Tuebingen, Tuebingen, Germany,CONTACT Boris Macek Proteome Center Tuebingen, Interfaculty Institute for Cell Biology, Auf Der Morgenstelle 15, Tuebingen72076, Germany
| |
Collapse
|
13
|
Thuy-Boun PS, Wang AY, Crissien-Martinez A, Xu JH, Chatterjee S, Stupp GS, Su AI, Coyle WJ, Wolan DW. Quantitative metaproteomics and activity-based protein profiling of patient fecal microbiome identifies host and microbial serine-type endopeptidase activity associated with ulcerative colitis. Mol Cell Proteomics 2022; 21:100197. [PMID: 35033677 PMCID: PMC8941213 DOI: 10.1016/j.mcpro.2022.100197] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 01/10/2022] [Accepted: 01/11/2022] [Indexed: 12/12/2022] Open
Abstract
The gut microbiota plays an important yet incompletely understood role in the induction and propagation of ulcerative colitis (UC). Organism-level efforts to identify UC-associated microbes have revealed the importance of community structure, but less is known about the molecular effectors of disease. We performed 16S rRNA gene sequencing in parallel with label-free data-dependent LC-MS/MS proteomics to characterize the stool microbiomes of healthy (n = 8) and UC (n = 10) patients. Comparisons of taxonomic composition between techniques revealed major differences in community structure partially attributable to the additional detection of host, fungal, viral, and food peptides by metaproteomics. Differential expression analysis of metaproteomic data identified 176 significantly enriched protein groups between healthy and UC patients. Gene ontology analysis revealed several enriched functions with serine-type endopeptidase activity overrepresented in UC patients. Using a biotinylated fluorophosphonate probe and streptavidin-based enrichment, we show that serine endopeptidases are active in patient fecal samples and that additional putative serine hydrolases are detectable by this approach compared with unenriched profiling. Finally, as metaproteomic databases expand, they are expected to asymptotically approach completeness. Using ComPIL and de novo peptide sequencing, we estimate the size of the probable peptide space unidentified (“dark peptidome”) by our large database approach to establish a rough benchmark for database sufficiency. Despite high variability inherent in patient samples, our analysis yielded a catalog of differentially enriched proteins between healthy and UC fecal proteomes. This catalog provides a clinically relevant jumping-off point for further molecular-level studies aimed at identifying the microbial underpinnings of UC. Identified 176 significantly altered protein groups between healthy and UC patients. Serine-type endopeptidase activity is overrepresented in UC patients. Fluorophosphonate ABPP shows that endopeptidases are active in fecal samples. ABPP enrichment helps identify additional putative serine hydrolases in samples. De novo sequencing used to estimate number of MS2 spectra unidentified by ComPIL.
Collapse
Affiliation(s)
- Peter S Thuy-Boun
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037
| | - Ana Y Wang
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037
| | | | - Janice H Xu
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037
| | - Sandip Chatterjee
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037
| | - Gregory S Stupp
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037
| | - Walter J Coyle
- Scripps Clinic Gastroenterology Division, La Jolla, CA 92037
| | - Dennis W Wolan
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA 92037; Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037.
| |
Collapse
|
14
|
Blakeley-Ruiz JA, Kleiner M. Considerations for Constructing a Protein Sequence Database for Metaproteomics. Comput Struct Biotechnol J 2022; 20:937-952. [PMID: 35242286 PMCID: PMC8861567 DOI: 10.1016/j.csbj.2022.01.018] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 01/14/2022] [Accepted: 01/18/2022] [Indexed: 12/14/2022] Open
Abstract
Mass spectrometry-based metaproteomics has emerged as a prominent technique for interrogating the functions of specific organisms in microbial communities, in addition to total community function. Identifying proteins by mass spectrometry requires matching mass spectra of fragmented peptide ions to a database of protein sequences corresponding to the proteins in the sample. This sequence database determines which protein sequences can be identified from the measurement, and as such the taxonomic and functional information that can be inferred from a metaproteomics measurement. Thus, the construction of the protein sequence database directly impacts the outcome of any metaproteomics study. Several factors, such as source of sequence information and database curation, need to be considered during database construction to maximize accurate protein identifications traceable to the species of origin. In this review, we provide an overview of existing strategies for database construction and the relevant studies that have sought to test and validate these strategies. Based on this review of the literature and our experience we provide a decision tree and best practices for choosing and implementing database construction strategies.
Collapse
Affiliation(s)
- J. Alfredo Blakeley-Ruiz
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA
- Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Corresponding authors at: Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA.
| | - Manuel Kleiner
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA
- Corresponding authors at: Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
15
|
Van Den Bossche T, Kunath BJ, Schallert K, Schäpe SS, Abraham PE, Armengaud J, Arntzen MØ, Bassignani A, Benndorf D, Fuchs S, Giannone RJ, Griffin TJ, Hagen LH, Halder R, Henry C, Hettich RL, Heyer R, Jagtap P, Jehmlich N, Jensen M, Juste C, Kleiner M, Langella O, Lehmann T, Leith E, May P, Mesuere B, Miotello G, Peters SL, Pible O, Queiros PT, Reichl U, Renard BY, Schiebenhoefer H, Sczyrba A, Tanca A, Trappe K, Trezzi JP, Uzzau S, Verschaffelt P, von Bergen M, Wilmes P, Wolf M, Martens L, Muth T. Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows. Nat Commun 2021; 12:7305. [PMID: 34911965 PMCID: PMC8674281 DOI: 10.1038/s41467-021-27542-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 11/24/2021] [Indexed: 12/17/2022] Open
Abstract
Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear. Here, we carry out a community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI). Based on well-established workflows, we evaluate the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample. We observe that variability at the peptide level is predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappear at the protein group level. While differences are observed for predicted community composition, similar functional profiles are obtained across workflows. CAMPI demonstrates the robustness of present-day metaproteomics research, serves as a template for multi-laboratory studies in metaproteomics, and provides publicly available data sets for benchmarking future developments.
Collapse
Affiliation(s)
- Tim Van Den Bossche
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Benoit J Kunath
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Kay Schallert
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Stephanie S Schäpe
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Paul E Abraham
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jean Armengaud
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Magnus Ø Arntzen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Ariane Bassignani
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Dirk Benndorf
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
- Microbiology, Department of Applied Biosciences and Process Technology, Anhalt University of Applied Sciences, Köthen, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Stephan Fuchs
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | | | - Timothy J Griffin
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Live H Hagen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Rashi Halder
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Céline Henry
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Robert Heyer
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Pratik Jagtap
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Nico Jehmlich
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Marlene Jensen
- Department of Plant & Microbial Biology, North Carolina State University, Raleigh, USA
| | - Catherine Juste
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Manuel Kleiner
- Department of Plant & Microbial Biology, North Carolina State University, Raleigh, USA
| | - Olivier Langella
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| | - Theresa Lehmann
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Emma Leith
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Bart Mesuere
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Guylaine Miotello
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Samantha L Peters
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Olivier Pible
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Pedro T Queiros
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Udo Reichl
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Henning Schiebenhoefer
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam, Germany
| | | | - Alessandro Tanca
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Jean-Pierre Trezzi
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Integrated Biobank of Luxembourg, Luxembourg Institute of Health, 1, rue Louis Rech, L-3555, Dudelange, Luxembourg
| | - Sergio Uzzau
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Pieter Verschaffelt
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Martin von Bergen
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Department of Life Sciences and Medicine, Faculty of Science, Technology and Medicine, University of Luxembourg, 6 avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Maximilian Wolf
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Lennart Martens
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium.
| | - Thilo Muth
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany
| |
Collapse
|
16
|
Lin A, Plubell DL, Keich U, Noble WS. Accurately Assigning Peptides to Spectra When Only a Subset of Peptides Are Relevant. J Proteome Res 2021; 20:4153-4164. [PMID: 34236864 PMCID: PMC8489664 DOI: 10.1021/acs.jproteome.1c00483] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The standard proteomics database search strategy involves searching spectra against a peptide database and estimating the false discovery rate (FDR) of the resulting set of peptide-spectrum matches. One assumption of this protocol is that all the peptides in the database are relevant to the hypothesis being investigated. However, in settings where researchers are interested in a subset of peptides, alternative search and FDR control strategies are needed. Recently, two methods were proposed to address this problem: subset-search and all-sub. We show that both methods fail to control the FDR. For subset-search, this failure is due to the presence of "neighbor" peptides, which are defined as irrelevant peptides with a similar precursor mass and fragmentation spectrum as a relevant peptide. Not considering neighbors compromises the FDR estimate because a spectrum generated by an irrelevant peptide can incorrectly match well to a relevant peptide. Therefore, we have developed a new method, "subset-neighbor search" (SNS), that accounts for neighbor peptides. We show evidence that SNS controls the FDR when neighbors are present and that SNS outperforms group-FDR, the only other method that appears to control the FDR relative to a subset of relevant peptides.
Collapse
Affiliation(s)
- Andy Lin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Deanna L. Plubell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Uri Keich
- School of Mathematics and Statistics, University of Sydney, NSW, Australia
| | - William S. Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School for Computer Science and Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
17
|
Huang W, Kane MA. MAPLE: A Microbiome Analysis Pipeline Enabling Optimal Peptide Search and Comparative Taxonomic and Functional Analysis. J Proteome Res 2021; 20:2882-2894. [PMID: 33848166 DOI: 10.1021/acs.jproteome.1c00114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Metaproteomics by mass spectrometry (MS) is a powerful approach to profile a large number of proteins expressed by all organisms in a highly complex biological or ecological sample, which is able to provide a direct and quantitative assessment of the functional makeup of a microbiota. The human gastrointestinal microbiota has been found playing important roles in human physiology and health, and metaproteomics has been shown to shed light on multiple novel associations between microbiota and diseases. MS-powered proteomics generally relies on genome data to define search space. However, metaproteomics, which simultaneously analyzes all proteins from hundreds to thousands of species, faces significant challenges regarding database search and interpretation of results. To overcome these obstacles, we have developed a user-friendly microbiome analysis pipeline (MAPLE, freely downloadable at http://maple.rx.umaryland.edu/), which is able to define an optimal search space by inferring proteomes specific to samples following the principle of parsimony. MAPLE facilitates highly comparable or better peptide identification compared to a sample-specific metagenome-guided search. In addition, we implemented an automated peptide-centric enrichment analysis function in MAPLE to address issues of traditional protein-centric comparison, enabling straightforward and comprehensive comparison of taxonomic and functional makeup between microbiota.
Collapse
Affiliation(s)
- Weiliang Huang
- Department of Pharmaceutical Sciences, University of Maryland, School of Pharmacy, Baltimore, Maryland 21201, United States
| | - Maureen A Kane
- Department of Pharmaceutical Sciences, University of Maryland, School of Pharmacy, Baltimore, Maryland 21201, United States
| |
Collapse
|
18
|
Salvato F, Hettich RL, Kleiner M. Five key aspects of metaproteomics as a tool to understand functional interactions in host-associated microbiomes. PLoS Pathog 2021; 17:e1009245. [PMID: 33630960 PMCID: PMC7906368 DOI: 10.1371/journal.ppat.1009245] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Affiliation(s)
- Fernanda Salvato
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, North Carolina, United States of America
- * E-mail: (FS); (MK)
| | - Robert L. Hettich
- Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, Tennessee, United States of America
| | - Manuel Kleiner
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, North Carolina, United States of America
- * E-mail: (FS); (MK)
| |
Collapse
|
19
|
Sajulga R, Easterly C, Riffle M, Mesuere B, Muth T, Mehta S, Kumar P, Johnson J, Gruening BA, Schiebenhoefer H, Kolmeder CA, Fuchs S, Nunn BL, Rudney J, Griffin TJ, Jagtap PD. Survey of metaproteomics software tools for functional microbiome analysis. PLoS One 2020; 15:e0241503. [PMID: 33170893 PMCID: PMC7654790 DOI: 10.1371/journal.pone.0241503] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 10/15/2020] [Indexed: 11/23/2022] Open
Abstract
To gain a thorough appreciation of microbiome dynamics, researchers characterize the functional relevance of expressed microbial genes or proteins. This can be accomplished through metaproteomics, which characterizes the protein expression of microbiomes. Several software tools exist for analyzing microbiomes at the functional level by measuring their combined proteome-level response to environmental perturbations. In this survey, we explore the performance of six available tools, to enable researchers to make informed decisions regarding software choice based on their research goals. Tandem mass spectrometry-based proteomic data obtained from dental caries plaque samples grown with and without sucrose in paired biofilm reactors were used as representative data for this evaluation. Microbial peptides from one sample pair were identified by the X! tandem search algorithm via SearchGUI and subjected to functional analysis using software tools including eggNOG-mapper, MEGAN5, MetaGOmics, MetaProteomeAnalyzer (MPA), ProPHAnE, and Unipept to generate functional annotation through Gene Ontology (GO) terms. Among these software tools, notable differences in functional annotation were detected after comparing differentially expressed protein functional groups. Based on the generated GO terms of these tools we performed a peptide-level comparison to evaluate the quality of their functional annotations. A BLAST analysis against the NCBI non-redundant database revealed that the sensitivity and specificity of functional annotation varied between tools. For example, eggNOG-mapper mapped to the most number of GO terms, while Unipept generated more accurate GO terms. Based on our evaluation, metaproteomics researchers can choose the software according to their analytical needs and developers can use the resulting feedback to further optimize their algorithms. To make more of these tools accessible via scalable metaproteomics workflows, eggNOG-mapper and Unipept 4.0 were incorporated into the Galaxy platform.
Collapse
Affiliation(s)
- Ray Sajulga
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Caleb Easterly
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Michael Riffle
- University of Washington, Seattle, Washington, United States of America
| | | | - Thilo Muth
- Federal Institute for Materials Research and Testing, Berlin, Germany
| | - Subina Mehta
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Praveen Kumar
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - James Johnson
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | | | | | | | | | - Brook L. Nunn
- University of Washington, Seattle, Washington, United States of America
| | - Joel Rudney
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Timothy J. Griffin
- University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Pratik D. Jagtap
- University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|