1
|
Wu E, Mallawaarachchi V, Zhao J, Yang Y, Liu H, Wang X, Shen C, Lin Y, Qiao L. Contigs directed gene annotation (ConDiGA) for accurate protein sequence database construction in metaproteomics. Microbiome 2024; 12:58. [PMID: 38504332 PMCID: PMC10949615 DOI: 10.1186/s40168-024-01775-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 02/05/2024] [Indexed: 03/21/2024]
Abstract
BACKGROUND Microbiota are closely associated with human health and disease. Metaproteomics can provide a direct means to identify microbial proteins in microbiota for compositional and functional characterization. However, in-depth and accurate metaproteomics is still limited due to the extreme complexity and high diversity of microbiota samples. It is generally recommended to use metagenomic data from the same samples to construct the protein sequence database for metaproteomic data analysis. Although different metagenomics-based database construction strategies have been developed, an optimization of gene taxonomic annotation has not been reported, which, however, is extremely important for accurate metaproteomic analysis. RESULTS Herein, we proposed an accurate taxonomic annotation pipeline for genes from metagenomic data, namely contigs directed gene annotation (ConDiGA), and used the method to build a protein sequence database for metaproteomic analysis. We compared our pipeline (ConDiGA or MD3) with two other popular annotation pipelines (MD1 and MD2). In MD1, genes were directly annotated against the whole bacterial genome database; in MD2, contigs were annotated against the whole bacterial genome database and the taxonomic information of contigs was assigned to the genes; in MD3, the most confident species from the contigs annotation results were taken as reference to annotate genes. Annotation tools, including BLAST, Kaiju, and Kraken2, were compared. Based on a synthetic microbial community of 12 species, it was found that Kaiju with the MD3 pipeline outperformed the others in the construction of protein sequence database from metagenomic data. Similar performance was also observed with a fecal sample, as well as in silico mixed datasets of the simulated microbial community and the fecal sample. CONCLUSIONS Overall, we developed an optimized pipeline for gene taxonomic annotation to construct protein sequence databases. Our study can tackle the current taxonomic annotation reliability problem in metagenomics-derived protein sequence database and can promote the in-depth metaproteomic analysis of microbiome. The unique metagenomic and metaproteomic datasets of the 12 bacterial species are publicly available as a standard benchmarking sample for evaluating various analysis pipelines. The code of ConDiGA is open access at GitHub for the analysis of microbiota samples. Video Abstract.
Collapse
Affiliation(s)
- Enhui Wu
- Department of Chemistry, and Shanghai Stomatological Hospital, Fudan University, Shanghai, 200000, China
| | - Vijini Mallawaarachchi
- School of Computing, College of Engineering, Computing and Cybernetics, The Australian National University, Canberra, ACT, 2600, Australia
- Flinders Accelerator for Microbiome Exploration, College of Science and Engineering, Flinders University, Bedford Park, SA, 5042, Australia
| | - Jinzhi Zhao
- Department of Chemistry, and Shanghai Stomatological Hospital, Fudan University, Shanghai, 200000, China
| | - Yi Yang
- Department of Chemistry, and Shanghai Stomatological Hospital, Fudan University, Shanghai, 200000, China
| | - Hebin Liu
- Shanghai Omicsolution Co., Ltd, Shanghai, 200000, China
| | - Xiaoqing Wang
- Shanghai Omicsolution Co., Ltd, Shanghai, 200000, China
| | - Chengpin Shen
- Shanghai Omicsolution Co., Ltd, Shanghai, 200000, China
| | - Yu Lin
- School of Computing, College of Engineering, Computing and Cybernetics, The Australian National University, Canberra, ACT, 2600, Australia
| | - Liang Qiao
- Department of Chemistry, and Shanghai Stomatological Hospital, Fudan University, Shanghai, 200000, China.
| |
Collapse
|
2
|
Kleikamp HBC, Grouzdev D, Schaasberg P, van Valderen R, van der Zwaan R, Wijgaart RVD, Lin Y, Abbas B, Pronk M, van Loosdrecht MCM, Pabst M. Metaproteomics, metagenomics and 16S rRNA sequencing provide different perspectives on the aerobic granular sludge microbiome. Water Res 2023; 246:120700. [PMID: 37866247 DOI: 10.1016/j.watres.2023.120700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 09/29/2023] [Accepted: 10/04/2023] [Indexed: 10/24/2023]
Abstract
The tremendous progress in sequencing technologies has made DNA sequencing routine for microbiome studies. Additionally, advances in mass spectrometric techniques have extended conventional proteomics into the field of microbial ecology. However, systematic studies that provide a better understanding of the complementary nature of these 'omics' approaches, particularly for complex environments such as wastewater treatment sludge, are urgently needed. Here, we describe a comparative metaomics study on aerobic granular sludge from three different wastewater treatment plants. For this, we employed metaproteomics, whole metagenome, and 16S rRNA amplicon sequencing to study the same granule material with uniform size. We furthermore compare the taxonomic profiles using the Genome Taxonomy Database (GTDB) to enhance the comparability between the different approaches. Though the major taxonomies were consistently identified in the different aerobic granular sludge samples, the taxonomic composition obtained by the different omics techniques varied significantly at the lower taxonomic levels, which impacts the interpretation of the nutrient removal processes. Nevertheless, as demonstrated by metaproteomics, the genera that were consistently identified in all techniques cover the majority of the protein biomass. The established metaomics data and the contig classification pipeline are publicly available, which provides a valuable resource for further studies on metabolic processes in aerobic granular sludge.
Collapse
Affiliation(s)
- Hugo B C Kleikamp
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands.
| | | | - Pim Schaasberg
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Ramon van Valderen
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Ramon van der Zwaan
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Roel van de Wijgaart
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Yuemei Lin
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Ben Abbas
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | - Mario Pronk
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands
| | | | - Martin Pabst
- Department of Biotechnology, Delft University of Technology, Delft, the Netherlands.
| |
Collapse
|
3
|
Lee EM, Srinivasan S, Purvine SO, Fiedler TL, Leiser OP, Proll SC, Minot SS, Deatherage Kaiser BL, Fredricks DN. Optimizing metaproteomics database construction: lessons from a study of the vaginal microbiome. mSystems 2023; 8:e0067822. [PMID: 37350639 PMCID: PMC10469846 DOI: 10.1128/msystems.00678-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 04/06/2023] [Indexed: 06/24/2023] Open
Abstract
Metaproteomics, a method for untargeted, high-throughput identification of proteins in complex samples, provides functional information about microbial communities and can tie functions to specific taxa. Metaproteomics often generates less data than other omics techniques, but analytical workflows can be improved to increase usable data in metaproteomic outputs. Identification of peptides in the metaproteomic analysis is performed by comparing mass spectra of sample peptides to a reference database of protein sequences. Although these protein databases are an integral part of the metaproteomic analysis, few studies have explored how database composition impacts peptide identification. Here, we used cervicovaginal lavage (CVL) samples from a study of bacterial vaginosis (BV) to compare the performance of databases built using six different strategies. We evaluated broad versus sample-matched databases, as well as databases populated with proteins translated from metagenomic sequencing of the same samples versus sequences from public repositories. Smaller sample-matched databases performed significantly better, driven by the statistical constraints on large databases. Additionally, large databases attributed up to 34% of significant bacterial hits to taxa absent from the sample, as determined orthogonally by 16S rRNA gene sequencing. We also tested a set of hybrid databases which included bacterial proteins from NCBI RefSeq and translated bacterial genes from the samples. These hybrid databases had the best overall performance, identifying 1,068 unique human and 1,418 unique bacterial proteins, ~30% more than a database populated with proteins from typical vaginal bacteria and fungi. Our findings can help guide the optimal identification of proteins while maintaining statistical power for reaching biological conclusions. IMPORTANCE Metaproteomic analysis can provide valuable insights into the functions of microbial and cellular communities by identifying a broad, untargeted set of proteins. The databases used in the analysis of metaproteomic data influence results by defining what proteins can be identified. Moreover, the size of the database impacts the number of identifications after accounting for false discovery rates (FDRs). Few studies have tested the performance of different strategies for building a protein database to identify proteins from metaproteomic data and those that have largely focused on highly diverse microbial communities. We tested a range of databases on CVL samples and found that a hybrid sample-matched approach, using publicly available proteins from organisms present in the samples, as well as proteins translated from metagenomic sequencing of the samples, had the best performance. However, our results also suggest that public sequence databases will continue to improve as more bacterial genomes are published.
Collapse
Affiliation(s)
- Elliot M. Lee
- Fred Hutchinson Cancer Research Center, Seattle, Washington, DC, USA
- University of Washington, Seattle, Washington, DC, USA
| | | | - Samuel O. Purvine
- Pacific Northwest National Laboratory, Richland, Washington, DC, USA
| | - Tina L. Fiedler
- Fred Hutchinson Cancer Research Center, Seattle, Washington, DC, USA
| | - Owen P. Leiser
- Pacific Northwest National Laboratory, Richland, Washington, DC, USA
| | - Sean C. Proll
- Fred Hutchinson Cancer Research Center, Seattle, Washington, DC, USA
| | - Samuel S. Minot
- Fred Hutchinson Cancer Research Center, Seattle, Washington, DC, USA
| | | | - David N. Fredricks
- Fred Hutchinson Cancer Research Center, Seattle, Washington, DC, USA
- University of Washington, Seattle, Washington, DC, USA
| |
Collapse
|
4
|
Blakeley-Ruiz JA, Kleiner M. Considerations for Constructing a Protein Sequence Database for Metaproteomics. Comput Struct Biotechnol J 2022; 20:937-952. [PMID: 35242286 PMCID: PMC8861567 DOI: 10.1016/j.csbj.2022.01.018] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 01/14/2022] [Accepted: 01/18/2022] [Indexed: 12/14/2022] Open
Abstract
Mass spectrometry-based metaproteomics has emerged as a prominent technique for interrogating the functions of specific organisms in microbial communities, in addition to total community function. Identifying proteins by mass spectrometry requires matching mass spectra of fragmented peptide ions to a database of protein sequences corresponding to the proteins in the sample. This sequence database determines which protein sequences can be identified from the measurement, and as such the taxonomic and functional information that can be inferred from a metaproteomics measurement. Thus, the construction of the protein sequence database directly impacts the outcome of any metaproteomics study. Several factors, such as source of sequence information and database curation, need to be considered during database construction to maximize accurate protein identifications traceable to the species of origin. In this review, we provide an overview of existing strategies for database construction and the relevant studies that have sought to test and validate these strategies. Based on this review of the literature and our experience we provide a decision tree and best practices for choosing and implementing database construction strategies.
Collapse
Affiliation(s)
- J. Alfredo Blakeley-Ruiz
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA
- Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Corresponding authors at: Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA.
| | - Manuel Kleiner
- Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA
- Corresponding authors at: Department of Plant and Microbial Biology, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
5
|
Jouffret V, Miotello G, Culotta K, Ayrault S, Pible O, Armengaud J. Increasing the power of interpretation for soil metaproteomics data. Microbiome 2021; 9:195. [PMID: 34587999 PMCID: PMC8482631 DOI: 10.1186/s40168-021-01139-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 07/29/2021] [Indexed: 05/07/2023]
Abstract
BACKGROUND Soil and sediment microorganisms are highly phylogenetically diverse but are currently largely under-represented in public molecular databases. Their functional characterization by means of metaproteomics is usually performed using metagenomic sequences acquired for the same sample. However, such hugely diverse metagenomic datasets are difficult to assemble; in parallel, theoretical proteomes from isolates available in generic databases are of high quality. Both these factors advocate for the use of theoretical proteomes in metaproteomics interpretation pipelines. Here, we examined a number of database construction strategies with a view to increasing the outputs of metaproteomics studies performed on soil samples. RESULTS The number of peptide-spectrum matches was found to be of comparable magnitude when using public or sample-specific metagenomics-derived databases. However, numbers were significantly increased when a combination of both types of information was used in a two-step cascaded search. Our data also indicate that the functional annotation of the metaproteomics dataset can be maximized by using a combination of both types of databases. CONCLUSIONS A two-step strategy combining sample-specific metagenome database and public databases such as the non-redundant NCBI database and a massive soil gene catalog allows maximizing the metaproteomic interpretation both in terms of ratio of assigned spectra and retrieval of function-derived information. Video abstract.
Collapse
Affiliation(s)
- Virginie Jouffret
- Université Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé (DMTS), SPI, F-30200, Bagnols-sur-Cèze, France
- Laboratoire des Sciences et de l'Environnement (LSCE-IPSL), UMR 8212 (CEA/CNRS/UVSQ), CEA Saclay, Université Paris-Saclay, Orme des Merisiers, F-91191, Gif-sur-Yvette, France
- Laboratoire Innovations technologiques pour la Détection et le Diagnostic (Li2D), Université de Montpellier, F-30207, Bagnols-sur-Cèze, France
| | - Guylaine Miotello
- Université Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé (DMTS), SPI, F-30200, Bagnols-sur-Cèze, France
| | - Karen Culotta
- Université Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé (DMTS), SPI, F-30200, Bagnols-sur-Cèze, France
| | - Sophie Ayrault
- Laboratoire des Sciences et de l'Environnement (LSCE-IPSL), UMR 8212 (CEA/CNRS/UVSQ), CEA Saclay, Université Paris-Saclay, Orme des Merisiers, F-91191, Gif-sur-Yvette, France
| | - Olivier Pible
- Université Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé (DMTS), SPI, F-30200, Bagnols-sur-Cèze, France
| | - Jean Armengaud
- Université Paris-Saclay, CEA, INRAE, Département Médicaments et Technologies pour la Santé (DMTS), SPI, F-30200, Bagnols-sur-Cèze, France.
| |
Collapse
|
6
|
Saito MA, Saunders JK, Chagnon M, Gaylord DA, Shepherd A, Held NA, Dupont C, Symmonds N, York A, Charron M, Kinkade DB. Development of an Ocean Protein Portal for Interactive Discovery and Education. J Proteome Res 2020; 20:326-336. [PMID: 32897077 DOI: 10.1021/acs.jproteome.0c00382] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Proteins are critical in catalyzing chemical reactions, forming key cellular structures, and in regulating cellular processes. Investigation of marine microbial proteins by metaproteomics methods enables the discovery of numerous aspects of microbial biogeochemical processes. However, these datasets present big data challenges as they often involve many samples collected across broad geospatial and temporal scales, resulting in thousands of protein identifications, abundances, and corresponding annotation information. The Ocean Protein Portal (OPP) was created to enable data sharing and discovery among multiple scientific domains and serve both research and education functions. The portal focuses on three use case questions: "Where is my protein of interest?", "Who makes it?", and "How much is there?" and provides profile and section visualizations, real-time taxonomic analysis, and links to metadata, sequence analysis, and other external resources to enable connections to be made between biogeochemical and proteomics datasets.
Collapse
Affiliation(s)
- Mak A Saito
- Woods Hole Oceanographic Institution, Woods Hole, Falmouth, Massachusetts 02543, United States
| | - Jaclyn K Saunders
- Woods Hole Oceanographic Institution, Woods Hole, Falmouth, Massachusetts 02543, United States
| | - Michael Chagnon
- RPS Group, South Kingston, Rhode Island 02879, United States.,Kaimika Technology, Cumberland, Rhode Island 02864, United States
| | - David A Gaylord
- Woods Hole Oceanographic Institution, Woods Hole, Falmouth, Massachusetts 02543, United States
| | - Adam Shepherd
- Woods Hole Oceanographic Institution, Woods Hole, Falmouth, Massachusetts 02543, United States
| | - Noelle A Held
- Woods Hole Oceanographic Institution, Woods Hole, Falmouth, Massachusetts 02543, United States
| | - Christopher Dupont
- Woods Hole Oceanographic Institute, Falmouth, Massachusetts 02543, United States
| | - Nicholas Symmonds
- Woods Hole Oceanographic Institution, Woods Hole, Falmouth, Massachusetts 02543, United States
| | - Amber York
- Woods Hole Oceanographic Institution, Woods Hole, Falmouth, Massachusetts 02543, United States
| | - Matthew Charron
- Kaimika Technology, Cumberland, Rhode Island 02864, United States
| | - Danie B Kinkade
- Woods Hole Oceanographic Institute, Falmouth, Massachusetts 02543, United States
| |
Collapse
|
7
|
Saunders JK, Gaylord DA, Held NA, Symmonds N, Dupont CL, Shepherd A, Kinkade DB, Saito MA. METATRYP v 2.0: Metaproteomic Least Common Ancestor Analysis for Taxonomic Inference Using Specialized Sequence Assemblies-Standalone Software and Web Servers for Marine Microorganisms and Coronaviruses. J Proteome Res 2020; 19:4718-4729. [PMID: 32897080 PMCID: PMC7640959 DOI: 10.1021/acs.jproteome.0c00385] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Indexed: 12/30/2022]
Abstract
We present METATRYP version 2 software that identifies shared peptides across the predicted proteomes of organisms within environmental metaproteomics studies to enable accurate taxonomic attribution of peptides during protein inference. Improvements include ingestion of complex sequence assembly data categories (metagenomic and metatranscriptomic assemblies, single cell amplified genomes, and metagenome assembled genomes), prediction of the least common ancestor (LCA) for a peptide shared across multiple organisms, increased performance through updates to the backend architecture, and development of a web portal (https://metatryp.whoi.edu). Major expansion of the marine METATRYP database with predicted proteomes from environmental sequencing confirms a low occurrence of shared tryptic peptides among disparate marine microorganisms, implying tractability for targeted metaproteomics. METATRYP was designed to facilitate ocean metaproteomics and has been integrated into the Ocean Protein Portal (https://oceanproteinportal.org); however, it can be readily applied to other domains. We describe the rapid deployment of a coronavirus-specific web portal (https://metatryp-coronavirus.whoi.edu/) to aid in use of proteomics on coronavirus research during the ongoing pandemic. A coronavirus-focused METATRYP database identified potential SARS-CoV-2 peptide biomarkers and indicated very few shared tryptic peptides between SARS-CoV-2 and other disparate taxa analyzed, sharing <1% peptides with taxa outside of the betacoronavirus group, establishing that taxonomic specificity is achievable using tryptic peptide-based proteomic diagnostic approaches.
Collapse
Affiliation(s)
- Jaclyn K. Saunders
- Woods
Hole Oceanographic Institution, 266 Woods Hole Road Mailstop #51, Woods Hole, Massachusetts 02543, United States
| | - David A. Gaylord
- Woods
Hole Oceanographic Institution, 266 Woods Hole Road Mailstop #51, Woods Hole, Massachusetts 02543, United States
| | - Noelle A. Held
- Woods
Hole Oceanographic Institution, 266 Woods Hole Road Mailstop #51, Woods Hole, Massachusetts 02543, United States
| | - Nicholas Symmonds
- Woods
Hole Oceanographic Institution, 266 Woods Hole Road Mailstop #51, Woods Hole, Massachusetts 02543, United States
| | | | - Adam Shepherd
- Woods
Hole Oceanographic Institution, 266 Woods Hole Road Mailstop #51, Woods Hole, Massachusetts 02543, United States
| | - Danie B. Kinkade
- Woods
Hole Oceanographic Institution, 266 Woods Hole Road Mailstop #51, Woods Hole, Massachusetts 02543, United States
| | - Mak A. Saito
- Woods
Hole Oceanographic Institution, 266 Woods Hole Road Mailstop #51, Woods Hole, Massachusetts 02543, United States
| |
Collapse
|
8
|
Djemiel C, Goulas E, Badalato N, Chabbert B, Hawkins S, Grec S. Targeted Metagenomics of Retting in Flax: The Beginning of the Quest to Harness the Secret Powers of the Microbiota. Front Genet 2020; 11:581664. [PMID: 33193706 PMCID: PMC7652851 DOI: 10.3389/fgene.2020.581664] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 09/21/2020] [Indexed: 12/13/2022] Open
Abstract
The mechanical and chemical properties of natural plant fibers are determined by many different factors, both intrinsic and extrinsic to the plant, during growth but also after harvest. A better understanding of how all these factors exert their effect and how they interact is necessary to be able to optimize fiber quality for use in different industries. One important factor is the post-harvest process known as retting, representing the first step in the extraction of bast fibers from the stem of species such as flax and hemp. During this process microorganisms colonize the stem and produce hydrolytic enzymes that target cell wall polymers thereby facilitating the progressive destruction of the stem and fiber bundles. Recent advances in sequencing technology have allowed researchers to implement targeted metagenomics leading to a much better characterization of the microbial communities involved in retting, as well as an improved understanding of microbial dynamics. In this paper we review how our current knowledge of the microbiology of retting has been improved by targeted metagenomics and discuss how related '-omics' approaches might be used to fully characterize the functional capability of the retting microbiome.
Collapse
Affiliation(s)
- Christophe Djemiel
- Univ. Lille, CNRS, UMR 8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | - Estelle Goulas
- Univ. Lille, CNRS, UMR 8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | - Nelly Badalato
- Univ. Lille, CNRS, UMR 8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | - Brigitte Chabbert
- Université de Reims Champagne Ardenne, INRAE, UMR FARE A 614, Reims, France
| | - Simon Hawkins
- Univ. Lille, CNRS, UMR 8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| | - Sébastien Grec
- Univ. Lille, CNRS, UMR 8576 - UGSF - Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France
| |
Collapse
|
9
|
O'Bryon I, Jenson SC, Merkley ED. Flying blind, or just flying under the radar? The underappreciated power of de novo methods of mass spectrometric peptide identification. Protein Sci 2020; 29:1864-1878. [PMID: 32713088 PMCID: PMC7454419 DOI: 10.1002/pro.3919] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/21/2020] [Accepted: 07/23/2020] [Indexed: 12/15/2022]
Abstract
Mass spectrometry-based proteomics is a popular and powerful method for precise and highly multiplexed protein identification. The most common method of analyzing untargeted proteomics data is called database searching, where the database is simply a collection of protein sequences from the target organism, derived from genome sequencing. Experimental peptide tandem mass spectra are compared to simplified models of theoretical spectra calculated from the translated genomic sequences. However, in several interesting application areas, such as forensics, archaeology, venomics, and others, a genome sequence may not be available, or the correct genome sequence to use is not known. In these cases, de novo peptide identification can play an important role. De novo methods infer peptide sequence directly from the tandem mass spectrum without reference to a sequence database, usually using graph-based or machine learning algorithms. In this review, we provide a basic overview of de novo peptide identification methods and applications, briefly covering de novo algorithms and tools, and focusing in more depth on recent applications from venomics, metaproteomics, forensics, and characterization of antibody drugs.
Collapse
Affiliation(s)
- Isabelle O'Bryon
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Sarah C. Jenson
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Eric D. Merkley
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| |
Collapse
|
10
|
Kumar P, Johnson JE, Easterly C, Mehta S, Sajulga R, Nunn B, Jagtap PD, Griffin TJ. A Sectioning and Database Enrichment Approach for Improved Peptide Spectrum Matching in Large, Genome-Guided Protein Sequence Databases. J Proteome Res 2020; 19:2772-2785. [DOI: 10.1021/acs.jproteome.0c00260] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Praveen Kumar
- Bioinformatics and Computational Biology, University of Minnesota−Rochester, Rochester, Minnesota 55904, United States
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - James E. Johnson
- Minnesota Supercomputing Institute, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Caleb Easterly
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Subina Mehta
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Ray Sajulga
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Brook Nunn
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Pratik D. Jagtap
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| | - Timothy J. Griffin
- Biochemistry Molecular Biology and Biophysics, University of Minnesota−Twin Cities, Minneapolis, Minnesota 55455, United States
| |
Collapse
|
11
|
Johnson RS, Searle BC, Nunn BL, Gilmore JM, Phillips M, Amemiya CT, Heck M, MacCoss MJ. Assessing Protein Sequence Database Suitability Using De Novo Sequencing. Mol Cell Proteomics 2020; 19:198-208. [PMID: 31732549 PMCID: PMC6944239 DOI: 10.1074/mcp.tir119.001752] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 10/31/2019] [Indexed: 11/06/2022] Open
Abstract
The analysis of samples from unsequenced and/or understudied species as well as samples where the proteome is derived from multiple organisms poses two key questions. The first is whether the proteomic data obtained from an unusual sample type even contains peptide tandem mass spectra. The second question is whether an appropriate protein sequence database is available for proteomic searches. We describe the use of automated de novo sequencing for evaluating both the quality of a collection of tandem mass spectra and the suitability of a given protein sequence database for searching that data. Applications of this method include the proteome analysis of closely related species, metaproteomics, and proteomics of extinct organisms.
Collapse
Affiliation(s)
- Richard S Johnson
- Department of Genome Sciences, University of Washington, Seattle, Washington.
| | - Brian C Searle
- Institute for Systems Biology, Seattle, Washington; Proteome Software, Portland, Oregon
| | - Brook L Nunn
- Department of Genome Sciences, University of Washington, Seattle, Washington
| | - Jason M Gilmore
- Department of Genome Sciences, University of Washington, Seattle, Washington
| | - Molly Phillips
- Department of Biology, University of Washington, Seattle, Washington; School of Natural Sciences, University of California, Merced, California
| | - Chris T Amemiya
- School of Natural Sciences, University of California, Merced, California
| | - Michelle Heck
- United States Department of Agriculture, Agricultural Research Service, Ithaca, New York
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington, Seattle, Washington
| |
Collapse
|
12
|
Géron A, Werner J, Wattiez R, Lebaron P, Matallana-Surget S. Deciphering the Functioning of Microbial Communities: Shedding Light on the Critical Steps in Metaproteomics. Front Microbiol 2019; 10:2395. [PMID: 31708885 PMCID: PMC6821674 DOI: 10.3389/fmicb.2019.02395] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 10/03/2019] [Indexed: 11/13/2022] Open
Abstract
Unraveling the complex structure and functioning of microbial communities is essential to accurately predict the impact of perturbations and/or environmental changes. From all molecular tools available today to resolve the dynamics of microbial communities, metaproteomics stands out, allowing the establishment of phenotype-genotype linkages. Despite its rapid development, this technology has faced many technical challenges that still hamper its potential power. How to maximize the number of protein identification, improve quality of protein annotation, and provide reliable ecological interpretation are questions of immediate urgency. In our study, we used a robust metaproteomic workflow combining two protein fractionation approaches (gel-based versus gel-free) and four protein search databases derived from the same metagenome to analyze the same seawater sample. The resulting eight metaproteomes provided different outcomes in terms of (i) total protein numbers, (ii) taxonomic structures, and (iii) protein functions. The characterization and/or representativeness of numerous proteins from ecologically relevant taxa such as Pelagibacterales, Rhodobacterales, and Synechococcales, as well as crucial environmental processes, such as nutrient uptake, nitrogen assimilation, light harvesting, and oxidative stress response, were found to be particularly affected by the methodology. Our results provide clear evidences that the use of different protein search databases significantly alters the biological conclusions in both gel-free and gel-based approaches. Our findings emphasize the importance of diversifying the experimental workflow for a comprehensive metaproteomic study.
Collapse
Affiliation(s)
- Augustin Géron
- Division of Biological and Environmental Sciences, Faculty of Natural Sciences, University of Stirling, Stirling, United Kingdom
- Department of Proteomic and Microbiology, University of Mons, Mons, Belgium
| | - Johannes Werner
- Department of Biological Oceanography, Leibniz Institute for Baltic Sea Research, Rostock, Germany
| | - Ruddy Wattiez
- Department of Proteomic and Microbiology, University of Mons, Mons, Belgium
| | - Philippe Lebaron
- Sorbonne Universités, UPMC Université Paris 06, USR 3579, LBBM, Observatoire Océanologique, Banyuls-sur-Mer, France
| | - Sabine Matallana-Surget
- Division of Biological and Environmental Sciences, Faculty of Natural Sciences, University of Stirling, Stirling, United Kingdom
| |
Collapse
|
13
|
Mikan MP, Harvey HR, Timmins-Schiffman E, Riffle M, May DH, Salter I, Noble WS, Nunn BL. Metaproteomics reveal that rapid perturbations in organic matter prioritize functional restructuring over taxonomy in western Arctic Ocean microbiomes. ISME J 2020; 14:39-52. [PMID: 31492961 DOI: 10.1038/s41396-019-0503-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 07/31/2019] [Accepted: 08/06/2019] [Indexed: 02/05/2023]
Abstract
We examined metaproteome profiles from two Arctic microbiomes during 10-day shipboard incubations to directly track early functional and taxonomic responses to a simulated algal bloom and an oligotrophic control. Using a novel peptide-based enrichment analysis, significant changes (p-value < 0.01) in biological and molecular functions associated with carbon and nitrogen recycling were observed. Within the first day under both organic matter conditions, Bering Strait surface microbiomes increased protein synthesis, carbohydrate degradation, and cellular redox processes while decreasing C1 metabolism. Taxonomic assignments revealed that the core microbiome collectively responded to algal substrates by assimilating carbon before select taxa utilize and metabolize nitrogen intracellularly. Incubations of Chukchi Sea bottom water microbiomes showed similar, but delayed functional responses to identical treatments. Although 24 functional terms were shared between experimental treatments, the timing, and degree of the remaining responses were highly variable, showing that organic matter perturbation directs community functionality prior to alterations to the taxonomic distribution at the microbiome class level. The dynamic responses of these two oceanic microbial communities have important implications for timing and magnitude of responses to organic perturbations within the Arctic Ocean and how community-level functions may forecast biogeochemical gradients in oceans.
Collapse
|
14
|
Heyer R, Schallert K, Büdel A, Zoun R, Dorl S, Behne A, Kohrs F, Püttker S, Siewert C, Muth T, Saake G, Reichl U, Benndorf D. A Robust and Universal Metaproteomics Workflow for Research Studies and Routine Diagnostics Within 24 h Using Phenol Extraction, FASP Digest, and the MetaProteomeAnalyzer. Front Microbiol 2019; 10:1883. [PMID: 31474963 PMCID: PMC6707425 DOI: 10.3389/fmicb.2019.01883] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Accepted: 07/30/2019] [Indexed: 01/29/2023] Open
Abstract
The investigation of microbial proteins by mass spectrometry (metaproteomics) is a key technology for simultaneously assessing the taxonomic composition and the functionality of microbial communities in medical, environmental, and biotechnological applications. We present an improved metaproteomics workflow using an updated sample preparation and a new version of the MetaProteomeAnalyzer software for data analysis. High resolution by multidimensional separation (GeLC, MudPIT) was sacrificed to aim at fast analysis of a broad range of different samples in less than 24 h. The improved workflow generated at least two times as many protein identifications than our previous workflow, and a drastic increase of taxonomic and functional annotations. Improvements of all aspects of the workflow, particularly the speed, are first steps toward potential routine clinical diagnostics (i.e., fecal samples) and analysis of technical and environmental samples. The MetaProteomeAnalyzer is provided to the scientific community as a central remote server solution at www.mpa.ovgu.de.
Collapse
Affiliation(s)
- Robert Heyer
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Kay Schallert
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Anja Büdel
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Roman Zoun
- Database Research Group, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Sebastian Dorl
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg, Austria
| | | | - Fabian Kohrs
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Sebastian Püttker
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Corina Siewert
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems Magdeburg, Magdeburg, Germany
| | - Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Gunter Saake
- Database Research Group, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Udo Reichl
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems Magdeburg, Magdeburg, Germany
| | - Dirk Benndorf
- Bioprocess Engineering, Otto von Guericke University Magdeburg, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems Magdeburg, Magdeburg, Germany
| |
Collapse
|
15
|
Li S, Tang H, Ye Y. A Meta-proteogenomic Approach to Peptide Identification Incorporating Assembly Uncertainty and Genomic Variation. Mol Cell Proteomics 2019; 18:S183-S192. [PMID: 31142575 PMCID: PMC6692780 DOI: 10.1074/mcp.tir118.001233] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 04/25/2019] [Indexed: 01/07/2023] Open
Abstract
Matching metagenomic and/or metatranscriptomic data, currently often under-used, can be useful reference for metaproteomic tandem mass spectra (MS/MS) data analysis. Here we developed a software pipeline for identification of peptides and proteins from metaproteomic MS/MS data using proteins derived from matching metagenomic (and metatranscriptomic) data as the search database, based on two novel approaches Graph2Pro (published) and Var2Pep (new). Graph2Pro retains and uses uncertainties of metagenome assembly for reference-based MS/MS data analysis. Var2Pep considers the variations found in metagenomic/metatranscriptomic sequencing reads that are not retained in the assemblies (contigs). The new software pipeline provides one stop application of both tools, and it supports the use of metagenome assembly from commonly used assemblers including MegaHit and metaSPAdes. When tested on two collections of multi-omic microbiome data sets, our pipeline significantly improved the identification rate of the metaproteomic MS/MS spectra by about two folds, comparing to conventional contig- or read-based approaches (the Var2Pep alone identified 5.6% to 24.1% more unique peptides, depending on the data set). We also showed that identified variant peptides are important for functional profiling of microbiomes. All results suggested that it is important to take into consideration of the assembly uncertainties and genomic variants to facilitate metaproteomic MS/MS data interpretation.
Collapse
Affiliation(s)
- Sujun Li
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN
| | - Haixu Tang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN
| | - Yuzhen Ye
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN.
| |
Collapse
|
16
|
Levi Mortera S, Soggiu A, Vernocchi P, Del Chierico F, Piras C, Carsetti R, Marzano V, Britti D, Urbani A, Roncada P, Putignani L. Metaproteomic investigation to assess gut microbiota shaping in newborn mice: A combined taxonomic, functional and quantitative approach. J Proteomics 2019; 203:103378. [PMID: 31102759 DOI: 10.1016/j.jprot.2019.103378] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 04/23/2019] [Accepted: 05/13/2019] [Indexed: 12/16/2022]
Abstract
Breastfeeding is nowadays known to be one of the most critical factors contributing to the development of an efficient immune system. In the last decade, a consistent number of pieces of evidence demonstrated the relationship between a healthy organism and its gut microbiota. However, this link is still not fully understood and requires further investigation. We recently adopted a murine model to describe the impact of either maternal milk or parental genetic background, on the composition of the gut microbial population in the first weeks of life. A metaproteomic approach to such complex environments is a big challenge that requires a strong effort in both data production and analysis, including the set-up of dedicated multitasking bioinformatics pipelines. Herein we present an LC-MS/MS based investigation to monitor mouse gut microbiota in the early life, aiming at characterizing its functions and metabolic activities together with a taxonomic description in terms of operational taxonomic units. We provided a quantitative evaluation of bacterial metaproteins, taking into account differential expression results in relation to the functional and taxonomic classification, particularly with proteins from orthologues groups. This allowed the reduction of the bias arising from the presence of a high number of shared peptides, and proteins, among different bacterial species. We also focused on host mucosal proteome and its modulation, according to different microbiota composition. SIGNIFICANCE: This paper would represent a reference work for investigations on gut microbiota in early life, from both a microbiological and a functional proteomic point of view. We focused on the shaping of the mouse gut microbiota in dependence on the feeding modality, defining a reliable taxonomic description, highlighting some functional characteristics of the microbial community, and performing a first quantitative evaluation by data independent analysis in metaproteomics.
Collapse
Affiliation(s)
| | - Alessio Soggiu
- Department of Veterinary Medicine, University of Milan, Milan, Italy
| | - Pamela Vernocchi
- Human Microbiome Unit, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | | | - Cristian Piras
- Department of Veterinary Medicine, University of Milan, Milan, Italy
| | - Rita Carsetti
- B cell Pathophysiology Unit, Immunology Research Area and Unit of Diagnostic Immunology, Department of Laboratories, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Valeria Marzano
- Human Microbiome Unit, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy
| | - Domenico Britti
- C.I.S. - Interdepartmental Services Centre of Veterinary for Human and Animal Health, University of Catanzaro "Magna Græcia", Catanzaro, Italy.; Department of Health Sciences, University of Catanzaro "Magna Græcia", Catanzaro, Italy
| | - Andrea Urbani
- Catholic University of Sacred Heart, Rome, Italy; Fondazione Policlinico Universitario A. Gemelli, IRCCS, Rome, Italy
| | - Paola Roncada
- Department of Health Sciences, University of Catanzaro "Magna Græcia", Catanzaro, Italy
| | - Lorenza Putignani
- Parasitology Unit and Human Microbiome Unit, Bambino Gesù Children's Hospital, IRCCS, Rome, Italy.
| |
Collapse
|
17
|
Schiebenhoefer H, Van Den Bossche T, Fuchs S, Renard BY, Muth T, Martens L. Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis. Expert Rev Proteomics 2019; 16:375-390. [PMID: 31002542 DOI: 10.1080/14789450.2019.1609944] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
INTRODUCTION The study of microbial communities based on the combined analysis of genomic and proteomic data - called metaproteogenomics - has gained increased research attention in recent years. This relatively young field aims to elucidate the functional and taxonomic interplay of proteins in microbiomes and its implications on human health and the environment. Areas covered: This article reviews bioinformatics methods and software tools dedicated to the analysis of data from metaproteomics and metaproteogenomics experiments. In particular, it focuses on the creation of tailored protein sequence databases, on the optimal use of database search algorithms including methods of error rate estimation, and finally on taxonomic and functional annotation of peptide and protein identifications. Expert opinion: Recently, various promising strategies and software tools have been proposed for handling typical data analysis issues in metaproteomics. However, severe challenges remain that are highlighted and discussed in this article; these include: (i) robust false-positive assessment of peptide and protein identifications, (ii) complex protein inference against a background of highly redundant data, (iii) taxonomic and functional post-processing of identification data, and finally, (iv) the assessment and provision of metrics and tools for quantitative analysis.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Tim Van Den Bossche
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| | - Stephan Fuchs
- d FG13 Division of Nosocomial Pathogens and Antibiotic Resistances , Robert Koch Institute , Wernigerode , Germany
| | - Bernhard Y Renard
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Thilo Muth
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Lennart Martens
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| |
Collapse
|
18
|
Thorn CE, Bergesch C, Joyce A, Sambrano G, McDonnell K, Brennan F, Heyer R, Benndorf D, Abram F. A robust, cost-effective method for DNA, RNA and protein co-extraction from soil, other complex microbiomes and pure cultures. Mol Ecol Resour 2019; 19:439-455. [PMID: 30565880 DOI: 10.1111/1755-0998.12979] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Revised: 12/03/2018] [Accepted: 12/06/2018] [Indexed: 11/29/2022]
Abstract
The soil microbiome is inherently complex with high biological diversity, and spatial heterogeneity typically occurring on the submillimetre scale. To study the microbial ecology of soils, and other microbiomes, biomolecules, that is, nucleic acids and proteins, must be efficiently and reliably co-recovered from the same biological samples. Commercial kits are currently available for the co-extraction of DNA, RNA and proteins but none has been developed for soil samples. We present a new protocol drawing on existing phenol-chloroform-based methods for nucleic acids co-extraction but incorporating targeted precipitation of proteins from the phenol phase. The protocol is cost-effective and robust, and easily implemented using reagents commonly available in laboratories. The method is estimated to be eight times cheaper than using disparate commercial kits for the isolation of DNA and/or RNA, and proteins, from soil. The method is effective, providing good quality biomolecules from a diverse range of soil types, with clay contents varying from 9.5% to 35.1%, which we successfully used for downstream, high-throughput gene sequencing and metaproteomics. Additionally, we demonstrate that the protocol can also be easily implemented for biomolecule co-extraction from other complex microbiome samples, including cattle slurry and microbial communities recovered from anaerobic bioreactors, as well as from Gram-positive and Gram-negative pure cultures.
Collapse
Affiliation(s)
- Camilla E Thorn
- Functional Environmental Microbiology, School of Natural Sciences, National University of Ireland Galway, Galway, Ireland
| | - Christian Bergesch
- Functional Environmental Microbiology, School of Natural Sciences, National University of Ireland Galway, Galway, Ireland
| | - Aoife Joyce
- Functional Environmental Microbiology, School of Natural Sciences, National University of Ireland Galway, Galway, Ireland
| | - Gustavo Sambrano
- Functional Environmental Microbiology, School of Natural Sciences, National University of Ireland Galway, Galway, Ireland
| | - Kevin McDonnell
- Functional Environmental Microbiology, School of Natural Sciences, National University of Ireland Galway, Galway, Ireland
| | - Fiona Brennan
- Department of Environment, Soils and Land-use, Teagasc, Wexford, Ireland
| | - Robert Heyer
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Otto von Guericke University, Magdeburg, Germany
| | - Dirk Benndorf
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Otto von Guericke University, Magdeburg, Germany
| | - Florence Abram
- Functional Environmental Microbiology, School of Natural Sciences, National University of Ireland Galway, Galway, Ireland
| |
Collapse
|
19
|
Saito MA, Bertrand EM, Duffy ME, Gaylord DA, Held NA, Hervey WJ, Hettich RL, Jagtap PD, Janech MG, Kinkade DB, Leary DH, McIlvin MR, Moore EK, Morris RM, Neely BA, Nunn BL, Saunders JK, Shepherd AI, Symmonds NI, Walsh DA. Progress and Challenges in Ocean Metaproteomics and Proposed Best Practices for Data Sharing. J Proteome Res 2019; 18:1461-1476. [PMID: 30702898 DOI: 10.1021/acs.jproteome.8b00761] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Ocean metaproteomics is an emerging field enabling discoveries about marine microbial communities and their impact on global biogeochemical processes. Recent ocean metaproteomic studies have provided insight into microbial nutrient transport, colimitation of carbon fixation, the metabolism of microbial biofilms, and dynamics of carbon flux in marine ecosystems. Future methodological developments could provide new capabilities such as characterizing long-term ecosystem changes, biogeochemical reaction rates, and in situ stoichiometries. Yet challenges remain for ocean metaproteomics due to the great biological diversity that produces highly complex mass spectra, as well as the difficulty in obtaining and working with environmental samples. This review summarizes the progress and challenges facing ocean metaproteomic scientists and proposes best practices for data sharing of ocean metaproteomic data sets, including the data types and metadata needed to enable intercomparisons of protein distributions and annotations that could foster global ocean metaproteomic capabilities.
Collapse
Affiliation(s)
- Mak A Saito
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Erin M Bertrand
- Department of Biology , Dalhousie University , Halifax , Nova Scotia B3H 4R2 , Canada
| | - Megan E Duffy
- School of Oceanography , University of Washington , Seattle , Washington 98195-7940 , United States
| | - David A Gaylord
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Noelle A Held
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | | | - Robert L Hettich
- Oak Ridge National Laboratory and Microbiology Department , University of Tennessee , Knoxville , Tennessee 37996 , United States
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics , University of Minnesota , Saint Paul , Minnesota 55108 , United States
| | - Michael G Janech
- College of Charleston , Charleston , South Carolina 29424 , United States
| | - Danie B Kinkade
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Dagmar H Leary
- U.S. Naval Research Laboratory , Washington , D.C. 20375 , United States
| | - Matthew R McIlvin
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Eli K Moore
- Department of Environmental Science , Rowan University , Glassboro , New Jersey 08028 , United States
| | - Robert M Morris
- School of Oceanography , University of Washington , Seattle , Washington 98195-7940 , United States
| | - Benjamin A Neely
- National Institute of Standards and Technology , Charleston , South Carolina 29412 , United States
| | - Brook L Nunn
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - Jaclyn K Saunders
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States.,School of Oceanography , University of Washington , Seattle , Washington 98195-7940 , United States
| | - Adam I Shepherd
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Nicholas I Symmonds
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - David A Walsh
- Department of Biology , Concordia University , Montreal , Quebec H4B 1R6 , Canada
| |
Collapse
|
20
|
Rechenberger J, Samaras P, Jarzab A, Behr J, Frejno M, Djukovic A, Sanz J, González-Barberá EM, Salavert M, López-Hontangas JL, Xavier KB, Debrauwer L, Rolain JM, Sanz M, Garcia-Garcera M, Wilhelm M, Ubeda C, Kuster B. Challenges in Clinical Metaproteomics Highlighted by the Analysis of Acute Leukemia Patients with Gut Colonization by Multidrug-Resistant Enterobacteriaceae. Proteomes 2019; 7:2. [PMID: 30626002 DOI: 10.3390/proteomes7010002] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 12/20/2018] [Accepted: 01/03/2019] [Indexed: 12/25/2022] Open
Abstract
The microbiome has a strong impact on human health and disease and is, therefore, increasingly studied in a clinical context. Metaproteomics is also attracting considerable attention, and such data can be efficiently generated today owing to improvements in mass spectrometry-based proteomics. As we will discuss in this study, there are still major challenges notably in data analysis that need to be overcome. Here, we analyzed 212 fecal samples from 56 hospitalized acute leukemia patients with multidrug-resistant Enterobactericeae (MRE) gut colonization using metagenomics and metaproteomics. This is one of the largest clinical metaproteomic studies to date, and the first metaproteomic study addressing the gut microbiome in MRE colonized acute leukemia patients. Based on this substantial data set, we discuss major current limitations in clinical metaproteomic data analysis to provide guidance to researchers in the field. Notably, the results show that public metagenome databases are incomplete and that sample-specific metagenomes improve results. Furthermore, biological variation is tremendous which challenges clinical study designs and argues that longitudinal measurements of individual patients are a valuable future addition to the analysis of patient cohorts.
Collapse
|
21
|
Xie M, Wu J, An F, Yue X, Tao D, Wu R, Lee Y. An integrated metagenomic/metaproteomic investigation of microbiota in dajiang-meju, a traditional fermented soybean product in Northeast China. Food Res Int 2019; 115:414-424. [PMID: 30599960 DOI: 10.1016/j.foodres.2018.10.076] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 10/15/2018] [Accepted: 10/25/2018] [Indexed: 01/09/2023]
Abstract
Dajiang-meju have been used as major ingredients for the preparation of traditional spontaneously fermented soybean paste in Northeast China. In this work, we sequenced and analyzed the metagenome of 12 dajiang-meju samples. To complement the metagenome analysis, we analyzed the taxonomic and functional diversity of the microbiota by metaproteomics (LC-MS/MS). The analysis of metagenomic data revealed that the communities were primarily dominated by Enterobacter, Enterococcus, Leuconostoc, Lactobacillus, Citrobacter and Leclercia. Moreover, changes in the functional levels were monitored, and metaproteomic analysis revealed that most of the proteins were mainly expressed by members of Rhizopus, Penicillium and Geotrichum. The number of sequences allocated to fungi in the fermentation process decreased, whereas the number of sequences assigned to bacteria increased with time of fermentation. In addition, functional metagenomic profiling indicated that a series of sequences related to carbohydrates and amino acids metabolism were enriched. Additionally, enzymes associated with glycolysis metabolic pathways were presumed to contribute to the generation of flavor in dajiang-meju. Proteins from different dajiang-meju samples involved in global and overview maps, carbohydrate metabolism, nucleic acid metabolism and energy metabolism were differentially expressed. This information improves the understanding of microbial metabolic patterns with respect to the metaproteomes of dajiang-meju and provides a powerful tool for studying the fermentation process of soybean products.
Collapse
Affiliation(s)
- Mengxi Xie
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China
| | - Junrui Wu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China
| | - Feiyu An
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China
| | - Xiqing Yue
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China
| | - Dongbing Tao
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China
| | - Rina Wu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China.
| | - Yuankun Lee
- Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore, 117545 Singapore, Singapore.
| |
Collapse
|
22
|
Lin A, Howbert JJ, Noble WS. Combining High-Resolution and Exact Calibration To Boost Statistical Power: A Well-Calibrated Score Function for High-Resolution MS2 Data. J Proteome Res 2018; 17:3644-3656. [PMID: 30221945 DOI: 10.1021/acs.jproteome.8b00206] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
To achieve accurate assignment of peptide sequences to observed fragmentation spectra, a shotgun proteomics database search tool must make good use of the very high-resolution information produced by state-of-the-art mass spectrometers. However, making use of this information while also ensuring that the search engine's scores are well calibrated, that is, that the score assigned to one spectrum can be meaningfully compared to the score assigned to a different spectrum, has proven to be challenging. Here we describe a database search score function, the "residue evidence" (res-ev) score, that achieves both of these goals simultaneously. We also demonstrate how to combine calibrated res-ev scores with calibrated XCorr scores to produce a "combined p value" score function. We provide a benchmark consisting of four mass spectrometry data sets, which we use to compare the combined p value to the score functions used by several existing search engines. Our results suggest that the combined p value achieves state-of-the-art performance, generally outperforming MS Amanda and Morpheus and performing comparably to MS-GF+. The res-ev and combined p-value score functions are freely available as part of the Tide search engine in the Crux mass spectrometry toolkit ( http://crux.ms ).
Collapse
Affiliation(s)
- Andy Lin
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - J Jeffry Howbert
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - William Stafford Noble
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States.,Department of Computer Science and Engineering , University of Washington , Seattle , Washington 98195 , United States
| |
Collapse
|
23
|
Jarman KH, Heller NC, Jenson SC, Hutchison JR, Kaiser BLD, Payne SH, Wunschel DS, Merkley ED. Proteomics Goes to Court: A Statistical Foundation for Forensic Toxin/Organism Identification Using Bottom-Up Proteomics. J Proteome Res 2018; 17:3075-3085. [DOI: 10.1021/acs.jproteome.8b00212] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Affiliation(s)
- Kristin H. Jarman
- Applied Statistics and Computational Modeling Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Natalie C. Heller
- Applied Statistics and Computational Modeling Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Sarah C. Jenson
- Chemical and Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Janine R. Hutchison
- Chemical and Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Brooke L. Deatherage Kaiser
- Chemical and Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Samuel H. Payne
- Biological Sciences Division, Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - David S. Wunschel
- Chemical and Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Eric D. Merkley
- Chemical and Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
24
|
Xiao J, Tanca A, Jia B, Yang R, Wang B, Zhang Y, Li J. Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis. J Proteome Res 2018; 17:1596-1605. [DOI: 10.1021/acs.jproteome.7b00894] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Jinqiu Xiao
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China
| | - Alessandro Tanca
- Porto Conte Ricerche, Science and Technology Park of Sardinia, Tramariglio, Alghero, Italy
| | - Ben Jia
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China
| | - Runqing Yang
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, People’s Republic of China
| | - Bo Wang
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China
| | - Yu Zhang
- Institute of Oceanography, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China
| | - Jing Li
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, People’s Republic of China
| |
Collapse
|
25
|
Blank C, Easterly C, Gruening B, Johnson J, Kolmeder CA, Kumar P, May D, Mehta S, Mesuere B, Brown Z, Elias JE, Hervey WJ, McGowan T, Muth T, Nunn B, Rudney J, Tanca A, Griffin TJ, Jagtap PD. Disseminating Metaproteomic Informatics Capabilities and Knowledge Using the Galaxy-P Framework. Proteomes 2018; 6:proteomes6010007. [PMID: 29385081 PMCID: PMC5874766 DOI: 10.3390/proteomes6010007] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 01/26/2018] [Accepted: 01/26/2018] [Indexed: 01/12/2023] Open
Abstract
The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; (c) taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics "Contribution Fest" undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software.
Collapse
Affiliation(s)
- Clemens Blank
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg im Breisgau, Germany.
| | - Caleb Easterly
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Bjoern Gruening
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg im Breisgau, Germany.
| | - James Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Carolin A Kolmeder
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland.
| | - Praveen Kumar
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Damon May
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| | - Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Bart Mesuere
- Computational Biology Group, Ghent University, Krijgslaan 281, B-9000 Ghent, Belgium.
| | - Zachary Brown
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Joshua E Elias
- Department of Chemical & Systems Biology, Stanford University, Stanford, CA 94305, USA.
| | - W Judson Hervey
- Center for Bio/Molecular Science & Engineering, Naval Research Laboratory, Washington, DC 20375, USA.
| | - Thomas McGowan
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Thilo Muth
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany.
| | - Brook Nunn
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| | - Joel Rudney
- Department of Diagnostic and Biological Sciences, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Alessandro Tanca
- Porto Conte Ricerche Science and Technology Park of Sardinia, 07041 Alghero, Italy.
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|
26
|
Riffle M, May DH, Timmins-Schiffman E, Mikan MP, Jaschob D, Noble WS, Nunn BL. MetaGOmics: A Web-Based Tool for Peptide-Centric Functional and Taxonomic Analysis of Metaproteomics Data. Proteomes 2017; 6:E2. [PMID: 29280960 DOI: 10.3390/proteomes6010002] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Revised: 12/19/2017] [Accepted: 12/21/2017] [Indexed: 12/22/2022] Open
Abstract
Metaproteomics is the characterization of all proteins being expressed by a community of organisms in a complex biological sample at a single point in time. Applications of metaproteomics range from the comparative analysis of environmental samples (such as ocean water and soil) to microbiome data from multicellular organisms (such as the human gut). Metaproteomics research is often focused on the quantitative functional makeup of the metaproteome and which organisms are making those proteins. That is: What are the functions of the currently expressed proteins? How much of the metaproteome is associated with those functions? And, which microorganisms are expressing the proteins that perform those functions? However, traditional protein-centric functional analysis is greatly complicated by the large size, redundancy, and lack of biological annotations for the protein sequences in the database used to search the data. To help address these issues, we have developed an algorithm and web application (dubbed "MetaGOmics") that automates the quantitative functional (using Gene Ontology) and taxonomic analysis of metaproteomics data and subsequent visualization of the results. MetaGOmics is designed to overcome the shortcomings of traditional proteomics analysis when used with metaproteomics data. It is easy to use, requires minimal input, and fully automates most steps of the analysis-including comparing the functional makeup between samples. MetaGOmics is freely available at https://www.yeastrc.org/metagomics/.
Collapse
|
27
|
Bergauer K, Fernandez-Guerra A, Garcia JAL, Sprenger RR, Stepanauskas R, Pachiadaki MG, Jensen ON, Herndl GJ. Organic matter processing by microbial communities throughout the Atlantic water column as revealed by metaproteomics. Proc Natl Acad Sci U S A 2018; 115:E400-8. [PMID: 29255014 DOI: 10.1073/pnas.1708779115] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The phylogenetic composition of the heterotrophic microbial community is depth stratified in the oceanic water column down to abyssopelagic layers. In the layers below the euphotic zone, it has been suggested that heterotrophic microbes rely largely on solubilized particulate organic matter as a carbon and energy source rather than on dissolved organic matter. To decipher whether changes in the phylogenetic composition with depth are reflected in changes in the bacterial and archaeal transporter proteins, we generated an extensive metaproteomic and metagenomic dataset of microbial communities collected from 100- to 5,000-m depth in the Atlantic Ocean. By identifying which compounds of the organic matter pool are absorbed, transported, and incorporated into microbial cells, intriguing insights into organic matter transformation in the deep ocean emerged. On average, solute transporters accounted for 23% of identified protein sequences in the lower euphotic and ∼39% in the bathypelagic layer, indicating the central role of heterotrophy in the dark ocean. In the bathypelagic layer, substrate affinities of expressed transporters suggest that, in addition to amino acids, peptides and carbohydrates, carboxylic acids and compatible solutes may be essential substrates for the microbial community. Key players with highest expression of solute transporters were Alphaproteobacteria, Gammaproteobacteria, and Deltaproteobacteria, accounting for 40%, 11%, and 10%, respectively, of relative protein abundances. The in situ expression of solute transporters indicates that the heterotrophic prokaryotic community is geared toward the utilization of similar organic compounds throughout the water column, with yet higher abundances of transporters targeting aromatic compounds in the bathypelagic realm.
Collapse
|
28
|
Cheng K, Ning Z, Zhang X, Li L, Liao B, Mayne J, Stintzi A, Figeys D. MetaLab: an automated pipeline for metaproteomic data analysis. Microbiome 2017; 5:157. [PMID: 29197424 PMCID: PMC5712144 DOI: 10.1186/s40168-017-0375-2] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Accepted: 11/17/2017] [Indexed: 05/19/2023]
Abstract
BACKGROUND Research involving microbial ecosystems has drawn increasing attention in recent years. Studying microbe-microbe, host-microbe, and environment-microbe interactions are essential for the understanding of microbial ecosystems. Currently, metaproteomics provide qualitative and quantitative information of proteins, providing insights into the functional changes of microbial communities. However, computational analysis of large-scale data generated in metaproteomic studies remains a challenge. Conventional proteomic software have difficulties dealing with the extreme complexity and species diversity present in microbiome samples leading to lower rates of peptide and protein identification. To address this issue, we previously developed the MetaPro-IQ approach for highly efficient microbial protein/peptide identification and quantification. RESULT Here, we developed an integrated software platform, named MetaLab, providing a complete and automated, user-friendly pipeline for fast microbial protein identification, quantification, as well as taxonomic profiling, directly from mass spectrometry raw data. Spectral clustering adopted in the pre-processing step dramatically improved the speed of peptide identification from database searches. Quantitative information of identified peptides was used for estimating the relative abundance of taxa at all phylogenetic ranks. Taxonomy result files exported by MetaLab are fully compatible with widely used metagenomics tools. Herein, the potential of MetaLab is evaluated by reanalyzing a metaproteomic dataset from mouse gut microbiome samples. CONCLUSION MetaLab is a fully automatic software platform enabling an integrated data-processing pipeline for metaproteomics. The function of sample-specific database generation can be very advantageous for searching peptides against huge protein databases. It provides a seamless connection between peptide determination and taxonomic profiling; therefore, the peptide abundance is readily used for measuring the microbial variations. MetaLab is designed as a versatile, efficient, and easy-to-use tool which can greatly simplify the procedure of metaproteomic data analysis for researchers in microbiome studies.
Collapse
Affiliation(s)
- Kai Cheng
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Zhibin Ning
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Xu Zhang
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Leyuan Li
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Bo Liao
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Janice Mayne
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Alain Stintzi
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
| | - Daniel Figeys
- Department of Biochemistry, Microbiology and Immunology, Ottawa Institute of Systems Biology, Faculty of Medicine, University of Ottawa, Ottawa, Ontario Canada
- Molecular Architecture of Life Program, Canadian Institute for Advanced Research, Toronto, Ontario Canada
| |
Collapse
|
29
|
Heyer R, Schallert K, Zoun R, Becher B, Saake G, Benndorf D. Challenges and perspectives of metaproteomic data analysis. J Biotechnol 2017; 261:24-36. [PMID: 28663049 DOI: 10.1016/j.jbiotec.2017.06.1201] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2017] [Revised: 06/20/2017] [Accepted: 06/23/2017] [Indexed: 02/07/2023]
Abstract
In nature microorganisms live in complex microbial communities. Comprehensive taxonomic and functional knowledge about microbial communities supports medical and technical application such as fecal diagnostics as well as operation of biogas plants or waste water treatment plants. Furthermore, microbial communities are crucial for the global carbon and nitrogen cycle in soil and in the ocean. Among the methods available for investigation of microbial communities, metaproteomics can approximate the activity of microorganisms by investigating the protein content of a sample. Although metaproteomics is a very powerful method, issues within the bioinformatic evaluation impede its success. In particular, construction of databases for protein identification, grouping of redundant proteins as well as taxonomic and functional annotation pose big challenges. Furthermore, growing amounts of data within a metaproteomics study require dedicated algorithms and software. This review summarizes recent metaproteomics software and addresses the introduced issues in detail.
Collapse
Affiliation(s)
- Robert Heyer
- Otto von Guericke University, Bioprocess Engineering, Universitätsplatz 2, 39106 Magdeburg, Germany.
| | - Kay Schallert
- Otto von Guericke University, Bioprocess Engineering, Universitätsplatz 2, 39106 Magdeburg, Germany.
| | - Roman Zoun
- Otto von Guericke University, Institute for Technical and Business Information Systems, Universitätsplatz 2, 39106 Magdeburg, Germany.
| | - Beatrice Becher
- Otto von Guericke University, Bioprocess Engineering, Universitätsplatz 2, 39106 Magdeburg, Germany.
| | - Gunter Saake
- Otto von Guericke University, Institute for Technical and Business Information Systems, Universitätsplatz 2, 39106 Magdeburg, Germany.
| | - Dirk Benndorf
- Otto von Guericke University, Bioprocess Engineering, Universitätsplatz 2, 39106 Magdeburg, Germany; Max Planck Institute for Dynamics of Complex Technical Systems, Bioprocess Engineering, Sandtorstraße 1, 39106, Magdeburg, Germany.
| |
Collapse
|
30
|
Abstract
In shotgun proteomics analysis, user-specified parameters are critical to database search performance and therefore to the yield of confident peptide-spectrum matches (PSMs). Two of the most important parameters are related to the accuracy of the mass spectrometer. Precursor mass tolerance defines the peptide candidates considered for each spectrum. Fragment mass tolerance or bin size determines how close observed and theoretical fragments must be to be considered a match. For either of these two parameters, too wide a setting yields randomly high-scoring false PSMs, whereas too narrow a setting erroneously excludes true PSMs, in both cases, lowering the yield of peptides detected at a given false discovery rate. We describe a strategy for inferring optimal search parameters by assembling and analyzing pairs of spectra that are likely to have been generated by the same peptide ion to infer precursor and fragment mass error. This strategy does not rely on a database search, making it usable in a wide variety of settings. In our experiments on data from a variety of instruments including Orbitrap and Q-TOF acquisitions, this strategy yields more high-confidence PSMs than using settings based on instrument defaults or determined by experts. Param-Medic is open-source and cross-platform. It is available as a standalone tool ( http://noble.gs.washington.edu/proj/param-medic/ ) and has been integrated into the Crux proteomics toolkit ( http://crux.ms ), providing automatic parameter selection for the Comet and Tide search engines.
Collapse
Affiliation(s)
- Damon H May
- Department of Genome Sciences, University of Washington , Seattle, Washington 98195, United States
| | - Kaipo Tamura
- Department of Genome Sciences, University of Washington , Seattle, Washington 98195, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington , Seattle, Washington 98195, United States
- Department of Computer Science and Engineering, University of Washington , Seattle, Washington 98195, United States
| |
Collapse
|
31
|
Timmins-Schiffman E, May DH, Mikan M, Riffle M, Frazar C, Harvey HR, Noble WS, Nunn BL. Critical decisions in metaproteomics: achieving high confidence protein annotations in a sea of unknowns. ISME J 2017; 11:309-14. [PMID: 27824341 DOI: 10.1038/ismej.2016.132] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|