1
|
Lou R, Shui W. Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Mol Cell Proteomics 2024; 23:100712. [PMID: 38182042 PMCID: PMC10847697 DOI: 10.1016/j.mcpro.2024.100712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/27/2023] [Accepted: 01/02/2024] [Indexed: 01/07/2024] Open
Abstract
Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. This review provides a comprehensive overview of recent advances in both the experimental and computational methods for DIA proteomics, from data acquisition schemes to analysis strategies and software tools. DIA acquisition schemes are categorized based on the design of precursor isolation windows, highlighting wide-window, overlapping-window, narrow-window, scanning quadrupole-based, and parallel accumulation-serial fragmentation-enhanced DIA methods. For DIA data analysis, major strategies are classified into spectrum reconstruction, sequence-based search, library-based search, de novo sequencing, and sequencing-independent approaches. A wide array of software tools implementing these strategies are reviewed, with details on their overall workflows and scoring approaches at different steps. The generation and optimization of spectral libraries, which are critical resources for DIA analysis, are also discussed. Publicly available benchmark datasets covering global proteomics and phosphoproteomics are summarized to facilitate performance evaluation of various software tools and analysis workflows. Continued advances and synergistic developments of versatile components in DIA workflows are expected to further enhance the power of DIA-based proteomics.
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
2
|
Abstract
Proteins are the key biological actors within cells, driving many biological processes integral to both healthy and diseased states. Understanding the depth of complexity represented within the proteome is crucial to our scientific understanding of cellular biology and to provide disease specific insights for clinical applications. Mass spectrometry-based proteomics is the premier method for proteome analysis, with the ability to both identify and quantify proteins. Although proteomics continues to grow as a robust field of bioanalytical chemistry, advances are still necessary to enable a more comprehensive view of the proteome. In this review, we provide a broad overview of mass spectrometry-based proteomics in general, and highlight four developing areas of bottom-up proteomics: (1) protein inference, (2) alternative proteases, (3) sample-specific databases and (4) post-translational modification discovery.
Collapse
Affiliation(s)
- Rachel M Miller
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA.
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
3
|
Grabowsky ER, Saviola AJ, Alvarado-Díaz J, Mascareñas AQ, Hansen KC, Yates JR, Mackessy SP. Montane Rattlesnakes in México: Venoms of Crotalus tancitarensis and Related Species within the Crotalus intermedius Group. Toxins (Basel) 2023; 15:72. [PMID: 36668891 PMCID: PMC9867100 DOI: 10.3390/toxins15010072] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 01/04/2023] [Accepted: 01/10/2023] [Indexed: 01/15/2023] Open
Abstract
The Crotalus intermedius group is a clade of rattlesnakes consisting of several species adapted to a high elevation habitat, primarily in México. Crotalus tancitarensis was previously classified as C. intermedius, until individuals occurring on Cerro Tancítaro in Michoacán, México, were reevaluated and classified as a new species (C. tancitarensis) based on scale pattern and geographic location. This study aimed to characterize the venom of C. tancitarensis and compare the venom profile to those of other species within the Crotalus intermedius group using gel electrophoresis, biochemical assays, reverse-phase high performance liquid chromatography, mass spectrometry, and lethal toxicity (LD50) assays. Results show that the venom profiles of species within the Crotalus intermedius group are similar, but with distinct differences in phospholipase A2 (PLA2), metalloproteinase PI (SVMP PI), and kallikrein-like serine proteinase (SVSP) activity and relative abundance. Proteomic analysis indicated that the highland forms produce venoms with 50-60 protein isoforms and a composition typical of type I rattlesnake venoms (abundant SVMPs, lack of presynaptic PLA2-based neurotoxins), as well as a diversity of typical Crotalus venom components such as serine proteinases, PLA2s, C-type lectins, and less abundant toxins (LAAOs, CRiSPs, etc.). The overall venom profile of C. tancitarensis appears most similar to C. transversus, which is consistent with a previous mitochondrial DNA analysis of the Crotalus intermedius group. These rattlesnakes of the Mexican highlands represent a radiation of high elevation specialists, and in spite of divergence of species in these Sky Island habitats, venom composition of species analyzed here has remained relatively conserved. The majority of protein family isoforms are conserved in all members of the clade, and as seen in other more broadly distributed rattlesnake species, differences in their venoms are largely due to relative concentrations of specific components.
Collapse
Affiliation(s)
- Emily R. Grabowsky
- School of Biological Sciences, University of Northern Colorado, Greeley, CO 80639, USA
| | - Anthony J. Saviola
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- Department of Molecular Medicine and Neurobiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Javier Alvarado-Díaz
- INIRENA (Instituto de Investigaciones sobre los Recursos Naturales), Morelia CP 58330, Michoacán, Mexico
| | | | - Kirk C. Hansen
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - John R. Yates
- Department of Molecular Medicine and Neurobiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Stephen P. Mackessy
- School of Biological Sciences, University of Northern Colorado, Greeley, CO 80639, USA
| |
Collapse
|
4
|
Cunsolo V, Di Francesco A, Pittalà MGG, Saletti R, Foti S. The TriMet_DB: A Manually Curated Database of the Metabolic Proteins of Triticum aestivum. Nutrients 2022; 14:nu14245377. [PMID: 36558536 PMCID: PMC9781733 DOI: 10.3390/nu14245377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/07/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
Mass-spectrometry-based wheat proteomics is challenging because the current interpretation of mass spectrometry data relies on public databases that are not exhaustive (UniProtKB/Swiss-Prot) or contain many redundant and poor or un-annotated entries (UniProtKB/TrEMBL). Here, we report the development of a manually curated database of the metabolic proteins of Triticum aestivum (hexaploid wheat), named TriMet_DB (Triticum aestivum Metabolic Proteins DataBase). The manually curated TriMet_DB was generated in FASTA format so that it can be read directly by programs used to interpret the mass spectrometry data. Furthermore, the complete list of entries included in the TriMet_DB is reported in a freely available resource, which includes for each protein the description, the gene code, the protein family, and the allergen name (if any). To evaluate its performance, the TriMet_DB was used to interpret the MS data acquired on the metabolic protein fraction extracted from the cultivar MEC of Triticum aestivum. Data are available via ProteomeXchange with identifier PXD037709.
Collapse
|
5
|
Reanalysis of ProteomicsDB Using an Accurate, Sensitive, and Scalable False Discovery Rate Estimation Approach for Protein Groups. Mol Cell Proteomics 2022; 21:100437. [PMID: 36328188 PMCID: PMC9718969 DOI: 10.1016/j.mcpro.2022.100437] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 10/16/2022] [Accepted: 10/28/2022] [Indexed: 11/07/2022] Open
Abstract
Estimating false discovery rates (FDRs) of protein identification continues to be an important topic in mass spectrometry-based proteomics, particularly when analyzing very large datasets. One performant method for this purpose is the Picked Protein FDR approach which is based on a target-decoy competition strategy on the protein level that ensures that FDRs scale to large datasets. Here, we present an extension to this method that can also deal with protein groups, that is, proteins that share common peptides such as protein isoforms of the same gene. To obtain well-calibrated FDR estimates that preserve protein identification sensitivity, we introduce two novel ideas. First, the picked group target-decoy and second, the rescued subset grouping strategies. Using entrapment searches and simulated data for validation, we demonstrate that the new Picked Protein Group FDR method produces accurate protein group-level FDR estimates regardless of the size of the data set. The validation analysis also uncovered that applying the commonly used Occam's razor principle leads to anticonservative FDR estimates for large datasets. This is not the case for the Picked Protein Group FDR method. Reanalysis of deep proteomes of 29 human tissues showed that the new method identified up to 4% more protein groups than MaxQuant. Applying the method to the reanalysis of the entire human section of ProteomicsDB led to the identification of 18,000 protein groups at 1% protein group-level FDR. The analysis also showed that about 1250 genes were represented by ≥2 identified protein groups. To make the method accessible to the proteomics community, we provide a software tool including a graphical user interface that enables merging results from multiple MaxQuant searches into a single list of identified and quantified protein groups.
Collapse
|
6
|
Miller RM, Jordan BT, Mehlferber MM, Jeffery ED, Chatzipantsiou C, Kaur S, Millikin RJ, Dai Y, Tiberi S, Castaldi PJ, Shortreed MR, Luckey CJ, Conesa A, Smith LM, Deslattes Mays A, Sheynkman GM. Enhanced protein isoform characterization through long-read proteogenomics. Genome Biol 2022; 23:69. [PMID: 35241129 PMCID: PMC8892804 DOI: 10.1186/s13059-022-02624-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 02/02/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms. RESULTS We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. CONCLUSIONS Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.
Collapse
Affiliation(s)
- Rachel M. Miller
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Ben T. Jordan
- grid.27755.320000 0000 9136 933XDepartment of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA USA
| | - Madison M. Mehlferber
- grid.27755.320000 0000 9136 933XDepartment of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA USA ,grid.27755.320000 0000 9136 933XDepartment of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA USA
| | - Erin D. Jeffery
- grid.27755.320000 0000 9136 933XDepartment of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA USA
| | | | - Simi Kaur
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Robert J. Millikin
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Yunxiang Dai
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Simone Tiberi
- grid.7400.30000 0004 1937 0650Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland ,grid.7400.30000 0004 1937 0650Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Peter J. Castaldi
- grid.62560.370000 0004 0378 8294Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA USA ,grid.62560.370000 0004 0378 8294Division of General Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA USA
| | - Michael R. Shortreed
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Chance John Luckey
- grid.27755.320000 0000 9136 933XDepartment of Pathology, University of Virginia, Charlottesville, VA USA
| | - Ana Conesa
- grid.4711.30000 0001 2183 4846Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain ,grid.15276.370000 0004 1936 8091Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL USA
| | - Lloyd M. Smith
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Anne Deslattes Mays
- grid.420089.70000 0000 9635 8082 Office of Data Science and Sharing, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Rockville, MD USA
| | - Gloria M. Sheynkman
- grid.27755.320000 0000 9136 933XDepartment of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA USA ,grid.27755.320000 0000 9136 933XCenter for Public Health Genomics, University of Virginia, Charlottesville, VA USA ,grid.27755.320000 0000 9136 933XUVA Cancer Center, University of Virginia, Charlottesville, VA USA
| |
Collapse
|
7
|
Simopoulos CMA, Figeys D, Lavallée-Adam M. Novel Bioinformatics Strategies Driving Dynamic Metaproteomic Studies. Methods Mol Biol 2022; 2456:319-338. [PMID: 35612752 DOI: 10.1007/978-1-0716-2124-0_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Constant improvements in mass spectrometry technologies and laboratory workflows have enabled the proteomics investigation of biological samples of growing complexity. Microbiomes represent such complex samples for which metaproteomics analyses are becoming increasingly popular. Metaproteomics experimental procedures create large amounts of data from which biologically relevant signal must be efficiently extracted to draw meaningful conclusions. Such a data processing requires appropriate bioinformatics tools specifically developed for, or capable of handling metaproteomics data. In this chapter, we outline current and novel tools that can perform the most commonly used steps in the analysis of cutting-edge metaproteomics data, such as peptide and protein identification and quantification, as well as data normalization, imputation, mining, and visualization. We also provide details about the experimental setups in which these tools should be used.
Collapse
Affiliation(s)
- Caitlin M A Simopoulos
- Department of Biochemistry, Microbiology and Immunology and Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, Canada
| | - Daniel Figeys
- Department of Biochemistry, Microbiology and Immunology and Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, Canada
- School of Pharmaceutical Sciences, University of Ottawa, Ottawa, ON, Canada
| | - Mathieu Lavallée-Adam
- Department of Biochemistry, Microbiology and Immunology and Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, Canada.
| |
Collapse
|
8
|
Van Den Bossche T, Kunath BJ, Schallert K, Schäpe SS, Abraham PE, Armengaud J, Arntzen MØ, Bassignani A, Benndorf D, Fuchs S, Giannone RJ, Griffin TJ, Hagen LH, Halder R, Henry C, Hettich RL, Heyer R, Jagtap P, Jehmlich N, Jensen M, Juste C, Kleiner M, Langella O, Lehmann T, Leith E, May P, Mesuere B, Miotello G, Peters SL, Pible O, Queiros PT, Reichl U, Renard BY, Schiebenhoefer H, Sczyrba A, Tanca A, Trappe K, Trezzi JP, Uzzau S, Verschaffelt P, von Bergen M, Wilmes P, Wolf M, Martens L, Muth T. Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows. Nat Commun 2021; 12:7305. [PMID: 34911965 PMCID: PMC8674281 DOI: 10.1038/s41467-021-27542-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 11/24/2021] [Indexed: 12/17/2022] Open
Abstract
Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear. Here, we carry out a community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI). Based on well-established workflows, we evaluate the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample. We observe that variability at the peptide level is predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappear at the protein group level. While differences are observed for predicted community composition, similar functional profiles are obtained across workflows. CAMPI demonstrates the robustness of present-day metaproteomics research, serves as a template for multi-laboratory studies in metaproteomics, and provides publicly available data sets for benchmarking future developments.
Collapse
Affiliation(s)
- Tim Van Den Bossche
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Benoit J Kunath
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Kay Schallert
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Stephanie S Schäpe
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Paul E Abraham
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jean Armengaud
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Magnus Ø Arntzen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Ariane Bassignani
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Dirk Benndorf
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
- Microbiology, Department of Applied Biosciences and Process Technology, Anhalt University of Applied Sciences, Köthen, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Stephan Fuchs
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | | | - Timothy J Griffin
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Live H Hagen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Rashi Halder
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Céline Henry
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Robert Heyer
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Pratik Jagtap
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Nico Jehmlich
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Marlene Jensen
- Department of Plant & Microbial Biology, North Carolina State University, Raleigh, USA
| | - Catherine Juste
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Manuel Kleiner
- Department of Plant & Microbial Biology, North Carolina State University, Raleigh, USA
| | - Olivier Langella
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| | - Theresa Lehmann
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Emma Leith
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Bart Mesuere
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Guylaine Miotello
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Samantha L Peters
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Olivier Pible
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Pedro T Queiros
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Udo Reichl
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Henning Schiebenhoefer
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam, Germany
| | | | - Alessandro Tanca
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Jean-Pierre Trezzi
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Integrated Biobank of Luxembourg, Luxembourg Institute of Health, 1, rue Louis Rech, L-3555, Dudelange, Luxembourg
| | - Sergio Uzzau
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Pieter Verschaffelt
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Martin von Bergen
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Department of Life Sciences and Medicine, Faculty of Science, Technology and Medicine, University of Luxembourg, 6 avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Maximilian Wolf
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Lennart Martens
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium.
| | - Thilo Muth
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany
| |
Collapse
|
9
|
Van Den Bossche T, Kunath BJ, Schallert K, Schäpe SS, Abraham PE, Armengaud J, Arntzen MØ, Bassignani A, Benndorf D, Fuchs S, Giannone RJ, Griffin TJ, Hagen LH, Halder R, Henry C, Hettich RL, Heyer R, Jagtap P, Jehmlich N, Jensen M, Juste C, Kleiner M, Langella O, Lehmann T, Leith E, May P, Mesuere B, Miotello G, Peters SL, Pible O, Queiros PT, Reichl U, Renard BY, Schiebenhoefer H, Sczyrba A, Tanca A, Trappe K, Trezzi JP, Uzzau S, Verschaffelt P, von Bergen M, Wilmes P, Wolf M, Martens L, Muth T. Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows. Nat Commun 2021; 12:7305. [PMID: 34911965 DOI: 10.1101/2021.03.05.433915] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 11/24/2021] [Indexed: 05/21/2023] Open
Abstract
Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear. Here, we carry out a community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI). Based on well-established workflows, we evaluate the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample. We observe that variability at the peptide level is predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappear at the protein group level. While differences are observed for predicted community composition, similar functional profiles are obtained across workflows. CAMPI demonstrates the robustness of present-day metaproteomics research, serves as a template for multi-laboratory studies in metaproteomics, and provides publicly available data sets for benchmarking future developments.
Collapse
Affiliation(s)
- Tim Van Den Bossche
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Benoit J Kunath
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Kay Schallert
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Stephanie S Schäpe
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Paul E Abraham
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jean Armengaud
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Magnus Ø Arntzen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Ariane Bassignani
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Dirk Benndorf
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
- Microbiology, Department of Applied Biosciences and Process Technology, Anhalt University of Applied Sciences, Köthen, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Stephan Fuchs
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | | | - Timothy J Griffin
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Live H Hagen
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Rashi Halder
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Céline Henry
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Robert L Hettich
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Robert Heyer
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Pratik Jagtap
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Nico Jehmlich
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Marlene Jensen
- Department of Plant & Microbial Biology, North Carolina State University, Raleigh, USA
| | - Catherine Juste
- INRAE, AgroParisTech, Micalis Institute, Université Paris-Saclay, 78350, Jouy-en-Josas, France
| | - Manuel Kleiner
- Department of Plant & Microbial Biology, North Carolina State University, Raleigh, USA
| | - Olivier Langella
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE - Le Moulon, 91190, Gif-sur-Yvette, France
| | - Theresa Lehmann
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Emma Leith
- Department of Biochemistry Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Bart Mesuere
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Guylaine Miotello
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Samantha L Peters
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Olivier Pible
- Département Médicaments et Technologies pour la Santé (DMTS), Université Paris Saclay, CEA, INRAE, SPI, 30200, Bagnols-sur-Cèze, France
| | - Pedro T Queiros
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Udo Reichl
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Henning Schiebenhoefer
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Data Analytics and Computational Statistics, Hasso-Plattner-Institute, Faculty of Digital Engineering, University of Potsdam, Potsdam, Germany
| | | | - Alessandro Tanca
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Jean-Pierre Trezzi
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Integrated Biobank of Luxembourg, Luxembourg Institute of Health, 1, rue Louis Rech, L-3555, Dudelange, Luxembourg
| | - Sergio Uzzau
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Pieter Verschaffelt
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Martin von Bergen
- Department of Molecular Systems Biology, Helmholtz-Centre for Environmental Research - UFZ GmbH, Leipzig, Germany
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Department of Life Sciences and Medicine, Faculty of Science, Technology and Medicine, University of Luxembourg, 6 avenue du Swing, L-4367, Belvaux, Luxembourg
| | - Maximilian Wolf
- Bioprocess Engineering, Otto-von-Guericke University Magdeburg, Magdeburg, Germany
| | - Lennart Martens
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium.
| | - Thilo Muth
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany
| |
Collapse
|
10
|
Cantrell LS, Schey KL. Data-Independent Acquisition Mass Spectrometry of the Human Lens Enhances Spatiotemporal Measurement of Fiber Cell Aging. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:2755-2765. [PMID: 34705440 PMCID: PMC9685647 DOI: 10.1021/jasms.1c00193] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The ocular lens proteome undergoes post-translational and progressive degradation as fiber cells age. The oldest fiber cells and the proteins therein are present at birth and are retained through death. Transparency of the lens is maintained in part by the high abundance Crystallin family proteins (up to 300 mg/mL), which establishes a high dynamic range of protein abundance. As a result, previous data-dependent analysis (DDA) measurements of the lens proteome are less equipped to identify the lowest abundance proteins. To probe more deeply into the lens proteome, we measured the insoluble lens proteome of an 18-year-old human with DDA and data-independent analysis (DIA) methods. By applying more recent library-free DIA search methods, 5,161 protein groups, 50,386 peptides, and 4,960 deamidation sites were detected: significantly outperforming the quantity of identifications in using DDA and pan-human DIA library searches. Finally, by segmenting the lens into multiple fiber cell-age-related regions, we uncovered cell-age-related changes in proteome composition and putative function.
Collapse
Affiliation(s)
- Lee S Cantrell
- Chemical and Physical Biology Program, Vanderbilt University, Nashville, Tennessee 37212, United States
| | - Kevin L Schey
- Department of Biochemistry, Vanderbilt University, Nashville, Tennessee 37212, United States
| |
Collapse
|
11
|
Kirchner M, Deng H, Xu Y. Heterogeneity in proline hydroxylation of fibrillar collagens observed by mass spectrometry. PLoS One 2021; 16:e0250544. [PMID: 34464391 PMCID: PMC8407550 DOI: 10.1371/journal.pone.0250544] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 06/28/2021] [Indexed: 01/22/2023] Open
Abstract
Collagen is the major protein in the extracellular matrix and plays vital roles in tissue development and function. Collagen is also one of the most processed proteins in its biosynthesis. The most prominent post-translational modification (PTM) of collagen is the hydroxylation of Pro residues in the Y-position of the characteristic (Gly-Xaa-Yaa) repeating amino acid sequence of a collagen triple helix. Recent studies using mass spectrometry (MS) and tandem MS sequencing (MS/MS) have revealed unexpected hydroxylation of Pro residues in the X-positions (X-Hyp). The newly identified X-Hyp residues appear to be highly heterogeneous in location and percent occupancy. In order to understand the dynamic nature of the new X-Hyps and their potential impact on applications of MS and MS/MS for collagen research, we sampled four different collagen samples using standard MS and MS/MS techniques. We found considerable variations in the degree of PTMs of the same collagen from different organisms and/or tissues. The rat tail tendon type I collagen is particularly variable in terms of both over-hydroxylation of Pro in the X-position and under-hydroxylation of Pro in the Y-position. In contrast, only a few unexpected PTMs in collagens type I and type III from human placenta were observed. Some observations are not reproducible between different sequencing efforts of the same sample, presumably due to a low population and/or the unpredictable nature of the ionization process. Additionally, despite the heterogeneous preparation and sourcing, collagen samples from commercial sources do not show elevated variations in PTMs compared to samples prepared from a single tissue and/or organism. These findings will contribute to the growing body of information regarding the PTMs of collagen by MS technology, and culminate to a more comprehensive understanding of the extent and the functional roles of the PTMs of collagen.
Collapse
Affiliation(s)
- Michele Kirchner
- Department of Chemistry, Hunter College of CUNY, New York, NY, United States of America
- The Graduate Center, The City University of New York, New York, NY, United States of America
| | - Haiteng Deng
- Proteomics Resource Center, The Rockefeller University, New York, NY, United States of America
| | - Yujia Xu
- Department of Chemistry, Hunter College of CUNY, New York, NY, United States of America
- The Graduate Center, The City University of New York, New York, NY, United States of America
- * E-mail:
| |
Collapse
|
12
|
Rozanova S, Barkovits K, Nikolov M, Schmidt C, Urlaub H, Marcus K. Quantitative Mass Spectrometry-Based Proteomics: An Overview. Methods Mol Biol 2021; 2228:85-116. [PMID: 33950486 DOI: 10.1007/978-1-0716-1024-4_8] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
In recent decades, mass spectrometry has moved more than ever before into the front line of protein-centered research. After being established at the qualitative level, the more challenging question of quantification of proteins and peptides using mass spectrometry has become a focus for further development. In this chapter, we discuss and review actual strategies and problems of the methods for the quantitative analysis of peptides, proteins, and finally proteomes by mass spectrometry. The common themes, the differences, and the potential pitfalls of the main approaches are presented in order to provide a survey of the emerging field of quantitative, mass spectrometry-based proteomics.
Collapse
Affiliation(s)
- Svitlana Rozanova
- Medizinisches Proteom-Center, Medical Faculty, Ruhr-University Bochum, Bochum, Germany.,Medical Proteome Analysis, Center for protein diagnostics (PRODI), Ruhr-University Bochum, Bochum, Germany
| | - Katalin Barkovits
- Medizinisches Proteom-Center, Medical Faculty, Ruhr-University Bochum, Bochum, Germany.,Medical Proteome Analysis, Center for protein diagnostics (PRODI), Ruhr-University Bochum, Bochum, Germany
| | - Miroslav Nikolov
- Bioanalytical Mass Spectrometry Group, Max Planck Institute for Biophysical Chemistry, Goettingen, Germany
| | - Carla Schmidt
- Interdisciplinary Research Center HALOmem, Charles Tanford Protein Center, Institute for Biochemistry and Biotechnology, Martin Luther University Halle-Wittenberg, Halle, Germany
| | - Henning Urlaub
- Bioanalytical Mass Spectrometry Group, Max Planck Institute for Biophysical Chemistry, Goettingen, Germany.,Bioanalytics Group, Institute of Clinical Chemistry, University Medical Center Goettingen, Goettingen, Germany.,Hematology/Oncology, Department of Medicine II, Johann Wolfgang Goethe University, Frankfurt, Germany
| | - Katrin Marcus
- Medizinisches Proteom-Center, Medical Faculty, Ruhr-University Bochum, Bochum, Germany. .,Medical Proteome Analysis, Center for protein diagnostics (PRODI), Ruhr-University Bochum, Bochum, Germany.
| |
Collapse
|
13
|
Peng Y, Jain S, Li YF, Greguš M, Ivanov AR, Vitek O, Radivojac P. New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics. Bioinformatics 2020; 36:i745-i753. [PMID: 33381824 DOI: 10.1093/bioinformatics/btaa807] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate estimation of false discovery rate (FDR) of spectral identification is a central problem in mass spectrometry-based proteomics. Over the past two decades, target-decoy approaches (TDAs) and decoy-free approaches (DFAs) have been widely used to estimate FDR. TDAs use a database of decoy species to faithfully model score distributions of incorrect peptide-spectrum matches (PSMs). DFAs, on the other hand, fit two-component mixture models to learn the parameters of correct and incorrect PSM score distributions. While conceptually straightforward, both approaches lead to problems in practice, particularly in experiments that push instrumentation to the limit and generate low fragmentation-efficiency and low signal-to-noise-ratio spectra. RESULTS We introduce a new decoy-free framework for FDR estimation that generalizes present DFAs while exploiting more search data in a manner similar to TDAs. Our approach relies on multi-component mixtures, in which score distributions corresponding to the correct PSMs, best incorrect PSMs and second-best incorrect PSMs are modeled by the skew normal family. We derive EM algorithms to estimate parameters of these distributions from the scores of best and second-best PSMs associated with each experimental spectrum. We evaluate our models on multiple proteomics datasets and a HeLa cell digest case study consisting of more than a million spectra in total. We provide evidence of improved performance over existing DFAs and improved stability and speed over TDAs without any performance degradation. We propose that the new strategy has the potential to extend beyond peptide identification and reduce the need for TDA on all analytical platforms. AVAILABILITYAND IMPLEMENTATION https://github.com/shawn-peng/FDR-estimation. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yisu Peng
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | - Shantanu Jain
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | | | - Michal Greguš
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA.,Barnett Institute of Chemical and Biological Analysis, Northeastern University, Boston, MA 02115, USA
| | - Alexander R Ivanov
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA.,Barnett Institute of Chemical and Biological Analysis, Northeastern University, Boston, MA 02115, USA
| | - Olga Vitek
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA.,Barnett Institute of Chemical and Biological Analysis, Northeastern University, Boston, MA 02115, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA.,Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, USA.,Barnett Institute of Chemical and Biological Analysis, Northeastern University, Boston, MA 02115, USA
| |
Collapse
|
14
|
Couté Y, Bruley C, Burger T. Beyond Target-Decoy Competition: Stable Validation of Peptide and Protein Identifications in Mass Spectrometry-Based Discovery Proteomics. Anal Chem 2020; 92:14898-14906. [PMID: 32970414 DOI: 10.1021/acs.analchem.0c00328] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
In bottom-up discovery proteomics, target-decoy competition (TDC) is the most popular method for false discovery rate (FDR) control. Despite unquestionable statistical foundations, this method has drawbacks, including its hitherto unknown intrinsic lack of stability vis-à-vis practical conditions of application. Although some consequences of this instability have already been empirically described, they may have been misinterpreted. This article provides evidence that TDC has become less reliable as the accuracy of modern mass spectrometers improved. We therefore propose to replace TDC by a totally different method to control the FDR at the spectrum, peptide, and protein levels, while benefiting from the theoretical guarantees of the Benjamini-Hochberg framework. As this method is simpler to use, faster to compute, and more stable than TDC, we argue that it is better adapted to the standardization and throughput constraints of current proteomic platforms.
Collapse
Affiliation(s)
- Yohann Couté
- Université Grenoble Alpes, CNRS, CEA, INSERM, IRIG, BGE, F-38000 Grenoble, France
| | - Christophe Bruley
- Université Grenoble Alpes, CNRS, CEA, INSERM, IRIG, BGE, F-38000 Grenoble, France
| | - Thomas Burger
- Université Grenoble Alpes, CNRS, CEA, INSERM, IRIG, BGE, F-38000 Grenoble, France
| |
Collapse
|
15
|
Aitekenov S, Gaipov A, Bukasov R. Review: Detection and quantification of proteins in human urine. Talanta 2020; 223:121718. [PMID: 33303164 PMCID: PMC7554478 DOI: 10.1016/j.talanta.2020.121718] [Citation(s) in RCA: 64] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 09/23/2020] [Accepted: 09/26/2020] [Indexed: 12/31/2022]
Abstract
Extensive medical research showed that patients, with high protein concentration in urine, have various kinds of kidney diseases, referred to as proteinuria. Urinary protein biomarkers are useful for diagnosis of many health conditions – kidney and cardio vascular diseases, cancers, diabetes, infections. This review focuses on the instrumental quantification (electrophoresis, chromatography, immunoassays, mass spectrometry, fluorescence spectroscopy, the infrared spectroscopy, and Raman spectroscopy) of proteins (the most of all albumin) in human urine matrix. Different techniques provide unique information on what constituents of the urine are. Due to complex nature of urine, a separation step by electrophoresis or chromatography are often used for proteomics study of urine. Mass spectrometry is a powerful tool for the discovery and the analysis of biomarkers in urine, however, costs of the analysis are high, especially for quantitative analysis. Immunoassays, which often come with fluorescence detection, are major qualitative and quantitative tools in clinical analysis. While Infrared and Raman spectroscopies do not give extensive information about urine, they could become important tools for the routine clinical diagnostics of kidney problems, due to rapidness and low-cost. Thus, it is important to review all the applicable techniques and methods related to urine analysis. In this review, a brief overview of each technique's principle is introduced. Where applicable, research papers about protein determination in urine are summarized with the main figures of merits, such as the limit of detection, the detectable range, recovery and accuracy, when available. Urinary protein biomarkers are useful for diagnosis of many conditions: kidney and cardio vascular diseases, cancers. Liquid chromatography – mass spectroscopy is a powerful tool for urine proteomics, but used mostly in science. Immunoassays are widely used in both clinical and bio-analytical laboratories. IR and Raman spectroscopies are promising tools for diagnostics of urine due to low-cost and rapidness.
Collapse
Affiliation(s)
- Sultan Aitekenov
- School of Sciences and Humanities, Department of Chemistry, Nazarbaev University, Nur-Sultan, Kazakhstan
| | - Abduzhappar Gaipov
- School of Medicine, Department of Clinical Sciences, Nazarbaev University, Nur-Sultan, Kazakhstan
| | - Rostislav Bukasov
- School of Sciences and Humanities, Department of Chemistry, Nazarbaev University, Nur-Sultan, Kazakhstan.
| |
Collapse
|
16
|
Simopoulos CMA, Ning Z, Zhang X, Li L, Walker K, Lavallée-Adam M, Figeys D. pepFunk: a tool for peptide-centric functional analysis of metaproteomic human gut microbiome studies. Bioinformatics 2020; 36:4171-4179. [DOI: 10.1093/bioinformatics/btaa289] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Revised: 03/20/2020] [Accepted: 04/27/2020] [Indexed: 12/13/2022] Open
Abstract
Abstract
Motivation
Enzymatic digestion of proteins before mass spectrometry analysis is a key process in metaproteomic workflows. Canonical metaproteomic data processing pipelines typically involve matching spectra produced by the mass spectrometer to a theoretical spectra database, followed by matching the identified peptides back to parent-proteins. However, the nature of enzymatic digestion produces peptides that can be found in multiple proteins due to conservation or chance, presenting difficulties with protein and functional assignment.
Results
To combat this challenge, we developed pepFunk, a peptide-centric metaproteomic workflow focused on the analysis of human gut microbiome samples. Our workflow includes a curated peptide database annotated with Kyoto Encyclopedia of Genes and Genomes (KEGG) terms and a gene set variation analysis-inspired pathway enrichment adapted for peptide-level data. Analysis using our peptide-centric workflow is fast and highly correlated to a protein-centric analysis, and can identify more enriched KEGG pathways than analysis using protein-level data. Our workflow is open source and available as a web application or source code to be run locally.
Availability and implementation
pepFunk is available online as a web application at https://shiny.imetalab.ca/pepFunk/ with open-source code available from https://github.com/northomics/pepFunk.
Contact
dfigeys@uottawa.ca
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Caitlin M A Simopoulos
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Faculty of Medicine, SIMM-University of Ottawa Joint Research Center in Systems and Personalized Pharmacology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Zhibin Ning
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Faculty of Medicine, SIMM-University of Ottawa Joint Research Center in Systems and Personalized Pharmacology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Xu Zhang
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Faculty of Medicine, SIMM-University of Ottawa Joint Research Center in Systems and Personalized Pharmacology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Leyuan Li
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Faculty of Medicine, SIMM-University of Ottawa Joint Research Center in Systems and Personalized Pharmacology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Krystal Walker
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Faculty of Medicine, SIMM-University of Ottawa Joint Research Center in Systems and Personalized Pharmacology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Mathieu Lavallée-Adam
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
| | - Daniel Figeys
- Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Faculty of Medicine, SIMM-University of Ottawa Joint Research Center in Systems and Personalized Pharmacology, University of Ottawa, Ottawa, ON K1H 8M5, Canada
- Canadian Institute for Advanced Research, Toronto, ON M5G 1M1, Canada
| |
Collapse
|
17
|
Prieto G, Vázquez J. Protein Probability Model for High-Throughput Protein Identification by Mass Spectrometry-Based Proteomics. J Proteome Res 2020; 19:1285-1297. [PMID: 32037837 DOI: 10.1021/acs.jproteome.9b00819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Shotgun proteomics is the method of choice for high-throughput protein identification; however, robust statistical methods are essential to automatize this task while minimizing the number of false identifications. The standard method for estimating the false discovery rate (FDR) of individual identifications and keeping it below a threshold (typically 1%) is the target-decoy approach. However, numerous works have shown that FDR at the protein level may become much larger than FDR at the peptide level. The development of an appropriate scoring model to identify proteins from their peptides using high-throughput shotgun proteomics is highly needed. In this study, we present a novel protein-level scoring algorithm that uses the scores of the identified peptides and maintains all of the properties expected for a true protein probability. We also present a refinement of the picked method to calculate FDR at the protein level. These algorithms can be used together as a robust identification workflow suitable for large-scale proteomics, and we show that the identification performance of this workflow is superior to that of other widely used methods in several samples and using different search engines. Our protein probability model offers the scientific community an algorithm that is easy to integrate into protein identification workflows for the automated analysis of shotgun proteomics data.
Collapse
Affiliation(s)
- Gorka Prieto
- Department of Communications Engineering, University of the Basque Country (UPV/EHU), 48013 Bilbao, Spain
| | - Jesús Vázquez
- Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28049 Madrid, Spain
| |
Collapse
|
18
|
Pfeuffer J, Sachsenberg T, Dijkstra TMH, Serang O, Reinert K, Kohlbacher O. EPIFANY: A Method for Efficient High-Confidence Protein Inference. J Proteome Res 2020; 19:1060-1072. [PMID: 31975601 DOI: 10.1021/acs.jproteome.9b00566] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Accurate protein inference in the presence of shared peptides is still one of the key problems in bottom-up proteomics. Most protein inference tools employing simple heuristic inference strategies are efficient but exhibit reduced accuracy. More advanced probabilistic methods often exhibit better inference quality but tend to be too slow for large data sets. Here, we present a novel protein inference method, EPIFANY, combining a loopy belief propagation algorithm with convolution trees for efficient processing of Bayesian networks. We demonstrate that EPIFANY combines the reliable protein inference of Bayesian methods with significantly shorter runtimes. On the 2016 iPRG protein inference benchmark data, EPIFANY is the only tested method that finds all true-positive proteins at a 5% protein false discovery rate (FDR) without strict prefiltering on the peptide-spectrum match (PSM) level, yielding an increase in identification performance (+10% in the number of true positives and +14% in partial AUC) compared to previous approaches. Even very large data sets with hundreds of thousands of spectra (which are intractable with other Bayesian and some non-Bayesian tools) can be processed with EPIFANY within minutes. The increased inference quality including shared peptides results in better protein inference results and thus increased robustness of the biological hypotheses generated. EPIFANY is available as open-source software for all major platforms at https://OpenMS.de/epifany.
Collapse
Affiliation(s)
- Julianus Pfeuffer
- Applied Bioinformatics, Department of Computer Science, University of Tübingen, 72076 Tübingen, Germany.,Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany.,Algorithmic Bioinformatics, Department of Bioinformatics, Freie Universität Berlin, 14195 Berlin, Germany
| | - Timo Sachsenberg
- Applied Bioinformatics, Department of Computer Science, University of Tübingen, 72076 Tübingen, Germany.,Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
| | - Tjeerd M H Dijkstra
- Biomolecular Interactions, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Oliver Serang
- Department of Computer Science, University of Montana, Missoula, Montana 59812, United States
| | - Knut Reinert
- Algorithmic Bioinformatics, Department of Bioinformatics, Freie Universität Berlin, 14195 Berlin, Germany
| | - Oliver Kohlbacher
- Applied Bioinformatics, Department of Computer Science, University of Tübingen, 72076 Tübingen, Germany.,Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany.,Biomolecular Interactions, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany.,Institute for Translational Bioinformatics, University Hospital Tübingen, 72076 Tübingen, Germany.,Quantitative Biology Center, University of Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
19
|
Abstract
Mass spectrometry is extremely efficient for sequencing small peptides generated by, for example, a trypsin digestion of a complex mixture. Current instruments have the capacity to generate 50-100 K MSMS spectra from a single run. Of these ~30-50% is typically assigned to peptide matches on a 1% FDR threshold. The remaining spectra need more research to explain. We address here whether the 30-50% matched spectra provide consensus matches when using different database-dependent search pipelines. Although the majority of the spectra peptide assignments concur across search engines, our conclusion is that database-dependent search engines still require improvements.
Collapse
Affiliation(s)
- Rune Matthiesen
- Computational and Experimental Biology Group, CEDOC, Chronic Diseases Research Centre, NOVA Medical School, Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisboa, Portugal.
| | - Gorka Prieto
- Department of Communications Engineering, Faculty of Engineering of Bilbao, University of the Basque Country (UPV/EHU), Bilbao, Spain
| | - Hans Christian Beck
- Department of Clinical Biochemistry and Pharmacology, Odense University Hospital, Odense C, Denmark
| |
Collapse
|
20
|
Inferring Protein-Protein Interaction Networks From Mass Spectrometry-Based Proteomic Approaches: A Mini-Review. Comput Struct Biotechnol J 2019; 17:805-811. [PMID: 31316724 PMCID: PMC6611912 DOI: 10.1016/j.csbj.2019.05.007] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Revised: 05/20/2019] [Accepted: 05/26/2019] [Indexed: 01/06/2023] Open
Abstract
Studying protein-protein interaction networks provide key evidence for the underlying molecular mechanisms. Mass spectrometry-based proteomic approaches have been playing a pivotal role in deciphering these interaction networks, along with precise quantification for individual interactions. In this mini-review we discuss the available techniques and methods for qualitative and quantitative elucidation of protein-protein interaction networks. We then summarize the down-stream computational strategies for identification and quantification of interactions from those techniques. Finally, we highlight the challenges and limitations of current computational pipelines in eliminating false positive interactors, followed by a summary of the innovative algorithms to address these issues, along with the scope for future improvements.
Collapse
|
21
|
Schiebenhoefer H, Van Den Bossche T, Fuchs S, Renard BY, Muth T, Martens L. Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis. Expert Rev Proteomics 2019; 16:375-390. [PMID: 31002542 DOI: 10.1080/14789450.2019.1609944] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
INTRODUCTION The study of microbial communities based on the combined analysis of genomic and proteomic data - called metaproteogenomics - has gained increased research attention in recent years. This relatively young field aims to elucidate the functional and taxonomic interplay of proteins in microbiomes and its implications on human health and the environment. Areas covered: This article reviews bioinformatics methods and software tools dedicated to the analysis of data from metaproteomics and metaproteogenomics experiments. In particular, it focuses on the creation of tailored protein sequence databases, on the optimal use of database search algorithms including methods of error rate estimation, and finally on taxonomic and functional annotation of peptide and protein identifications. Expert opinion: Recently, various promising strategies and software tools have been proposed for handling typical data analysis issues in metaproteomics. However, severe challenges remain that are highlighted and discussed in this article; these include: (i) robust false-positive assessment of peptide and protein identifications, (ii) complex protein inference against a background of highly redundant data, (iii) taxonomic and functional post-processing of identification data, and finally, (iv) the assessment and provision of metrics and tools for quantitative analysis.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Tim Van Den Bossche
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| | - Stephan Fuchs
- d FG13 Division of Nosocomial Pathogens and Antibiotic Resistances , Robert Koch Institute , Wernigerode , Germany
| | - Bernhard Y Renard
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Thilo Muth
- a Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure , Robert Koch Institute , Berlin , Germany
| | - Lennart Martens
- b VIB - UGent Center for Medical Biotechnology, VIB , Ghent , Belgium.,c Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences , Ghent University , Ghent , Belgium
| |
Collapse
|
22
|
Devabhaktuni A, Lin S, Zhang L, Swaminathan K, Gonzalez CG, Olsson N, Pearlman SM, Rawson K, Elias JE. TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry datasets. Nat Biotechnol 2019; 37:469-479. [PMID: 30936560 PMCID: PMC6447449 DOI: 10.1038/s41587-019-0067-5] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Accepted: 02/12/2019] [Indexed: 02/06/2023]
Abstract
Although mass spectrometry is well suited to identifying thousands of potential protein post-translational modifications (PTMs), it has historically been biased towards just a few. To measure the entire set of PTMs across diverse proteomes, software must overcome the dual challenges of covering enormous search spaces and distinguishing correct from incorrect spectrum interpretations. Here, we describe TagGraph, a computational tool that overcomes both challenges with an unrestricted string-based search method that is as much as 350-fold faster than existing approaches, and a probabilistic validation model that we optimized for PTM assignments. We applied TagGraph to a published human proteomic dataset of 25 million mass spectra and tripled confident spectrum identifications compared to its original analysis. We identified thousands of modification types on almost 1 million sites in the proteome. We show alternative contexts for highly abundant yet understudied PTMs such as proline hydroxylation, and its unexpected association with cancer mutations. By enabling broad characterization of PTMs, TagGraph informs as to how their functions and regulation intersect.
Collapse
Affiliation(s)
- Arun Devabhaktuni
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Sarah Lin
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Lichao Zhang
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Kavya Swaminathan
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Carlos G Gonzalez
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Niclas Olsson
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Samuel M Pearlman
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Keith Rawson
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA
| | - Joshua E Elias
- Department of Chemical and Systems Biology Stanford School of Medicine, Stanford University, Stanford, CA, USA.
| |
Collapse
|
23
|
Henning J, Tostengard A, Smith R. A Peptide-Level Fully Annotated Data Set for Quantitative Evaluation of Precursor-Aware Mass Spectrometry Data Processing Algorithms. J Proteome Res 2018; 18:392-398. [PMID: 30394759 DOI: 10.1021/acs.jproteome.8b00659] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Modern label-free quantitative mass spectrometry workflows are complex experimental chains for devising the composition of biological samples. With benchtop and in silico experimental steps that each have a significant effect on the accuracy, coverage, and statistical significance of the study result, it is crucial to understand the efficacy and biases of each protocol decision. Although many studies have been conducted on wet lab experimental protocols, postacquisition data processing methods have not been adequately evaluated in large part due to a lack of available ground truth data. In this study, we provide a novel ground truth data set for mass spectrometry data analysis at the precursor (MS1) signal level comprised of isolated peptide signals from UPS2, a popular complex standard for proteomics analysis, requiring more than 1000 h of manual curation. The data set consists of more than 62 million points with 1,294,008 grouped into 57,518 extracted ion chromatograms and those grouped into 14,111 isotopic envelopes. This data set can be used to evaluate many aspects of mass spectrometry data processing, including precursor mapping and signal extraction algorithms.
Collapse
Affiliation(s)
- Jessica Henning
- Department of Computer Science , University of Montana , Missoula , Montana 59812 , United States
| | - Annika Tostengard
- Department of Computer Science , University of Montana , Missoula , Montana 59812 , United States
| | - Rob Smith
- Department of Computer Science , University of Montana , Missoula , Montana 59812 , United States.,Prime Laboratories, Inc. , Missoula , Montana United States
| |
Collapse
|
24
|
Discrimination and quantification of homologous keratins from goat and sheep with dual protease digestion and PRM assays. J Proteomics 2018; 186:38-46. [DOI: 10.1016/j.jprot.2018.07.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 07/03/2018] [Accepted: 07/13/2018] [Indexed: 01/25/2023]
|
25
|
Jarnuczak AF, Albornoz MG, Eyers CE, Grant CM, Hubbard SJ. A quantitative and temporal map of proteostasis during heat shock in Saccharomyces cerevisiae. Mol Omics 2018; 14:37-52. [PMID: 29570196 DOI: 10.1039/c7mo00050b] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Temperature fluctuation is a common environmental stress that elicits a molecular response in order to maintain intracellular protein levels. Here, for the first time, we report a comprehensive temporal and quantitative study of the proteome during a 240 minute heat stress, using label-free mass spectrometry. We report temporal expression changes of the hallmark heat stress proteins, including many molecular chaperones, tightly coupled to their protein clients. A notable lag of 30 to 120 minutes was evident between transcriptome and proteome levels for differentially expressed genes. This targeted molecular response buffers the global proteome; fewer than 15% of proteins display significant abundance change. Additionally, a parallel study in a Hsp70 chaperone mutant (ssb1Δ) demonstrated a significantly attenuated response, at odds with the modest phenotypic effects that are observed on growth rate. We cast the global changes in temporal protein expression into protein interaction and functional networks, to afford a unique, time-resolved and quantitative description of the heat shock response in an important model organism.
Collapse
Affiliation(s)
- Andrew F Jarnuczak
- School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Oxford Road, Manchester M13 9PT, UK.
| | | | | | | | | |
Collapse
|
26
|
Dowsey AW. The need for statistical contributions to bioinformatics at scale, with illustration to mass spectrometry. STAT MODEL 2017. [DOI: 10.1177/1471082x17708519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In their article, Morris and Baladandayuthapani clearly evidence the influence of statisticians in recent methodological advances throughout the bioinformatics pipeline and advocate for the expansion of this role. The latest acquisition platforms, such as next generation sequencing (genomics/transcriptomics) and hyphenated mass spectrometry (proteomics/metabolomics), output raw datasets in the order of gigabytes; it is not unusual to acquire a terabyte or more of data per study. The increasing computational burden this brings is a further impediment against the use of statistically rigorous methodology in the pre-processing stages of the bioinformatics pipeline. In this discussion I describe the mass spectrometry pipeline and use it as an example to show that beneath this challenge lies a two-fold opportunity: (a) Biological complexity and dynamic range is still well beyond what is captured by current processing methodology; hence, potential biomarkers and mechanistic insights are consistently missed; (b) Statistical science could play a larger role in optimizing the acquisition process itself. Data rates will continue to increase as routine clinical omics analysis moves to large-scale facilities with systematic, standardized protocols. Key inferential gains will be achieved by borrowing strength across the sum total of all analyzed studies, a task best underpinned by appropriate statistical modelling.
Collapse
Affiliation(s)
- Andrew W Dowsey
- School of Social & Community Medicine and School of Veterinary Sciences, Faculty of Health Sciences, University of Bristol, United Kingdom
| |
Collapse
|
27
|
Rosenberger G, Bludau I, Schmitt U, Heusel M, Hunter CL, Liu Y, MacCoss MJ, MacLean BX, Nesvizhskii AI, Pedrioli PGA, Reiter L, Röst HL, Tate S, Ting YS, Collins BC, Aebersold R. Statistical control of peptide and protein error rates in large-scale targeted data-independent acquisition analyses. Nat Methods 2017; 14:921-927. [PMID: 28825704 PMCID: PMC5581544 DOI: 10.1038/nmeth.4398] [Citation(s) in RCA: 139] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 07/07/2017] [Indexed: 12/18/2022]
Abstract
Liquid chromatography coupled to tandem mass spectrometry is the main method for high-throughput identification and quantification of peptides and inferred proteins. Within this field, data-independent acquisition (DIA) combined with peptide-centric scoring, exemplified by SWATH-MS, emerged as a scalable method to achieve deep and consistent proteome coverage across large-scale datasets. Here we discuss the adaptation of statistical concepts developed for discovery proteomics based on spectrum-centric scoring to large-scale DIA experiments analyzed with peptide-centric scoring strategies and provide guidance on their application. We show that optimal tradeoffs between sensitivity and specificity require careful considerations of the relationship between proteins in the samples and proteins represented in the spectral library. We propose the application of a global analyte constraint to prevent accumulation of false positives across large-scale datasets. Furthermore, to increase the quality and reproducibility of published proteomic results, well-established confidence criteria should be reported for detected peptide queries, peptides and inferred proteins.
Collapse
Affiliation(s)
- George Rosenberger
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.,PhD Program in Systems Biology, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Isabell Bludau
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.,PhD Program in Systems Biology, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Uwe Schmitt
- ID Scientific IT Services, ETH Zurich, Zurich, Switzerland
| | - Moritz Heusel
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.,PhD program in Molecular and Translational Biomedicine, Competence Center Personalized Medicine (CC-PM), ETH Zurich and University of Zurich, Zurich, Switzerland
| | | | - Yansheng Liu
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Brendan X MacLean
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.,Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA
| | - Patrick G A Pedrioli
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | | | - Hannes L Röst
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | | | - Ying S Ting
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Ben C Collins
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.,Faculty of Science, University of Zurich, Zurich, Switzerland
| |
Collapse
|
28
|
Proteomic differences in amyloid plaques in rapidly progressive and sporadic Alzheimer's disease. Acta Neuropathol 2017; 133:933-954. [PMID: 28258398 DOI: 10.1007/s00401-017-1691-0] [Citation(s) in RCA: 128] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Revised: 02/22/2017] [Accepted: 02/26/2017] [Indexed: 12/16/2022]
Abstract
Rapidly progressive Alzheimer's disease (rpAD) is a particularly aggressive form of Alzheimer's disease, with a median survival time of 7-10 months after diagnosis. Why these patients have such a rapid progression of Alzheimer's disease is currently unknown. To further understand pathological differences between rpAD and typical sporadic Alzheimer's disease (sAD) we used localized proteomics to analyze the protein differences in amyloid plaques in rpAD and sAD. Label-free quantitative LC-MS/MS was performed on amyloid plaques microdissected from rpAD and sAD patients (n = 22 for each patient group) and protein expression differences were quantified. On average, 913 ± 30 (mean ± SEM) proteins were quantified in plaques from each patient and 279 of these proteins were consistently found in plaques from every patient. We found significant differences in protein composition between rpAD and sAD plaques. We found that rpAD plaques contained significantly higher levels of neuronal proteins (p = 0.0017) and significantly lower levels of astrocytic proteins (p = 1.08 × 10-6). Unexpectedly, cumulative protein differences in rpAD plaques did not suggest accelerated typical sAD. Plaques from patients with rpAD were particularly abundant in synaptic proteins, especially those involved in synaptic vesicle release, highlighting the potential importance of synaptic dysfunction in the accelerated development of plaque pathology in rpAD. Combined, our data provide new direct evidence that amyloid plaques do not all have the same protein composition and that the proteomic differences in plaques could provide important insight into the factors that contribute to plaque development. The cumulative protein differences in rpAD plaques suggest rpAD may be a novel subtype of Alzheimer's disease.
Collapse
|
29
|
Zhang B, Pirmoradian M, Zubarev R, Käll L. Covariation of Peptide Abundances Accurately Reflects Protein Concentration Differences. Mol Cell Proteomics 2017; 16:936-948. [PMID: 28302922 PMCID: PMC5417831 DOI: 10.1074/mcp.o117.067728] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Revised: 03/13/2017] [Indexed: 12/29/2022] Open
Abstract
Most implementations of mass spectrometry-based proteomics involve enzymatic digestion of proteins, expanding the analysis to multiple proteolytic peptides for each protein. Currently, there is no consensus of how to summarize peptides' abundances to protein concentrations, and such efforts are complicated by the fact that error control normally is applied to the identification process, and do not directly control errors linking peptide abundance measures to protein concentration. Peptides resulting from suboptimal digestion or being partially modified are not representative of the protein concentration. Without a mechanism to remove such unrepresentative peptides, their abundance adversely impacts the estimation of their protein's concentration. Here, we present a relative quantification approach, Diffacto, that applies factor analysis to extract the covariation of peptides' abundances. The method enables a weighted geometrical average summarization and automatic elimination of incoherent peptides. We demonstrate, based on a set of controlled label-free experiments using standard mixtures of proteins, that the covariation structure extracted by the factor analysis accurately reflects protein concentrations. In the 1% peptide-spectrum match-level FDR data set, as many as 11% of the peptides have abundance differences incoherent with the other peptides attributed to the same protein. If not controlled, such contradicting peptide abundance have a severe impact on protein quantifications. When adding the quantities of each protein's three most abundant peptides, we note as many as 14% of the proteins being estimated as having a negative correlation with their actual concentration differences between samples. Diffacto reduced the amount of such obviously incorrectly quantified proteins to 1.6%. Furthermore, by analyzing clinical data sets from two breast cancer studies, our method revealed the persistent proteomic signatures linked to three subtypes of breast cancer. We conclude that Diffacto can facilitate the interpretation and enhance the utility of most types of proteomics data.
Collapse
Affiliation(s)
- Bo Zhang
- From the ‡Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Scheeles väg 2, SE-17177 Solna, Sweden
| | - Mohammad Pirmoradian
- From the ‡Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Scheeles väg 2, SE-17177 Solna, Sweden.,§Department of Laboratory Medicine, Karolinska University Hospital Huddinge, SE-14186 Huddinge, Sweden
| | - Roman Zubarev
- From the ‡Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Scheeles väg 2, SE-17177 Solna, Sweden;
| | - Lukas Käll
- ¶Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology-KTH, SE-17165 Solna, Sweden
| |
Collapse
|
30
|
The M, MacCoss MJ, Noble WS, Käll L. Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2016; 27:1719-1727. [PMID: 27572102 PMCID: PMC5059416 DOI: 10.1007/s13361-016-1460-7] [Citation(s) in RCA: 225] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 06/15/2016] [Accepted: 07/20/2016] [Indexed: 05/21/2023]
Abstract
Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator's processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method-grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein-in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Matthew The
- Science for Life Laboratory, School of Biotechnology, KTH - Royal Institute of Technology, Box 1031, 17121, Solna, Sweden
| | - Michael J MacCoss
- Department of Genome Sciences, School of Medicine, University of Washington, Seattle, WA, 98195, USA
| | - William S Noble
- Department of Genome Sciences, School of Medicine, University of Washington, Seattle, WA, 98195, USA
- Department of Computer Science and Engineering, University of Washington, Seattle, WA, 98195, USA
| | - Lukas Käll
- Science for Life Laboratory, School of Biotechnology, KTH - Royal Institute of Technology, Box 1031, 17121, Solna, Sweden.
| |
Collapse
|
31
|
The M, Tasnim A, Käll L. How to talk about protein-level false discovery rates in shotgun proteomics. Proteomics 2016; 16:2461-9. [PMID: 27503675 PMCID: PMC5096025 DOI: 10.1002/pmic.201500431] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Revised: 05/12/2016] [Accepted: 07/20/2016] [Indexed: 12/04/2022]
Abstract
A frequently sought output from a shotgun proteomics experiment is a list of proteins that we believe to have been present in the analyzed sample before proteolytic digestion. The standard technique to control for errors in such lists is to enforce a preset threshold for the false discovery rate (FDR). Many consider protein-level FDRs a difficult and vague concept, as the measurement entities, spectra, are manifestations of peptides and not proteins. Here, we argue that this confusion is unnecessary and provide a framework on how to think about protein-level FDRs, starting from its basic principle: the null hypothesis. Specifically, we point out that two competing null hypotheses are used concurrently in today's protein inference methods, which has gone unnoticed by many. Using simulations of a shotgun proteomics experiment, we show how confusing one null hypothesis for the other can lead to serious discrepancies in the FDR. Furthermore, we demonstrate how the same simulations can be used to verify FDR estimates of protein inference methods. In particular, we show that, for a simple protein inference method, decoy models can be used to accurately estimate protein-level FDRs for both competing null hypotheses.
Collapse
Affiliation(s)
- Matthew The
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology - KTH, Solna, Sweden
| | - Ayesha Tasnim
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology - KTH, Solna, Sweden
| | - Lukas Käll
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology - KTH, Solna, Sweden.
| |
Collapse
|
32
|
Riley NM, Bern M, Westphall MS, Coon JJ. Full-Featured Search Algorithm for Negative Electron-Transfer Dissociation. J Proteome Res 2016; 15:2768-76. [PMID: 27402189 DOI: 10.1021/acs.jproteome.6b00319] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Negative electron-transfer dissociation (NETD) has emerged as a premier tool for peptide anion analysis, offering access to acidic post-translational modifications and regions of the proteome that are intractable with traditional positive-mode approaches. Whole-proteome scale characterization is now possible with NETD, but proper informatic tools are needed to capitalize on advances in instrumentation. Currently only one database search algorithm (OMSSA) can process NETD data. Here we implement NETD search capabilities into the Byonic platform to improve the sensitivity of negative-mode data analyses, and we benchmark these improvements using 90 min LC-MS/MS analyses of tryptic peptides from human embryonic stem cells. With this new algorithm for searching NETD data, we improved the number of successfully identified spectra by as much as 80% and identified 8665 unique peptides, 24 639 peptide spectral matches, and 1338 proteins in activated-ion NETD analyses, more than doubling identifications from previous negative-mode characterizations of the human proteome. Furthermore, we reanalyzed our recently published large-scale, multienzyme negative-mode yeast proteome data, improving peptide and peptide spectral match identifications and considerably increasing protein sequence coverage. In all, we show that new informatics tools, in combination with recent advances in data acquisition, can significantly improve proteome characterization in negative-mode approaches.
Collapse
Affiliation(s)
- Nicholas M Riley
- Department of Chemistry, University of Wisconsin-Madison , Madison, Wisconsin 53706, United States.,Genome Center of Wisconsin, University of Wisconsin-Madison , Madison, Wisconsin 53706, United States
| | - Marshall Bern
- Protein Metrics, Inc. , San Carlos, California 94070, United States
| | - Michael S Westphall
- Genome Center of Wisconsin, University of Wisconsin-Madison , Madison, Wisconsin 53706, United States
| | - Joshua J Coon
- Department of Chemistry, University of Wisconsin-Madison , Madison, Wisconsin 53706, United States.,Genome Center of Wisconsin, University of Wisconsin-Madison , Madison, Wisconsin 53706, United States.,Department of Biomolecular Chemistry, University of Wisconsin-Madison , Madison, Wisconsin 53706, United States
| |
Collapse
|
33
|
Muth T, Renard BY, Martens L. Metaproteomic data analysis at a glance: advances in computational microbial community proteomics. Expert Rev Proteomics 2016; 13:757-69. [DOI: 10.1080/14789450.2016.1209418] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
34
|
McDowell G, Philpott A. New Insights Into the Role of Ubiquitylation of Proteins. INTERNATIONAL REVIEW OF CELL AND MOLECULAR BIOLOGY 2016; 325:35-88. [DOI: 10.1016/bs.ircmb.2016.02.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
35
|
Latosinska A, Vougas K, Makridakis M, Klein J, Mullen W, Abbas M, Stravodimos K, Katafigiotis I, Merseburger AS, Zoidakis J, Mischak H, Vlahou A, Jankowski V. Comparative Analysis of Label-Free and 8-Plex iTRAQ Approach for Quantitative Tissue Proteomic Analysis. PLoS One 2015; 10:e0137048. [PMID: 26331617 PMCID: PMC4557910 DOI: 10.1371/journal.pone.0137048] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 08/12/2015] [Indexed: 11/18/2022] Open
Abstract
High resolution proteomics approaches have been successfully utilized for the comprehensive characterization of the cell proteome. However, in the case of quantitative proteomics an open question still remains, which quantification strategy is best suited for identification of biologically relevant changes, especially in clinical specimens. In this study, a thorough comparison of a label-free approach (intensity-based) and 8-plex iTRAQ was conducted as applied to the analysis of tumor tissue samples from non-muscle invasive and muscle-invasive bladder cancer. For the latter, two acquisition strategies were tested including analysis of unfractionated and fractioned iTRAQ-labeled peptides. To reduce variability, aliquots of the same protein extract were used as starting material, whereas to obtain representative results per method further sample processing and MS analysis were conducted according to routinely applied protocols. Considering only multiple-peptide identifications, LC-MS/MS analysis resulted in the identification of 910, 1092 and 332 proteins by label-free, fractionated and unfractionated iTRAQ, respectively. The label-free strategy provided higher protein sequence coverage compared to both iTRAQ experiments. Even though pre-fraction of the iTRAQ labeled peptides allowed for a higher number of identifications, this was not accompanied by a respective increase in the number of differentially expressed changes detected. Validity of the proteomics output related to protein identification and differential expression was determined by comparison to existing data in the field (Protein Atlas and published data on the disease). All methods predicted changes which to a large extent agreed with published data, with label-free providing a higher number of significant changes than iTRAQ. Conclusively, both label-free and iTRAQ (when combined to peptide fractionation) provide high proteome coverage and apparently valid predictions in terms of differential expression, nevertheless label-free provides higher sequence coverage and ultimately detects a higher number of differentially expressed proteins. The risk for receiving false associations still exists, particularly when analyzing highly heterogeneous biological samples, raising the need for the analysis of higher sample numbers and/or application of adjustment for multiple testing.
Collapse
Affiliation(s)
- Agnieszka Latosinska
- Biotechnology Division, Biomedical Research Foundation of the Academy of Athens, Athens, Greece
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Konstantinos Vougas
- Biotechnology Division, Biomedical Research Foundation of the Academy of Athens, Athens, Greece
| | - Manousos Makridakis
- Biotechnology Division, Biomedical Research Foundation of the Academy of Athens, Athens, Greece
| | - Julie Klein
- Institut National de la Santé et de la Recherche Médicale (INSERM), U1048, Institute of Cardiovascular and Metabolic Diseases, Toulouse, France
- Université Toulouse III Paul-Sabatier, Toulouse, France
| | - William Mullen
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow, United Kingdom
| | - Mahmoud Abbas
- Department of Pathology, Hannover Medical School, Hannover, Germany
| | | | - Ioannis Katafigiotis
- Department of Urology, Medical School of Athens, Laikon Hospital, Athens, Greece
| | | | - Jerome Zoidakis
- Biotechnology Division, Biomedical Research Foundation of the Academy of Athens, Athens, Greece
| | - Harald Mischak
- BHF Glasgow Cardiovascular Research Centre, University of Glasgow, Glasgow, United Kingdom
- Mosaiques Diagnostics GmbH, Hannover, Germany
| | - Antonia Vlahou
- Biotechnology Division, Biomedical Research Foundation of the Academy of Athens, Athens, Greece
| | - Vera Jankowski
- RWTH-Aachen, Institute for Molecular Cardiovascular Research (IMCAR), Aachen, Germany
- * E-mail:
| |
Collapse
|
36
|
Filip S, Vougas K, Zoidakis J, Latosinska A, Mullen W, Spasovski G, Mischak H, Vlahou A, Jankowski J. Comparison of Depletion Strategies for the Enrichment of Low-Abundance Proteins in Urine. PLoS One 2015. [PMID: 26208298 PMCID: PMC4514849 DOI: 10.1371/journal.pone.0133773] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Proteome analysis of complex biological samples for biomarker identification remains challenging, among others due to the extended range of protein concentrations. High-abundance proteins like albumin or IgG of plasma and urine, may interfere with the detection of potential disease biomarkers. Currently, several options are available for the depletion of abundant proteins in plasma. However, the applicability of these methods in urine has not been thoroughly investigated. In this study, we compared different, commercially available immunodepletion and ion-exchange based approaches on urine samples from both healthy subjects and CKD patients, for their reproducibility and efficiency in protein depletion. A starting urine volume of 500 μL was used to simulate conditions of a multi-institutional biomarker discovery study. All depletion approaches showed satisfactory reproducibility (n=5) in protein identification as well as protein abundance. Comparison of the depletion efficiency between the unfractionated and fractionated samples and the different depletion strategies, showed efficient depletion in all cases, with the exception of the ion-exchange kit. The depletion efficiency was found slightly higher in normal than in CKD samples and normal samples yielded more protein identifications than CKD samples when using both initial as well as corresponding depleted fractions. Along these lines, decrease in the amount of albumin and other targets as applicable, following depletion, was observed. Nevertheless, these depletion strategies did not yield a higher number of identifications in neither the urine from normal nor CKD patients. Collectively, when analyzing urine in the context of CKD biomarker identification, no added value of depletion strategies can be observed and analysis of unfractionated starting urine appears to be preferable.
Collapse
Affiliation(s)
- Szymon Filip
- Biomedical Research Foundation Academy of Athens, Biotechnology Division, Athens, Greece
- Charité–Universitätsmedizin Berlin, Berlin, Germany
| | - Konstantinos Vougas
- Biomedical Research Foundation Academy of Athens, Biotechnology Division, Athens, Greece
| | - Jerome Zoidakis
- Biomedical Research Foundation Academy of Athens, Biotechnology Division, Athens, Greece
| | - Agnieszka Latosinska
- Biomedical Research Foundation Academy of Athens, Biotechnology Division, Athens, Greece
- Charité–Universitätsmedizin Berlin, Berlin, Germany
| | - William Mullen
- University of Glasgow Institute of Cardiovascular and Medical Sciences, Glasgow, United Kingdom
| | - Goce Spasovski
- Ss. Cyril and Methodius University in Skopje, Nephrology Department, Skopje, Former Yugoslav Republic of Macedonia
| | - Harald Mischak
- University of Glasgow Institute of Cardiovascular and Medical Sciences, Glasgow, United Kingdom
- Mosaiques Diagnostics GmbH, Hannover, Germany
| | - Antonia Vlahou
- Biomedical Research Foundation Academy of Athens, Biotechnology Division, Athens, Greece
| | - Joachim Jankowski
- University Hospital RWTH Aachen, Institute for Molecular Cardiovascular Research, Aachen, Germany
- * E-mail:
| |
Collapse
|
37
|
Serang O. A Fast Numerical Method for Max-Convolution and the Application to Efficient Max-Product Inference in Bayesian Networks. J Comput Biol 2015; 22:770-83. [PMID: 26161499 DOI: 10.1089/cmb.2015.0013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Observations depending on sums of random variables are common throughout many fields; however, no efficient solution is currently known for performing max-product inference on these sums of general discrete distributions (max-product inference can be used to obtain maximum a posteriori estimates). The limiting step to max-product inference is the max-convolution problem (sometimes presented in log-transformed form and denoted as "infimal convolution," "min-convolution," or "convolution on the tropical semiring"), for which no O(k log(k)) method is currently known. Presented here is an O(k log(k)) numerical method for estimating the max-convolution of two nonnegative vectors (e.g., two probability mass functions), where k is the length of the larger vector. This numerical max-convolution method is then demonstrated by performing fast max-product inference on a convolution tree, a data structure for performing fast inference given information on the sum of n discrete random variables in O(nk log(nk)log(n)) steps (where each random variable has an arbitrary prior distribution on k contiguous possible states). The numerical max-convolution method can be applied to specialized classes of hidden Markov models to reduce the runtime of computing the Viterbi path from nk(2) to nk log(k), and has potential application to the all-pairs shortest paths problem.
Collapse
Affiliation(s)
- Oliver Serang
- 1 Department of Informatik Freie Universität Berlin, Berlin, Germany .,2 Liebniz-Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany
| |
Collapse
|
38
|
Webb-Robertson BJM, Matzke MM, Datta S, Payne SH, Kang J, Bramer LM, Nicora CD, Shukla AK, Metz TO, Rodland KD, Smith RD, Tardiff MF, McDermott JE, Pounds JG, Waters KM. Bayesian proteoform modeling improves protein quantification of global proteomic measurements. Mol Cell Proteomics 2015; 13:3639-46. [PMID: 25433089 DOI: 10.1074/mcp.m113.030932] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
As the capability of mass spectrometry-based proteomics has matured, tens of thousands of peptides can be measured simultaneously, which has the benefit of offering a systems view of protein expression. However, a major challenge is that, with an increase in throughput, protein quantification estimation from the native measured peptides has become a computational task. A limitation to existing computationally driven protein quantification methods is that most ignore protein variation, such as alternate splicing of the RNA transcript and post-translational modifications or other possible proteoforms, which will affect a significant fraction of the proteome. The consequence of this assumption is that statistical inference at the protein level, and consequently downstream analyses, such as network and pathway modeling, have only limited power for biomarker discovery. Here, we describe a Bayesian Proteoform Quantification model (BP-Quant)(1) that uses statistically derived peptides signatures to identify peptides that are outside the dominant pattern or the existence of multiple overexpressed patterns to improve relative protein abundance estimates. It is a research-driven approach that utilizes the objectives of the experiment, defined in the context of a standard statistical hypothesis, to identify a set of peptides exhibiting similar statistical behavior relating to a protein. This approach infers that changes in relative protein abundance can be used as a surrogate for changes in function, without necessarily taking into account the effect of differential post-translational modifications, processing, or splicing in altering protein function. We verify the approach using a dilution study from mouse plasma samples and demonstrate that BP-Quant achieves similar accuracy as the current state-of-the-art methods at proteoform identification with significantly better specificity. BP-Quant is available as a MatLab® and R packages.
Collapse
Affiliation(s)
- Bobbie-Jo M Webb-Robertson
- From the ‡Applied Statistics and Computational Modeling, Pacific Northwest National Laboratory, Richland, WA 99354;
| | - Melissa M Matzke
- §Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Susmita Datta
- ¶Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202
| | - Samuel H Payne
- ‖Omics Technology Development and Production, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Jiyun Kang
- ‖Omics Technology Development and Production, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Lisa M Bramer
- From the ‡Applied Statistics and Computational Modeling, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Carrie D Nicora
- ‖Omics Technology Development and Production, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Anil K Shukla
- ‖Omics Technology Development and Production, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Thomas O Metz
- ¶¶Omics Biological Applications, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Karin D Rodland
- ‡‡Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Richard D Smith
- ‡‡Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Mark F Tardiff
- From the ‡Applied Statistics and Computational Modeling, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Jason E McDermott
- §Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Joel G Pounds
- ‡‡Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354
| | - Katrina M Waters
- ‡‡Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99354
| |
Collapse
|
39
|
Sikdar S, Gill R, Datta S. Improving protein identification from tandem mass spectrometry data by one-step methods and integrating data from other platforms. Brief Bioinform 2015; 17:262-9. [PMID: 26141827 DOI: 10.1093/bib/bbv043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Many approaches have been proposed for the protein identification problem based on tandem mass spectrometry (MS/MS) data. In these experiments, proteins are digested into peptides and the resulting peptide mixture is subjected to mass spectrometry. Some interesting putative peptide features (peaks) are selected from the mass spectra. Following that, the precursor ions undergo fragmentation and are analyzed by MS/MS. The process of identification of peptides from the mass spectra and the constituent proteins in the sample is called protein identification from MS/MS data. There are many two-step protein identification procedures, reviewed in the literature, which first attempt to identify the peptides in a separate process and then use these results to infer the proteins. However, in recent years, there have been attempts to provide a one-step solution to protein identification, which simultaneously identifies the proteins and the peptides in the sample. RESULTS In this review, we briefly introduce the most popular two-step protein identification procedure, PeptideProphet coupled with ProteinProphet. Following that, we describe the difficulties with two-step procedures and review some recently introduced one-step protein/peptide identification procedures that do not suffer from these issues. The focus of this review is on one-step procedures that are based on statistical likelihood-based models, but some discussion of other one-step procedures is also included. We report comparative performances of one-step and two-step methods, which support the overall superiorities of one-step procedures. We also cover some recent efforts to improve protein identification by incorporating other molecular data along with MS/MS data.
Collapse
|
40
|
Savitski MM, Wilhelm M, Hahne H, Kuster B, Bantscheff M. A Scalable Approach for Protein False Discovery Rate Estimation in Large Proteomic Data Sets. Mol Cell Proteomics 2015; 14:2394-404. [PMID: 25987413 DOI: 10.1074/mcp.m114.046995] [Citation(s) in RCA: 283] [Impact Index Per Article: 31.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Indexed: 02/06/2023] Open
Abstract
Calculating the number of confidently identified proteins and estimating false discovery rate (FDR) is a challenge when analyzing very large proteomic data sets such as entire human proteomes. Biological and technical heterogeneity in proteomic experiments further add to the challenge and there are strong differences in opinion regarding the conceptual validity of a protein FDR and no consensus regarding the methodology for protein FDR determination. There are also limitations inherent to the widely used classic target-decoy strategy that particularly show when analyzing very large data sets and that lead to a strong over-representation of decoy identifications. In this study, we investigated the merits of the classic, as well as a novel target-decoy-based protein FDR estimation approach, taking advantage of a heterogeneous data collection comprised of ∼19,000 LC-MS/MS runs deposited in ProteomicsDB (https://www.proteomicsdb.org). The "picked" protein FDR approach treats target and decoy sequences of the same protein as a pair rather than as individual entities and chooses either the target or the decoy sequence depending on which receives the highest score. We investigated the performance of this approach in combination with q-value based peptide scoring to normalize sample-, instrument-, and search engine-specific differences. The "picked" target-decoy strategy performed best when protein scoring was based on the best peptide q-value for each protein yielding a stable number of true positive protein identifications over a wide range of q-value thresholds. We show that this simple and unbiased strategy eliminates a conceptual issue in the commonly used "classic" protein FDR approach that causes overprediction of false-positive protein identification in large data sets. The approach scales from small to very large data sets without losing performance, consistently increases the number of true-positive protein identifications and is readily implemented in proteomics analysis software.
Collapse
Affiliation(s)
| | - Mathias Wilhelm
- §Chair for Proteomics and Bioanalytics, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354 Freising, Germany; ¶SAP SE, Dietmar-Hopp-Allee 16, 69190 Walldorf, Germany
| | - Hannes Hahne
- §Chair for Proteomics and Bioanalytics, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354 Freising, Germany
| | - Bernhard Kuster
- §Chair for Proteomics and Bioanalytics, Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354 Freising, Germany; ‖Center for Integrated Protein Science Munich, Emil Erlenmeyer Forum 5, 85354 Freising, Germany
| | - Marcus Bantscheff
- From the ‡Cellzome GmbH, Meyerhofstrasse 1, 69117 Heidelberg, Germany;
| |
Collapse
|
41
|
Alves G, Yu YK. Mass spectrometry-based protein identification with accurate statistical significance assignment. ACTA ACUST UNITED AC 2014; 31:699-706. [PMID: 25362092 DOI: 10.1093/bioinformatics/btu717] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. RESULTS We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. AVAILABILITY AND IMPLEMENTATION The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
42
|
Serang O. The probabilistic convolution tree: efficient exact Bayesian inference for faster LC-MS/MS protein inference. PLoS One 2014; 9:e91507. [PMID: 24626234 PMCID: PMC3953406 DOI: 10.1371/journal.pone.0091507] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 02/12/2014] [Indexed: 11/18/2022] Open
Abstract
Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called “causal independence”). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to and the space to where is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions.
Collapse
|
43
|
Kelchtermans P, Bittremieux W, De Grave K, Degroeve S, Ramon J, Laukens K, Valkenborg D, Barsnes H, Martens L. Machine learning applications in proteomics research: how the past can boost the future. Proteomics 2014; 14:353-66. [PMID: 24323524 DOI: 10.1002/pmic.201300289] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/24/2013] [Accepted: 10/14/2013] [Indexed: 01/22/2023]
Abstract
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.
Collapse
Affiliation(s)
- Pieter Kelchtermans
- Department of Medical Protein Research, VIB, Ghent, Belgium; Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium; Flemish Institute for Technological Research (VITO), Boeretang, Mol, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Drift time-specific collision energies enable deep-coverage data-independent acquisition proteomics. Nat Methods 2013; 11:167-70. [DOI: 10.1038/nmeth.2767] [Citation(s) in RCA: 324] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Accepted: 11/05/2013] [Indexed: 12/30/2022]
|
45
|
Yang C, He Z, Yu W. A combinatorial perspective of the protein inference problem. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1542-1547. [PMID: 24407311 DOI: 10.1109/tcbb.2013.110] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from peptide identification results. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we devote ourselves to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound, and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain an analytical expression for protein inference. Our method achieves comparable results with ProteinProphet in a more efficient manner in experiments on two data sets of standard protein mixtures and two data sets of real samples. Based on our model, we study the impact of unique peptides and degenerate peptides (degenerate peptides are peptides shared by at least two proteins) on protein probabilities. Meanwhile, we also study the relationship between our model and ProteinProphet. We name our program ProteinInfer. Its Java source code, our supplementary document and experimental results are available at: >http://bioinformatics.ust.hk/proteininfer.
Collapse
Affiliation(s)
- Chao Yang
- The Hong Kong University of Science and Technology, Hong Kong
| | | | - Weichuan Yu
- The Hong Kong University of Science and Technology, Hong Kong
| |
Collapse
|
46
|
Serang O, Cansizoglu AE, Käll L, Steen H, Steen JA. Nonparametric Bayesian evaluation of differential protein quantification. J Proteome Res 2013; 12:4556-65. [PMID: 24024742 DOI: 10.1021/pr400678m] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Arbitrary cutoffs are ubiquitous in quantitative computational proteomics: maximum acceptable MS/MS PSM or peptide q value, minimum ion intensity to calculate a fold change, the minimum number of peptides that must be available to trust the estimated protein fold change (or the minimum number of PSMs that must be available to trust the estimated peptide fold change), and the "significant" fold change cutoff. Here we introduce a novel experimental setup and nonparametric Bayesian algorithm for determining the statistical quality of a proposed differential set of proteins or peptides. By comparing putatively nonchanging case-control evidence to an empirical null distribution derived from a control-control experiment, we successfully avoid some of these common parameters. We then apply our method to evaluating different fold-change rules and find that for our data a 1.2-fold change is the most permissive of the plausible fold-change rules.
Collapse
Affiliation(s)
- Oliver Serang
- Thermo Fisher Scientific Bremen , Hanna-Kunath-Straße 11, Bremen 28199, Germany
| | | | | | | | | |
Collapse
|
47
|
McDowell GS, Philpott A. Non-canonical ubiquitylation: mechanisms and consequences. Int J Biochem Cell Biol 2013; 45:1833-42. [PMID: 23732108 DOI: 10.1016/j.biocel.2013.05.026] [Citation(s) in RCA: 113] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Revised: 05/10/2013] [Accepted: 05/22/2013] [Indexed: 01/04/2023]
Abstract
Post-translational protein modifications initiate, regulate, propagate and terminate a wide variety of processes in cells, and in particular, ubiquitylation targets substrate proteins for degradation, subcellular translocation, cell signaling and multiple other cellular events. Modification of substrate proteins is widely observed to occur via covalent linkages of ubiquitin to the amine groups of lysine side-chains. However, in recent years several new modes of ubiquitin chain attachment have emerged. For instance, covalent modification of non-lysine sites in substrate proteins is theoretically possible according to basic chemical principles underlying the ubiquitylation process, and evidence is building that sites such as the N-terminal amine group of a protein, the hydroxyl group of serine and threonine residues and even the thiol groups of cysteine residues are all employed as sites of ubiquitylation. However, the potential importance of this "non-canonical ubiquitylation" of substrate proteins on sites other than lysine residues has been largely overlooked. This review aims to highlight the unusual features of the process of non-canonical ubiquitylation and the consequences of these events on the activity and fate of a protein.
Collapse
Affiliation(s)
- Gary S McDowell
- Department of Oncology, University of Cambridge, Hutchison/Medical Research Council (MRC) Research Centre, Cambridge, UK
| | | |
Collapse
|
48
|
Serang O, Paulo J, Steen H, Steen JA. A non-parametric cutout index for robust evaluation of identified proteins. Mol Cell Proteomics 2013; 12:807-12. [PMID: 23292186 DOI: 10.1074/mcp.o112.022863] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
This paper proposes a novel, automated method for evaluating sets of proteins identified using mass spectrometry. The remaining peptide-spectrum match score distributions of protein sets are compared to an empirical absent peptide-spectrum match score distribution, and a Bayesian non-parametric method reminiscent of the Dirichlet process is presented to accurately perform this comparison. Thus, for a given protein set, the process computes the likelihood that the proteins identified are correctly identified. First, the method is used to evaluate protein sets chosen using different protein-level false discovery rate (FDR) thresholds, assigning each protein set a likelihood. The protein set assigned the highest likelihood is used to choose a non-arbitrary protein-level FDR threshold. Because the method can be used to evaluate any protein identification strategy (and is not limited to mere comparisons of different FDR thresholds), we subsequently use the method to compare and evaluate multiple simple methods for merging peptide evidence over replicate experiments. The general statistical approach can be applied to other types of data (e.g. RNA sequencing) and generalizes to multivariate problems.
Collapse
Affiliation(s)
- Oliver Serang
- Department of Neurobiology, Harvard Medical School, Boston, MA, USA
| | | | | | | |
Collapse
|
49
|
Serang O, Moruz L, Hoopmann MR, Käll L. Recognizing uncertainty increases robustness and reproducibility of mass spectrometry-based protein inferences. J Proteome Res 2012; 11:5586-91. [PMID: 23148905 DOI: 10.1021/pr300426s] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Parsimony and protein grouping are widely employed to enforce economy in the number of identified proteins, with the goal of increasing the quality and reliability of protein identifications; however, in a counterintuitive manner, parsimony and protein grouping may actually decrease the reproducibility and interpretability of protein identifications. We present a simple illustration demonstrating ways in which parsimony and protein grouping may lower the reproducibility or interpretability of results. We then provide an example of a data set where a probabilistic method increases the reproducibility and interpretability of identifications made on replicate analyses of Human Du145 prostate cancer cell lines.
Collapse
Affiliation(s)
- Oliver Serang
- Department of Neurobiology, Harvard Medical School Children's Hospital Boston, Boston, Massachusetts, United States.
| | | | | | | |
Collapse
|
50
|
Abstract
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programming and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area.
Collapse
Affiliation(s)
- Yong Fuga Li
- School of Informatics and Computing, Indiana University, Bloomington 150 S, Woodlawn Avenue, Bloomington, Indiana 47405, USA
| | | |
Collapse
|