1
|
Li H, Na S, Hwang KB, Paek E. TIDD: tool-independent and data-dependent machine learning for peptide identification. BMC Bioinformatics 2022; 23:109. [PMID: 35354356 PMCID: PMC8969291 DOI: 10.1186/s12859-022-04640-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 03/16/2022] [Indexed: 11/10/2022] Open
Abstract
Background In shotgun proteomics, database search engines have been developed to assign peptides to tandem mass (MS/MS) spectra and at the same time post-processing (or rescoring) approaches over the search results have been proposed to increase the number of confident peptide identifications. The most popular post-processing approaches such as Percolator and PeptideProphet have improved rates of peptide identifications by combining multiple scores from database search engines while applying machine learning techniques. Existing post-processing approaches, however, are limited when dealing with results from new search engines because their features for machine learning must be optimized specifically for each search engine. Results We propose a universal post-processing tool, called TIDD, which supports confident peptide identifications regardless of the search engine adopted. TIDD can work for any (including newly developed) search engines because it calculates universal features that assess peptide-spectrum match quality while it allows additional features provided by search engines (or users) as well. Even though it relies on universal features independent of search tools, TIDD showed similar or better performance than Percolator in terms of peptide identification. TIDD identified 10.23–38.95% more PSMs than target-decoy estimation for MSFragger, which is not supported by Percolator. TIDD offers an easy-to-use simple graphical user interface for user convenience. Conclusions TIDD successfully eliminated the requirement for an optimal feature engineering per database search tool, and thus, can be applied directly to any database search results including newly developed ones. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04640-y.
Collapse
Affiliation(s)
- Honglan Li
- Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| | - Seungjin Na
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, 04763, Republic of Korea
| | - Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Republic of Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea.
| |
Collapse
|
2
|
Fahrner M, Kook L, Fröhlich K, Biniossek ML, Schilling O. A Systematic Evaluation of Semispecific Peptide Search Parameter Enables Identification of Previously Undescribed N-Terminal Peptides and Conserved Proteolytic Processing in Cancer Cell Lines. Proteomes 2021; 9:proteomes9020026. [PMID: 34070654 PMCID: PMC8162549 DOI: 10.3390/proteomes9020026] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 05/21/2021] [Accepted: 05/22/2021] [Indexed: 01/07/2023] Open
Abstract
Liquid chromatography-tandem mass spectrometry (LC-MS/MS) has become the most commonly used technique in explorative proteomic research. A variety of open-source tools for peptide-spectrum matching have become available. Most analyses of explorative MS data are performed using conventional settings, such as fully specific enzymatic constraints. Here we evaluated the impact of the fragment mass tolerance in combination with the enzymatic constraints on the performance of three search engines. Three open-source search engines (Myrimatch, X! Tandem, and MSGF+) were evaluated concerning the suitability in semi- and unspecific searches as well as the importance of accurate fragment mass spectra in non-specific peptide searches. We then performed a semispecific reanalysis of the published NCI-60 deep proteome data applying the most suited parameters. Semi- and unspecific LC-MS/MS data analyses particularly benefit from accurate fragment mass spectra while this effect is less pronounced for conventional, fully specific peptide-spectrum matching. Search speed differed notably between the three search engines for semi- and non-specific peptide-spectrum matching. Semispecific reanalysis of NCI-60 proteome data revealed hundreds of previously undescribed N-terminal peptides, including cases of proteolytic processing or likely alternative translation start sites, some of which were ubiquitously present in all cell lines of the reanalyzed panel. Highly accurate MS2 fragment data in combination with modern open-source search algorithms enable the confident identification of semispecific peptides from large proteomic datasets. The identification of previously undescribed N-terminal peptides in published studies highlights the potential of future reanalysis and data mining in proteomic datasets.
Collapse
Affiliation(s)
- Matthias Fahrner
- Institute for Surgical Pathology, Medical Center–University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg, Germany; (M.F.); (K.F.)
- Faculty of Biology, Albert-Ludwigs-University Freiburg, 79104 Freiburg, Germany
- Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, 79104 Freiburg, Germany
| | - Lucas Kook
- Epidemiology, Biostatistics & Prevention Institute, University of Zurich, 8001 Zurich, Switzerland;
- Institute for Data Analysis and Process Design, Zurich University of Applied Sciences, 8401 Winterthur, Switzerland
| | - Klemens Fröhlich
- Institute for Surgical Pathology, Medical Center–University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg, Germany; (M.F.); (K.F.)
- Faculty of Biology, Albert-Ludwigs-University Freiburg, 79104 Freiburg, Germany
- Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, 79104 Freiburg, Germany
| | - Martin L. Biniossek
- Institute for Molecular Medicine and Cell Research, University of Freiburg, 79104 Freiburg, Germany;
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center–University of Freiburg, Faculty of Medicine, University of Freiburg, 79106 Freiburg, Germany; (M.F.); (K.F.)
- Faculty of Biology, Albert-Ludwigs-University Freiburg, 79104 Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- BIOSS Centre for Biological Signaling Studies, University of Freiburg, 79104 Freiburg, Germany
- Correspondence: ; Tel.: +49-761-270-80610
| |
Collapse
|
3
|
Li T, Chen L, Gan M. Quality control of imbalanced mass spectra from isotopic labeling experiments. BMC Bioinformatics 2019; 20:549. [PMID: 31694522 PMCID: PMC6833298 DOI: 10.1186/s12859-019-3170-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Accepted: 10/22/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mass spectra are usually acquired from the Liquid Chromatography-Mass Spectrometry (LC-MS) analysis for isotope labeled proteomics experiments. In such experiments, the mass profiles of labeled (heavy) and unlabeled (light) peptide pairs are represented by isotope clusters (2D or 3D) that provide valuable information about the studied biological samples in different conditions. The core task of quality control in quantitative LC-MS experiment is to filter out low-quality peptides with questionable profiles. The commonly used methods for this problem are the classification approaches. However, the data imbalance problems in previous control methods are often ignored or mishandled. In this study, we introduced a quality control framework based on the extreme gradient boosting machine (XGBoost), and carefully addressed the imbalanced data problem in this framework. RESULTS In the XGBoost based framework, we suggest the application of the Synthetic minority over-sampling technique (SMOTE) to re-balance data and use the balanced data to train the boosted trees as the classifier. Then the classifier is applied to other data for the peptide quality assessment. Experimental results show that our proposed framework increases the reliability of peptide heavy-light ratio estimation significantly. CONCLUSIONS Our results indicate that this framework is a powerful method for the peptide quality assessment. For the feature extraction part, the extracted ion chromatogram (XIC) based features contribute to the peptide quality assessment. To solve the imbalanced data problem, SMOTE brings a much better classification performance. Finally, the XGBoost is capable for the peptide quality control. Overall, our proposed framework provides reliable results for the further proteomics studies.
Collapse
Affiliation(s)
- Tianjun Li
- Department of Computer and Information Science, University of Macau, Taipa, Macau, China
| | - Long Chen
- Department of Computer and Information Science, University of Macau, Taipa, Macau, China.
| | - Min Gan
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian, China
| |
Collapse
|
4
|
Halloran JT, Zhang H, Kara K, Renggli C, The M, Zhang C, Rocke DM, Käll L, Noble WS. Speeding Up Percolator. J Proteome Res 2019; 18:3353-3359. [PMID: 31407580 PMCID: PMC6884961 DOI: 10.1021/acs.jproteome.9b00288] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The processing of peptide tandem mass spectrometry data involves matching observed spectra against a sequence database. The ranking and calibration of these peptide-spectrum matches can be improved substantially using a machine learning postprocessor. Here, we describe our efforts to speed up one widely used postprocessor, Percolator. The improved software is dramatically faster than the previous version of Percolator, even when using relatively few processors. We tested the new version of Percolator on a data set containing over 215 million spectra and recorded an overall reduction to 23% of the running time as compared to the unoptimized code. We also show that the memory footprint required by these speedups is modest relative to that of the original version of Percolator.
Collapse
Affiliation(s)
- John T. Halloran
- Department of Public Health Sciences, University of California, Davis, Davis, CA, USA
| | - Hantian Zhang
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
| | - Kaan Kara
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
| | - Cédric Renggli
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
| | - Matthew The
- Science for Life Laboratory, KTH — Royal Institute of Technology, Solna, Sweden
| | - Ce Zhang
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
| | - David M. Rocke
- Department of Public Health Sciences, University of California, Davis, Davis, CA, USA
| | - Lukas Käll
- Science for Life Laboratory, KTH — Royal Institute of Technology, Solna, Sweden
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
5
|
Li H, Joh YS, Kim H, Paek E, Lee SW, Hwang KB. Evaluating the effect of database inflation in proteogenomic search on sensitive and reliable peptide identification. BMC Genomics 2016; 17:1031. [PMID: 28155652 PMCID: PMC5259817 DOI: 10.1186/s12864-016-3327-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Proteogenomics is a promising approach for various tasks ranging from gene annotation to cancer research. Databases for proteogenomic searches are often constructed by adding peptide sequences inferred from genomic or transcriptomic evidence to reference protein sequences. Such inflation of databases has potential of identifying novel peptides. However, it also raises concerns on sensitive and reliable peptide identification. Spurious peptides included in target databases may result in underestimated false discovery rate (FDR). On the other hand, inflation of decoy databases could decrease the sensitivity of peptide identification due to the increased number of high-scoring random hits. Although several studies have addressed these issues, widely applicable guidelines for sensitive and reliable proteogenomic search have hardly been available. Results To systematically evaluate the effect of database inflation in proteogenomic searches, we constructed a variety of real and simulated proteogenomic databases for yeast and human tandem mass spectrometry (MS/MS) data, respectively. Against these databases, we tested two popular database search tools with various approaches to search result validation: the target-decoy search strategy (with and without a refined scoring-metric) and a mixture model-based method. The effect of separate filtering of known and novel peptides was also examined. The results from real and simulated proteogenomic searches confirmed that separate filtering increases the sensitivity and reliability in proteogenomic search. However, no one method consistently identified the largest (or the smallest) number of novel peptides from real proteogenomic searches. Conclusions We propose to use a set of search result validation methods with separate filtering, for sensitive and reliable identification of peptides in proteogenomic search. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3327-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Honglan Li
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Republic of Korea
| | - Yoon Sung Joh
- Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| | - Hyunwoo Kim
- Scientific Data Research Center, Korea Institute of Science and Technology Information, Daejeon, 34141, Republic of Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| | - Sang-Won Lee
- Department of Chemistry, Research Institute for Natural Sciences, Korea University, Seoul, 02841, Republic of Korea
| | - Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul, 06978, Republic of Korea.
| |
Collapse
|
6
|
Seyfried NT, Dammer EB, Swarup V, Nandakumar D, Duong DM, Yin L, Deng Q, Nguyen T, Hales CM, Wingo T, Glass J, Gearing M, Thambisetty M, Troncoso JC, Geschwind DH, Lah JJ, Levey AI. A Multi-network Approach Identifies Protein-Specific Co-expression in Asymptomatic and Symptomatic Alzheimer's Disease. Cell Syst 2016; 4:60-72.e4. [PMID: 27989508 DOI: 10.1016/j.cels.2016.11.006] [Citation(s) in RCA: 323] [Impact Index Per Article: 35.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Revised: 09/23/2016] [Accepted: 11/09/2016] [Indexed: 01/07/2023]
Abstract
Here, we report proteomic analyses of 129 human cortical tissues to define changes associated with the asymptomatic and symptomatic stages of Alzheimer's disease (AD). Network analysis revealed 16 modules of co-expressed proteins, 10 of which correlated with AD phenotypes. A subset of modules overlapped with RNA co-expression networks, including those associated with neurons and astroglial cell types, showing altered expression in AD, even in the asymptomatic stages. Overlap of RNA and protein networks was otherwise modest, with many modules specific to the proteome, including those linked to microtubule function and inflammation. Proteomic modules were validated in an independent cohort, demonstrating some module expression changes unique to AD and several observed in other neurodegenerative diseases. AD genetic risk loci were concentrated in glial-related modules in the proteome and transcriptome, consistent with their causal role in AD. This multi-network analysis reveals protein- and disease-specific pathways involved in the etiology, initiation, and progression of AD.
Collapse
Affiliation(s)
- Nicholas T Seyfried
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA; Department of Neurology, Emory University School of Medicine, Atlanta, GA 30322, USA.
| | - Eric B Dammer
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Vivek Swarup
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA
| | - Divya Nandakumar
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Duc M Duong
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Luming Yin
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Qiudong Deng
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Tram Nguyen
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Chadwick M Hales
- Department of Neurology, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Thomas Wingo
- Department of Neurology, Emory University School of Medicine, Atlanta, GA 30322, USA; Division of Neurology, Atlanta VA Medical Center, Decatur, GA 30033, USA
| | - Jonathan Glass
- Department of Neurology, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Marla Gearing
- Department of Experimental Pathology, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Madhav Thambisetty
- Johns Hopkins School of Medicine, Baltimore, MD 21205, USA; National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | | | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90095, USA
| | - James J Lah
- Department of Neurology, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Allan I Levey
- Department of Neurology, Emory University School of Medicine, Atlanta, GA 30322, USA.
| |
Collapse
|
7
|
Jian L, Xia Z, Niu X, Liang X, Samir P, Link AJ. l2 Multiple Kernel Fuzzy SVM-Based Data Fusion for Improving Peptide Identification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:804-809. [PMID: 26394437 DOI: 10.1109/tcbb.2015.2480084] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
SEQUEST is a database-searching engine, which calculates the correlation score between observed spectrum and theoretical spectrum deduced from protein sequences stored in a flat text file, even though it is not a relational and object-oriental repository. Nevertheless, the SEQUEST score functions fail to discriminate between true and false PSMs accurately. Some approaches, such as PeptideProphet and Percolator, have been proposed to address the task of distinguishing true and false PSMs. However, most of these methods employ time-consuming learning algorithms to validate peptide assignments [1] . In this paper, we propose a fast algorithm for validating peptide identification by incorporating heterogeneous information from SEQUEST scores and peptide digested knowledge. To automate the peptide identification process and incorporate additional information, we employ l2 multiple kernel learning (MKL) to implement the current peptide identification task. Results on experimental datasets indicate that compared with state-of-the-art methods, i.e., PeptideProphet and Percolator, our data fusing strategy has comparable performance but reduces the running time significantly.
Collapse
|
8
|
Collins MO, Wright JC, Jones M, Rayner JC, Choudhary JS. Confident and sensitive phosphoproteomics using combinations of collision induced dissociation and electron transfer dissociation. J Proteomics 2014; 103:1-14. [PMID: 24657495 PMCID: PMC4047622 DOI: 10.1016/j.jprot.2014.03.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Revised: 02/26/2014] [Accepted: 03/09/2014] [Indexed: 01/28/2023]
Abstract
We present a workflow using an ETD-optimised version of Mascot Percolator and a modified version of SLoMo (turbo-SLoMo) for analysis of phosphoproteomic data. We have benchmarked this against several database searching algorithms and phosphorylation site localisation tools and show that it offers highly sensitive and confident phosphopeptide identification and site assignment with PSM-level statistics, enabling rigorous comparison of data acquisition methods. We analysed the Plasmodium falciparum schizont phosphoproteome using for the first time, a data-dependent neutral loss-triggered-ETD (DDNL) strategy and a conventional decision-tree method. At a posterior error probability threshold of 0.01, similar numbers of PSMs were identified using both methods with a 73% overlap in phosphopeptide identifications. The false discovery rate associated with spectral pairs where DDNL CID/ETD identified the same phosphopeptide was < 1%. 72% of phosphorylation site assignments using turbo-SLoMo without any score filtering, were identical and 99.8% of these cases are associated with a false localisation rate of < 5%. We show that DDNL acquisition is a useful approach for phosphoproteomics and results in an increased confidence in phosphopeptide identification without compromising sensitivity or duty cycle. Furthermore, the combination of Mascot Percolator and turbo-SLoMo represents a robust workflow for phosphoproteomic data analysis using CID and ETD fragmentation. Biological significance Protein phosphorylation is a ubiquitous post-translational modification that regulates protein function. Mass spectrometry-based approaches have revolutionised its analysis on a large-scale but phosphorylation sites are often identified by single phosphopeptides and therefore require more rigorous data analysis to unsure that sites are identified with high confidence for follow-up experiments to investigate their biological significance. The coverage and confidence of phosphoproteomic experiments can be enhanced by the use of multiple complementary fragmentation methods. Here we have benchmarked a data analysis pipeline for analysis of phosphoproteomic data generated using CID and ETD fragmentation and used it to demonstrate the utility of a data-dependent neutral loss triggered ETD fragmentation strategy for high confidence phosphopeptide identification and phosphorylation site localisation. We report and benchmark a data analysis pipeline for phosphoproteomic data analysis. Combined use of Mascot Percolator and turbo-SLoMo to compare fragmentation methods CID and ETD fragmentation for phosphorylation site identification Demonstrate the utility of data-dependent neutral loss triggered ETD fragmentation High confidence of phosphoproteomic analysis using ETD/CID spectral pairs
Collapse
Affiliation(s)
- Mark O Collins
- Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - James C Wright
- Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Matthew Jones
- Malaria Programme, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Julian C Rayner
- Malaria Programme, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Jyoti S Choudhary
- Proteomic Mass Spectrometry, The Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
| |
Collapse
|
9
|
Ivanov MV, Levitsky LI, Lobas AA, Panic T, Laskay ÜA, Mitulovic G, Schmid R, Pridatchenko ML, Tsybin YO, Gorshkov MV. Empirical Multidimensional Space for Scoring Peptide Spectrum Matches in Shotgun Proteomics. J Proteome Res 2014; 13:1911-20. [DOI: 10.1021/pr401026y] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Mark V. Ivanov
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Inststitutskii per., 9, Dolgoprudny 141700, Moscow region, Russia
| | - Lev I. Levitsky
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Inststitutskii per., 9, Dolgoprudny 141700, Moscow region, Russia
| | - Anna A. Lobas
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Inststitutskii per., 9, Dolgoprudny 141700, Moscow region, Russia
| | - Tanja Panic
- Medical University of Vienna, Spitalgasse 23, Vienna 1090, Austria
| | - Ünige A. Laskay
- Biomolecular
Mass Spectrometry Laboratory, Ecole Polytechnique Fédérale de Lausanne, 2 av. Forel, Lausanne 1015, Switzerland
| | - Goran Mitulovic
- Medical University of Vienna, Spitalgasse 23, Vienna 1090, Austria
| | - Rainer Schmid
- Medical University of Vienna, Spitalgasse 23, Vienna 1090, Austria
| | - Marina L. Pridatchenko
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
| | - Yury O. Tsybin
- Biomolecular
Mass Spectrometry Laboratory, Ecole Polytechnique Fédérale de Lausanne, 2 av. Forel, Lausanne 1015, Switzerland
| | - Mikhail V. Gorshkov
- Institute
for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld. 2, Moscow 119334, Russia
- Moscow Institute of Physics and Technology (State University), Inststitutskii per., 9, Dolgoprudny 141700, Moscow region, Russia
| |
Collapse
|
10
|
Perez-Riverol Y, Wang R, Hermjakob H, Müller M, Vesada V, Vizcaíno JA. Open source libraries and frameworks for mass spectrometry based proteomics: a developer's perspective. BIOCHIMICA ET BIOPHYSICA ACTA 2014; 1844:63-76. [PMID: 23467006 PMCID: PMC3898926 DOI: 10.1016/j.bbapap.2013.02.032] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2012] [Revised: 02/05/2013] [Accepted: 02/22/2013] [Indexed: 12/23/2022]
Abstract
Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- Department of Proteomics, Center for Genetic Engineering and Biotechnology, Ciudad de la Habana, Cuba
| | - Rui Wang
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Henning Hermjakob
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Markus Müller
- Proteome Informatics Group, Swiss Institute of Bioinformatics, CMU - 1, rue Michel Servet CH-1211 Geneva, Switzerland
| | - Vladimir Vesada
- Department of Proteomics, Center for Genetic Engineering and Biotechnology, Ciudad de la Habana, Cuba
| | - Juan Antonio Vizcaíno
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
11
|
Granholm V, Kim S, Navarro JCF, Sjölund E, Smith RD, Käll L. Fast and accurate database searches with MS-GF+Percolator. J Proteome Res 2013; 13:890-7. [PMID: 24344789 DOI: 10.1021/pr400937n] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community.
Collapse
Affiliation(s)
- Viktor Granholm
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University , Solna, Sweden
| | | | | | | | | | | |
Collapse
|
12
|
Xu M, Li Z, Li L. Combining Percolator with X!Tandem for Accurate and Sensitive Peptide Identification. J Proteome Res 2013; 12:3026-33. [DOI: 10.1021/pr4001256] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Mingguo Xu
- Department
of Chemistry, University of Alberta, Edmonton, Alberta T6G 2G2, Canada
| | - Zhendong Li
- Department
of Chemistry, University of Alberta, Edmonton, Alberta T6G 2G2, Canada
| | - Liang Li
- Department
of Chemistry, University of Alberta, Edmonton, Alberta T6G 2G2, Canada
| |
Collapse
|