1
|
Zhou WJ, Yang H, Zeng WF, Zhang K, Chi H, He SM. pValid: Validation Beyond the Target-Decoy Approach for Peptide Identification in Shotgun Proteomics. J Proteome Res 2019; 18:2747-2758. [PMID: 31244209 DOI: 10.1021/acs.jproteome.8b00993] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
As the de facto validation method in mass spectrometry-based proteomics, the target-decoy approach determines a threshold to estimate the false discovery rate and then filters those identifications beyond the threshold. However, the incorrect identifications within the threshold are still unknown and further validation methods are needed. In this study, we characterized a framework of validation and investigated a number of common and novel validation methods. We first defined the accuracy of a validation method by its false-positive rate (FPR) and false-negative rate (FNR) and, further, proved that a validation method with lower FPR and FNR led to identifications with higher sensitivity and precision. Then we proposed a validation method named pValid that incorporated an open database search and a theoretical spectrum prediction strategy via a machine-learning technology. pValid was compared with four common validation methods as well as a synthetic peptide validation method. Tests on three benchmark data sets indicated that pValid had an FPR of 0.03% and an FNR of 1.79% on average, both superior to the other four common validation methods. Tests on a synthetic peptide data set also indicated that the FPR and FNR of pValid were better than those of the synthetic peptide validation method. Tests on a large-scale human proteome data set indicated that pValid successfully flagged the highest number of incorrect identifications among all five methods. Further considering its cost-effectiveness, pValid has the potential to be a feasible validation tool for peptide identification.
Collapse
Affiliation(s)
- Wen-Jing Zhou
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Hao Yang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Kun Zhang
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| | - Si-Min He
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS) , Institute of Computing Technology, CAS , Beijing , China 100190.,University of Chinese Academy of Sciences , Beijing , China 100049
| |
Collapse
|
2
|
Grover H, Wallstrom G, Wu CC, Gopalakrishnan V. Context-sensitive markov models for peptide scoring and identification from tandem mass spectrometry. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2013; 17:94-105. [PMID: 23289783 PMCID: PMC3567622 DOI: 10.1089/omi.2012.0073] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Peptide and protein identification via tandem mass spectrometry (MS/MS) lies at the heart of proteomic characterization of biological samples. Several algorithms are able to search, score, and assign peptides to large MS/MS datasets. Most popular methods, however, underutilize the intensity information available in the tandem mass spectrum due to the complex nature of the peptide fragmentation process, thus contributing to loss of potential identifications. We present a novel probabilistic scoring algorithm called Context-Sensitive Peptide Identification (CSPI) based on highly flexible Input-Output Hidden Markov Models (IO-HMM) that capture the influence of peptide physicochemical properties on their observed MS/MS spectra. We use several local and global properties of peptides and their fragment ions from literature. Comparison with two popular algorithms, Crux (re-implementation of SEQUEST) and X!Tandem, on multiple datasets of varying complexity, shows that peptide identification scores from our models are able to achieve greater discrimination between true and false peptides, identifying up to ∼25% more peptides at a False Discovery Rate (FDR) of 1%. We evaluated two alternative normalization schemes for fragment ion-intensities, a global rank-based and a local window-based. Our results indicate the importance of appropriate normalization methods for learning superior models. Further, combining our scores with Crux using a state-of-the-art procedure, Percolator, we demonstrate the utility of using scoring features from intensity-based models, identifying ∼4-8 % additional identifications over Percolator at 1% FDR. IO-HMMs offer a scalable and flexible framework with several modeling choices to learn complex patterns embedded in MS/MS data.
Collapse
Affiliation(s)
- Himanshu Grover
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Garrick Wallstrom
- Department of Biomedical Informatics, Arizona State University, Scottsdale, Arizona
| | - Christine C. Wu
- Department of Cell Biology and Physiology, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Vanathi Gopalakrishnan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
3
|
Grover H, Gopalakrishnan V. Efficient Processing of Models for Large-scale Shotgun Proteomics Data. INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING : NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM). INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS, AND WORKSHARING 2012; 2012:591-596. [PMID: 25309967 DOI: 10.4108/icst.collaboratecom.2012.250716] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Mass-spectrometry (MS) based proteomics has become a key enabling technology for the systems approach to biology, providing insights into the protein complement of an organism. Bioinformatics analyses play a critical role in interpretation of large, and often replicated, MS datasets generated across laboratories and institutions. A significant amount of computational effort in the workflow is spent on the identification of protein and peptide components of complex biological samples, and consists of a series of steps relying on large database searches and intricate scoring algorithms. In this work, we share our efforts and experience in efficient handling of these large MS datasets through database indexing and parallelization based on multiprocessor architectures. We also identify important challenges and opportunities that are relevant specifically to the task of peptide and protein identification, and more generally to similar multi-step problems that are inherently parallelizable.
Collapse
Affiliation(s)
- Himanshu Grover
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206-3701 USA ( )
| | - Vanathi Gopalakrishnan
- Department of Biomedical Informatics, and has joint appointments with the Intelligent Systems Program and the Computational & Systems Biology Department, University of Pittsburgh, Pittsburgh, PA 15206-3701 USA. She is also the corresponding author (phone: 412-624-3290; fax: 412-624-5310; )
| |
Collapse
|