1
|
Freestone J, Käll L, Noble WS, Keich U. How to Train a Postprocessor for Tandem Mass Spectrometry Proteomics Database Search While Maintaining Control of the False Discovery Rate. J Proteome Res 2025; 24:2266-2279. [PMID: 40163043 DOI: 10.1021/acs.jproteome.4c00742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Decoy-based methods are a popular choice for the statistical validation of peptide detection in tandem mass spectrometry and proteomics data. Such methods can achieve a substantial boost in statistical power when coupled with postprocessors such as Percolator that use auxiliary features to learn a better-discriminating scoring function. However, we recently showed that Percolator can struggle to control the false discovery rate (FDR) when reporting the list of discovered peptides. To address this problem, we introduce Percolator-RESET, which is an adaptation of our recently developed RESET meta-procedure to the peptide detection problem. Specifically, Percolator-RESET fuses Percolator's iterative SVM training procedure with RESET's general framework to provide valid false discovery rate control. Percolator-RESET operates in both a standard single-decoy mode and a two-decoy mode, with the latter requiring the generation of two decoys per target. We demonstrate that Percolator-RESET controls the FDR in both modes, both theoretically and empirically, while typically reporting only a marginally smaller number of discoveries than Percolator in the single-decoy mode. The two-decoy mode is marginally more powerful than both Percolator and the single-decoy mode and exhibits less variability than the latter.
Collapse
Affiliation(s)
- Jack Freestone
- School of Mathematics and Statistics F07, University of Sydney, New South Wales 2006, Australia
| | - Lukas Käll
- Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm SE-100 44, Sweden
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Uri Keich
- School of Mathematics and Statistics F07, University of Sydney, New South Wales 2006, Australia
| |
Collapse
|
2
|
Freestone J, Noble WS, Keich U. Analysis of Tandem Mass Spectrometry Data with CONGA: Combining Open and Narrow Searches with Group-Wise Analysis. J Proteome Res 2024; 23:1894-1906. [PMID: 38652578 DOI: 10.1021/acs.jproteome.3c00399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Searching for tandem mass spectrometry proteomics data against a database is a well-established method for assigning peptide sequences to observed spectra but typically cannot identify peptides harboring unexpected post-translational modifications (PTMs). Open modification searching aims to address this problem by allowing a spectrum to match a peptide even if the spectrum's precursor mass differs from the peptide mass. However, expanding the search space in this way can lead to a loss of statistical power to detect peptides. We therefore developed a method, called CONGA (combining open and narrow searches with group-wise analysis), that takes into account results from both types of searches─a traditional "narrow window" search and an open modification search─while carrying out rigorous false discovery rate control. The result is an algorithm that provides the best of both worlds: the ability to detect unexpected PTMs without a concomitant loss of power to detect unmodified peptides.
Collapse
Affiliation(s)
- Jack Freestone
- School of Mathematics and Statistics F07, University of Sydney, NSW 2006, Australia
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Uri Keich
- School of Mathematics and Statistics F07, University of Sydney, NSW 2006, Australia
| |
Collapse
|
3
|
Kertesz-Farkas A, Nii Adoquaye Acquaye FL, Bhimani K, Eng JK, Fondrie WE, Grant C, Hoopmann MR, Lin A, Lu YY, Moritz RL, MacCoss MJ, Noble WS. The Crux Toolkit for Analysis of Bottom-Up Tandem Mass Spectrometry Proteomics Data. J Proteome Res 2023; 22:561-569. [PMID: 36598107 DOI: 10.1021/acs.jproteome.2c00615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The Crux tandem mass spectrometry data analysis toolkit provides a collection of algorithms for analyzing bottom-up proteomics tandem mass spectrometry data. Many publications have described various individual components of Crux, but a comprehensive summary has not been published since 2014. The goal of this work is to summarize the functionality of Crux, focusing on developments since 2014. We begin with empirical results demonstrating our recently implemented speedups to the Tide search engine. Other new features include a new score function in Tide, two new confidence estimation procedures, as well as three new tools: Param-medic for estimating search parameters directly from mass spectrometry data, Kojak for searching cross-linked mass spectra, and DIAmeter for searching data independent acquisition data against a sequence database.
Collapse
Affiliation(s)
- Attila Kertesz-Farkas
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia
| | - Frank Lawrence Nii Adoquaye Acquaye
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia
| | - Kishankumar Bhimani
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, 850 Republican Street, Seattle, Washington 98109-4725, United States
| | - William E Fondrie
- Talus Bioscience550 17th Avenue, Seattle, Washington 98122, United States
| | - Charles Grant
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - Michael R Hoopmann
- Insititute for Systems Biology, 401 Terry Avenue N, Seattle, Washington 98109, United States
| | - Andy Lin
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - Yang Y Lu
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Insititute for Systems Biology, 401 Terry Avenue N, Seattle, Washington 98109, United States
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States.,Paul G. Allen School of Computer Science and Engineering, University of Washington185 E Stevens Way NE, Seattle, Washington 98195-2350, United States
| |
Collapse
|