1
|
Freestone J, Noble WS, Keich U. Analysis of Tandem Mass Spectrometry Data with CONGA: Combining Open and Narrow Searches with Group-Wise Analysis. J Proteome Res 2024. [PMID: 38652578 DOI: 10.1021/acs.jproteome.3c00399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Searching for tandem mass spectrometry proteomics data against a database is a well-established method for assigning peptide sequences to observed spectra but typically cannot identify peptides harboring unexpected post-translational modifications (PTMs). Open modification searching aims to address this problem by allowing a spectrum to match a peptide even if the spectrum's precursor mass differs from the peptide mass. However, expanding the search space in this way can lead to a loss of statistical power to detect peptides. We therefore developed a method, called CONGA (combining open and narrow searches with group-wise analysis), that takes into account results from both types of searches─a traditional "narrow window" search and an open modification search─while carrying out rigorous false discovery rate control. The result is an algorithm that provides the best of both worlds: the ability to detect unexpected PTMs without a concomitant loss of power to detect unmodified peptides.
Collapse
Affiliation(s)
- Jack Freestone
- School of Mathematics and Statistics F07, University of Sydney, NSW 2006, Australia
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Uri Keich
- School of Mathematics and Statistics F07, University of Sydney, NSW 2006, Australia
| |
Collapse
|
2
|
Hollin T, Abel S, Banks C, Hristov B, Prudhomme J, Hales K, Florens L, Stafford Noble W, Le Roch KG. Proteome-Wide Identification of RNA-dependent proteins and an emerging role for RNAs in Plasmodium falciparum protein complexes. Nat Commun 2024; 15:1365. [PMID: 38355719 PMCID: PMC10866993 DOI: 10.1038/s41467-024-45519-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 01/26/2024] [Indexed: 02/16/2024] Open
Abstract
Ribonucleoprotein complexes are composed of RNA, RNA-dependent proteins (RDPs) and RNA-binding proteins (RBPs), and play fundamental roles in RNA regulation. However, in the human malaria parasite, Plasmodium falciparum, identification and characterization of these proteins are particularly limited. In this study, we use an unbiased proteome-wide approach, called R-DeeP, a method based on sucrose density gradient ultracentrifugation, to identify RDPs. Quantitative analysis by mass spectrometry identifies 898 RDPs, including 545 proteins not yet associated with RNA. Results are further validated using a combination of computational and molecular approaches. Overall, this method provides the first snapshot of the Plasmodium protein-protein interaction network in the presence and absence of RNA. R-DeeP also helps to reconstruct Plasmodium multiprotein complexes based on co-segregation and deciphers their RNA-dependence. One RDP candidate, PF3D7_0823200, is functionally characterized and validated as a true RBP. Using enhanced crosslinking and immunoprecipitation followed by high-throughput sequencing (eCLIP-seq), we demonstrate that this protein interacts with various Plasmodium non-coding transcripts, including the var genes and ap2 transcription factors.
Collapse
Affiliation(s)
- Thomas Hollin
- Department of Molecular, Cell and Systems Biology, University of California Riverside, Riverside, CA, USA
| | - Steven Abel
- Department of Molecular, Cell and Systems Biology, University of California Riverside, Riverside, CA, USA
| | - Charles Banks
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Borislav Hristov
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jacques Prudhomme
- Department of Molecular, Cell and Systems Biology, University of California Riverside, Riverside, CA, USA
| | - Kianna Hales
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Karine G Le Roch
- Department of Molecular, Cell and Systems Biology, University of California Riverside, Riverside, CA, USA.
| |
Collapse
|
3
|
Kertesz-Farkas A, Nii Adoquaye Acquaye FL, Bhimani K, Eng JK, Fondrie WE, Grant C, Hoopmann MR, Lin A, Lu YY, Moritz RL, MacCoss MJ, Noble WS. The Crux Toolkit for Analysis of Bottom-Up Tandem Mass Spectrometry Proteomics Data. J Proteome Res 2023; 22:561-569. [PMID: 36598107 DOI: 10.1021/acs.jproteome.2c00615] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The Crux tandem mass spectrometry data analysis toolkit provides a collection of algorithms for analyzing bottom-up proteomics tandem mass spectrometry data. Many publications have described various individual components of Crux, but a comprehensive summary has not been published since 2014. The goal of this work is to summarize the functionality of Crux, focusing on developments since 2014. We begin with empirical results demonstrating our recently implemented speedups to the Tide search engine. Other new features include a new score function in Tide, two new confidence estimation procedures, as well as three new tools: Param-medic for estimating search parameters directly from mass spectrometry data, Kojak for searching cross-linked mass spectra, and DIAmeter for searching data independent acquisition data against a sequence database.
Collapse
Affiliation(s)
- Attila Kertesz-Farkas
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia
| | - Frank Lawrence Nii Adoquaye Acquaye
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia
| | - Kishankumar Bhimani
- Department of Data Analysis and Artificial Intelligence and Laboratory on AI for Computational Biology, Faculty of Computer Science, HSE University, 20 Myasnitskaya ulitsa, Moscow 101000, Russia
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, 850 Republican Street, Seattle, Washington 98109-4725, United States
| | - William E Fondrie
- Talus Bioscience550 17th Avenue, Seattle, Washington 98122, United States
| | - Charles Grant
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - Michael R Hoopmann
- Insititute for Systems Biology, 401 Terry Avenue N, Seattle, Washington 98109, United States
| | - Andy Lin
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - Yang Y Lu
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Insititute for Systems Biology, 401 Terry Avenue N, Seattle, Washington 98109, United States
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington3720 15th Avenue NE, Seattle, Washington 98195, United States.,Paul G. Allen School of Computer Science and Engineering, University of Washington185 E Stevens Way NE, Seattle, Washington 98195-2350, United States
| |
Collapse
|
4
|
Lin A, Short T, Noble WS, Keich U. Improving Peptide-Level Mass Spectrometry Analysis via Double Competition. J Proteome Res 2022; 21:2412-2420. [PMID: 36166314 PMCID: PMC10108709 DOI: 10.1021/acs.jproteome.2c00282] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The analysis of shotgun proteomics data often involves generating lists of inferred peptide-spectrum matches (PSMs) and/or of peptides. The canonical approach for generating these discovery lists is by controlling the false discovery rate (FDR), most commonly through target-decoy competition (TDC). At the PSM level, TDC is implemented by competing each spectrum's best-scoring target (real) peptide match with its best match against a decoy database. This PSM-level procedure can be adapted to the peptide level by selecting the top-scoring PSM per peptide prior to FDR estimation. Here, we first highlight and empirically augment a little known previous work by He et al., which showed that TDC-based PSM-level FDR estimates can be liberally biased. We thus propose that researchers instead focus on peptide-level analysis. We then investigate three ways to carry out peptide-level TDC and show that the most common method ("PSM-only") offers the lowest statistical power in practice. An alternative approach that carries out a double competition, first at the PSM and then at the peptide level ("PSM-and-peptide"), is the most powerful method, yielding an average increase of 17% more discovered peptides at 1% FDR threshold relative to the PSM-only method.
Collapse
Affiliation(s)
- Andy Lin
- Chemical and Biological Signatures, Pacific Northwest National Laboratory, Seattle, Washington 98109, United States
| | - Temana Short
- School of Mathematics & Statistics, University of Sydney, New South Wales, 2006, Australia
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Uri Keich
- School of Mathematics & Statistics, University of Sydney, New South Wales, 2006, Australia
| |
Collapse
|
5
|
Lin A, Plubell DL, Keich U, Noble WS. Accurately Assigning Peptides to Spectra When Only a Subset of Peptides Are Relevant. J Proteome Res 2021; 20:4153-4164. [PMID: 34236864 PMCID: PMC8489664 DOI: 10.1021/acs.jproteome.1c00483] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The standard proteomics database search strategy involves searching spectra against a peptide database and estimating the false discovery rate (FDR) of the resulting set of peptide-spectrum matches. One assumption of this protocol is that all the peptides in the database are relevant to the hypothesis being investigated. However, in settings where researchers are interested in a subset of peptides, alternative search and FDR control strategies are needed. Recently, two methods were proposed to address this problem: subset-search and all-sub. We show that both methods fail to control the FDR. For subset-search, this failure is due to the presence of "neighbor" peptides, which are defined as irrelevant peptides with a similar precursor mass and fragmentation spectrum as a relevant peptide. Not considering neighbors compromises the FDR estimate because a spectrum generated by an irrelevant peptide can incorrectly match well to a relevant peptide. Therefore, we have developed a new method, "subset-neighbor search" (SNS), that accounts for neighbor peptides. We show evidence that SNS controls the FDR when neighbors are present and that SNS outperforms group-FDR, the only other method that appears to control the FDR relative to a subset of relevant peptides.
Collapse
Affiliation(s)
- Andy Lin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Deanna L. Plubell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Uri Keich
- School of Mathematics and Statistics, University of Sydney, NSW, Australia
| | - William S. Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School for Computer Science and Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
6
|
Abstract
Proteomics studies rely on the accurate assignment of peptides to the acquired tandem mass spectra-a task where machine learning algorithms have proven invaluable. We describe mokapot, which provides a flexible semisupervised learning algorithm that allows for highly customized analyses. We demonstrate some of the unique features of mokapot by improving the detection of RNA-cross-linked peptides from an analysis of RNA-binding proteins and increasing the consistency of peptide detection in a single-cell proteomics study.
Collapse
Affiliation(s)
- William
E. Fondrie
- Department
of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - William S. Noble
- Department
of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul
G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
7
|
Abstract
Computational tools used for genomic analyses are becoming more accurate but also increasingly sophisticated and complex. This introduces a new problem in that these pieces of software have a large number of tunable parameters that often have a large influence on the results that are reported. We quantify the impact of parameter choice on transcript assembly and take some first steps toward generating a truly automated genomic analysis pipeline by developing a method for automatically choosing input-specific parameter values for reference-based transcript assembly using the Scallop tool. By choosing parameter values for each input, the area under the receiver operator characteristic curve (AUC) when comparing assembled transcripts to a reference transcriptome is increased by an average of 28.9% over using only the default parameter choices on 1595 RNA-Seq samples in the Sequence Read Archive. This approach is general, and when applied to StringTie, it increases the AUC by an average of 13.1% on a set of 65 RNA-Seq experiments from ENCODE. Parameter advisors for both Scallop and StringTie are available on Github.
Collapse
Affiliation(s)
- Dan Deblasio
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
- Current affiliation: Department of Computer Science, The University of Texas at EI Paso, EI Paso, Texas, USA
| | - Kwanho Kim
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
- Current affiliation: Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Carl Kingsford
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
8
|
Bioinformatics Methods for Mass Spectrometry-Based Proteomics Data Analysis. Int J Mol Sci 2020; 21:ijms21082873. [PMID: 32326049 PMCID: PMC7216093 DOI: 10.3390/ijms21082873] [Citation(s) in RCA: 109] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 04/16/2020] [Accepted: 04/18/2020] [Indexed: 01/15/2023] Open
Abstract
Recent advances in mass spectrometry (MS)-based proteomics have enabled tremendous progress in the understanding of cellular mechanisms, disease progression, and the relationship between genotype and phenotype. Though many popular bioinformatics methods in proteomics are derived from other omics studies, novel analysis strategies are required to deal with the unique characteristics of proteomics data. In this review, we discuss the current developments in the bioinformatics methods used in proteomics and how they facilitate the mechanistic understanding of biological processes. We first introduce bioinformatics software and tools designed for mass spectrometry-based protein identification and quantification, and then we review the different statistical and machine learning methods that have been developed to perform comprehensive analysis in proteomics studies. We conclude with a discussion of how quantitative protein data can be used to reconstruct protein interactions and signaling networks.
Collapse
|
9
|
Fondrie WE, Noble WS. Machine Learning Strategy That Leverages Large Data sets to Boost Statistical Power in Small-Scale Experiments. J Proteome Res 2020; 19:1267-1274. [PMID: 32009418 PMCID: PMC8455073 DOI: 10.1021/acs.jproteome.9b00780] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semi-supervised algorithms to learn models directly from the datasets that they analyze. Although these methods are effective for many proteomics experiments, we suspected that they may be suboptimal for experiments of smaller scale. In this work, we found that the power and consistency of Percolator results was reduced as the size of the experiment was decreased. As an alternative, we propose a different operating mode for Percolator: learn a model with Percolator from a large dataset and use the learned model to evaluate the small-scale experiment. We call this a “static modeling” approach, in contrast to Percolator’s usual “dynamic model” that is trained anew for each dataset. We applied this static modeling approach to two settings: small, gel-based experiments and single-cell proteomics. In both cases, static models increased the yield of detected peptides and eliminated the model-induced variability of the standard dynamic approach. These results suggest that static models are a powerful tool for bringing the full benefits of Percolator and other semi-supervised algorithms to small-scale experiments.
Collapse
Affiliation(s)
- William E Fondrie
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065, United States.,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195-5065, United States
| |
Collapse
|
10
|
May DH, Tamura K, Noble WS. Detecting Modifications in Proteomics Experiments with Param-Medic. J Proteome Res 2019; 18:1902-1906. [PMID: 30714740 DOI: 10.1021/acs.jproteome.8b00954] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Searching tandem mass spectra against a peptide database requires accurate knowledge of various experimental parameters, including machine settings and details of the sample preparation protocol. In some cases, such as in reanalysis of public data sets, this experimental metadata may be missing or inaccurate. We describe a method for automatically inferring the presence of various types of modifications, including stable-isotope and isobaric labeling and tandem mass tags as well as the enrichment of phosphorylated peptides, directly from a given set of mass spectra. We demonstrate the sensitivity and specificity of the proposed approach, and we provide open-source Python and C++ implementations in a new version of the software tool Param-Medic.
Collapse
Affiliation(s)
- Damon H May
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - Kaipo Tamura
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - William S Noble
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States.,Paul G. Allen School of Computer Science and Engineering , University of Washington , Seattle , Washington 98195 , United States
| |
Collapse
|
11
|
Miller SE, Rizzo AI, Waldbauer JR. Postnovo: Postprocessing Enables Accurate and FDR-Controlled de Novo Peptide Sequencing. J Proteome Res 2018; 17:3671-3680. [PMID: 30277077 DOI: 10.1021/acs.jproteome.8b00278] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
De novo sequencing offers an alternative to database search methods for peptide identification from mass spectra. Since it does not rely on a predetermined database of expected or potential sequences in the sample, de novo sequencing is particularly appropriate for samples lacking a well-defined or comprehensive reference database. However, the low accuracy of many de novo sequence predictions has prevented the widespread use of the variety of sequencing tools currently available. Here, we present a new open-source tool, Postnovo, that postprocesses de novo sequence predictions to find high-accuracy results. Postnovo uses a predictive model to rescore and rerank candidate sequences in a manner akin to database search postprocessing tools such as Percolator. Postnovo leverages the output from multiple de novo sequencing tools in its own analyses, producing many times the length of amino acid sequence information (including both full- and partial-length peptide sequences) at an equivalent false discovery rate (FDR) compared to any individual tool. We present a methodology to reliably screen the sequence predictions to a desired FDR given the Postnovo sequence score. We validate Postnovo with multiple data sets and demonstrate its ability to identify proteins that are missed by database search even in samples with paired reference databases.
Collapse
Affiliation(s)
- Samuel E Miller
- Department of the Geophysical Sciences , University of Chicago , 5734 South Ellis Avenue , Chicago , Illinois 60637 , United States
| | - Adriana I Rizzo
- Department of the Geophysical Sciences , University of Chicago , 5734 South Ellis Avenue , Chicago , Illinois 60637 , United States
| | - Jacob R Waldbauer
- Department of the Geophysical Sciences , University of Chicago , 5734 South Ellis Avenue , Chicago , Illinois 60637 , United States
| |
Collapse
|
12
|
Jo H, Paek E. Data-Dependent Scoring Parameter Optimization in MS-GF+ Using Spectrum Quality Filter. J Proteome Res 2018; 17:3593-3598. [PMID: 30033731 DOI: 10.1021/acs.jproteome.8b00415] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Most database search tools for proteomics have their own scoring parameter sets depending on experimental conditions such as fragmentation methods, instruments, digestion enzymes, and so on. These scoring parameter sets are usually predefined by tool developers and cannot be modified by users. The number of different experimental conditions grows as the technology develops, and the given set of scoring parameters could be suboptimal for tandem mass spectrometry data acquired using new sample preparation or fragmentation methods. Here we introduce a new approach to optimize scoring parameters in a data-dependent manner using a spectrum quality filter. The new approach conducts a preliminary search for the spectra selected by the spectrum quality filter. Search results from the preliminary search are used to generate data-dependent scoring parameters; then, the full search over the entire input spectra is conducted using the learned scoring parameters. We show that the new approach yields more and better peptide-spectrum matches than the conventional search using built-in scoring parameters when compared at the same 1% false discovery rate.
Collapse
Affiliation(s)
- Hyunjin Jo
- Department of Computer Science , Hanyang University , Seongdong-gu , Seoul 04763 , Korea
| | - Eunok Paek
- Department of Computer Science , Hanyang University , Seongdong-gu , Seoul 04763 , Korea
| |
Collapse
|
13
|
Levitsky LI, Ivanov MV, Lobas AA, Bubis JA, Tarasova IA, Solovyeva EM, Pridatchenko ML, Gorshkov MV. IdentiPy: An Extensible Search Engine for Protein Identification in Shotgun Proteomics. J Proteome Res 2018; 17:2249-2255. [PMID: 29682971 DOI: 10.1021/acs.jproteome.7b00640] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We present an open-source, extensible search engine for shotgun proteomics. Implemented in Python programming language, IdentiPy shows competitive processing speed and sensitivity compared with the state-of-the-art search engines. It is equipped with a user-friendly web interface, IdentiPy Server, enabling the use of a single server installation accessed from multiple workstations. Using a simplified version of X!Tandem scoring algorithm and its novel "autotune" feature, IdentiPy outperforms the popular alternatives on high-resolution data sets. Autotune adjusts the search parameters for the particular data set, resulting in improved search efficiency and simplifying the user experience. IdentiPy with the autotune feature shows higher sensitivity compared with the evaluated search engines. IdentiPy Server has built-in postprocessing and protein inference procedures and provides graphic visualization of the statistical properties of the data set and the search results. It is open-source and can be freely extended to use third-party scoring functions or processing algorithms and allows customization of the search workflow for specialized applications.
Collapse
Affiliation(s)
- Lev I Levitsky
- Moscow Institute of Physics and Technology , 9 Institutskiy per. , Dolgoprudny , Moscow Region 141700 , Russian Federation.,V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Mark V Ivanov
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Anna A Lobas
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Julia A Bubis
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Irina A Tarasova
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Elizaveta M Solovyeva
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Marina L Pridatchenko
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| | - Mikhail V Gorshkov
- V.L. Talrose Institute for Energy Problems of Chemical Physics , Russian Academy of Sciences , 38 Leninsky Pr., Bld. 2 , Moscow 119334 , Russia
| |
Collapse
|
14
|
Solntsev SK, Shortreed MR, Frey BL, Smith LM. Enhanced Global Post-translational Modification Discovery with MetaMorpheus. J Proteome Res 2018; 17:1844-1851. [PMID: 29578715 DOI: 10.1021/acs.jproteome.7b00873] [Citation(s) in RCA: 151] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Correct identification of protein post-translational modifications (PTMs) is crucial to understanding many aspects of protein function in biological processes. G-PTM-D is a recently developed technique for global identification and localization of PTMs. Spectral file calibration prior to applying G-PTM-D, and algorithmic enhancements in the peptide database search significantly increase the accuracy, speed, and scope of PTM identification. We enhance G-PTM-D by using multinotch searches and demonstrate its effectiveness in identification of numerous types of PTMs including high-mass modifications such as glycosylations. The changes described in this work lead to a 20% increase in the number of identified modifications and an order of magnitude decrease in search time. The complete workflow is implemented in MetaMorpheus, a software tool that integrates the database search procedure, identification of coisolated peptides, spectral calibration, and the enhanced G-PTM-D workflow. Multinotch searches are also shown to be useful in contexts other than G-PTM-D by producing superior results when used instead of standard narrow-window and open database searches.
Collapse
|
15
|
Misra BB. Updates on resources, software tools, and databases for plant proteomics in 2016-2017. Electrophoresis 2018; 39:1543-1557. [PMID: 29420853 DOI: 10.1002/elps.201700401] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2017] [Revised: 01/23/2018] [Accepted: 02/02/2018] [Indexed: 11/05/2022]
Abstract
Proteomics data processing, annotation, and analysis can often lead to major hurdles in large-scale high-throughput bottom-up proteomics experiments. Given the recent rise in protein-based big datasets being generated, efforts in in silico tool development occurrences have had an unprecedented increase; so much so, that it has become increasingly difficult to keep track of all the advances in a particular academic year. However, these tools benefit the plant proteomics community in circumventing critical issues in data analysis and visualization, as these continually developing open-source and community-developed tools hold potential in future research efforts. This review will aim to introduce and summarize more than 50 software tools, databases, and resources developed and published during 2016-2017 under the following categories: tools for data pre-processing and analysis, statistical analysis tools, peptide identification tools, databases and spectral libraries, and data visualization and interpretation tools. Intended for a well-informed proteomics community, finally, efforts in data archiving and validation datasets for the community will be discussed as well. Additionally, the author delineates the current and most commonly used proteomics tools in order to introduce novice readers to this -omics discovery platform.
Collapse
Affiliation(s)
- Biswapriya B Misra
- Department of Internal Medicine, Section of Molecular Medicine, Medical Center Boulevard, Winston-Salem, NC, USA
| |
Collapse
|