1
|
Cox J. Prediction of peptide mass spectral libraries with machine learning. Nat Biotechnol 2023; 41:33-43. [PMID: 36008611 DOI: 10.1038/s41587-022-01424-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 07/11/2022] [Indexed: 01/21/2023]
Abstract
The recent development of machine learning methods to identify peptides in complex mass spectrometric data constitutes a major breakthrough in proteomics. Longstanding methods for peptide identification, such as search engines and experimental spectral libraries, are being superseded by deep learning models that allow the fragmentation spectra of peptides to be predicted from their amino acid sequence. These new approaches, including recurrent neural networks and convolutional neural networks, use predicted in silico spectral libraries rather than experimental libraries to achieve higher sensitivity and/or specificity in the analysis of proteomics data. Machine learning is galvanizing applications that involve large search spaces, such as immunopeptidomics and proteogenomics. Current challenges in the field include the prediction of spectra for peptides with post-translational modifications and for cross-linked pairs of peptides. Permeation of machine-learning-based spectral prediction into search engines and spectrum-centric data-independent acquisition workflows for diverse peptide classes and measurement conditions will continue to push sensitivity and dynamic range in proteomics applications in the coming years.
Collapse
Affiliation(s)
- Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany.
- Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway.
| |
Collapse
|
2
|
Verbruggen S, Gessulat S, Gabriels R, Matsaroki A, Van de Voorde H, Kuster B, Degroeve S, Martens L, Van Criekinge W, Wilhelm M, Menschaert G. Spectral Prediction Features as a Solution for the Search Space Size Problem in Proteogenomics. Mol Cell Proteomics 2021; 20:100076. [PMID: 33823297 PMCID: PMC8214147 DOI: 10.1016/j.mcpro.2021.100076] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 03/04/2021] [Accepted: 03/25/2021] [Indexed: 11/17/2022] Open
Abstract
Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting. First proteogenomics with PSM rescoring using machine learning–predicted spectra Demonstrated on both ribosome profiling and nanopore RNA-Seq–derived databases Rescoring leads to elevated stringency and increased identification rates Rescoring compensates for the search space size issues in proteogenomics
Collapse
Affiliation(s)
- Steven Verbruggen
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium
| | - Siegfried Gessulat
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Ralf Gabriels
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | | | | | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Sven Degroeve
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Lennart Martens
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Wim Van Criekinge
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Gerben Menschaert
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium.
| |
Collapse
|
3
|
Cautereels J, Van Hee N, Chatterjee S, Van Alsenoy C, Lemière F, Blockhuys F. QCMS 2 as a new method for providing insight into peptide fragmentation: The influence of the side-chain and inter-side-chain interactions. JOURNAL OF MASS SPECTROMETRY : JMS 2020; 55:e4446. [PMID: 31652378 DOI: 10.1002/jms.4446] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 09/12/2019] [Accepted: 09/21/2019] [Indexed: 06/10/2023]
Abstract
The identification of peptides and proteins from tandem mass spectra is a difficult task and multiple tools have been developed to aid this identification. We present a new method called quantum chemical mass spectrometry for materials science (QCMS2 ), which is based on quantum chemical calculations of bond orders, reaction, and transition-state energies at the DFT/B3LYP/6-311+G* level of theory. The method was used to describe the fragmentation pathways of five X-His-Ser tripeptides with X = Asn, Asp, Glu, Ser, and Trp, thereby focusing on the influence of the side chain and inter-side-chain interactions on the fragmentation. The main features in the mass spectra of the five tripeptides were correctly reproduced, and a number of fragments were assigned to fragmentations involving the side chain and the influence of inter-side-chain interactions. Product ion spectra were recorded to evaluate the capabilities and limitations of QCMS2 and a number of conventional tools.
Collapse
Affiliation(s)
- Julie Cautereels
- Department of Chemistry, University of Antwerp, Antwerp, Belgium
| | - Nils Van Hee
- Department of Chemistry, University of Antwerp, Antwerp, Belgium
| | - Sneha Chatterjee
- Department of Chemistry, University of Antwerp, Antwerp, Belgium
| | | | - Filip Lemière
- Department of Chemistry, University of Antwerp, Antwerp, Belgium
| | - Frank Blockhuys
- Department of Chemistry, University of Antwerp, Antwerp, Belgium
| |
Collapse
|
4
|
Verbruggen S, Ndah E, Van Criekinge W, Gessulat S, Kuster B, Wilhelm M, Van Damme P, Menschaert G. PROTEOFORMER 2.0: Further Developments in the Ribosome Profiling-assisted Proteogenomic Hunt for New Proteoforms. Mol Cell Proteomics 2019; 18:S126-S140. [PMID: 31040227 PMCID: PMC6692777 DOI: 10.1074/mcp.ra118.001218] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 04/30/2019] [Indexed: 12/20/2022] Open
Abstract
PROTEOFORMER is a pipeline that enables the automated processing of data derived from ribosome profiling (RIBO-seq, i.e. the sequencing of ribosome-protected mRNA fragments). As such, genome-wide ribosome occupancies lead to the delineation of data-specific translation product candidates and these can improve the mass spectrometry-based identification. Since its first publication, different upgrades, new features and extensions have been added to the PROTEOFORMER pipeline. Some of the most important upgrades include P-site offset calculation during mapping, comprehensive data pre-exploration, the introduction of two alternative proteoform calling strategies and extended pipeline output features. These novelties are illustrated by analyzing ribosome profiling data of human HCT116 and Jurkat data. The different proteoform calling strategies are used alongside one another and in the end combined together with reference sequences from UniProt. Matching mass spectrometry data are searched against this extended search space with MaxQuant. Overall, besides annotated proteoforms, this pipeline leads to the identification and validation of different categories of new proteoforms, including translation products of up- and downstream open reading frames, 5' and 3' extended and truncated proteoforms, single amino acid variants, splice variants and translation products of so-called noncoding regions. Further, proof-of-concept is reported for the improvement of spectrum matching by including Prosit, a deep neural network strategy that adds extra fragmentation spectrum intensity features to the analysis. In the light of ribosome profiling-driven proteogenomics, it is shown that this allows validating the spectrum matches of newly identified proteoforms with elevated stringency. These updates and novel conclusions provide new insights and lessons for the ribosome profiling-based proteogenomic research field. More practical information on the pipeline, raw code, the user manual (README) and explanations on the different modes of availability can be found at the GitHub repository of PROTEOFORMER: https://github.com/Biobix/proteoformer.
Collapse
Affiliation(s)
- Steven Verbruggen
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium.
| | - Elvis Ndah
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, Ghent, Belgium
| | - Wim Van Criekinge
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Siegfried Gessulat
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Munich, Germany; SAP SE, Potsdam, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Munich, Germany
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Munich, Germany
| | - Petra Van Damme
- VIB-UGent Center for Medical Biotechnology, Ghent, Belgium; Department of Biochemistry and Microbiology, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Gerben Menschaert
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium.
| |
Collapse
|
5
|
Dykstra AB, St Brice L, Rodriguez M, Raman B, Izquierdo J, Cook KD, Lynd LR, Hettich RL. Development of a multipoint quantitation method to simultaneously measure enzymatic and structural components of the Clostridium thermocellum cellulosome protein complex. J Proteome Res 2013; 13:692-701. [PMID: 24274857 DOI: 10.1021/pr400788e] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Clostridium thermocellum has emerged as a leading bioenergy-relevant microbe due to its ability to solubilize cellulose into carbohydrates, mediated by multicomponent membrane-attached complexes termed cellulosomes. To probe microbial cellulose utilization rates, it is desirable to be able to measure the concentrations of saccharolytic enzymes and estimate the total amount of cellulosome present on a mass basis. Current cellulase determination methodologies involve labor-intensive purification procedures and only allow for indirect determination of abundance. We have developed a method using multiple reaction monitoring (MRM-MS) to simultaneously quantitate both enzymatic and structural components of the cellulosome protein complex in samples ranging in complexity from purified cellulosomes to whole cell lysates, as an alternative to a previously developed enzyme-linked immunosorbent assay (ELISA) method of cellulosome quantitation. The precision of the cellulosome mass concentration in technical replicates is better than 5% relative standard deviation for all samples, indicating high precision for determination of the mass concentration of cellulosome components.
Collapse
Affiliation(s)
- Andrew B Dykstra
- Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6131, United States
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Abstract
MOTIVATION Tandem mass spectrometry provides the means to match mass spectrometry signal observations with the chemical entities that generated them. The technology produces signal spectra that contain information about the chemical dissociation pattern of a peptide that was forced to fragment using methods like collision-induced dissociation. The ability to predict these MS(2) signals and to understand this fragmentation process is important for sensitive high-throughput proteomics research. RESULTS We present a new tool called MS(2)PIP for predicting the intensity of the most important fragment ion signal peaks from a peptide sequence. MS(2)PIP pre-processes a large dataset with confident peptide-to-spectrum matches to facilitate data-driven model induction using a random forest regression learning algorithm. The intensity predictions of MS(2)PIP were evaluated on several independent evaluation sets and found to correlate significantly better with the observed fragment-ion intensities as compared with the current state-of-the-art PeptideART tool. AVAILABILITY MS(2)PIP code is available for both training and predicting at http://compomics.com/.
Collapse
Affiliation(s)
- Sven Degroeve
- Department of Medical Protein Research, VIB, Ghent 9000, Belgium and Department of Biochemistry, Ghent University, Ghent 9000, Belgium
| | | |
Collapse
|
7
|
Risk BA, Edwards NJ, Giddings MC. A peptide-spectrum scoring system based on ion alignment, intensity, and pair probabilities. J Proteome Res 2013; 12:4240-7. [PMID: 23875887 DOI: 10.1021/pr400286p] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Peppy, the proteogenomic/proteomic search software, employs a novel method for assessing the match quality between an MS/MS spectrum and a theorized peptide sequence. The scoring system uses three score factors calculated with binomial probabilities: the probability that a fragment ion will randomly align with a peptide ion, the probability that the aligning ions will be selected from subsets of the most intense peaks, and the probability that the intensities of fragment ions identified as y-ions are greater than those of their counterpart b-ions. The scores produced by the method act as global confidence scores, which facilitate the accurate comparison of results and the estimation of false discovery rates. Peppy has been integrated into the meta-search engine PepArML to produce meaningful comparisons with Mascot, MSGF+, OMSSA, X!Tandem, k-Score and s-Score. For two of the four data sets examined with the PepArML analysis, Peppy exceeded the accuracy performance of the other scoring systems. Peppy is available for download at http://geneffects.com/peppy .
Collapse
Affiliation(s)
- Brian A Risk
- Department of Biochemistry & Biophysics, UNC School of Medicine, Chapel Hill, North Carolina 27599, United States.
| | | | | |
Collapse
|
8
|
Abstract
Proteogenomic searching is a useful method for identifying novel proteins, annotating genes and detecting peptides unique to an individual genome. The approach, however, can be laborious, as it often requires search segmentation and the use of several unintegrated tools. Furthermore, many proteogenomic efforts have been limited to small genomes, as large genomes can prove impractical due to the required amount of computer memory and computation time. We present Peppy, a software tool designed to perform every necessary task of proteogenomic searches quickly, accurately and automatically. The software generates a peptide database from a genome, tracks peptide loci, matches peptides to MS/MS spectra and assigns confidence values to those matches. Peppy automatically performs a decoy database generation, search and analysis to return identifications at the desired false discovery rate threshold. Written in Java for cross-platform execution, the software is fully multithreaded for enhanced speed. The program can run on regular desktop computers, opening the doors of proteogenomic searching to a wider audience of proteomics and genomics researchers. Peppy is available at http://geneffects.com/peppy .
Collapse
Affiliation(s)
- Brian A Risk
- Department of Biochemistry & Biophysics, UNC School of Medicine, Chapel Hill, North Carolina 27599, United States.
| | | | | |
Collapse
|
9
|
Madsen JA, Xu H, Robinson MR, Horton AP, Shaw JB, Giles DK, Kaoud TS, Dalby KN, Trent MS, Brodbelt JS. High-throughput database search and large-scale negative polarity liquid chromatography-tandem mass spectrometry with ultraviolet photodissociation for complex proteomic samples. Mol Cell Proteomics 2013; 12:2604-14. [PMID: 23695934 DOI: 10.1074/mcp.o113.028258] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS(1) and MS(2) data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of peptide cations, and when these methods were combined into a single search, an increase of up to 13% sequence coverage was observed for the kinases. The ability to sequence peptide anions and cations in alternating scans in the same chromatographic run was also demonstrated. Because ETD has a significant bias toward identifying highly basic peptides, negative UVPD was used to improve the identification of the more acidic peptides in conjunction with positive ETD for the more basic species. In this case, tryptic peptides from the cytosolic section of HeLa cells were analyzed by polarity switching nanoLC-MS/MS utilizing ETD for cation sequencing and UVPD for anion sequencing. Relative to searching using ETD alone, positive/negative polarity switching significantly improved sequence coverages across identified proteins, resulting in a 33% increase in unique peptide identifications and more than twice the number of peptide spectral matches.
Collapse
Affiliation(s)
- James A Madsen
- Department of Chemistry and Biochemistry, The University of Texas at Austin, 1 University Station A5300, Austin, Texas 78712, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Zhang Y, Fonslow BR, Shan B, Baek MC, Yates JR. Protein analysis by shotgun/bottom-up proteomics. Chem Rev 2013; 113:2343-94. [PMID: 23438204 PMCID: PMC3751594 DOI: 10.1021/cr3003533] [Citation(s) in RCA: 937] [Impact Index Per Article: 85.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Yaoyang Zhang
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bryan R. Fonslow
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bing Shan
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Moon-Chang Baek
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- Department of Molecular Medicine, Cell and Matrix Biology Research Institute, School of Medicine, Kyungpook National University, Daegu 700-422, Republic of Korea
| | - John R. Yates
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
11
|
A support for the identification of non-tryptic peptides based on low resolution tandem and sequential mass spectrometry data: The INSPIRE software. Anal Chim Acta 2012; 718:70-7. [DOI: 10.1016/j.aca.2012.01.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2011] [Revised: 12/28/2011] [Accepted: 01/02/2012] [Indexed: 11/17/2022]
|
12
|
Savidor A, Teper D, Gartemann KH, Eichenlaub R, Chalupowicz L, Manulis-Sasson S, Barash I, Tews H, Mayer K, Giannone RJ, Hettich RL, Sessa G. The Clavibacter michiganensis subsp. michiganensis–Tomato Interactome Reveals the Perception of Pathogen by the Host and Suggests Mechanisms of Infection. J Proteome Res 2011; 11:736-50. [DOI: 10.1021/pr200646a] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Alon Savidor
- Department of Molecular Biology and Ecology of Plants, Tel Aviv University, Tel Aviv 69978, Israel
| | - Doron Teper
- Department of Molecular Biology and Ecology of Plants, Tel Aviv University, Tel Aviv 69978, Israel
| | - Karl-Heinz Gartemann
- Department of Genetechnology/Microbiology, Faculty of Biology, University of Bielefeld, 33501 Bielefeld, Germany
| | - Rudolf Eichenlaub
- Department of Genetechnology/Microbiology, Faculty of Biology, University of Bielefeld, 33501 Bielefeld, Germany
| | - Laura Chalupowicz
- Department of Plant Pathology and Weed Research, ARO, The Volcani Center, Bet Dagan 50250, Israel
| | - Shulamit Manulis-Sasson
- Department of Plant Pathology and Weed Research, ARO, The Volcani Center, Bet Dagan 50250, Israel
| | - Isaac Barash
- Department of Molecular Biology and Ecology of Plants, Tel Aviv University, Tel Aviv 69978, Israel
| | - Helena Tews
- Department of Genetechnology/Microbiology, Faculty of Biology, University of Bielefeld, 33501 Bielefeld, Germany
| | - Kerstin Mayer
- Department of Genetechnology/Microbiology, Faculty of Biology, University of Bielefeld, 33501 Bielefeld, Germany
| | - Richard J. Giannone
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Robert L. Hettich
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Guido Sessa
- Department of Molecular Biology and Ecology of Plants, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
13
|
Li W, Song C, Bailey DJ, Tseng GC, Coon JJ, Wysocki VH. Statistical analysis of electron transfer dissociation pairwise fragmentation patterns. Anal Chem 2011; 83:9540-5. [PMID: 22022956 DOI: 10.1021/ac202327r] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Electron transfer dissociation (ETD) is an alternative peptide dissociation method developed in recent years. Compared with the traditional collision induced dissociation (CID) b and y ion formation, ETD generates c and z ions and the backbone cleavage is believed to be less selective. We have reported previously the application of a statistical data mining strategy, K-means clustering, to discover fragmentation patterns for CID, and here we report application of this approach to ETD spectra. We use ETD data sets from digestions with three different proteases. Data analysis shows that selective cleavages do exist for ETD, with the fragmentation patterns affected by protease, charge states, and amino acid residue compositions. It is also noticed that the c(n-1) ion, corresponding to loss of the C-terminal amino acid residue, is statistically strong regardless of the residue at the C-terminus of the peptide, which suggests that the peptide gas phase conformation plays an important role in the dissociation pathways. These patterns provide a basis for mechanism elucidation, spectral prediction, and improvement of ETD peptide identification algorithms.
Collapse
Affiliation(s)
- Wenzhou Li
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona 85721, United States
| | | | | | | | | | | |
Collapse
|
14
|
Yadav AK, Kumar D, Dash D. MassWiz: A Novel Scoring Algorithm with Target-Decoy Based Analysis Pipeline for Tandem Mass Spectrometry. J Proteome Res 2011; 10:2154-60. [DOI: 10.1021/pr200031z] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Amit Kumar Yadav
- Institute of Genomics and Integrative Biology (CSIR), Mall Road, Delhi, India
| | - Dhirendra Kumar
- Institute of Genomics and Integrative Biology (CSIR), Mall Road, Delhi, India
| | - Debasis Dash
- Institute of Genomics and Integrative Biology (CSIR), Mall Road, Delhi, India
| |
Collapse
|
15
|
Li W, Ji L, Goya J, Tan G, Wysocki VH. SQID: an intensity-incorporated protein identification algorithm for tandem mass spectrometry. J Proteome Res 2011; 10:1593-602. [PMID: 21204564 DOI: 10.1021/pr100959y] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
To interpret LC-MS/MS data in proteomics, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. On the basis of our previously reported study of fragmentation intensity patterns, we developed a protein identification algorithm, SeQuence IDentfication (SQID), that makes use of the coarse intensity from a statistical analysis. The scoring scheme was validated by comparing with Sequest and X!Tandem using three data sets, and the results indicate an improvement in the number of identified peptides, including unique peptides that are not identified by Sequest or X!Tandem. The software and source code are available under the GNU GPL license at http://quiz2.chem.arizona.edu/wysocki/bioinformatics.htm.
Collapse
Affiliation(s)
- Wenzhou Li
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona 85721, United States
| | | | | | | | | |
Collapse
|
16
|
Walworth MJ, Stankovich JJ, Van Berkel GJ, Schulz M, Minarik S, Nichols J, Reich E. Hydrophobic Treatment Enabling Analysis of Wettable Surfaces Using a Liquid Microjunction Surface Sampling Probe/Electrospray Ionization-Mass Spectrometry System. Anal Chem 2010; 83:591-7. [DOI: 10.1021/ac102634e] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Affiliation(s)
- Matthew J. Walworth
- Organic and Biological Mass Spectrometry Group, Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6131, United States; Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States; Thin-Layer Chromatography Laboratory, Performance and Life Science Chemicals, Merck KGaA, 64293 Darmstadt, Germany; CAMAG Scientific, Inc., Wilmington, North Carolina 28401, United States; and CAMAG-Laboratory, Muttenz, Switzerland
| | - Joseph J. Stankovich
- Organic and Biological Mass Spectrometry Group, Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6131, United States; Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States; Thin-Layer Chromatography Laboratory, Performance and Life Science Chemicals, Merck KGaA, 64293 Darmstadt, Germany; CAMAG Scientific, Inc., Wilmington, North Carolina 28401, United States; and CAMAG-Laboratory, Muttenz, Switzerland
| | - Gary J. Van Berkel
- Organic and Biological Mass Spectrometry Group, Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6131, United States; Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States; Thin-Layer Chromatography Laboratory, Performance and Life Science Chemicals, Merck KGaA, 64293 Darmstadt, Germany; CAMAG Scientific, Inc., Wilmington, North Carolina 28401, United States; and CAMAG-Laboratory, Muttenz, Switzerland
| | - Michael Schulz
- Organic and Biological Mass Spectrometry Group, Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6131, United States; Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States; Thin-Layer Chromatography Laboratory, Performance and Life Science Chemicals, Merck KGaA, 64293 Darmstadt, Germany; CAMAG Scientific, Inc., Wilmington, North Carolina 28401, United States; and CAMAG-Laboratory, Muttenz, Switzerland
| | - Susanne Minarik
- Organic and Biological Mass Spectrometry Group, Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6131, United States; Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States; Thin-Layer Chromatography Laboratory, Performance and Life Science Chemicals, Merck KGaA, 64293 Darmstadt, Germany; CAMAG Scientific, Inc., Wilmington, North Carolina 28401, United States; and CAMAG-Laboratory, Muttenz, Switzerland
| | - Judy Nichols
- Organic and Biological Mass Spectrometry Group, Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6131, United States; Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States; Thin-Layer Chromatography Laboratory, Performance and Life Science Chemicals, Merck KGaA, 64293 Darmstadt, Germany; CAMAG Scientific, Inc., Wilmington, North Carolina 28401, United States; and CAMAG-Laboratory, Muttenz, Switzerland
| | - Eike Reich
- Organic and Biological Mass Spectrometry Group, Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6131, United States; Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996-1600, United States; Thin-Layer Chromatography Laboratory, Performance and Life Science Chemicals, Merck KGaA, 64293 Darmstadt, Germany; CAMAG Scientific, Inc., Wilmington, North Carolina 28401, United States; and CAMAG-Laboratory, Muttenz, Switzerland
| |
Collapse
|
17
|
Tharakan R, Edwards N, Graham DRM. Data maximization by multipass analysis of protein mass spectra. Proteomics 2010; 10:1160-71. [DOI: 10.1002/pmic.200900433] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
18
|
Gucinski AC, Dodds ED, Li W, Wysocki VH. Understanding and exploiting Peptide fragment ion intensities using experimental and informatic approaches. Methods Mol Biol 2010; 604:73-94. [PMID: 20013365 DOI: 10.1007/978-1-60761-444-9_6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Tandem mass spectrometry is a widely used tool in proteomics. This section will address the properties that describe how protonated peptides fragment when activated by collisions in a mass spectrometer and how that information can be used to identify proteins. A review of the mobile proton model is presented, along with a summary of commonly observed peptide cleavage enhancements, including the proline effect. The methods used to elucidate peptide dissociation chemistry by using both small groups of model peptides and large datasets are also discussed. Finally, the role of peak intensity in commercially available and developmental peptide identification algorithms is examined.
Collapse
Affiliation(s)
- Ashley C Gucinski
- Department of Chemistry and Biochemistry, The University of Arizona, Tucson, AZ, USA
| | | | | | | |
Collapse
|
19
|
Emory JF, Walworth MJ, Van Berkel GJ, Schulz M, Minarik S. Direct analysis of reversed-phase high-performance thin layer chromatography separated tryptic protein digests using a liquid microjunction surface sampling probe/electrospray ionization mass spectrometry system. EUROPEAN JOURNAL OF MASS SPECTROMETRY (CHICHESTER, ENGLAND) 2010; 16:21-33. [PMID: 20065522 DOI: 10.1255/ejms.1041] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
The sampling, ionization and detection of tryptic peptides separated in one-dimension on reversed-phase high-performance thin layer chromatography (HPTLC) plates was performed using liquid microjunction surface sampling probe electrospray ionization mass spectrometry. Tryptic digests of five proteins [cytochrome c, myoglobin, beta-casein, lysozyme and bovine serum albumin (BSA)] were spotted on reversed phase HPTLC RP-8 F254s and HPTLC RP-18 F254s plates. The plates were then developed using 70/30 methanol/water with 0.1M ammonium acetate. A dual purpose extraction/electrospray solution containing 70/30/0.1 water/methanol/formic acid was infused through the sampling probe during analysis of the developed lanes. Both full scan mass spectra and data dependent tandem mass spectra were acquired for each development lane to detect and verify the peptide distributions. Data dependent tandem mass spectra provided both protein identification and sequence coverage information. Highest sequence coverages were achieved for cytochrome c and myoglobin (62.5% and 58.3%, respectively) on reversed phase RP-8 plates. While the tryptic peptides were separated enough for identification, the peptide bands did show some overlap with most peptides located in the lower half of the development lane. Proteins whose peptides were more separated gave higher sequence coverage. Larger proteins such as beta-casein and BSA which were spotted in lower relative amounts gave much lower sequence coverage than the smaller proteins.
Collapse
MESH Headings
- Amino Acid Sequence
- Animals
- Caseins/analysis
- Caseins/isolation & purification
- Cattle
- Chickens
- Chromatography, High Pressure Liquid/instrumentation
- Chromatography, High Pressure Liquid/methods
- Chromatography, Reverse-Phase/instrumentation
- Chromatography, Reverse-Phase/methods
- Chromatography, Thin Layer/instrumentation
- Chromatography, Thin Layer/methods
- Cytochromes c/analysis
- Cytochromes c/isolation & purification
- Equipment Design
- Horses
- Molecular Sequence Data
- Muramidase/analysis
- Muramidase/isolation & purification
- Myoglobin/analysis
- Myoglobin/isolation & purification
- Proteins/analysis
- Proteins/isolation & purification
- Serum Albumin, Bovine/analysis
- Serum Albumin, Bovine/isolation & purification
- Spectrometry, Mass, Electrospray Ionization/instrumentation
- Spectrometry, Mass, Electrospray Ionization/methods
Collapse
Affiliation(s)
- Joshua F Emory
- Organic and Biological Mass Spectrometry Group, Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6131, USA
| | | | | | | | | |
Collapse
|
20
|
Thompson MR, Chourey K, Froelich JM, Erickson BK, VerBerkmoes NC, Hettich RL. Experimental approach for deep proteome measurements from small-scale microbial biomass samples. Anal Chem 2009; 80:9517-25. [PMID: 19072265 DOI: 10.1021/ac801707s] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Many methods of microbial proteome characterizations require large quantities of cellular biomass (>1-2 g) for sample preparation and protein identification. Our experimental approach differs from traditional techniques by providing the ability to identify the proteomic state of a microbe from a few milligrams of starting cellular material. The small-scale, guanidine lysis method minimizes sample loss by achieving cellular lysis and protein digestion in a single-tube experiment. For this experimental approach, the freshwater microbe Shewanella oneidensis MR-1 and the purple non-sulfur bacterium Rhodopseudomonas palustris CGA0010 were used as model organisms for technology development and evaluation. A 2-D LC-MS/MS comparison between a standard sonication lysis method and the small-scale guanidine lysis techniques demonstrates that the guanidine lysis method is more efficient with smaller sample amounts of cell pellet (i.e., down to 1 mg). The described methodology enables deeper proteome measurements from a few milliliters of confluent bacterial cultures. We also report a new protocol for efficient lysis from small amounts of natural biofilm samples for deep proteome measurements, which should greatly enhance the emerging field of environmental microbial community proteomics. This straightforward sample boiling protocol is complementary to the small-scale guanidine lysis technique, is amenable for small sample quantities, and requires no special reagents that might complicate the MS measurements.
Collapse
Affiliation(s)
- Melissa R Thompson
- Graduate School of Genome Science and Technology, Oak Ridge National Laboratory-University of Tennessee, Knoxville, Tennessee 37830, USA
| | | | | | | | | | | |
Collapse
|
21
|
Unell M, Abraham PE, Shah M, Zhang B, Rückert C, VerBerkmoes NC, Jansson JK. Impact of phenolic substrate and growth temperature on the Arthrobacter chlorophenolicus proteome. J Proteome Res 2009; 8:1953-64. [PMID: 19714879 DOI: 10.1021/pr800897c] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We compared the Arthrobacter chlorophenolicus proteome during growth on 4-chlorophenol, 4-nitrophenol, or phenol at 5 and 28 degrees C, both for the wild-type and a mutant strain with mass spectrometry based proteomics. A label-free workflow employing spectral counting identified 3749 proteins across all growth conditions, representing over 70% of the predicted genome and 739 of these proteins form the core proteome. Statistically significant differences were found in the proteomes of cells grown under different conditions including differentiation of hundreds of unknown proteins. The 4-chlorophenol-degradation pathway was confirmed, but not that for phenol.
Collapse
Affiliation(s)
- Maria Unell
- Department of Microbiology, Swedish University of Agricultural Sciences, Box 7025, 750 07 Uppsala, Sweden
| | | | | | | | | | | | | |
Collapse
|
22
|
Shao C, Sun W, Li F, Yang R, Zhang L, Gao Y. Oscore: a combined score to reduce false negative rates for peptide identification in tandem mass spectrometry analysis. JOURNAL OF MASS SPECTROMETRY : JMS 2009; 44:25-31. [PMID: 18698557 DOI: 10.1002/jms.1466] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Tandem mass spectrometry (MS/MS) has been widely used in proteomics studies. Multiple algorithms have been developed for assessing matches between MS/MS spectra and peptide sequences in databases. However, it is still a challenge to reduce false negative rates without compromising the high confidence of peptide identification. In this study, we developed the score, Oscore, by logistic regression using SEQUEST and AMASS variables to identify fully tryptic peptides. Since these variables showed complicated association with each other, combining them together rather than applying them to a threshold model improved the classification of correct and incorrect peptide identifications. Oscore achieved both a lower false negative rate and a lower false positive rate than PeptideProphet on datasets from 18 known protein mixtures and several proteome-scale samples of different complexity, database size and separation methods. By a three-way comparison among Oscore, PeptideProphet and another logistic regression model which made use of PeptideProphet's variables, the main contributor for the improvement made by Oscore is discussed.
Collapse
Affiliation(s)
- Chen Shao
- Department of Physiology and Pathophysiology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing, China
| | | | | | | | | | | |
Collapse
|
23
|
Pasilis SP, Kertesz V, Van Berkel GJ, Schulz M, Schorcht S. HPTLC/DESI-MS imaging of tryptic protein digests separated in two dimensions. JOURNAL OF MASS SPECTROMETRY : JMS 2008; 43:1627-1635. [PMID: 18563861 DOI: 10.1002/jms.1431] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Desorption electrospray ionization mass spectrometry (DESI-MS) was demonstrated as a method to detect and identify peptides from two-dimensional separations of cytochrome c and myoglobin tryptic digests on ProteoChrom HPTLC Cellulose sheets. Data-dependent tandem mass spectra were acquired during lane scans across the TLC plates. Peptides and the corresponding proteins were identified using a protein database search software. Two-dimensional distributions of identified peptides were mapped for each separated protein digest. Sequence coverages for cytochrome c and myoglobin were 81 and 74%, respectively. These compared well with those determined using the more standard HPLC/ESI-MS/MS approach (89 and 84%, respectively). Preliminary results show that use of more sensitive instrumentation has the potential for improved detection of peptides with low R(f) values and improvement in sequence coverage. However, less multiple charging and more sodiation were seen in HPTLC/DESI-MS spectra relative to HPLC/ESI-MS spectra, which can affect peptide identification by MS/MS. Methods to increase multiple charging and reduce the extent of sodiation are currently under investigation.
Collapse
Affiliation(s)
- Sofie P Pasilis
- Organic and Biological Mass Spectrometry Group, Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6131, USA
| | | | | | | | | |
Collapse
|
24
|
Nesatyy VJ, Suter MJF. Analysis of environmental stress response on the proteome level. MASS SPECTROMETRY REVIEWS 2008; 27:556-574. [PMID: 18553564 DOI: 10.1002/mas.20177] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Thousands of man-made chemicals are annually released into the environment by agriculture, transport, industries, and other human activities. In general, chemical analysis of environmental samples used to assess the pollution status of a specific ecosystem is complicated by the complexity of the mixture, and in some cases by the very low toxicity thresholds of chemicals present. In that sense, a proteomics approach, capable of detecting subtle changes in the level and structure of individual proteins within the whole proteome in response to the altered surroundings, has obvious applications in the field of ecotoxicology. In addition to identifying new protein biomarkers, it can also help to provide an insight into underlying mechanisms of toxicity. Despite being a comparatively new field with a number of caveats, proteomics applications have spread from microorganisms and plants to invertebrates and vertebrates, gradually becoming an established technology used in environmental research. This review article highlights recent advances in the field of environmental proteomics, mainly focusing on experimental approaches with a potential to understand toxic modes of action and to identify novel ecotoxicological biomarkers.
Collapse
Affiliation(s)
- Victor J Nesatyy
- Eawag-Swiss Federal Institute of Aquatic Science and Technology, Ueberlandstrasse 133, PO Box 611, 8600 Duebendorf, Switzerland
| | | |
Collapse
|
25
|
Savidor A, Donahoo RS, Hurtado-Gonzales O, Land ML, Shah MB, Lamour KH, McDonald WH. Cross-species global proteomics reveals conserved and unique processes in Phytophthora sojae and Phytophthora ramorum. Mol Cell Proteomics 2008; 7:1501-16. [PMID: 18316789 PMCID: PMC2500229 DOI: 10.1074/mcp.m700431-mcp200] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2007] [Revised: 01/23/2008] [Indexed: 11/06/2022] Open
Abstract
Phytophthora ramorum and Phytophthora sojae are destructive plant pathogens. P. sojae has a narrow host range, whereas P. ramorum has a wide host range. A global proteomics comparison of the vegetative (mycelium) and infective (germinating cyst) life stages of P. sojae and P. ramorum was conducted to identify candidate proteins involved in host range, early infection, and vegetative growth. Sixty-two candidates for early infection, 26 candidates for vegetative growth, and numerous proteins that may be involved in defining host specificity were identified. In addition, common life stage proteomic trends between the organisms were observed. In mycelia, proteins involved in transport and metabolism of amino acids, carbohydrates, and other small molecules were up-regulated. In the germinating cysts, up-regulated proteins associated with lipid transport and metabolism, cytoskeleton, and protein synthesis were observed. It appears that the germinating cyst catabolizes lipid reserves through the beta-oxidation pathway to drive the extensive protein synthesis necessary to produce the germ tube and initiate infection. Once inside the host, the pathogen switches to vegetative growth in which energy is derived from glycolysis and utilized for synthesis of amino acids and other molecules that assist survival in the plant tissue.
Collapse
Affiliation(s)
- Alon Savidor
- Graduate School of Genome Science and Technology, University of Tennessee-Oak Ridge National Laboratory Oak Ridge, Oak Ridge, Tennessee 37830, USA
| | | | | | | | | | | | | |
Collapse
|
26
|
Zhou C, Bowler LD, Feng J. A machine learning approach to explore the spectra intensity pattern of peptides using tandem mass spectrometry data. BMC Bioinformatics 2008; 9:325. [PMID: 18664292 PMCID: PMC2529326 DOI: 10.1186/1471-2105-9-325] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Accepted: 07/30/2008] [Indexed: 11/25/2022] Open
Abstract
Background A better understanding of the mechanisms involved in gas-phase fragmentation of peptides is essential for the development of more reliable algorithms for high-throughput protein identification using mass spectrometry (MS). Current methodologies depend predominantly on the use of derived m/z values of fragment ions, and, the knowledge provided by the intensity information present in MS/MS spectra has not been fully exploited. Indeed spectrum intensity information is very rarely utilized in the algorithms currently in use for high-throughput protein identification. Results In this work, a Bayesian neural network approach is employed to analyze ion intensity information present in 13878 different MS/MS spectra. The influence of a library of 35 features on peptide fragmentation is examined under different proton mobility conditions. Useful rules involved in peptide fragmentation are found and subsets of features which have significant influence on fragmentation pathway of peptides are characterised. An intensity model is built based on the selected features and the model can make an accurate prediction of the intensity patterns for given MS/MS spectra. The predictions include not only the mean values of spectra intensity but also the variances that can be used to tolerate noises and system biases within experimental MS/MS spectra. Conclusion The intensity patterns of fragmentation spectra are informative and can be used to analyze the influence of various characteristics of fragmented peptides on their fragmentation pathway. The features with significant influence can be used in turn to predict spectra intensities. Such information can help develop more reliable algorithms for peptide and protein identification.
Collapse
Affiliation(s)
- Cong Zhou
- Department of Computer Science and Mathematics, University of Warwick, Coventry CV4 7AL, UK.
| | | | | |
Collapse
|
27
|
Froelich JM, Reid GE. The origin and control of ex vivo oxidative peptide modifications prior to mass spectrometry analysis. Proteomics 2008; 8:1334-45. [PMID: 18306178 DOI: 10.1002/pmic.200700792] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The origin and control of ex vivo sample handling related oxidative modifications of methionine-, S-alkyl cysteine-, and tryptophan-containing peptides obtained from typical "in-solution" or "in-gel" proteolytic digestion strategies, have been examined by capillary HPLC and MS/MS. The origin of increased oxidation levels were found to be predominantly associated with the extensive ex vivo sample handling steps required for gel electrophoresis and/or in-gel proteolytic digestion of proteins prior to analysis by MS. Conditions for deliberately controlling the oxidation state (both oxidation and reduction) of these peptides, as well as for those containing cysteine, have been evaluated using a series of model synthetic peptides and standard tryptic protein digests. Essentially complete oxidation of methionine- and S-alkyl cysteine-containing peptides was achieved by reaction with 30% hydrogen peroxide/5% acetic acid at room temperature for 30 min. Under these conditions, cysteine was also converted to cysteic acid, while only limited oxidation of tryptophan to oxindolylalanine, and methionine and S-alkyl cysteine sulfoxides to their respective sulfones, were observed. Efficient reduction of methionine- and S-alkyl cysteine sulfoxide-containing peptides was achieved by reaction in 1 M dimethylsulfide/10 M hydrochloric acid at room temperature for 10 and 45 min, respectively. None of the reduction conditions evaluated were found to result in the reduction of oxindolylalanine, cysteic acid, or methionine sulfone.
Collapse
Affiliation(s)
- Jennifer M Froelich
- Department of Chemistry, Michigan State University, East Lansing, MI 48824, USA
| | | |
Collapse
|
28
|
McClintock C, Kertesz V, Hettich RL. Development of an Electrochemical Oxidation Method for Probing Higher Order Protein Structure with Mass Spectrometry. Anal Chem 2008; 80:3304-17. [DOI: 10.1021/ac702493a] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Carlee McClintock
- Graduate School of Genome Science and Technology, University of TennesseeOak Ridge National Laboratory, 1060 Commerce Park, Oak Ridge, Tennessee 37830, and Chemical Sciences Division, Oak Ridge National Laboratory, P.O. Box 2008, MS 6131, Oak Ridge, Tennessee 37831
| | - Vilmos Kertesz
- Graduate School of Genome Science and Technology, University of TennesseeOak Ridge National Laboratory, 1060 Commerce Park, Oak Ridge, Tennessee 37830, and Chemical Sciences Division, Oak Ridge National Laboratory, P.O. Box 2008, MS 6131, Oak Ridge, Tennessee 37831
| | - Robert L. Hettich
- Graduate School of Genome Science and Technology, University of TennesseeOak Ridge National Laboratory, 1060 Commerce Park, Oak Ridge, Tennessee 37830, and Chemical Sciences Division, Oak Ridge National Laboratory, P.O. Box 2008, MS 6131, Oak Ridge, Tennessee 37831
| |
Collapse
|
29
|
Using HPTLC/DESI-MS for peptide identification in 1D separations of tryptic protein digests. Anal Bioanal Chem 2008; 391:317-24. [PMID: 18264700 DOI: 10.1007/s00216-008-1874-6] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2007] [Revised: 01/08/2008] [Accepted: 01/10/2008] [Indexed: 10/22/2022]
Abstract
Desorption electrospray ionization mass spectrometry (DESI-MS) was investigated as a method to detect and identify peptides from tryptic digests of cytochrome c and myoglobin separated on ProteoChrom HPTLC Silica gel 60 F(254s) plates and ProteoChrom HPTLC Cellulose sheets. Full-scan mass spectra and data-dependent tandem mass spectra were acquired in separate plate scans and used to identify peptide ions. Peptide distributions along the development lane were mapped for each separated protein digest. Signal levels ranged over several orders of magnitude. In general, highest signal levels were obtained for the peptides with the highest R (f) values on a plate, while peptides with very low R (f) values were often not detected. Sequence coverages for cytochrome c were 58% for the digest separated on the silica gel plate and 72% for the separation on the cellulose sheet; myoglobin sequence coverages were 62% and 68% on silica gel and cellulose, respectively. Weak correlations between peptide hydrophilicity and R (f) values on the silica gel and cellulose plates were found, with the more hydrophilic peptides having lower R (f) values.
Collapse
|
30
|
Khatun J, Hamlett E, Giddings MC. Incorporating sequence information into the scoring function: a hidden Markov model for improved peptide identification. ACTA ACUST UNITED AC 2008; 24:674-81. [PMID: 18187442 DOI: 10.1093/bioinformatics/btn011] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The identification of peptides by tandem mass spectrometry (MS/MS) is a central method of proteomics research, but due to the complexity of MS/MS data and the large databases searched, the accuracy of peptide identification algorithms remains limited. To improve the accuracy of identification we applied a machine-learning approach using a hidden Markov model (HMM) to capture the complex and often subtle links between a peptide sequence and its MS/MS spectrum. MODEL Our model, HMM_Score, represents ion types as HMM states and calculates the maximum joint probability for a peptide/spectrum pair using emission probabilities from three factors: the amino acids adjacent to each fragmentation site, the mass dependence of ion types and the intensity dependence of ion types. The Viterbi algorithm is used to calculate the most probable assignment between ion types in a spectrum and a peptide sequence, then a correction factor is added to account for the propensity of the model to favor longer peptides. An expectation value is calculated based on the model score to assess the significance of each peptide/spectrum match. RESULTS We trained and tested HMM_Score on three data sets generated by two different mass spectrometer types. For a reference data set recently reported in the literature and validated using seven identification algorithms, HMM_Score produced 43% more positive identification results at a 1% false positive rate than the best of two other commonly used algorithms, Mascot and X!Tandem. HMM_Score is a highly accurate platform for peptide identification that works well for a variety of mass spectrometer and biological sample types. AVAILABILITY The program is freely available on ProteomeCommons via an OpenSource license. See http://bioinfo.unc.edu/downloads/ for the download link.
Collapse
Affiliation(s)
- Jainab Khatun
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | | |
Collapse
|
31
|
Palumbo AM, Tepe JJ, Reid GE. Mechanistic Insights into the Multistage Gas-Phase Fragmentation Behavior of Phosphoserine- and Phosphothreonine-Containing Peptides. J Proteome Res 2008; 7:771-9. [DOI: 10.1021/pr0705136] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Affiliation(s)
- Amanda M. Palumbo
- Department of Chemistry and Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824
| | - Jetze J. Tepe
- Department of Chemistry and Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824
| | - Gavin E. Reid
- Department of Chemistry and Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824
| |
Collapse
|
32
|
Thompson MR, Thompson DK, Hettich RL. Systematic assessment of the benefits and caveats in mining microbial post-translational modifications from shotgun proteomic data: the response of Shewanella oneidensis to chromate exposure. J Proteome Res 2008; 7:648-58. [PMID: 18171020 DOI: 10.1021/pr070531n] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Microbes are known to regulate both gene expression and protein activity through the use of post-translational modifications (PTMs). Common PTMs involved in cellular signaling and gene control include methylations, acetylations, and phosphorylations, whereas oxidations have been implicated as an indicator of stress. Shewanella oneidensis MR-1 is a Gram-negative bacterium that demonstrates both respiratory versatility and the ability to sense and adapt to diverse environmental conditions. The data set used in this study consisted of tandem mass spectra derived from midlog phase aerobic cultures of S. oneidensis either native or shocked with 1 mM chromate [Cr(VI)]. In this study, three algorithms (DBDigger, Sequest, and InsPecT) were evaluated for their ability to scrutinize shotgun proteomic data for evidence of PTMs. The use of conservative scoring filters for peptides or proteins versus creating a subdatabase first from a nonmodification search was evaluated with DBDigger. The use of higher-scoring filters for peptide identifications was found to result in optimal identifications of PTM peptides with a 2% false discovery rate (FDR) for the total data set using the DBDigger algorithm. However, the FDR climbs to unacceptably high levels when only PTM peptides are considered. Sequest was evaluated as a method for confirming PTM peptides putatively identified using DBDigger; however, there was a low identification rate ( approximately 25%) for the searched spectra. InsPecT was found to have a much lower, and thus more acceptable, FDR than DBDigger for PTM peptides. Comparisons between InsPecT and DBDigger were made with respect to both the FDR and PTM peptide identifications. As a demonstration of this approach, a number of S. oneidensis chemotaxis proteins as well as low-abundance signal transduction proteins were identified as being post-translationally modified in response to chromate challenge.
Collapse
Affiliation(s)
- Melissa R Thompson
- Graduate School of Genome Science and Technology, Oak Ridge National Laboratory and University of Tennessee, Knoxville, Tennessee 37830, USA
| | | | | |
Collapse
|
33
|
Dodds ED, Clowers BH, Hagerman PJ, Lebrilla CB. Systematic characterization of high mass accuracy influence on false discovery and probability scoring in peptide mass fingerprinting. Anal Biochem 2007; 372:156-66. [PMID: 17980142 DOI: 10.1016/j.ab.2007.10.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2007] [Revised: 10/01/2007] [Accepted: 10/08/2007] [Indexed: 11/29/2022]
Abstract
Whereas the bearing of mass measurement error on protein identification is sometimes underestimated, uncertainty in observed peptide masses unavoidably translates to ambiguity in subsequent protein identifications. Although ongoing instrumental advances continue to make high accuracy mass spectrometry (MS) increasingly accessible, many proteomics experiments are still conducted with rather large mass error tolerances. In addition, the ranking schemes of most protein identification algorithms do not include a meaningful incorporation of mass measurement error. This article provides a critical evaluation of mass error tolerance as it pertains to false positive peptide and protein associations resulting from peptide mass fingerprint (PMF) database searching. High accuracy, high resolution PMFs of several model proteins were obtained using matrix-assisted laser desorption/ionization Fourier transform ion cyclotron resonance mass spectrometry (MALDI-FTICR-MS). Varying levels of mass accuracy were simulated by systematically modulating the mass error tolerance of the PMF query and monitoring the effect on figures of merit indicating the PMF quality. Importantly, the benefits of decreased mass error tolerance are not manifest in Mowse scores when operating at tolerances in the low parts-per-million range but become apparent with the consideration of additional metrics that are often overlooked. Furthermore, the outcomes of these experiments support the concept that false discovery is closely tied to mass measurement error in PMF analysis. Clear establishment of this relation demonstrates the need for mass error-aware protein identification routines and argues for a more prominent contribution of high accuracy mass measurement to proteomic science.
Collapse
Affiliation(s)
- Eric D Dodds
- Department of Chemistry, University of California, Davis, Davis, CA 95616, USA
| | | | | | | |
Collapse
|
34
|
Falkner JA, Kachman M, Veine DM, Walker A, Strahler JR, Andrews PC. Validated MALDI-TOF/TOF mass spectra for protein standards. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2007; 18:850-5. [PMID: 17329120 DOI: 10.1016/j.jasms.2007.01.010] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2006] [Revised: 01/14/2007] [Accepted: 01/16/2007] [Indexed: 05/14/2023]
Abstract
A current focus of proteomics research is the establishment of acceptable confidence measures in the assignment of protein identifications in an unknown sample. Development of new algorithmic approaches would greatly benefit from a standard reference set of spectra for known proteins for the purpose of testing and training. Here we describe an openly available library of mass spectra generated on an ABI 4700 MALDI TOF/TOF from 246 known, individually purified and trypsin-digested protein samples. The initial full release of the Aurum Dataset includes gel images, peak lists, spectra, search result files, decoy database analysis files, FASTA file of protein sequences, manual curation, and summary pages describing protein coverage and peptides matched by MS/MS followed by decoy database analysis using Mascot, Sequest, and X!Tandem. The data are publicly available for use at ProteomeCommons.org.
Collapse
Affiliation(s)
- Jayson A Falkner
- National Resource for Proteomics and Pathway, Michigan Proteome Consortium, Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | | | | | | | | | | |
Collapse
|
35
|
Tabb DL, Fernando CG, Chambers MC. MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 2007; 6:654-61. [PMID: 17269722 PMCID: PMC2525619 DOI: 10.1021/pr0604054] [Citation(s) in RCA: 428] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.
Collapse
Affiliation(s)
- David L Tabb
- Mass Spectrometry Research Center / Departments of Biomedical Informatics and Biochemistry, Vanderbilt University Medical Center, Nashville, TN 37232-8575, USA.
| | | | | |
Collapse
|
36
|
Savidor A, Donahoo RS, Hurtado-Gonzales O, Verberkmoes NC, Shah MB, Lamour KH, McDonald WH. Expressed peptide tags: an additional layer of data for genome annotation. J Proteome Res 2007; 5:3048-58. [PMID: 17081056 DOI: 10.1021/pr060134x] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
While genome sequencing is becoming ever more routine, genome annotation remains a challenging process. Identification of the coding sequences within the genomic milieu presents a tremendous challenge, especially for eukaryotes with their complex gene architectures. Here, we present a method to assist the annotation process through the use of proteomic data and bioinformatics. Mass spectra of digested protein preparations of the organism of interest were acquired and searched against a protein database created by a six-frame translation of the genome. The identified peptides were mapped back to the genome, compared to the current annotation, and then categorized as supporting or extending the current genome annotation. We named the classified peptides Expressed Peptide Tags (EPTs). The well-annotated bacterium Rhodopseudomonas palustris was used as a control for the method and showed a high degree of correlation between EPT mapping and the current annotation, with 86% of the EPTs confirming existing gene calls and less than 1% of the EPTs expanding on the current annotation. The eukaryotic plant pathogens Phytophthora ramorum and Phytophthora sojae, whose genomes have been recently sequenced and are much less well-annotated, were also subjected to this method. A series of algorithmic steps were taken to increase the confidence of EPT identification for these organisms, including generation of smaller subdatabases to be searched against, and definition of EPT criteria that accommodates the more complex eukaryotic gene architecture. As expected, the analysis of the Phytophthora species showed less correlation between EPT mapping and their current annotation. While approximately 76% of Phytophthora EPTs supported the current annotation, a portion of them (7.7% and 12.9% for P. ramorum and P. sojae, respectively) suggested modification to current gene calls or identified novel genes that were missed by the current genome annotation of these organisms.
Collapse
Affiliation(s)
- Alon Savidor
- Graduate School of Genome Science and Technology, University of Tennessee-Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, USA
| | | | | | | | | | | | | |
Collapse
|
37
|
Sun S, Meyer-Arendt K, Eichelberger B, Brown R, Yen CY, Old WM, Pierce K, Cios KJ, Ahn NG, Resing KA. Improved validation of peptide MS/MS assignments using spectral intensity prediction. Mol Cell Proteomics 2006; 6:1-17. [PMID: 17018520 DOI: 10.1074/mcp.m600320-mcp200] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
A major limitation in identifying peptides from complex mixtures by shotgun proteomics is the ability of search programs to accurately assign peptide sequences using mass spectrometric fragmentation spectra (MS/MS spectra). Manual analysis is used to assess borderline identifications; however, it is error-prone and time-consuming, and criteria for acceptance or rejection are not well defined. Here we report a Manual Analysis Emulator (MAE) program that evaluates results from search programs by implementing two commonly used criteria: 1) consistency of fragment ion intensities with predicted gas phase chemistry and 2) whether a high proportion of the ion intensity (proportion of ion current (PIC)) in the MS/MS spectra can be derived from the peptide sequence. To evaluate chemical plausibility, MAE utilizes similarity (Sim) scoring against theoretical spectra simulated by MassAnalyzer software (Zhang, Z. (2004) Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 76, 3908-3922) using known gas phase chemical mechanisms. The results show that Sim scores provide significantly greater discrimination between correct and incorrect search results than achieved by Sequest XCorr scoring or Mascot Mowse scoring, allowing reliable automated validation of borderline cases. To evaluate PIC, MAE simplifies the DTA text files summarizing the MS/MS spectra and applies heuristic rules to classify the fragment ions. MAE output also provides data mining functions, which are illustrated by using PIC to identify spectral chimeras, where two or more peptide ions were sequenced together, as well as cases where fragmentation chemistry is not well predicted.
Collapse
Affiliation(s)
- Shaojun Sun
- Department of Computer Science and Engineering, University of Colorado at Denver and Health Sciences Center, Denver, Colorado 80217-3364, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Tabb DL, Shah MB, Strader MB, Connelly HM, Hettich RL, Hurst GB. Determination of peptide and protein ion charge states by Fourier transformation of isotope-resolved mass spectra. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2006; 17:903-915. [PMID: 16713712 DOI: 10.1016/j.jasms.2006.02.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2005] [Revised: 01/30/2006] [Accepted: 02/01/2006] [Indexed: 05/09/2023]
Abstract
We report an automated method for determining charge states from high-resolution mass spectra. Fourier transforms of isotope packets from high-resolution mass spectra are compared to Fourier transforms of modeled isotopic peak packets for a range of charge states. The charge state for the experimental ion packet is determined by the model isotope packet that yields the best match in the comparison of the Fourier transforms. This strategy is demonstrated for determining peptide ion charge states from "zoom scan" data from a linear quadrupole ion trap mass spectrometer, enabling the subsequent automated identification of singly- through quadruply-charged peptide ions, while reducing the numbers of conflicting identifications from ambiguous charge state assignments. We also apply this technique to determine the charges of intact protein ions from LC-FTICR data, demonstrating that it is more sensitive under these experimental conditions than two existing algorithms. The strategy outlined in this paper should be generally applicable to mass spectra obtained from any instrument capable of isotopic resolution.
Collapse
Affiliation(s)
- David L Tabb
- Life Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Manesh B Shah
- Life Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Michael Brad Strader
- Chemical Sciences Division, Oak Ridge National Laboratory, MS 6131, P.O. Box 2008, 37831-6131, Oak Ridge, TN, USA
| | - Heather M Connelly
- Chemical Sciences Division, Oak Ridge National Laboratory, MS 6131, P.O. Box 2008, 37831-6131, Oak Ridge, TN, USA
- Graduate School of Genome Science and Technology, University of Tennessee-Oak Ridge National Laboratory, 1060 Commerce Park, 37830, Oak Ridge, TN
| | - Robert L Hettich
- Chemical Sciences Division, Oak Ridge National Laboratory, MS 6131, P.O. Box 2008, 37831-6131, Oak Ridge, TN, USA
| | - Gregory B Hurst
- Chemical Sciences Division, Oak Ridge National Laboratory, MS 6131, P.O. Box 2008, 37831-6131, Oak Ridge, TN, USA.
| |
Collapse
|