1
|
Zhan Z, Wang L. Proteoform identification and quantification based on alignment graphs. Bioinformatics 2024; 41:btaf007. [PMID: 39786854 PMCID: PMC11769674 DOI: 10.1093/bioinformatics/btaf007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 01/02/2025] [Accepted: 01/06/2025] [Indexed: 01/12/2025] Open
Abstract
MOTIVATION Proteoforms are the different forms of a proteins generated from the genome with various sequence variations, splice isoforms, and post-translational modifications. Proteoforms regulate protein structures and functions. A single protein can have multiple proteoforms due to different modification sites. Proteoform identification is to find proteoforms of a given protein that best fits the input spectrum. Proteoform quantification is to find the corresponding abundances of different proteoforms for a specific protein. RESULTS We proposed algorithms for proteoform identification and quantification based on the top-down tandem mass spectrum. In the combination alignments of the HomMTM spectrum and the reference protein, we need to give a correction of the mass for each matched peak within the pre-defined error range. After the correction, we impose that the mass between any two (not necessarily consecutive) matched nodes in the protein is identical to that of the corresponding two matched peaks in the HomMTM spectrum. We design a back-tracking graph to store such kind of information and find a combinatorial path (k paths) with the minimum sum of peak intensity error in this back-tracking graph. The obtained alignment can also show the relative abundance of these proteoforms (paths). Our experimental results demonstrate the algorithm's capability to identify and quantify proteoform combinations encompassing a greater number of peaks. This advancement holds promise for enhancing the accuracy and comprehensiveness of proteoform quantification, addressing a crucial need in the field of top-down MS-based proteomics. AVAILABILITY AND IMPLEMENTATION The software package are available at https://github.com/Zeirdo/TopMGQuant.
Collapse
Affiliation(s)
- Zhaohui Zhan
- Department of Engineering, Shenzhen MSU-BIT University, Shenzhen, 518172, China
- Department of Computer Science, City University of Hong Kong, Hong Kong, 999077, China
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong, 999077, China
- City University of Hong Kong Shenzhen Research Institution, 518057, China
| |
Collapse
|
2
|
Jeong K, Kaulich PT, Jung W, Kim J, Tholey A, Kohlbacher O. Precursor deconvolution error estimation: The missing puzzle piece in false discovery rate in top-down proteomics. Proteomics 2024; 24:e2300068. [PMID: 37997224 DOI: 10.1002/pmic.202300068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 11/09/2023] [Accepted: 11/13/2023] [Indexed: 11/25/2023]
Abstract
Top-down proteomics (TDP) directly analyzes intact proteins and thus provides more comprehensive qualitative and quantitative proteoform-level information than conventional bottom-up proteomics (BUP) that relies on digested peptides and protein inference. While significant advancements have been made in TDP in sample preparation, separation, instrumentation, and data analysis, reliable and reproducible data analysis still remains one of the major bottlenecks in TDP. A key step for robust data analysis is the establishment of an objective estimation of proteoform-level false discovery rate (FDR) in proteoform identification. The most widely used FDR estimation scheme is based on the target-decoy approach (TDA), which has primarily been established for BUP. We present evidence that the TDA-based FDR estimation may not work at the proteoform-level due to an overlooked factor, namely the erroneous deconvolution of precursor masses, which leads to incorrect FDR estimation. We argue that the conventional TDA-based FDR in proteoform identification is in fact protein-level FDR rather than proteoform-level FDR unless precursor deconvolution error rate is taken into account. To address this issue, we propose a formula to correct for proteoform-level FDR bias by combining TDA-based FDR and precursor deconvolution error rate.
Collapse
Affiliation(s)
- Kyowon Jeong
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Philipp T Kaulich
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Wonhyeuk Jung
- Department of Cell Biology, Yale School of Medicine, New Haven, Connecticut, USA
| | - Jihyung Kim
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Andreas Tholey
- Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine, Christian-Albrechts-Universität zu Kiel, Kiel, Germany
| | - Oliver Kohlbacher
- Applied Bioinformatics, Computer Science Department, University of Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- Translational Bioinformatics, University Hospital Tübingen, Tübingen, Germany
| |
Collapse
|
3
|
Yu Q, Liu X, Keller MP, Navarrete-Perea J, Zhang T, Fu S, Vaites LP, Shuken SR, Schmid E, Keele GR, Li J, Huttlin EL, Rashan EH, Simcox J, Churchill GA, Schweppe DK, Attie AD, Paulo JA, Gygi SP. Sample multiplexing-based targeted pathway proteomics with real-time analytics reveals the impact of genetic variation on protein expression. Nat Commun 2023; 14:555. [PMID: 36732331 PMCID: PMC9894840 DOI: 10.1038/s41467-023-36269-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 01/20/2023] [Indexed: 02/04/2023] Open
Abstract
Targeted proteomics enables hypothesis-driven research by measuring the cellular expression of protein cohorts related by function, disease, or class after perturbation. Here, we present a pathway-centric approach and an assay builder resource for targeting entire pathways of up to 200 proteins selected from >10,000 expressed proteins to directly measure their abundances, exploiting sample multiplexing to increase throughput by 16-fold. The strategy, termed GoDig, requires only a single-shot LC-MS analysis, ~1 µg combined peptide material, a list of up to 200 proteins, and real-time analytics to trigger simultaneous quantification of up to 16 samples for hundreds of analytes. We apply GoDig to quantify the impact of genetic variation on protein expression in mice fed a high-fat diet. We create several GoDig assays to quantify the expression of multiple protein families (kinases, lipid metabolism- and lipid droplet-associated proteins) across 480 fully-genotyped Diversity Outbred mice, revealing protein quantitative trait loci and establishing potential linkages between specific proteins and lipid homeostasis.
Collapse
Affiliation(s)
- Qing Yu
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Xinyue Liu
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Mark P Keller
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | | | - Tian Zhang
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Sipei Fu
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Laura P Vaites
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Steven R Shuken
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Ernst Schmid
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | | | - Jiaming Li
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Edward L Huttlin
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Edrees H Rashan
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Judith Simcox
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | | | - Devin K Schweppe
- Department of Genome Sciences, University of Washington, Seattle, WA, 98105, USA
| | - Alan D Attie
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Joao A Paulo
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Steven P Gygi
- Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
4
|
Jora M, Corcoran D, Parungao GG, Lobue PA, Oliveira LFL, Stan G, Addepalli B, Limbach PA. Higher-Energy Collisional Dissociation Mass Spectral Networks for the Rapid, Semi-automated Characterization of Known and Unknown Ribonucleoside Modifications. Anal Chem 2022; 94:13958-13967. [PMID: 36174068 DOI: 10.1021/acs.analchem.2c03172] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Higher-energy collisional dissociation (HCD) of modified ribonucleosides generates characteristic and highly reproducible nucleoside-specific tandem mass spectra (MS/MS). Here, we demonstrate the capability of HCD spectra in combination with spectral matching for the semi-automated characterization of ribonucleosides. This process involved the generation of an HCD spectral library and the establishment of a mass spectral network for rapid detection with high sensitivity and specificity in a retention time-independent fashion. Systematic spectral matching analysis of the MS/MS spectra of tRNA hydrolysates from different organisms has helped us to uncover evidence for the existence of novel ribonucleoside modifications such as s2Cm and OHyW-14. Such an untargeted label-free approach has the potential to be integrated with other methods, including those that use isotope labeling, to simplify the characterization of unknown modified ribonucleosides. These findings suggest the compilation of a universal spectral network, for the characterization of known and unknown ribonucleosides, could accelerate discoveries in the epitranscriptome.
Collapse
Affiliation(s)
- Manasses Jora
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - Daniel Corcoran
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - Gwenn G Parungao
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - Peter A Lobue
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - Luiz F L Oliveira
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - George Stan
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - Balasubrahmanyam Addepalli
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| | - Patrick A Limbach
- Department of Chemistry, University of Cincinnati, P.O. Box 210172, Cincinnati, Ohio 45221-0172, United States
| |
Collapse
|
5
|
Trujillo EA, Hebert AS, Rivera Vazquez JC, Brademan DR, Tatli M, Amador-Noguez D, Meyer JG, Coon JJ. Rapid Targeted Quantitation of Protein Overexpression with Direct Infusion Shotgun Proteome Analysis (DISPA-PRM). Anal Chem 2022; 94:1965-1973. [PMID: 35044165 PMCID: PMC9007395 DOI: 10.1021/acs.analchem.1c03243] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
While much effort has been placed on comprehensive quantitative proteome analysis, certain applications demand the measurement of only a few target proteins from complex systems. Traditional approaches to targeted proteomics rely on nanoliquid chromatography (nLC) and targeted mass spectrometry (MS) methods, e.g., parallel reaction monitoring (PRM). However, the time requirement for nLC can limit the throughput of targeted proteomics. To achieve rapid and high-throughput targeted methods, here we show that nLC separations can be eliminated and replaced with direct infusion shotgun proteome analysis (DISPA) using high-field asymmetric waveform ion mobility spectrometry (FAIMS) with PRM. We demonstrate the application of DISPA-PRM for rapid targeted quantification of bacterial enzymes utilized in the production of biofuels by monitoring temporal expression in 72 metabolically engineered bacterial cultures in less than 2.5 h, with a measured dynamic range >1200-fold. We conclude that DISPA-PRM presents a valuable innovative tool with results comparable to nLC-MS/MS, enabling fast and rapid detection of targeted proteins in complex mixtures.
Collapse
Affiliation(s)
- Edna A. Trujillo
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706
| | - Alexander S. Hebert
- DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706
| | - Julio C. Rivera Vazquez
- Bacteriology, University of Wisconsin-Madison, Madison, WI 53706,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706
| | | | - Mehmet Tatli
- Bacteriology, University of Wisconsin-Madison, Madison, WI 53706,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706
| | - Daniel Amador-Noguez
- Bacteriology, University of Wisconsin-Madison, Madison, WI 53706,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI 53706
| | - Jesse G. Meyer
- Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI 53706,Department of Biochemistry, Medical College of Wisconsin, Milwaukee, WI 53226
| | - Joshua J. Coon
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706,Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI 53706,Morgridge Institute for Research, Madison, WI 53706
| |
Collapse
|
6
|
To PKP, Wu L, Chan CM, Hoque A, Lam H. ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics. J Proteome Res 2021; 20:5359-5367. [PMID: 34734728 DOI: 10.1021/acs.jproteome.1c00485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Modern shotgun proteomics experiments generate gigabytes of spectra every hour, only a fraction of which were utilized to form biological conclusions. Instead of being stored as flat files in public data repositories, this large amount of data can be better organized to facilitate data reuse. Clustering these spectra by similarity can be helpful in building high-quality spectral libraries, correcting identification errors, and highlighting frequently observed but unidentified spectra. However, large-scale clustering is time-consuming. Here, we present ClusterSheep, a method utilizing Graphics Processing Units (GPUs) to accelerate the process. Unlike previously proposed algorithms for this purpose, our method performs true pairwise comparison of all spectra within a precursor mass-to-charge ratio tolerance, thereby preserving the full cluster structures. ClusterSheep was benchmarked against previously reported clustering tools, MS-Cluster, MaRaCluster, and msCRUSH. The software tool also functions as an interactive visualization tool with a persistent state, enabling the user to explore the resulting clusters visually and retrieve the clustering results as desired.
Collapse
Affiliation(s)
- Paul Ka Po To
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Long Wu
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Chak Ming Chan
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Ayman Hoque
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| |
Collapse
|
7
|
Abstract
Direct infusion shotgun proteome analysis (DISPA) is a new paradigm for expedited mass spectrometry-based proteomics, but the original data analysis workflow was onerous. Here, we introduce CsoDIAq, a user-friendly software package for the identification and quantification of peptides and proteins from DISPA data. In addition to establishing a complete and automated analysis workflow with a graphical user interface, CsoDIAq introduces algorithmic concepts to spectrum-spectrum matching to improve peptide identification speed and sensitivity. These include spectra pooling to reduce search time complexity and a new spectrum-spectrum match score called match count and cosine, which improves target discrimination in a target-decoy analysis. Fragment mass tolerance correction also increased the number of peptide identifications. Finally, we adapt CsoDIAq to standard LC-MS DIA and show that it outperforms other spectrum-spectrum matching software.
Collapse
Affiliation(s)
- Caleb W Cranney
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, United States
| | - Jesse G Meyer
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, United States
| |
Collapse
|
8
|
Zhang W, Liang Z, Chen X, Xin L, Shan B, Luo Z, Li M. ChimST: An Efficient Spectral Library Search Tool for Peptide Identification from Chimeric Spectra in Data-Dependent Acquisition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1416-1425. [PMID: 31603795 DOI: 10.1109/tcbb.2019.2945954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Accurate and sensitive identification of peptides from MS/MS spectra is a very challenging problem in computational shotgun proteomics. To tackle this problem, spectral library search has been one of the competitive solutions. However, most existing library search tools were developed on the basis of one peptide per spectrum, which prevents them from working properly on chimeric spectra where two or more peptides are co-fragmented. In this work, we present a new library search tool called ChimST, which is particularly capable of reliably identifying multiple peptides from a chimeric spectrum. It starts with associating each query MS/MS spectrum with MS precursor features. For each precursor feature, there is a list of peptide candidates extracted from an input spectral library. Then, it takes one peptide candidate from each associated feature and scores how well they could collectively interpret the query spectrum. The highest-scoring set of peptide candidates are finally reported as the identification of the query spectrum. Our experimental tests show that ChimST could significantly outperform the three state-of-the-art library search tools, SpectraST, reSpect, and MSPLIT, in terms of the numbers of both peptide-spectrum matches and unique peptides, especially when the acquisition isolation window is broad.
Collapse
|
9
|
Chantada-Vázquez MDP, García Vence M, Serna A, Núñez C, Bravo SB. SWATH-MS Protocols in Human Diseases. Methods Mol Biol 2021; 2259:105-141. [PMID: 33687711 DOI: 10.1007/978-1-0716-1178-4_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Identification of molecular biomarkers for human diseases is one of the most important disciplines in translational science as it helps to elucidate their origin and early progression. Thus, it is a key factor in better diagnosis, prognosis, and treatment. Proteomics can help to solve the problem of sample complexity when the most common primary sample specimens were analyzed: organic fluids of easy access. The latest developments in high-throughput and label-free quantitative proteomics (SWATH-MS), together with more advanced liquid chromatography, have enabled the analysis of large sample sets with the sensitivity and depth needed to succeed in this task. In this chapter, we show different sample processing methods (major protein depletion, digestion, etc.) and a micro LC-SWATH-MS protocol to identify/quantify several proteins in different types of samples (serum/plasma, saliva, urine, tears).
Collapse
Affiliation(s)
| | - María García Vence
- Proteomic Unit, Instituto de Investigaciones Sanitarias-IDIS, Complejo Hospitalario Universitario de Santiago de Compostela (CHUS), Santiago de Compostela, Spain
| | | | - Cristina Núñez
- Research Unit, Hospital Universitario Lucus Augusti (HULA), Servizo Galego de Saúde (SERGAS), Lugo, Spain.
| | - Susana B Bravo
- Proteomic Unit, Instituto de Investigaciones Sanitarias-IDIS, Complejo Hospitalario Universitario de Santiago de Compostela (CHUS), Santiago de Compostela, Spain.
| |
Collapse
|
10
|
Quantitative shotgun proteome analysis by direct infusion. Nat Methods 2020; 17:1222-1228. [PMID: 33230323 PMCID: PMC8009190 DOI: 10.1038/s41592-020-00999-z] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 10/21/2020] [Indexed: 11/15/2022]
Abstract
Liquid chromatography mass spectrometry (LC-MS) delivers sensitive peptide analysis for proteomics, but the methodology requires extensive analysis time, hampering throughput. Here, we demonstrate that using gas-phase peptide separation instead of LC enables fast proteome analysis. Using Direct Infusion – Shotgun Proteome Analysis (DI-SPA) by data-independent acquisition mass spectrometry (DIA-MS), we demonstrate the targeted quantification of over 500 proteins within minutes of MS data collection (~3.5 proteins/second). We show the utility of this technology to perform a complex multifactorial proteome study of interactions between nutrients, genotype, and mitochondrial toxins in a collection of cultured human cells. More than 45,000 quantitative protein measurements from 132 samples were achieved in only 4.4 hours of MS data collection. Enabling fast, unbiased proteome quantification without LC, DI-SPA offers an approach to boosting throughput critical to drug and biomarker discovery studies that require analysis of thousands of proteomes.
Collapse
|
11
|
Fernández-Costa C, Martínez-Bartolomé S, McClatchy DB, Saviola AJ, Yu NK, Yates JR. Impact of the Identification Strategy on the Reproducibility of the DDA and DIA Results. J Proteome Res 2020; 19:3153-3161. [PMID: 32510229 PMCID: PMC7898222 DOI: 10.1021/acs.jproteome.0c00153] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Data-independent acquisition (DIA) is a promising technique for the proteomic analysis of complex protein samples. A number of studies have claimed that DIA experiments are more reproducible than data-dependent acquisition (DDA), but these claims are unsubstantiated since different data analysis methods are used in the two methods. Data analysis in most DIA workflows depends on spectral library searches, whereas DDA typically employs sequence database searches. In this study, we examined the reproducibility of the DIA and DDA results using both sequence database and spectral library search. The comparison was first performed using a cell lysate and then extended to an interactome study. Protein overlap among the technical replicates in both DDA and DIA experiments was 30% higher with library-based identifications than with sequence database identifications. The reproducibility of quantification was also improved with library search compared to database search, with the mean of the coefficient of variation decreasing more than 30% and a reduction in the number of missing values of more than 35%. Our results show that regardless of the acquisition method, higher identification and quantification reproducibility is observed when library search was used.
Collapse
Affiliation(s)
- Carolina Fernández-Costa
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| | | | - Daniel B. McClatchy
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Anthony J. Saviola
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nam-Kyung Yu
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| | - John R. Yates
- Departments of Molecular Medicine & Neurobiology, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
12
|
Fernández-Costa C, Martínez-Bartolomé S, McClatchy D, Yates JR. Improving Proteomics Data Reproducibility with a Dual-Search Strategy. Anal Chem 2020; 92:1697-1701. [PMID: 31880919 DOI: 10.1021/acs.analchem.9b04955] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Mass spectrometry-based proteomics is an invaluable tool for addressing important biological questions. Data-dependent acquisition methods effectuate stochastic acquisition of data in complex mixtures, which results in missing identifications across replicates. We developed a search approach that improves the reproducibility of data acquired from any mass spectrometer. In our approach, a spectral library is built from the identification results from a database search, and then, the library is used to research the same data files to obtain the final result. We showed that higher identification and quantification reproducibility is achieved with the dual-search approach than with a typical database search. Four datasets with different complexity were compared: (1) data from a cell lysate study performed in our lab, (2) data from an interactome study performed in our lab, (3) a publicly available extracellular vesicles dataset, and (4) a publicly available phosphoproteomics dataset. Our results show that the dual-search approach can be widely and easily used to improve data quality in proteomics data.
Collapse
Affiliation(s)
- Carolina Fernández-Costa
- Department of Molecular Medicine , The Scripps Research Institute , La Jolla , California 92037 , United States
| | - Salvador Martínez-Bartolomé
- Department of Molecular Medicine , The Scripps Research Institute , La Jolla , California 92037 , United States
| | - Daniel McClatchy
- Department of Molecular Medicine , The Scripps Research Institute , La Jolla , California 92037 , United States
| | - John R Yates
- Department of Molecular Medicine , The Scripps Research Institute , La Jolla , California 92037 , United States
| |
Collapse
|
13
|
Ammar C, Berchtold E, Csaba G, Schmidt A, Imhof A, Zimmer R. Multi-Reference Spectral Library Yields Almost Complete Coverage of Heterogeneous LC-MS/MS Data Sets. J Proteome Res 2019; 18:1553-1566. [DOI: 10.1021/acs.jproteome.8b00819] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Constantin Ammar
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17, 80333 München, Germany
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, 81337 München, Germany
| | - Evi Berchtold
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17, 80333 München, Germany
| | - Gergely Csaba
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17, 80333 München, Germany
| | - Andreas Schmidt
- Zentrallabor für Proteinanalytik (Protein Analysis Unit), Ludwig-Maximilians-Universität München, Grosshaderner Strasse 9, 82152 Planegg-Martinsried, Germany
| | - Axel Imhof
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, 81337 München, Germany
- Zentrallabor für Proteinanalytik (Protein Analysis Unit), Ludwig-Maximilians-Universität München, Grosshaderner Strasse 9, 82152 Planegg-Martinsried, Germany
| | - Ralf Zimmer
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München, Amalienstrasse 17, 80333 München, Germany
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, 81337 München, Germany
| |
Collapse
|
14
|
Deutsch EW, Perez-Riverol Y, Chalkley RJ, Wilhelm M, Tate S, Sachsenberg T, Walzer M, Käll L, Delanghe B, Böcker S, Schymanski EL, Wilmes P, Dorfer V, Kuster B, Volders PJ, Jehmlich N, Vissers JP, Wolan DW, Wang AY, Mendoza L, Shofstahl J, Dowsey AW, Griss J, Salek RM, Neumann S, Binz PA, Lam H, Vizcaíno JA, Bandeira N, Röst H. Expanding the Use of Spectral Libraries in Proteomics. J Proteome Res 2018; 17:4051-4060. [PMID: 30270626 PMCID: PMC6443480 DOI: 10.1021/acs.jproteome.8b00485] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Standards Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. We present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine analysis of proteomics datasets.
Collapse
Affiliation(s)
- Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Robert J. Chalkley
- University of California San Francisco, San Francisco, 94158, California, United States
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
| | | | - Timo Sachsenberg
- Department of Computer Science, Center for Bioinformatics, University of Tübingen, Sand 14, Tübingen, 72076, Germany
| | - Mathias Walzer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH − Royal Institute of Technology, Stockholm 114 28, Sweden
| | - Bernard Delanghe
- Thermo Fisher Scientific Bremen, Hanna-Kunath Str. 11, 28199 Bremen, Germany
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller-University Jena, 07743 Jena, Germany
| | - Emma L. Schymanski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
| | - Viktoria Dorfer
- University of Applied Sciences Upper Austria, Bioinformatics Research Group, Hagenberg, 4232, Austria
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, 85354, Germany
- Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University of Munich, Freising, 85354, Germany
| | | | - Nico Jehmlich
- Helmholtz-Centre for Environmental Research - UFZ, Leipzig, Germany
| | | | - Dennis W. Wolan
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Ana Y. Wang
- Department of Molecular Medicine, The Scripps Research Institute, 92037, La Jolla, California, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington, 98109, United States
| | - Jim Shofstahl
- Thermo Fisher Scientific, 355 River Oaks Parkway San Jose, CA 95134
| | - Andrew W. Dowsey
- Department of Population Health Sciences and Bristol Veterinary School, Faculty of Health Sciences, University of Bristol, Bristol BS9 1BN, UK
| | - Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Währinger Gürtel 18-20, Vienna 1090, Austria
| | - Reza M. Salek
- The International Agency for Research on Cancer (IARC), 150 Cours Albert Thomas, 69372 Lyon CEDEX 08, France
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry, Department of Stress and Developmental Biology, 06120 Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Pierre-Alain Binz
- Clinical Chemistry Service, Centre Hospitalier Universitaire Vaudois, 1011 Lausanne, Switzerland
| | - Henry Lam
- Department of Chemical and Biological Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, Department of Computer Science and Engineering, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 92093-0404, USA
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, 160 College St., Toronto, ON, M5S 3E1, Canada
| |
Collapse
|
15
|
Assembling the Community-Scale Discoverable Human Proteome. Cell Syst 2018; 7:412-421.e5. [PMID: 30172843 PMCID: PMC6279426 DOI: 10.1016/j.cels.2018.08.004] [Citation(s) in RCA: 113] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 12/22/2017] [Accepted: 08/03/2018] [Indexed: 01/15/2023]
Abstract
The increasing throughput and sharing of proteomics mass spectrometry data have now yielded over one-third of a million public mass spectrometry runs. However, these discoveries are not continuously aggregated in an open and error-controlled manner, which limits their utility. To facilitate the reusability of these data, we built the MassIVE Knowledge Base (MassIVE-KB), a community-wide, continuously updating knowledge base that aggregates proteomics mass spectrometry discoveries into an open reusable format with full provenance information for community scrutiny. Reusing >31 TB of public human data stored in a mass spectrometry interactive virtual environment (MassIVE), the MassIVE-KB contains >2.1 million precursors from 19,610 proteins (48% larger than before; 97% of the total) and doubles proteome coverage to 6 million amino acids (54% of the proteome) with strict library-scale false discovery controls, thereby providing evidence for 430 proteins for which sufficient protein-level evidence was previously missing. Furthermore, MassIVE-KB can inform experimental design, helps identify and quantify new data, and provides tools for community construction of specialized spectral libraries. Wang et al. introduce MassIVE-KB, a program designed to distill the entire community’s mass spectrometry data into reusable spectral library resources. As a result, the statistically-significant discovery of a peptide or protein in a single researcher’s data will thus be made available to the whole community to support its identification (in shotgun experiments) or quantitative detection (in targeted experiments) in all future analyses.
Collapse
|
16
|
Zhu K, Liu X. A graph-based approach for proteoform identification and quantification using top-down homogeneous multiplexed tandem mass spectra. BMC Bioinformatics 2018; 19:280. [PMID: 30367573 PMCID: PMC6101081 DOI: 10.1186/s12859-018-2273-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Background Top-down homogeneous multiplexed tandem mass (HomMTM) spectra are generated from modified proteoforms of the same protein with different post-translational modification patterns. They are frequently observed in the analysis of ultramodified proteins, some proteoforms of which have similar molecular weights and cannot be well separated by liquid chromatography in mass spectrometry analysis. Results We formulate the top-down HomMTM spectral identification problem as the minimum error k-splittable flow problem on graphs and propose a graph-based algorithm for the identification and quantification of proteoforms using top-down HomMTM spectra. Conclusions Experiments on a top-down mass spectrometry data set of the histone H4 protein showed that the proposed method identified many proteoform pairs that better explain the query spectra than single proteoforms. Electronic supplementary material The online version of this article (10.1186/s12859-018-2273-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kaiyuan Zhu
- Department of Computer Science, Indiana University Bloomington, 700 N. Woodlawn Avenue, Bloomington, IN, 47408, USA
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 719 Indiana Avenue, Indianapolis, IN, 46202, USA. .,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 W. 10th Street, Indianapolis, IN, 46202, USA.
| |
Collapse
|
17
|
Liu Y, Ma B, Zhang K, Lajoie G. An Approach for Peptide Identification by De Novo Sequencing of Mixture Spectra. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:326-336. [PMID: 28368810 DOI: 10.1109/tcbb.2015.2407401] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Mixture spectra occur quite frequently in a typical wet-lab mass spectrometry experiment, which result from the concurrent fragmentation of multiple precursors. The ability to efficiently and confidently identify mixture spectra is essential to alleviate the existent bottleneck of low mass spectra identification rate. However, most of the traditional computational methods are not suitable for interpreting mixture spectra, because they still take the assumption that the acquired spectra come from the fragmentation of a single precursor. In this manuscript, we formulate the mixture spectra de novo sequencing problem mathematically, and propose a dynamic programming algorithm for the problem. Additionally, we use both simulated and real mixture spectra data sets to verify the merits of the proposed algorithm.
Collapse
|
18
|
Deutsch EW, Csordas A, Sun Z, Jarnuczak A, Perez-Riverol Y, Ternent T, Campbell DS, Bernal-Llinares M, Okuda S, Kawano S, Moritz RL, Carver JJ, Wang M, Ishihama Y, Bandeira N, Hermjakob H, Vizcaíno JA. The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res 2016; 45:D1100-D1106. [PMID: 27924013 PMCID: PMC5210636 DOI: 10.1093/nar/gkw936] [Citation(s) in RCA: 692] [Impact Index Per Article: 76.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Accepted: 10/07/2016] [Indexed: 11/13/2022] Open
Abstract
The ProteomeXchange (PX) Consortium of proteomics resources (http://www.proteomexchange.org) was formally started in 2011 to standardize data submission and dissemination of mass spectrometry proteomics data worldwide. We give an overview of the current consortium activities and describe the advances of the past few years. Augmenting the PX founding members (PRIDE and PeptideAtlas, including the PASSEL resource), two new members have joined the consortium: MassIVE and jPOST. ProteomeCentral remains as the common data access portal, providing the ability to search for data sets in all participating PX resources, now with enhanced data visualization components. We describe the updated submission guidelines, now expanded to include four members instead of two. As demonstrated by data submission statistics, PX is supporting a change in culture of the proteomics field: public data sharing is now an accepted standard, supported by requirements for journal submissions resulting in public data release becoming the norm. More than 4500 data sets have been submitted to the various PX resources since 2012. Human is the most represented species with approximately half of the data sets, followed by some of the main model organisms and a growing list of more than 900 diverse species. Data reprocessing activities are becoming more prominent, with both MassIVE and PeptideAtlas releasing the results of reprocessed data sets. Finally, we outline the upcoming advances for ProteomeXchange.
Collapse
Affiliation(s)
| | - Attila Csordas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Zhi Sun
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Andrew Jarnuczak
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Tobias Ternent
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Manuel Bernal-Llinares
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata 951-8510, Japan
| | - Shin Kawano
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Kashiwa 277-0871, Japan
| | | | - Jeremy J Carver
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA.,Department Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Mingxun Wang
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA.,Department Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Yasushi Ishihama
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto 606-8501, Japan
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, CA 92093, USA.,Department Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, CA 92093, USA.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, CA 92093, USA
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.,National Center for Protein Sciences, Beijing, China
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
19
|
Griss J, Perez-Riverol Y, Lewis S, Tabb DL, Dianes JA, del-Toro N, Rurik M, Walzer MW, Kohlbacher O, Hermjakob H, Wang R, Vizcaíno JA. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nat Methods 2016; 13:651-656. [PMID: 27493588 PMCID: PMC4968634 DOI: 10.1038/nmeth.3902] [Citation(s) in RCA: 120] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 05/24/2016] [Indexed: 12/13/2022]
Abstract
Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average 75% of spectra analysed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large-scale to shed a light on these unidentified spectra. PRoteomics IDEntifications database (PRIDE) Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in PRIDE Archive, coming from hundreds of datasets, we were able to consistently characterize three distinct groups of spectra: 1) incorrectly identified spectra, 2) spectra correctly identified but below the set scoring threshold, and 3) truly unidentified spectra. Using a multitude of complementary analysis approaches, we were able to identify less than 20% of the consistently unidentified spectra. The complete spectrum clustering results are available through the new version of the PRIDE Cluster resource (http://www.ebi.ac.uk/pride/cluster). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra.
Collapse
Affiliation(s)
- Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Steve Lewis
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - David L. Tabb
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville
| | - José A. Dianes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Noemi del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Marc Rurik
- Dept. of Computer Science, University of Tübingen, Germany
- Center for Bioinformatics, University of Tübingen, Germany
| | - Mathias W. Walzer
- Dept. of Computer Science, University of Tübingen, Germany
- Center for Bioinformatics, University of Tübingen, Germany
| | - Oliver Kohlbacher
- Dept. of Computer Science, University of Tübingen, Germany
- Center for Bioinformatics, University of Tübingen, Germany
- Quantitative Biology Center, University of Tübingen, Germany
- Max Planck Institute for Developmental Biology, Germany
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
- National Center for Protein Sciences, Beijing, China
| | - Rui Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
20
|
Shi T, Song E, Nie S, Rodland KD, Liu T, Qian WJ, Smith RD. Advances in targeted proteomics and applications to biomedical research. Proteomics 2016; 16:2160-82. [PMID: 27302376 PMCID: PMC5051956 DOI: 10.1002/pmic.201500449] [Citation(s) in RCA: 167] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Revised: 05/09/2016] [Accepted: 06/10/2016] [Indexed: 12/17/2022]
Abstract
Targeted proteomics technique has emerged as a powerful protein quantification tool in systems biology, biomedical research, and increasing for clinical applications. The most widely used targeted proteomics approach, selected reaction monitoring (SRM), also known as multiple reaction monitoring (MRM), can be used for quantification of cellular signaling networks and preclinical verification of candidate protein biomarkers. As an extension to our previous review on advances in SRM sensitivity (Shi et al., Proteomics, 12, 1074-1092, 2012) herein we review recent advances in the method and technology for further enhancing SRM sensitivity (from 2012 to present), and highlighting its broad biomedical applications in human bodily fluids, tissue and cell lines. Furthermore, we also review two recently introduced targeted proteomics approaches, parallel reaction monitoring (PRM) and data-independent acquisition (DIA) with targeted data extraction on fast scanning high-resolution accurate-mass (HR/AM) instruments. Such HR/AM targeted quantification with monitoring all target product ions addresses SRM limitations effectively in specificity and multiplexing; whereas when compared to SRM, PRM and DIA are still in the infancy with a limited number of applications. Thus, for HR/AM targeted quantification we focus our discussion on method development, data processing and analysis, and its advantages and limitations in targeted proteomics. Finally, general perspectives on the potential of achieving both high sensitivity and high sample throughput for large-scale quantification of hundreds of target proteins are discussed.
Collapse
Affiliation(s)
- Tujin Shi
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Ehwang Song
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Song Nie
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Karin D Rodland
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Tao Liu
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Wei-Jun Qian
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Richard D Smith
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| |
Collapse
|
21
|
Bourmaud A, Gallien S, Domon B. Parallel reaction monitoring using quadrupole-Orbitrap mass spectrometer: Principle and applications. Proteomics 2016; 16:2146-59. [PMID: 27145088 DOI: 10.1002/pmic.201500543] [Citation(s) in RCA: 216] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Revised: 03/21/2016] [Accepted: 05/02/2016] [Indexed: 12/19/2022]
Abstract
Targeted mass spectrometry-based approaches are nowadays widely used for quantitative proteomics studies and more recently have been implemented on high resolution/accurate mass (HRAM) instruments resulting in a considerable performance improvement. More specifically, the parallel reaction monitoring technique (PRM) performed on quadrupole-Orbitrap mass spectrometers, leveraging the high resolution and trapping capabilities of the instrument, offers a clear advantage over the conventional selected reaction monitoring (SRM) measurements executed on triple quadrupole instruments. Analyses performed in HRAM mode allow for an improved discrimination between signals derived from analytes and those resulting from matrix interferences translating in the reliable quantification of low abundance components. The purpose of the study defines various implementation schemes of PRM, namely: (i) exploratory experiments assessing the detectability of very large sets of peptides (100-1000), (ii) wide-screen analyses using (crude) internal standards to obtain statistically meaningful (relative) quantitative analyses, and (iii) precise/accurate quantification of a limited number of analytes using calibrated internal standards. Each of the three implementation schemes requires specific acquisition methods with defined parameters to appropriately control the acquisition during the actual peptide elution. This tutorial describes the different PRM approaches and discusses their benefits and limitations in terms of quantification performance and confidence in analyte identification.
Collapse
Affiliation(s)
- Adele Bourmaud
- Luxembourg Clinical Proteomics Center, Luxembourg Institute of Health (LIH), Strassen, Luxembourg.,Doctoral School in Systems and Molecular Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Sebastien Gallien
- Luxembourg Clinical Proteomics Center, Luxembourg Institute of Health (LIH), Strassen, Luxembourg
| | - Bruno Domon
- Luxembourg Clinical Proteomics Center, Luxembourg Institute of Health (LIH), Strassen, Luxembourg.,Doctoral School in Systems and Molecular Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| |
Collapse
|
22
|
Griss J. Spectral library searching in proteomics. Proteomics 2016; 16:729-40. [PMID: 26616598 DOI: 10.1002/pmic.201500296] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 10/15/2015] [Accepted: 10/29/2015] [Indexed: 12/12/2022]
Abstract
Spectral library searching has become a mature method to identify tandem mass spectra in proteomics data analysis. This review provides a comprehensive overview of available spectral library search engines and highlights their distinct features. Additionally, resources providing spectral libraries are summarized and tools presented that extend experimental spectral libraries by simulating spectra. Finally, spectrum clustering algorithms are discussed that utilize the same spectrum-to-spectrum matching algorithms as spectral library search engines and allow novel methods to analyse proteomics data.
Collapse
Affiliation(s)
- Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
23
|
Liu Y, Sun W, John J, Lajoie G, Ma B, Zhang K. De Novo Sequencing Assisted Approach for Characterizing Mixture MS/MS Spectra. IEEE Trans Nanobioscience 2016; 15:166-76. [PMID: 26800542 DOI: 10.1109/tnb.2016.2519841] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Extensive research has been conducted for the computational analysis of mass spectrometry based proteomics data. However, there are still remaining challenges, among which, one particular challenge is the low identification rate of the collected spectral data. A specific contributing factor is the existence of mixture spectra in the collected MS/MS spectra which are generated by the concurrent fragmentation of multiple precursors in one sequencing attempt. The quite frequently observed mixture spectra necessitates the development of effective computational approaches to characterize those non-conventional spectral data. In this research, we proposed an approach for matching the query mixture spectra with a pair of peptide sequences acquired from the protein database by incorporating a special de novo assisted filtration strategy. The experiment results on two different datasets of MS/MS spectra containing mixed ion fragments from multiple peptides demonstrated the efficiency of the integrated filtration strategy in reducing examination space and verified the effectiveness of the proposed matching scheme as well.
Collapse
|
24
|
Holewinski RJ, Parker SJ, Matlock AD, Venkatraman V, Van Eyk JE. Methods for SWATH™: Data Independent Acquisition on TripleTOF Mass Spectrometers. Methods Mol Biol 2016; 1410:265-79. [PMID: 26867750 PMCID: PMC11552544 DOI: 10.1007/978-1-4939-3524-6_16] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Data independent acquisition (DIA also termed SWATH) is an emerging technology in the field of mass spectrometry based proteomics. Although the concept of DIA has been around for over a decade, the recent advancements, in particular the speed of acquisition, of mass analyzers have pushed the technique into the spotlight and allowed for high-quality DIA data to be routinely acquired by proteomics labs. In this chapter we will discuss the protocols used for DIA acquisition using the Sciex TripleTOF mass spectrometers and data analysis using the Sciex processing software.
Collapse
Affiliation(s)
- Ronald J Holewinski
- Advanced Clinical Biosystems Research Institute, The Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
| | - Sarah J Parker
- Advanced Clinical Biosystems Research Institute, The Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Andrea D Matlock
- Advanced Clinical Biosystems Research Institute, The Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Vidya Venkatraman
- Advanced Clinical Biosystems Research Institute, The Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, The Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| |
Collapse
|
25
|
Perez-Riverol Y, Alpi E, Wang R, Hermjakob H, Vizcaíno JA. Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics 2015; 15:930-49. [PMID: 25158685 PMCID: PMC4409848 DOI: 10.1002/pmic.201400302] [Citation(s) in RCA: 141] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 08/06/2014] [Accepted: 08/22/2014] [Indexed: 01/10/2023]
Abstract
Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | | | |
Collapse
|
26
|
Hook V, Bandeira N. Neuropeptidomics Mass Spectrometry Reveals Signaling Networks Generated by Distinct Protease Pathways in Human Systems. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:1970-80. [PMID: 26483184 PMCID: PMC4749436 DOI: 10.1007/s13361-015-1251-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 07/30/2015] [Accepted: 08/05/2015] [Indexed: 05/23/2023]
Abstract
Neuropeptides regulate intercellular signaling as neurotransmitters of the central and peripheral nervous systems, and as peptide hormones in the endocrine system. Diverse neuropeptides of distinct primary sequences of various lengths, often with post-translational modifications, coordinate and integrate regulation of physiological functions. Mass spectrometry-based analysis of the diverse neuropeptide structures in neuropeptidomics research is necessary to define the full complement of neuropeptide signaling molecules. Human neuropeptidomics has notable importance in defining normal and dysfunctional neuropeptide signaling in human health and disease. Neuropeptidomics has great potential for expansion in translational research opportunities for defining neuropeptide mechanisms of human diseases, providing novel neuropeptide drug targets for drug discovery, and monitoring neuropeptides as biomarkers of drug responses. In consideration of the high impact of human neuropeptidomics for health, an observed gap in this discipline is the few published articles in human neuropeptidomics compared with, for example, human proteomics and related mass spectrometry disciplines. Focus on human neuropeptidomics will advance new knowledge of the complex neuropeptide signaling networks participating in the fine control of neuroendocrine systems. This commentary review article discusses several human neuropeptidomics accomplishments that illustrate the rapidly expanding diversity of neuropeptides generated by protease processing of pro-neuropeptide precursors occurring within the secretory vesicle proteome. Of particular interest is the finding that human-specific cathepsin V participates in producing enkephalin and likely other neuropeptides, indicating unique proteolytic mechanisms for generating human neuropeptides. The field of human neuropeptidomics has great promise to solve new mechanisms in disease conditions, leading to new drug targets and therapeutic agents for human diseases. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Vivian Hook
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, 92093-0719, USA.
- School of Medicine, Department of Neurosciences and Department of Pharmacology, University of California, San Diego, La Jolla, CA, 92093-0719, USA.
| | - Nuno Bandeira
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, CA, 92093-0719, USA
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, 92093-0719, USA
| |
Collapse
|
27
|
Shteynberg D, Mendoza L, Hoopmann MR, Sun Z, Schmidt F, Deutsch EW, Moritz RL. reSpect: software for identification of high and low abundance ion species in chimeric tandem mass spectra. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:1837-1847. [PMID: 26419769 PMCID: PMC4750398 DOI: 10.1007/s13361-015-1252-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 06/22/2015] [Accepted: 08/11/2015] [Indexed: 06/05/2023]
Abstract
Most shotgun proteomics data analysis workflows are based on the assumption that each fragment ion spectrum is explained by a single species of peptide ion isolated by the mass spectrometer; however, in reality mass spectrometers often isolate more than one peptide ion within the window of isolation that contribute to additional peptide fragment peaks in many spectra. We present a new tool called reSpect, implemented in the Trans-Proteomic Pipeline (TPP), which enables an iterative workflow whereby fragment ion peaks explained by a peptide ion identified in one round of sequence searching or spectral library search are attenuated based on the confidence of the identification, and then the altered spectrum is subjected to further rounds of searching. The reSpect tool is not implemented as a search engine, but rather as a post-search engine processing step where only fragment ion intensities are altered. This enables the application of any search engine combination in the iterations that follow. Thus, reSpect is compatible with all other protein sequence database search engines as well as peptide spectral library search engines that are supported by the TPP. We show that while some datasets are highly amenable to chimeric spectrum identification and lead to additional peptide identification boosts of over 30% with as many as four different peptide ions identified per spectrum, datasets with narrow precursor ion selection only benefit from such processing at the level of a few percent. We demonstrate a technique that facilitates the determination of the degree to which a dataset would benefit from chimeric spectrum analysis. The reSpect tool is free and open source, provided within the TPP and available at the TPP website. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
| | | | | | - Zhi Sun
- Institute for Systems Biology, Seattle, WA, USA
| | - Frank Schmidt
- ZIK-FunGene Junior Research Group Applied Proteomics, Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | | | | |
Collapse
|
28
|
Parker SJ, Raedschelders K, Van Eyk JE. Emerging proteomic technologies for elucidating context-dependent cellular signaling events: A big challenge of tiny proportions. Proteomics 2015; 15:1486-502. [PMID: 25545106 DOI: 10.1002/pmic.201400448] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2014] [Revised: 10/31/2014] [Accepted: 12/23/2014] [Indexed: 12/11/2022]
Abstract
Aberrant cell signaling events either drive or compensate for nearly all pathologies. A thorough description and quantification of maladaptive signaling flux in disease is a critical step in drug development, and complex proteomic approaches can provide valuable mechanistic insights. Traditional proteomics-based signaling analyses rely heavily on in vitro cellular monoculture. The characterization of these simplified systems generates a rich understanding of the basic components and complex interactions of many signaling networks, but they cannot capture the full complexity of the microenvironments in which pathologies are ultimately made manifest. Unfortunately, techniques that can directly interrogate signaling in situ often yield mass-limited starting materials that are incompatible with traditional proteomics workflows. This review provides an overview of established and emerging techniques that are applicable to context-dependent proteomics. Analytical approaches are illustrated through recent proteomics-based studies in which selective sample acquisition strategies preserve context-dependent information, and where the challenge of minimal starting material is met by optimized sensitivity and coverage. This review is organized into three major technological themes: (i) LC methods in line with MS; (ii) antibody-based approaches; (iii) MS imaging with a discussion of data integration and systems modeling. Finally, we conclude with future perspectives and implications of context-dependent proteomics.
Collapse
Affiliation(s)
- Sarah J Parker
- Department of Medicine, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA; Advanced Clinical Biosystems Research Institute, Los Angeles, CA, USA; Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA; Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | | | | |
Collapse
|
29
|
Bilbao A, Varesio E, Luban J, Strambio-De-Castillia C, Hopfgartner G, Müller M, Lisacek F. Processing strategies and software solutions for data-independent acquisition in mass spectrometry. Proteomics 2015; 15:964-80. [DOI: 10.1002/pmic.201400323] [Citation(s) in RCA: 119] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Revised: 10/08/2014] [Accepted: 11/24/2014] [Indexed: 11/10/2022]
Affiliation(s)
- Aivett Bilbao
- Proteome Informatics Group; SIB Swiss Institute of Bioinformatics; Geneva Switzerland
- Life Sciences Mass Spectrometry; School of Pharmaceutical Sciences; University of Geneva; University of Lausanne; Geneva Switzerland
| | - Emmanuel Varesio
- Life Sciences Mass Spectrometry; School of Pharmaceutical Sciences; University of Geneva; University of Lausanne; Geneva Switzerland
| | - Jeremy Luban
- Program in Molecular Medicine; University of Massachusetts Medical School; Worcester MA USA
| | | | - Gérard Hopfgartner
- Life Sciences Mass Spectrometry; School of Pharmaceutical Sciences; University of Geneva; University of Lausanne; Geneva Switzerland
| | - Markus Müller
- Proteome Informatics Group; SIB Swiss Institute of Bioinformatics; Geneva Switzerland
- Faculty of Sciences; University of Geneva; Geneva Switzerland
| | - Frédérique Lisacek
- Proteome Informatics Group; SIB Swiss Institute of Bioinformatics; Geneva Switzerland
- Faculty of Sciences; University of Geneva; Geneva Switzerland
| |
Collapse
|
30
|
Wang J, Bourne PE, Bandeira N. MixGF: spectral probabilities for mixture spectra from more than one peptide. Mol Cell Proteomics 2014; 13:3688-97. [PMID: 25225354 DOI: 10.1074/mcp.o113.037218] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In large-scale proteomic experiments, multiple peptide precursors are often cofragmented simultaneously in the same mixture tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols promoting the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30-390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra.
Collapse
Affiliation(s)
- Jian Wang
- From the ‡Bioinformatics Program, University of California, San Diego, La Jolla, California
| | - Philip E Bourne
- §Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California
| | - Nuno Bandeira
- §Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California; ¶Center for Computational Mass Spectrometry, University of California, San Diego, La, Jolla, California; ‖Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92092
| |
Collapse
|
31
|
Wilhelm T, Jones AME. Identification of related peptides through the analysis of fragment ion mass shifts. J Proteome Res 2014; 13:4002-11. [PMID: 25058668 DOI: 10.1021/pr500347e] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Mass spectrometry (MS) has become the method of choice to identify and quantify proteins, typically by fragmenting peptides and inferring protein identification by reference to sequence databases. Well-established programs have largely solved the problem of identifying peptides in complex mixtures. However, to prevent the search space from becoming prohibitively large, most search engines need a list of expected modifications. Therefore, unexpected modifications limit both the identification of proteins and peptide-based quantification. We developed mass spectrometry-peak shift analysis (MS-PSA) to rapidly identify related spectra in large data sets without reference to databases or specified modifications. Peptide identifications from established tools, such as MASCOT or SEQUEST, may be propagated onto MS-PSA results. Modification of a peptide alters the mass of the precursor ion and some of the fragmentation ions. MS-PSA identifies characteristic fragmentation masses from MS/MS spectra. Related spectra are identified by pattern matching of unchanged and mass-shifted fragment ions. We illustrate the use of MS-PSA with simple and complex mixtures with both high and low mass accuracy data sets. MS-PSA is not limited to the analysis of peptides but can be used for the identification of related groups of spectra in any set of fragmentation patterns.
Collapse
Affiliation(s)
- Thomas Wilhelm
- Institute of Food Research , Norwich Research Park, Norwich NR4 7UA, United Kingdom
| | | |
Collapse
|
32
|
Toprak UH, Gillet LC, Maiolica A, Navarro P, Leitner A, Aebersold R. Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Mol Cell Proteomics 2014; 13:2056-71. [PMID: 24623587 PMCID: PMC4125737 DOI: 10.1074/mcp.o113.036475] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Revised: 02/26/2014] [Indexed: 12/21/2022] Open
Abstract
Quantifying the similarity of spectra is an important task in various areas of spectroscopy, for example, to identify a compound by comparing sample spectra to those of reference standards. In mass spectrometry based discovery proteomics, spectral comparisons are used to infer the amino acid sequence of peptides. In targeted proteomics by selected reaction monitoring (SRM) or SWATH MS, predetermined sets of fragment ion signals integrated over chromatographic time are used to identify target peptides in complex samples. In both cases, confidence in peptide identification is directly related to the quality of spectral matches. In this study, we used sets of simulated spectra of well-controlled dissimilarity to benchmark different spectral comparison measures and to develop a robust scoring scheme that quantifies the similarity of fragment ion spectra. We applied the normalized spectral contrast angle score to quantify the similarity of spectra to objectively assess fragment ion variability of tandem mass spectrometric datasets, to evaluate portability of peptide fragment ion spectra for targeted mass spectrometry across different types of mass spectrometers and to discriminate target assays from decoys in targeted proteomics. Altogether, this study validates the use of the normalized spectral contrast angle as a sensitive spectral similarity measure for targeted proteomics, and more generally provides a methodology to assess the performance of spectral comparisons and to support the rational selection of the most appropriate similarity measure. The algorithms used in this study are made publicly available as an open source toolset with a graphical user interface.
Collapse
Affiliation(s)
- Umut H Toprak
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Ludovic C Gillet
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Alessio Maiolica
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Pedro Navarro
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Alexander Leitner
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Ruedi Aebersold
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland; §Faculty of Science, University of Zurich, Zurich, 8093 Zurich, Switzerland
| |
Collapse
|
33
|
Law KP, Lim YP. Recent advances in mass spectrometry: data independent analysis and hyper reaction monitoring. Expert Rev Proteomics 2014; 10:551-66. [PMID: 24206228 DOI: 10.1586/14789450.2013.858022] [Citation(s) in RCA: 103] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
New mass spectrometry (MS) methods, collectively known as data independent analysis and hyper reaction monitoring, have recently emerged. These methods hold promises to address the shortcomings of data-dependent analysis and selected reaction monitoring (SRM) employed in shotgun and targeted proteomics, respectively. They allow MS analyses of all species in a complex sample indiscriminately, or permit SRM-like experiments conducted with full high-resolution product ion spectra, potentially leading to higher sequence coverage or analytical selectivity. These methods include MS(E), all-ion fragmentation, Fourier transform-all reaction monitoring, SWATH Acquisition, multiplexed MS/MS, pseudo-SRM (pSRM) and parallel reaction monitoring (PRM). In this review, the strengths and pitfalls of these methods are discussed and illustrated with examples. In essence, the suitability of the use of each method is contingent on the biological questions posed. Although these methods do not fundamentally change the shape of proteomics, they are useful additional tools that should expedite biological discoveries.
Collapse
Affiliation(s)
- Kai Pong Law
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, MD4, Level 1, 14 Medical Drive, 117599, Singapore
| | | |
Collapse
|
34
|
Shliaha PV, Jukes-Jones R, Christoforou A, Fox J, Hughes C, Langridge J, Cain K, Lilley KS. Additional Precursor Purification in Isobaric Mass Tagging Experiments by Traveling Wave Ion Mobility Separation (TWIMS). J Proteome Res 2014; 13:3360-9. [DOI: 10.1021/pr500220g] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Pavel V. Shliaha
- Cambridge
Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, U.K
| | | | - Andy Christoforou
- Cambridge
Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, U.K
| | - Jonathan Fox
- Waters Corporation,
HRMS, Stamford Avenue, Altrincham Road, Wilmslow, SK9 4AX, U.K
| | - Chris Hughes
- Waters Corporation,
HRMS, Stamford Avenue, Altrincham Road, Wilmslow, SK9 4AX, U.K
| | - James Langridge
- Waters Corporation,
HRMS, Stamford Avenue, Altrincham Road, Wilmslow, SK9 4AX, U.K
| | - Kelvin Cain
- MRC
Toxicology Unit, University of Leicester, Leicester, U.K
| | - Kathryn S. Lilley
- Cambridge
Centre for Proteomics, Department of Biochemistry, University of Cambridge, Cambridge, U.K
| |
Collapse
|
35
|
Howbert JJ, Noble WS. Computing exact p-values for a cross-correlation shotgun proteomics score function. Mol Cell Proteomics 2014; 13:2467-79. [PMID: 24895379 DOI: 10.1074/mcp.o113.036327] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The core of every protein mass spectrometry analysis pipeline is a function that assesses the quality of a match between an observed spectrum and a candidate peptide. We describe a procedure for computing exact p-values for the oldest and still widely used score function, SEQUEST XCorr. The procedure uses dynamic programming to enumerate efficiently the full distribution of scores for all possible peptides whose masses are close to that of the spectrum precursor mass. Ranking identified spectra by p-value rather than XCorr significantly reduces variance because of spectrum-specific effects on the score. In combination with the Percolator postprocessor, the XCorr p-value yields more spectrum and peptide identifications at a fixed false discovery rate than Mascot, X!Tandem, Comet, and MS-GF+ across a variety of data sets.
Collapse
Affiliation(s)
- J Jeffry Howbert
- From the ‡Department of Genome Sciences, University of Washington, Seattle, Washington
| | - William Stafford Noble
- From the ‡Department of Genome Sciences, University of Washington, Seattle, Washington; §Department of Computer Science and Engineering, University of Washington, Seattle, Washington
| |
Collapse
|
36
|
Wang J, Anania VG, Knott J, Rush J, Lill JR, Bourne PE, Bandeira N. Combinatorial approach for large-scale identification of linked peptides from tandem mass spectrometry spectra. Mol Cell Proteomics 2014; 13:1128-36. [PMID: 24493012 DOI: 10.1074/mcp.m113.035758] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The combination of chemical cross-linking and mass spectrometry has recently been shown to constitute a powerful tool for studying protein-protein interactions and elucidating the structure of large protein complexes. However, computational methods for interpreting the complex MS/MS spectra from linked peptides are still in their infancy, making the high-throughput application of this approach largely impractical. Because of the lack of large annotated datasets, most current approaches do not capture the specific fragmentation patterns of linked peptides and therefore are not optimal for the identification of cross-linked peptides. Here we propose a generic approach to address this problem and demonstrate it using disulfide-bridged peptide libraries to (i) efficiently generate large mass spectral reference data for linked peptides at a low cost and (ii) automatically train an algorithm that can efficiently and accurately identify linked peptides from MS/MS spectra. We show that using this approach we were able to identify thousands of MS/MS spectra from disulfide-bridged peptides through comparison with proteome-scale sequence databases and significantly improve the sensitivity of cross-linked peptide identification. This allowed us to identify 60% more direct pairwise interactions between the protein subunits in the 20S proteasome complex than existing tools on cross-linking studies of the proteasome complexes. The basic framework of this approach and the MS/MS reference dataset generated should be valuable resources for the future development of new tools for the identification of linked peptides.
Collapse
Affiliation(s)
- Jian Wang
- Bioinformatics Program, University of California, San Diego, La Jolla, California
| | | | | | | | | | | | | |
Collapse
|
37
|
Wang J, Anania VG, Knott J, Rush J, Lill JR, Bourne PE, Bandeira N. A turn-key approach for large-scale identification of complex posttranslational modifications. J Proteome Res 2014; 13:1190-9. [PMID: 24437954 PMCID: PMC3993922 DOI: 10.1021/pr400368u] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The conjugation of complex post-translational modifications (PTMs) such as glycosylation and Small Ubiquitin-like Modification (SUMOylation) to a substrate protein can substantially change the resulting peptide fragmentation pattern compared to its unmodified counterpart, making current database search methods inappropriate for the identification of tandem mass (MS/MS) spectra from such modified peptides. Traditionally it has been difficult to develop new algorithms to identify these atypical peptides because of the lack of a large set of annotated spectra from which to learn the altered fragmentation pattern. Using SUMOylation as an example, we propose a novel approach to generate large MS/MS training data from modified peptides and derive an algorithm that learns properties of PTM-specific fragmentation from such training data. Benchmark tests on data sets of varying complexity show that our method is 80-300% more sensitive than current state-of-the-art approaches. The core concepts of our method are readily applicable to developing algorithms for the identifications of peptides with other complex PTMs.
Collapse
Affiliation(s)
- Jian Wang
- Bioinformatics Program, ∥Skaggs School of Pharmacy and Pharmaceutical Sciences, ⊥Center for Computational Mass Spectrometry, and ¶Department of Computer Science and Engineering, University of California, San Diego , La Jolla, California 92093, United States
| | | | | | | | | | | | | |
Collapse
|
38
|
Shao W, Zhu K, Lam H. Refining similarity scoring to enable decoy-free validation in spectral library searching. Proteomics 2013; 13:3273-83. [DOI: 10.1002/pmic.201300232] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Revised: 08/06/2013] [Accepted: 09/10/2013] [Indexed: 12/30/2022]
Affiliation(s)
- Wenguang Shao
- Division of Biomedical Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong China
| | - Kan Zhu
- Department of Chemical and Biomolecular Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong China
| | - Henry Lam
- Division of Biomedical Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong China
- Department of Chemical and Biomolecular Engineering; The Hong Kong University of Science and Technology; Clear Water Bay Hong Kong China
| |
Collapse
|
39
|
Wang M, Bandeira N. Spectral library generating function for assessing spectrum-spectrum match significance. J Proteome Res 2013; 12:3944-51. [PMID: 23808827 DOI: 10.1021/pr400230p] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Tandem mass spectrometry (MS/MS) continues to be the technology of choice for high-throughput analysis of complex proteomics samples. While MS/MS spectra are commonly identified by matching against a database of known protein sequences, the complementary approach of spectral library searching against collections of reference spectra consistently outperforms sequence-based searches by resulting in significantly more identified spectra. However, while spectral library searches benefit from the advance knowledge of the expected peptide fragmentation patterns recorded in library spectra, estimation of the statistical significance of spectrum-spectrum matches (SSMs) continues to be hindered by difficulties in finding an appropriate definition of "random" SSMs to use as a null model when estimating the significance of true SSMs. We propose to avoid this problem by changing the null hypothesis: instead of determining the probability of observing a high SSM score between randomly matched spectra, we estimate the probability of observing a low SSM score between replicate spectra of the same molecule. To this end, we explicitly model the variation in instrument measurements of MS/MS peak intensities and show how these models can be used to determine a theoretical distribution of SSM scores between reference and query spectra of the same molecule. While the proposed spectral library generating function (SLGF) approach can be used to calculate theoretical distributions for any additive SSM score (e.g., any dot product), we further show how it can be used to calculate the distribution of expected cosines between reference and query spectra. We developed a spectral library search tool, Tremolo, and demonstrate that this SLGF-based search tool significantly outperforms current state-of-the-art spectral library search tools and provide a detailed discussion of the multiple reasons behind the observed differences in the sets of identified MS/MS spectra.
Collapse
Affiliation(s)
- Mingxun Wang
- Department of Computer Science and Engineering, Center for Computational Mass Spectrometry, CSE, and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Dr., La Jolla, California 92093, United States
| | | |
Collapse
|
40
|
Kryuchkov F, Verano-Braga T, Hansen TA, Sprenger RR, Kjeldsen F. Deconvolution of mixture spectra and increased throughput of peptide identification by utilization of intensified complementary ions formed in tandem mass spectrometry. J Proteome Res 2013; 12:3362-71. [PMID: 23725413 DOI: 10.1021/pr400210m] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
A cornerstone of mass spectrometry based proteomics is to relate with high statistical significance experimentally obtained tandem mass spectrometry (MS/MS) data to peptide sequences from a protein database. Most sequence specific fragment ions in MS/MS spectra are represented by a subset of complementary ion pairs. Here, we investigated the reliabilities of complementary ion pairs formed in CAD and CAD/ETD MS/MS and developed a reliability-based approach of intensification of ion signals of complementary pairs prior to database searching. In a large-scale proteomics experiment using high-resolution orbitrap mass spectrometry, an increase in the number of peptide identifications was obtained relative to the original CAD MS/MS spectra when intensified golden complementary (+18.6%) and CAD complementary pairs (+17.2%) were submitted to the Mascot search engine. This also exceeded the results obtained by deisotoping/deconvolution of CAD MS/MS spectra. A novel approach for extracting sequence-specific fragment ions of co-isolated peptides was developed based on the complementarity rules. This technique demonstrated an impressive gain of 42.4% more peptide identifications as compared with the use of the initial data set.
Collapse
Affiliation(s)
- Fedor Kryuchkov
- Protein Research Group, Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark
| | | | | | | | | |
Collapse
|
41
|
Shao W, Lam H. Denoising Peptide Tandem Mass Spectra for Spectral Libraries: A Bayesian Approach. J Proteome Res 2013; 12:3223-32. [DOI: 10.1021/pr400080b] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Affiliation(s)
- Wenguang Shao
- Division
of Biomedical Engineering, and ‡Department of Chemical and Biomolecular Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Henry Lam
- Division
of Biomedical Engineering, and ‡Department of Chemical and Biomolecular Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| |
Collapse
|
42
|
Guthals A, Watrous JD, Dorrestein PC, Bandeira N. The spectral networks paradigm in high throughput mass spectrometry. MOLECULAR BIOSYSTEMS 2013; 8:2535-44. [PMID: 22610447 DOI: 10.1039/c2mb25085c] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.
Collapse
Affiliation(s)
- Adrian Guthals
- Dept. Computer Science and Engineering, University of California, San Diego, USA
| | | | | | | |
Collapse
|
43
|
Abstract
The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.
Collapse
Affiliation(s)
- Rune Matthiesen
- Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal
| |
Collapse
|
44
|
Abstract
Historically many genome annotation strategies have lacked experimental evidence at the protein level, which and have instead relied heavily on ab initio gene prediction tools, which consequently resulted in many incorrectly annotated genomic sequences. Proteogenomics aims to address these issues using mass spectrometry (MS)-based proteomics, genomic mapping, and providing statistical significance measures such as false discovery rates (FDRs) to validate the mapped peptides. Presented here is a tool capable of meeting this goal, the UCSD proteogenomic pipeline, which maps peptide-spectrum matches (PSMs) to the genome using the Inspect MS/MS database search tool and assigns a statistical significance to the match using a target-decoy search approach to assign estimated FDRs. This pipeline also provides the option of using a more reliable approach to proteogenomics by determining the precise false-positive rates (FPRs) and p-values of each PSM by calculating their spectral probabilities and rescoring each PSM accordingly. In addition to the protein prediction challenges in the rapidly growing number of sequenced plant genomes, it is difficult to extract high-quality protein samples from many plant species. For that reason, this chapter contains methods for protein extraction and trypsin digestion that reliably produce samples suitable for proteogenomic analysis.
Collapse
|
45
|
Thalassinos K, Vissers JPC, Tenzer S, Levin Y, Thompson JW, Daniel D, Mann D, DeLong MR, Moseley MA, America AH, Ottens AK, Cavey GS, Efstathiou G, Scrivens JH, Langridge JI, Geromanos SJ. Design and application of a data-independent precursor and product ion repository. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2012; 23:1808-1820. [PMID: 22847389 DOI: 10.1007/s13361-012-0416-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Revised: 05/09/2012] [Accepted: 05/13/2012] [Indexed: 06/01/2023]
Abstract
The functional design and application of a data-independent LC-MS precursor and product ion repository for protein identification, quantification, and validation is conceptually described. The ion repository was constructed from the sequence search results of a broad range of discovery experiments investigating various tissue types of two closely related mammalian species. The relative high degree of similarity in protein complement, ion detection, and peptide and protein identification allows for the analysis of normalized precursor and product ion intensity values, as well as standardized retention times, creating a multidimensional/orthogonal queryable, qualitative, and quantitative space. Peptide ion map selection for identification and quantification is primarily based on replication and limited variation. The information is stored in a relational database and is used to create peptide- and protein-specific fragment ion maps that can be queried in a targeted fashion against the raw or time aligned ion detections. These queries can be conducted either individually or as groups, where the latter affords pathway and molecular machinery analysis of the protein complement. The presented results also suggest that peptide ionization and fragmentation efficiencies are highly conserved between experiments and practically independent of the analyzed biological sample when using similar instrumentation. Moreover, the data illustrate only minor variation in ionization efficiency with amino acid sequence substitutions occurring between species. Finally, the data and the presented results illustrate how LC-MS performance metrics can be extracted and utilized to ensure optimal performance of the employed analytical workflows.
Collapse
|
46
|
Affiliation(s)
- David L Tabb
- Departments of Biomedical Informatics and Biochemistry, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
47
|
Using ion purity scores for enhancing quantitative accuracy and precision in complex proteomics samples. Anal Bioanal Chem 2012; 404:1127-39. [DOI: 10.1007/s00216-012-6197-y] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Revised: 06/06/2012] [Accepted: 06/13/2012] [Indexed: 11/25/2022]
|
48
|
Yuan ZFE, Liu C, Wang HP, Sun RX, Fu Y, Zhang JF, Wang LH, Chi H, Li Y, Xiu LY, Wang WP, He SM. pParse: A method for accurate determination of monoisotopic peaks in high-resolution mass spectra. Proteomics 2011; 12:226-35. [DOI: 10.1002/pmic.201100081] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2011] [Revised: 10/31/2011] [Accepted: 11/02/2011] [Indexed: 11/09/2022]
|
49
|
Lam H. Building and searching tandem mass spectral libraries for peptide identification. Mol Cell Proteomics 2011; 10:R111.008565. [PMID: 21900153 DOI: 10.1074/mcp.r111.008565] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Spectral library searching is an emerging approach in peptide identifications from tandem mass spectra, a critical step in proteomic data analysis. Conceptually, the premise of this approach is that the tandem MS fragmentation pattern of a peptide under some fixed conditions is a reproducible fingerprint of that peptide, such that unknown spectra acquired under the same conditions can be identified by spectral matching. In actual practice, a spectral library is first meticulously compiled from a large collection of previously observed and identified tandem MS spectra, usually obtained from shotgun proteomics experiments of complex mixtures. Then, a query spectrum is then identified by spectral matching using recently developed spectral search engines. This review discusses the basic principles of the two pillars of this approach: spectral library construction, and spectral library searching. An overview of the software tools available for these two tasks, as well as a high-level description of the underlying algorithms, will be given. Finally, several new methods that utilize spectral libraries for peptide identification in ways other than straightforward spectral matching will also be described.
Collapse
Affiliation(s)
- Henry Lam
- Department of Chemical and Biomolecular Engineering, the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| |
Collapse
|
50
|
Wang J, Bourne PE, Bandeira N. Peptide identification by database search of mixture tandem mass spectra. Mol Cell Proteomics 2011; 10:M111.010017. [PMID: 21862760 DOI: 10.1074/mcp.m111.010017] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision.
Collapse
Affiliation(s)
- Jian Wang
- Bioinformatics Program, University of California, San Diego, La Jolla, CA 92093, USA
| | | | | |
Collapse
|