1
|
Wu HT, Riggs DL, Lyon YA, Julian RR. Statistical Framework for Identifying Differences in Similar Mass Spectra: Expanding Possibilities for Isomer Identification. Anal Chem 2023; 95:6996-7005. [PMID: 37128750 PMCID: PMC10157605 DOI: 10.1021/acs.analchem.3c00495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 04/04/2023] [Indexed: 05/03/2023]
Abstract
Isomeric molecules are important analytes in many biological and chemical arenas, yet their similarity poses challenges for many analytical methods, including mass spectrometry (MS). Tandem-MS provides significantly more information about isomers than intact mass analysis, but highly similar fragmentation patterns are common and include cases where no unique m/z peaks are generated between isomeric pairs. However, even in such situations, differences in peak intensity can exist and potentially contain additional information. Herein, we present a framework for comparing mass spectra that differ only in terms of peak intensity and include calculation of a statistical probability that the spectra derive from different analytes. This framework allows for confident identification of peptide isomers by collision-induced dissociation, higher-energy collisional dissociation, electron-transfer dissociation, and radical-directed dissociation. The method successfully identified many types of isomers including various d/l amino acid substitutions, Leu/Ile, and Asp/IsoAsp. The method can accommodate a wide range of changes in instrumental settings including source voltages, isolation widths, and resolution without influencing the analysis. It is shown that quantification of the composition of isomeric mixtures can be enabled with calibration curves, which were found to be highly linear and reproducible. The analysis can be implemented with data collected by either direct infusion or liquid-chromatography MS. Although this framework is presented in the context of isomer characterization, it should also prove useful in many other contexts where similar mass spectra are generated.
Collapse
Affiliation(s)
- Hoi-Ting Wu
- Department of Chemistry, University of California, Riverside, California 92521, United States
| | - Dylan L. Riggs
- Department of Chemistry, University of California, Riverside, California 92521, United States
| | - Yana A. Lyon
- Department of Chemistry, University of California, Riverside, California 92521, United States
| | - Ryan R. Julian
- Department of Chemistry, University of California, Riverside, California 92521, United States
| |
Collapse
|
2
|
Cox J. Prediction of peptide mass spectral libraries with machine learning. Nat Biotechnol 2023; 41:33-43. [PMID: 36008611 DOI: 10.1038/s41587-022-01424-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 07/11/2022] [Indexed: 01/21/2023]
Abstract
The recent development of machine learning methods to identify peptides in complex mass spectrometric data constitutes a major breakthrough in proteomics. Longstanding methods for peptide identification, such as search engines and experimental spectral libraries, are being superseded by deep learning models that allow the fragmentation spectra of peptides to be predicted from their amino acid sequence. These new approaches, including recurrent neural networks and convolutional neural networks, use predicted in silico spectral libraries rather than experimental libraries to achieve higher sensitivity and/or specificity in the analysis of proteomics data. Machine learning is galvanizing applications that involve large search spaces, such as immunopeptidomics and proteogenomics. Current challenges in the field include the prediction of spectra for peptides with post-translational modifications and for cross-linked pairs of peptides. Permeation of machine-learning-based spectral prediction into search engines and spectrum-centric data-independent acquisition workflows for diverse peptide classes and measurement conditions will continue to push sensitivity and dynamic range in proteomics applications in the coming years.
Collapse
Affiliation(s)
- Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany.
- Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway.
| |
Collapse
|
3
|
Jürgens L, Wethmar K. The Emerging Role of uORF-Encoded uPeptides and HLA uLigands in Cellular and Tumor Biology. Cancers (Basel) 2022; 14:cancers14246031. [PMID: 36551517 PMCID: PMC9776223 DOI: 10.3390/cancers14246031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/29/2022] [Accepted: 11/30/2022] [Indexed: 12/13/2022] Open
Abstract
Recent technological advances have facilitated the detection of numerous non-canonical human peptides derived from regulatory regions of mRNAs, long non-coding RNAs, and other cryptic transcripts. In this review, we first give an overview of the classification of these novel peptides and summarize recent improvements in their annotation and detection by ribosome profiling, mass spectrometry, and individual experimental analysis. A large fraction of the novel peptides originates from translation at upstream open reading frames (uORFs) that are located within the transcript leader sequence of regular mRNA. In humans, uORF-encoded peptides (uPeptides) have been detected in both healthy and malignantly transformed cells and emerge as important regulators in cellular and immunological pathways. In the second part of the review, we focus on various functional implications of uPeptides. As uPeptides frequently act at the transition of translational regulation and individual peptide function, we describe the mechanistic modes of translational regulation through ribosome stalling, the involvement in cellular programs through protein interaction and complex formation, and their role within the human leukocyte antigen (HLA)-associated immunopeptidome as HLA uLigands. We delineate how malignant transformation may lead to the formation of novel uORFs, uPeptides, or HLA uLigands and explain their potential implication in tumor biology. Ultimately, we speculate on a potential use of uPeptides as peptide drugs and discuss how uPeptides and HLA uLigands may facilitate translational inhibition of oncogenic protein messages and immunotherapeutic approaches in cancer therapy.
Collapse
|
4
|
Characterization of core fucosylation via sequential enzymatic treatments of intact glycopeptides and mass spectrometry analysis. Nat Commun 2022; 13:3910. [PMID: 35798744 PMCID: PMC9262967 DOI: 10.1038/s41467-022-31472-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 06/16/2022] [Indexed: 01/14/2023] Open
Abstract
Core fucosylation of N-linked glycoproteins has been linked to the functions of glycoproteins in physiological and pathological processes. However, quantitative characterization of core fucosylation remains challenging due to the complexity and heterogeneity of N-linked glycosylation. Here we report a mass spectrometry-based method that employs sequential treatment of intact glycopeptides with enzymes (STAGE) to analyze site-specific core fucosylation of glycoproteins. The STAGE method utilizes Endo F3 followed by PNGase F treatment to generate mass signatures for glycosites that are formerly modified by core fucosylated N-linked glycans. We benchmark the STAGE method and use it to characterize site specific core fucosylation of glycoproteins from human hepatocellular carcinoma and pancreatic ductal adenocarcinoma, resulting in the identification of 1130 and 782 core fucosylated glycosites, respectively. These results indicate that our STAGE method enables quantitative characterization of core fucosylation events from complex protein mixtures, which may benefit our understanding of core fucosylation functions in various diseases.
Collapse
|
5
|
Lee H, Kim SI. Review of Liquid Chromatography-Mass Spectrometry-Based Proteomic Analyses of Body Fluids to Diagnose Infectious Diseases. Int J Mol Sci 2022; 23:ijms23042187. [PMID: 35216306 PMCID: PMC8878692 DOI: 10.3390/ijms23042187] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 02/11/2022] [Accepted: 02/14/2022] [Indexed: 01/27/2023] Open
Abstract
Rapid and precise diagnostic methods are required to control emerging infectious diseases effectively. Human body fluids are attractive clinical samples for discovering diagnostic targets because they reflect the clinical statuses of patients and most of them can be obtained with minimally invasive sampling processes. Body fluids are good reservoirs for infectious parasites, bacteria, and viruses. Therefore, recent clinical proteomics methods have focused on body fluids when aiming to discover human- or pathogen-originated diagnostic markers. Cutting-edge liquid chromatography-mass spectrometry (LC-MS)-based proteomics has been applied in this regard; it is considered one of the most sensitive and specific proteomics approaches. Here, the clinical characteristics of each body fluid, recent tandem mass spectroscopy (MS/MS) data-acquisition methods, and applications of body fluids for proteomics regarding infectious diseases (including the coronavirus disease of 2019 [COVID-19]), are summarized and discussed.
Collapse
Affiliation(s)
- Hayoung Lee
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute (KBSI), Ochang 28119, Korea;
- Bio-Analytical Science Division, University of Science and Technology (UST), Daejeon 34113, Korea
| | - Seung Il Kim
- Research Center for Bioconvergence Analysis, Korea Basic Science Institute (KBSI), Ochang 28119, Korea;
- Bio-Analytical Science Division, University of Science and Technology (UST), Daejeon 34113, Korea
- Correspondence:
| |
Collapse
|
6
|
Cao L, Huang C, Cui Zhou D, Hu Y, Lih TM, Savage SR, Krug K, Clark DJ, Schnaubelt M, Chen L, da Veiga Leprevost F, Eguez RV, Yang W, Pan J, Wen B, Dou Y, Jiang W, Liao Y, Shi Z, Terekhanova NV, Cao S, Lu RJH, Li Y, Liu R, Zhu H, Ronning P, Wu Y, Wyczalkowski MA, Easwaran H, Danilova L, Mer AS, Yoo S, Wang JM, Liu W, Haibe-Kains B, Thiagarajan M, Jewell SD, Hostetter G, Newton CJ, Li QK, Roehrl MH, Fenyö D, Wang P, Nesvizhskii AI, Mani DR, Omenn GS, Boja ES, Mesri M, Robles AI, Rodriguez H, Bathe OF, Chan DW, Hruban RH, Ding L, Zhang B, Zhang H. Proteogenomic characterization of pancreatic ductal adenocarcinoma. Cell 2021; 184:5031-5052.e26. [PMID: 34534465 PMCID: PMC8654574 DOI: 10.1016/j.cell.2021.08.023] [Citation(s) in RCA: 224] [Impact Index Per Article: 74.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 03/19/2021] [Accepted: 08/18/2021] [Indexed: 02/07/2023]
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer with poor patient survival. Toward understanding the underlying molecular alterations that drive PDAC oncogenesis, we conducted comprehensive proteogenomic analysis of 140 pancreatic cancers, 67 normal adjacent tissues, and 9 normal pancreatic ductal tissues. Proteomic, phosphoproteomic, and glycoproteomic analyses were used to characterize proteins and their modifications. In addition, whole-genome sequencing, whole-exome sequencing, methylation, RNA sequencing (RNA-seq), and microRNA sequencing (miRNA-seq) were performed on the same tissues to facilitate an integrated proteogenomic analysis and determine the impact of genomic alterations on protein expression, signaling pathways, and post-translational modifications. To ensure robust downstream analyses, tumor neoplastic cellularity was assessed via multiple orthogonal strategies using molecular features and verified via pathological estimation of tumor cellularity based on histological review. This integrated proteogenomic characterization of PDAC will serve as a valuable resource for the community, paving the way for early detection and identification of novel therapeutic targets.
Collapse
Affiliation(s)
- Liwei Cao
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Chen Huang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Daniel Cui Zhou
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Yingwei Hu
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - T Mamie Lih
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Sara R Savage
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Karsten Krug
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - David J Clark
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Michael Schnaubelt
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Lijun Chen
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | | | | | - Weiming Yang
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Jianbo Pan
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Bo Wen
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yongchao Dou
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Wen Jiang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yuxing Liao
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zhiao Shi
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Nadezhda V Terekhanova
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Rita Jui-Hsien Lu
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Yize Li
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Ruiyang Liu
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Houxiang Zhu
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Peter Ronning
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Yige Wu
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Matthew A Wyczalkowski
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Hariharan Easwaran
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Ludmila Danilova
- Department of Oncology, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Arvind Singh Mer
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON M5G 1L7, Canada
| | - Seungyeul Yoo
- Sema4, a Mount Sinai venture, Stamford, CT 06902, USA
| | - Joshua M Wang
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Wenke Liu
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON M5G 1L7, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada
| | - Mathangi Thiagarajan
- Leidos Biomedical Research Inc., Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Scott D Jewell
- Van Andel Research Institute, Grand Rapids, MI 49503, USA
| | | | | | - Qing Kay Li
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Michael H Roehrl
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - David Fenyö
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Pei Wang
- Sema4, a Mount Sinai venture, Stamford, CT 06902, USA
| | | | - D R Mani
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Emily S Boja
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Mehdi Mesri
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Ana I Robles
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Henry Rodriguez
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Oliver F Bathe
- Departments of Surgery and Oncology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Daniel W Chan
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA; The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Ralph H Hruban
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA; The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21287, USA; The Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 631110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA.
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
| | - Hui Zhang
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA; The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21287, USA.
| |
Collapse
|
7
|
Szabó D, Schlosser G, Vékey K, Drahos L, Révész Á. Collision energies on QTof and Orbitrap instruments: How to make proteomics measurements comparable? JOURNAL OF MASS SPECTROMETRY : JMS 2021; 56:e4693. [PMID: 33277714 DOI: 10.1002/jms.4693] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 11/23/2020] [Accepted: 11/24/2020] [Indexed: 06/12/2023]
Abstract
Quadrupole time-of-flight (QTof) collision-induced dissociation (CID) and Orbitrap higher-energy collisional dissociation (HCD) are the most commonly used fragmentation techniques in mass spectrometry-based proteomics workflows. The information content of the MS/MS spectra is first and foremost determined by the applied collision energy. How can we set up the two instrument types to achieve maximum transferability? To answer this question, we compared MS/MS spectra obtained on a Bruker QTof CID and a Thermo Q-Exactive Focus Orbitrap HCD instrument as a function of collision energy using the similarity index. Results show that with a few eV lower collision energy setting on HCD (Orbitrap-specific CID) than on QTof CID, nearly identical MS/MS spectra can be obtained for leucine enkephalin pentapeptide standard, for selected +2 and +3 enolase tryptic peptides and for a large number of peptides in a HeLa protein digest. The Bruker QTof was able to produce colder ions, which may be significant to study inherently labile compounds. Further, we examined energy dependence of peptide identification confidence, as characterized by Mascot scores, on the HeLa peptides. In line with earlier QTof results, this dependence shows one or two maxima (unimodal or bimodal behavior) on Orbitrap. The fraction of bimodal peptides is lower on Orbitrap. Optimal energies as a function of m/z show a similar linear trend on both instruments, which suggests that with appropriate collision energy adjustment, matching conditions for proteomics can be achieved. Data have been deposited in the MassIVE repository (MSV000086434).
Collapse
Affiliation(s)
- Dániel Szabó
- MS Proteomics Research Group, Research Centre for Natural Sciences, Magyar Tudósok körútja 2., Budapest, H-1117, Hungary
- Hevesy György PhD School of Chemistry, Eötvös Loránd University, Faculty of Science, Institute of Chemistry, Pázmány Péter sétány 1/A, Budapest, H-1117, Hungary
| | - Gitta Schlosser
- MTA-ELTE Lendület Ion Mobility Mass Spectrometry Research Group, Eötvös Loránd University, Faculty of Science, Institute of Chemistry, Pázmány Péter sétány 1/A, Budapest, H-1117, Hungary
| | - Károly Vékey
- MS Proteomics Research Group, Research Centre for Natural Sciences, Magyar Tudósok körútja 2., Budapest, H-1117, Hungary
| | - László Drahos
- MS Proteomics Research Group, Research Centre for Natural Sciences, Magyar Tudósok körútja 2., Budapest, H-1117, Hungary
| | - Ágnes Révész
- MS Proteomics Research Group, Research Centre for Natural Sciences, Magyar Tudósok körútja 2., Budapest, H-1117, Hungary
| |
Collapse
|
8
|
Hu Y, Pan J, Shah P, Ao M, Thomas SN, Liu Y, Chen L, Schnaubelt M, Clark DJ, Rodriguez H, Boja ES, Hiltke T, Kinsinger CR, Rodland KD, Li QK, Qian J, Zhang Z, Chan DW, Zhang H. Integrated Proteomic and Glycoproteomic Characterization of Human High-Grade Serous Ovarian Carcinoma. Cell Rep 2020; 33:108276. [PMID: 33086064 PMCID: PMC7970828 DOI: 10.1016/j.celrep.2020.108276] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 06/18/2020] [Accepted: 09/23/2020] [Indexed: 12/12/2022] Open
Abstract
Many gene products exhibit great structural heterogeneity because of an array of modifications. These modifications are not directly encoded in the genomic template but often affect the functionality of proteins. Protein glycosylation plays a vital role in proper protein functions. However, the analysis of glycoproteins has been challenging compared with other protein modifications, such as phosphorylation. Here, we perform an integrated proteomic and glycoproteomic analysis of 83 prospectively collected high-grade serous ovarian carcinoma (HGSC) and 23 non-tumor tissues. Integration of the expression data from global proteomics and glycoproteomics reveals tumor-specific glycosylation, uncovers different glycosylation associated with three tumor clusters, and identifies glycosylation enzymes that were correlated with the altered glycosylation. In addition to providing a valuable resource, these results provide insights into the potential roles of glycosylation in the pathogenesis of HGSC, with the possibility of distinguishing pathological outcomes of ovarian tumors from non-tumors, as well as classifying tumor clusters.
Collapse
Affiliation(s)
- Yingwei Hu
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA
| | - Jianbo Pan
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA
| | - Punit Shah
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA
| | - Minghui Ao
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA
| | - Stefani N Thomas
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA
| | - Yang Liu
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA
| | - Lijun Chen
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA
| | - Michael Schnaubelt
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA
| | - David J Clark
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA
| | - Henry Rodriguez
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Emily S Boja
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Tara Hiltke
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Christopher R Kinsinger
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Karin D Rodland
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Qing Kay Li
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA
| | - Jiang Qian
- Department of Ophthalmology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA
| | - Zhen Zhang
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA
| | - Daniel W Chan
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA.
| | - Hui Zhang
- Department of Pathology, Johns Hopkins University, School of Medicine, Baltimore, MD 21287, USA.
| |
Collapse
|
9
|
Liu K, Li S, Wang L, Ye Y, Tang H. Full-Spectrum Prediction of Peptides Tandem Mass Spectra using Deep Neural Network. Anal Chem 2020; 92:4275-4283. [PMID: 32053352 DOI: 10.1021/acs.analchem.9b04867] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The ability to predict tandem mass (MS/MS) spectra from peptide sequences can significantly enhance our understanding of the peptide fragmentation process and could improve peptide identification in proteomics. However, current approaches for predicting high-energy collisional dissociation (HCD) spectra are limited to predict the intensities of expected ion types, that is, the a/b/c/x/y/z ions and their neutral loss derivatives (referred to as backbone ions). In practice, backbone ions only account for <70% of total ion intensities in HCD spectra, indicating many intense ions are ignored by current predictors. In this paper, we present a deep learning approach that can predict the complete spectra (both backbone and nonbackbone ions) directly from peptide sequences. We made no assumptions or expectations on which kind of ions to predict but instead predicting the intensities for all possible m/z. Training this model needs no annotations of fragment ion nor any prior knowledge of the fragmentation rules. Our analyses show that the predicted 2+ and 3+ HCD spectra are highly similar to the experimental spectra, with average full-spectrum cosine similarities of 0.820 (±0.088) and 0.786 (±0.085), respectively, very close to the similarities between the experimental replicated spectra. In contrast, the best-performed backbone only models can only achieve an average similarity below 0.75 and 0.70 for 2+ and 3+ spectra, respectively. Furthermore, we developed a multitask learning (MTL) approach for predicting spectra of insufficient training samples, which allows our model to make accurate predictions for electron transfer dissociation (ETD) spectra and HCD spectra of less abundant charges (1+ and 4+).
Collapse
Affiliation(s)
- Kaiyuan Liu
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| | - Sujun Li
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| | - Lei Wang
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| | - Yuzhen Ye
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| | - Haixu Tang
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
10
|
Alkhalifah Y, Phillips I, Soltoggio A, Darnley K, Nailon WH, McLaren D, Eddleston M, Thomas CLP, Salman D. VOCCluster: Untargeted Metabolomics Feature Clustering Approach for Clinical Breath Gas Chromatography/Mass Spectrometry Data. Anal Chem 2020; 92:2937-2945. [PMID: 31791122 DOI: 10.1021/acs.analchem.9b03084] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Metabolic profiling of breath analysis involves processing, alignment, scaling, and clustering of thousands of features extracted from gas chromatography/mass spectrometry (GC/MS) data from hundreds of participants. The multistep data processing is complicated, operator error-prone, and time-consuming. Automated algorithmic clustering methods that are able to cluster features in a fast and reliable way are necessary. These accelerate metabolic profiling and discovery platforms for next-generation medical diagnostic tools. Our unsupervised clustering technique, VOCCluster, prototyped in Python, handles features of deconvolved GC/MS breath data. VOCCluster was created from a heuristic ontology based on the observation of experts undertaking data processing with a suite of software packages. VOCCluster identifies and clusters groups of volatile organic compounds (VOCs) from deconvolved GC/MS breath with similar mass spectra and retention index profiles. VOCCluster was used to cluster more than 15 000 features extracted from 74 GC/MS clinical breath samples obtained from participants with cancer before and after a radiation therapy. Results were evaluated against a panel of ground truth compounds and compared to other clustering methods (DBSCAN and OPTICS) that were used in previous metabolomics studies. VOCCluster was able to cluster those features into 1081 groups (including endogenous and exogenous compounds and instrumental artifacts) with an accuracy rate of 96% (±0.04 at 95% confidence interval).
Collapse
Affiliation(s)
| | | | | | - Kareen Darnley
- Edinburgh Cancer Centre , NHS Lothian , Edinburgh EH4 2SP , U.K
| | | | - Duncan McLaren
- Edinburgh Cancer Centre , NHS Lothian , Edinburgh EH4 2SP , U.K
| | - Michael Eddleston
- Pharmacology, Toxicology and Therapeutics Unit , University of Edinburgh , Edinburgh EH8 9YL , U.K
| | | | | |
Collapse
|
11
|
Hu Y, Shah P, Clark DJ, Ao M, Zhang H. Reanalysis of Global Proteomic and Phosphoproteomic Data Identified a Large Number of Glycopeptides. Anal Chem 2018; 90:8065-8071. [PMID: 29741879 PMCID: PMC6440470 DOI: 10.1021/acs.analchem.8b01137] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Protein glycosylation plays fundamental roles in many cellular processes, and previous reports have shown dysregulation to be associated with several human diseases, including diabetes, cancer, and neurodegenerative disorders. Despite the vital role of glycosylation for proper protein function, the analysis of glycoproteins has been lagged behind to other protein modifications. In this study, we describe the reanalysis of global proteomic data from breast cancer xenograft tissues using recently developed software package GPQuest 2.0, revealing a large number of previously unidentified N-linked glycopeptides. More importantly, we found that using immobilized metal affinity chromatography (IMAC) technology for the enrichment of phosphopeptides had coenriched a substantial number of sialoglycopeptides, allowing for a large-scale analysis of sialoglycopeptides in conjunction with the analysis of phosphopeptides. Collectively, combined tandem mass spectrometry (MS/MS) analyses of global proteomic and phosphoproteomic data sets resulted in the identification of 6 724 N-linked glycopeptides from 617 glycoproteins derived from two breast cancer xenograft tissues. Next, we utilized GPQuest 2.0 for the reanalysis of global and phosphoproteomic data generated from 108 human breast cancer tissues that were previously analyzed by Clinical Proteomic Analysis Consortium (CPTAC). Reanalysis of the CPTAC data set resulted in the identification of 2 683 glycopeptides from the global proteomic data set and 4 554 glycopeptides from phosphoproteomic data set, respectively. Together, 11 292 N-linked glycopeptides corresponding to 1 731 N-linked glycosites from 883 human glycoproteins were identified from the two data sets. This analysis revealed an extensive number of glycopeptides hidden in the global and enriched in IMAC-based phosphopeptide-enriched proteomic data, information which would have remained unknown from the original study otherwise. The reanalysis described herein can be readily applied to identify glycopeptides from already existing data sets, providing insight into many important facets of protein glycosylation in different biological, physiological, and pathological processes.
Collapse
Affiliation(s)
- Yingwei Hu
- Department of Pathology, Johns Hopkins University, Baltimore, Maryland 21287, United States
| | - Punit Shah
- Department of Pathology, Johns Hopkins University, Baltimore, Maryland 21287, United States
| | - David J. Clark
- Department of Pathology, Johns Hopkins University, Baltimore, Maryland 21287, United States
| | - Minghui Ao
- Department of Pathology, Johns Hopkins University, Baltimore, Maryland 21287, United States
| | - Hui Zhang
- Department of Pathology, Johns Hopkins University, Baltimore, Maryland 21287, United States
| |
Collapse
|
12
|
Depke T, Franke R, Brönstrup M. Clustering of MS2 spectra using unsupervised methods to aid the identification of secondary metabolites from Pseudomonas aeruginosa. J Chromatogr B Analyt Technol Biomed Life Sci 2017. [DOI: 10.1016/j.jchromb.2017.06.002] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
13
|
Van Berkel GJ, Kertesz V. Rapid sample classification using an open port sampling interface coupled with liquid introduction atmospheric pressure ionization mass spectrometry. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2017; 31:281-291. [PMID: 27862458 DOI: 10.1002/rcm.7792] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Revised: 11/08/2016] [Accepted: 11/09/2016] [Indexed: 06/06/2023]
Abstract
RATIONALE An "Open Access"-like mass spectrometric platform to fully utilize the simplicity of the manual open port sampling interface for rapid characterization of unprocessed samples by liquid introduction atmospheric pressure ionization mass spectrometry has been lacking. The in-house developed integrated software with a simple, small and relatively low-cost mass spectrometry system introduced here fills this void. METHODS Software was developed to operate the mass spectrometer, to collect and process mass spectrometric data files, to build a database and to classify samples using such a database. These tasks were accomplished via the vendor-provided software libraries. Sample classification based on spectral comparison utilized the spectral contrast angle method. RESULTS Using the developed software platform near real-time sample classification is exemplified using a series of commercially available blue ink rollerball pens and vegetable oils. In the case of the inks, full scan positive and negative ion ESI mass spectra were both used for database generation and sample classification. For the vegetable oils, full scan positive ion mode APCI mass spectra were recorded. The overall accuracy of the employed spectral contrast angle statistical model was 95.3% and 98% in case of the inks and oils, respectively, using leave-one-out cross-validation. CONCLUSIONS This work illustrates that an open port sampling interface/mass spectrometer combination, with appropriate instrument control and data processing software, is a viable direct liquid extraction sampling and analysis system suitable for the non-expert user and near real-time sample classification via database matching. Published in 2016. This article is a U.S. Government work and is in the public domain in the USA.
Collapse
Affiliation(s)
- Gary J Van Berkel
- Mass Spectrometry and Laser Spectroscopy Group, Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831-6131, USA
| | - Vilmos Kertesz
- Mass Spectrometry and Laser Spectroscopy Group, Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831-6131, USA
| |
Collapse
|
14
|
Abstract
Scoring functions that assess spectrum similarity play a crucial role in many computational mass spectrometry algorithms. These functions are used to compare an experimentally acquired fragmentation (MS/MS) spectrum against two different types of target MS/MS spectra: either against a theoretical MS/MS spectrum derived from a peptide from a sequence database, or against another, previously acquired MS/MS spectrum. The former is typically encountered in database searching, while the latter is used in spectrum clustering and spectral library searching. The comparison between acquired versus theoretical MS/MS spectra is most commonly performed using cross-correlations or probability derived scoring functions, while the comparison of two acquired MS/MS spectra typically makes use of a normalized dot product, especially in spectrum library search algorithms. In addition to these scoring functions, Pearson's or Spearman's correlation coefficients, mean squared error, or median absolute deviation scores can also be used for the same purpose. Here, we describe and evaluate these scoring functions with regards to their ability to assess spectrum similarity for theoretical versus acquired, and acquired versus acquired spectra.
Collapse
Affiliation(s)
- Şule Yilmaz
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium
| | - Elien Vandermarliere
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium
| | - Lennart Martens
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium.
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium.
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium.
| |
Collapse
|
15
|
Ischenko D, Alexeev D, Shitikov E, Kanygina A, Malakhova M, Kostryukova E, Larin A, Kovalchuk S, Pobeguts O, Butenko I, Anikanov N, Altukhov I, Ilina E, Govorun V. Large scale analysis of amino acid substitutions in bacterial proteomics. BMC Bioinformatics 2016; 17:450. [PMID: 27821049 PMCID: PMC5100282 DOI: 10.1186/s12859-016-1301-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 10/21/2016] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Proteomics of bacterial pathogens is a developing field exploring microbial physiology, gene expression and the complex interactions between bacteria and their hosts. One of the complications in proteomic approach is micro- and macro-heterogeneity of bacterial species, which makes it impossible to build a comprehensive database of bacterial genomes for identification, while most of the existing algorithms rely largely on genomic data. RESULTS Here we present a large scale study of identification of single amino acid polymorphisms between bacterial strains. An ad hoc method was developed based on MS/MS spectra comparison without the support of a genomic database. Whole-genome sequencing was used to validate the accuracy of polymorphism detection. Several approaches presented earlier to the proteomics community as useful for polymorphism detection were tested on isolates of Helicobacter pylori, Neisseria gonorrhoeae and Escherichia coli. CONCLUSION The developed method represents a perspective approach in the field of bacterial proteomics allowing to identify hundreds of peptides with novel SAPs from a single proteome.
Collapse
Affiliation(s)
- Dmitry Ischenko
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation.
- Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russian Federation.
| | - Dmitry Alexeev
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
- Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russian Federation
| | - Egor Shitikov
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Alexandra Kanygina
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
- Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russian Federation
| | - Maja Malakhova
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Elena Kostryukova
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Andrey Larin
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Sergey Kovalchuk
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Olga Pobeguts
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Ivan Butenko
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Nikolay Anikanov
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Ilya Altukhov
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
- Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russian Federation
| | - Elena Ilina
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Vadim Govorun
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| |
Collapse
|
16
|
Bazsó FL, Ozohanics O, Schlosser G, Ludányi K, Vékey K, Drahos L. Quantitative Comparison of Tandem Mass Spectra Obtained on Various Instruments. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2016; 27:1357-1365. [PMID: 27206510 DOI: 10.1007/s13361-016-1408-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Revised: 04/08/2016] [Accepted: 04/13/2016] [Indexed: 06/05/2023]
Abstract
The similarity between two tandem mass spectra, which were measured on different instruments, was compared quantitatively using the similarity index (SI), defined as the dot product of the square root of peak intensities in the respective spectra. This function was found to be useful for comparing energy-dependent tandem mass spectra obtained on various instruments. Spectral comparisons show the similarity index in a 2D "heat map", indicating which collision energy combinations result in similar spectra, and how good this agreement is. The results and methodology can be used in the pharma industry to design experiments and equipment well suited for good reproducibility. We suggest that to get good long-term reproducibility, it is best to adjust the collision energy to yield a spectrum very similar to a reference spectrum. It is likely to yield better results than using the same tuning file, which, for example, does not take into account that contamination of the ion source due to extended use may influence instrument tuning. The methodology may be used to characterize energy dependence on various instrument types, to optimize instrumentation, and to study the influence or correlation between various experimental parameters. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Fanni Laura Bazsó
- MS Proteomics Research Group, Research Center for Natural Sciences, Hungarian Academy of Sciences, H-1117, Magyar tudósok krt. 2, Budapest, Hungary
| | - Oliver Ozohanics
- MTA-TTK NAP B MS Neuroproteomics Research Group, Research Center for Natural Sciences, Hungarian Academy of Sciences, H-1117, Magyar tudósok krt. 2, Budapest, Hungary
| | - Gitta Schlosser
- MTA-ELTE Research Group of Peptide Chemistry, Hungarian Academy of Sciences, Eötvös Loránd University, 1117, Budapest, Hungary
| | - Krisztina Ludányi
- Department of Pharmaceutics, Semmelweis University, Hőgyes E. Street 7-9, H-1092, Budapest, Hungary
| | - Károly Vékey
- MS Proteomics Research Group, Research Center for Natural Sciences, Hungarian Academy of Sciences, H-1117, Magyar tudósok krt. 2, Budapest, Hungary
- Core Technologies Center, Research Center for Natural Sciences, Hungarian Academy of Sciences, H-1117, Magyar tudosok krt. 2, Budapest, Hungary
| | - László Drahos
- MS Proteomics Research Group, Research Center for Natural Sciences, Hungarian Academy of Sciences, H-1117, Magyar tudósok krt. 2, Budapest, Hungary.
- MTA-TTK NAP B MS Neuroproteomics Research Group, Research Center for Natural Sciences, Hungarian Academy of Sciences, H-1117, Magyar tudósok krt. 2, Budapest, Hungary.
| |
Collapse
|
17
|
Griss J. Spectral library searching in proteomics. Proteomics 2016; 16:729-40. [PMID: 26616598 DOI: 10.1002/pmic.201500296] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 10/15/2015] [Accepted: 10/29/2015] [Indexed: 12/12/2022]
Abstract
Spectral library searching has become a mature method to identify tandem mass spectra in proteomics data analysis. This review provides a comprehensive overview of available spectral library search engines and highlights their distinct features. Additionally, resources providing spectral libraries are summarized and tools presented that extend experimental spectral libraries by simulating spectra. Finally, spectrum clustering algorithms are discussed that utilize the same spectrum-to-spectrum matching algorithms as spectral library search engines and allow novel methods to analyse proteomics data.
Collapse
Affiliation(s)
- Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
18
|
Egert B, Weinert CH, Kulling SE. A peaklet-based generic strategy for the untargeted analysis of comprehensive two-dimensional gas chromatography mass spectrometry data sets. J Chromatogr A 2015; 1405:168-77. [PMID: 26074098 DOI: 10.1016/j.chroma.2015.05.056] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2015] [Revised: 05/26/2015] [Accepted: 05/27/2015] [Indexed: 12/17/2022]
Abstract
Comprehensive two-dimensional gas chromatography mass spectrometry (GC×GC-MS) is a well-established key technology in analytical chemistry and increasingly used in the field of untargeted metabolomics. However, automated processing of large GC×GC-MS data sets is still a major bottleneck in untargeted, large-scale metabolomics. For this reason we introduce a novel peaklet-based alignment strategy. The algorithm is capable of an untargeted deterministic alignment exploiting a density based clustering procedure within a time constrained similarity matrix. Exploiting minimal (1)D and (2)D retention time shifts between peak modulations, the alignment is done without the need for peak merging which also eliminates the need for linear or nonlinear retention time correction procedures. The approach is validated in detail using data of urine samples from a large human metabolomics study. The data was acquired by a Shimadzu GCMS-QP2010 Ultra GC×GC-qMS system and consists of 512 runs, including 312 study samples and 178 quality control sample injections, measured within a time period of 22 days. The final result table consisted of 313 analytes, each of these being detectable in at least 75% of the study samples. In summary, we present an automated, reliable and fully transparent workflow for the analysis of large GC×GC-qMS metabolomics data sets.
Collapse
Affiliation(s)
- Björn Egert
- Max Rubner-Institut, Department of Safety and Quality of Fruit and Vegetables, Haid-und-Neu-Straße 9, 76131, Karlsruhe, Germany.
| | - Christoph H Weinert
- Max Rubner-Institut, Department of Safety and Quality of Fruit and Vegetables, Haid-und-Neu-Straße 9, 76131, Karlsruhe, Germany
| | - Sabine E Kulling
- Max Rubner-Institut, Department of Safety and Quality of Fruit and Vegetables, Haid-und-Neu-Straße 9, 76131, Karlsruhe, Germany
| |
Collapse
|
19
|
Butt AQ, McArdle A, Gibson DS, FitzGerald O, Pennington SR. Psoriatic Arthritis Under a Proteomic Spotlight: Application of Novel Technologies to Advance Diagnosis and Management. Curr Rheumatol Rep 2015; 17:35. [DOI: 10.1007/s11926-015-0509-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
20
|
Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat Protoc 2015; 10:426-41. [PMID: 25675208 DOI: 10.1038/nprot.2015.015] [Citation(s) in RCA: 220] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Targeted proteomics by selected/multiple reaction monitoring (S/MRM) or, on a larger scale, by SWATH (sequential window acquisition of all theoretical spectra) MS (mass spectrometry) typically relies on spectral reference libraries for peptide identification. Quality and coverage of these libraries are therefore of crucial importance for the performance of the methods. Here we present a detailed protocol that has been successfully used to build high-quality, extensive reference libraries supporting targeted proteomics by SWATH MS. We describe each step of the process, including data acquisition by discovery proteomics, assertion of peptide-spectrum matches (PSMs), generation of consensus spectra and compilation of MS coordinates that uniquely define each targeted peptide. Crucial steps such as false discovery rate (FDR) control, retention time normalization and handling of post-translationally modified peptides are detailed. Finally, we show how to use the library to extract SWATH data with the open-source software Skyline. The protocol takes 2-3 d to complete, depending on the extent of the library and the computational resources available.
Collapse
|
21
|
Toprak UH, Gillet LC, Maiolica A, Navarro P, Leitner A, Aebersold R. Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Mol Cell Proteomics 2014; 13:2056-71. [PMID: 24623587 PMCID: PMC4125737 DOI: 10.1074/mcp.o113.036475] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Revised: 02/26/2014] [Indexed: 12/21/2022] Open
Abstract
Quantifying the similarity of spectra is an important task in various areas of spectroscopy, for example, to identify a compound by comparing sample spectra to those of reference standards. In mass spectrometry based discovery proteomics, spectral comparisons are used to infer the amino acid sequence of peptides. In targeted proteomics by selected reaction monitoring (SRM) or SWATH MS, predetermined sets of fragment ion signals integrated over chromatographic time are used to identify target peptides in complex samples. In both cases, confidence in peptide identification is directly related to the quality of spectral matches. In this study, we used sets of simulated spectra of well-controlled dissimilarity to benchmark different spectral comparison measures and to develop a robust scoring scheme that quantifies the similarity of fragment ion spectra. We applied the normalized spectral contrast angle score to quantify the similarity of spectra to objectively assess fragment ion variability of tandem mass spectrometric datasets, to evaluate portability of peptide fragment ion spectra for targeted mass spectrometry across different types of mass spectrometers and to discriminate target assays from decoys in targeted proteomics. Altogether, this study validates the use of the normalized spectral contrast angle as a sensitive spectral similarity measure for targeted proteomics, and more generally provides a methodology to assess the performance of spectral comparisons and to support the rational selection of the most appropriate similarity measure. The algorithms used in this study are made publicly available as an open source toolset with a graphical user interface.
Collapse
Affiliation(s)
- Umut H Toprak
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Ludovic C Gillet
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Alessio Maiolica
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Pedro Navarro
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Alexander Leitner
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Ruedi Aebersold
- From the ‡Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, Switzerland; §Faculty of Science, University of Zurich, Zurich, 8093 Zurich, Switzerland
| |
Collapse
|
22
|
Buts K, Michielssens S, Hertog MLATM, Hayakawa E, Cordewener J, America AHP, Nicolai BM, Carpentier SC. Improving the identification rate of data independent label-free quantitative proteomics experiments on non-model crops: a case study on apple fruit. J Proteomics 2014; 105:31-45. [PMID: 24565695 DOI: 10.1016/j.jprot.2014.02.015] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Revised: 01/23/2014] [Accepted: 02/14/2014] [Indexed: 11/28/2022]
Abstract
UNLABELLED Complex peptide extracts from non-model crops are troublesome for proper identification and quantification. To increase the identification rate of label free DIA experiments of Braeburn apple a new workflow was developed where a DDA database was constructed and linked to the DIA data. At a first level, parent masses found in DIA were searched in the DDA database based on their mass to charge ratio and retention time; at a second level, masses of fragmentation ions were compared for each of the linked spectrum. Following this workflow, a tenfold increase of peptides was identified from a single DIA run. As proof of principle, the designed workflow was applied to determine the changes during a storage experiment, achieving a two-fold identification increase in the number of significant peptides. The corresponding protein families were divided into nine clusters, representing different time profiles of changes in abundances during storage. Up-regulated protein families already show a glimpse of important pathways affecting aging during long-term storage, such as ethylene synthesis, and responses to abiotic stresses and their influence on the central metabolism. BIOLOGICAL SIGNIFICANCE Proteomics research on non-model crops causes additional difficulties in identifying the peptides present in, often complex, samples. This work proposes a new workflow to retrieve more identifications from a set of quantitative data, based on linking DIA and DDA data at two consecutive levels. As proof of principle, a storage experiment on Braeburn apple resulted in twice as much identified storage related peptides. Important proteins involved in central metabolism and stress are significantly up-regulated after long term storage. This article is part of a Special Issue entitled: Proteomics of non-model organisms.
Collapse
Affiliation(s)
- Kim Buts
- BIOSYST-MeBioS, KU Leuven, Belgium.
| | - Servaas Michielssens
- Quantum Chemistry and Physical Chemistry Section, KU Leuven, Belgium; Computational Biomolecular Dynamics Group, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | | | - Eisuke Hayakawa
- Research Group of Functional Genomics and Proteomics, KU Leuven, Belgium
| | | | | | - Bart M Nicolai
- BIOSYST-MeBioS, KU Leuven, Belgium; Flanders Centre of Postharvest Technology, Leuven, Belgium
| | | |
Collapse
|
23
|
Kim S, Zhang X. Comparative analysis of mass spectral similarity measures on peak alignment for comprehensive two-dimensional gas chromatography mass spectrometry. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:509761. [PMID: 24151524 PMCID: PMC3787630 DOI: 10.1155/2013/509761] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 07/25/2013] [Accepted: 08/07/2013] [Indexed: 12/14/2022]
Abstract
Peak alignment is a critical procedure in mass spectrometry-based biomarker discovery in metabolomics. One of peak alignment approaches to comprehensive two-dimensional gas chromatography mass spectrometry (GC×GC-MS) data is peak matching-based alignment. A key to the peak matching-based alignment is the calculation of mass spectral similarity scores. Various mass spectral similarity measures have been developed mainly for compound identification, but the effect of these spectral similarity measures on the performance of peak matching-based alignment still remains unknown. Therefore, we selected five mass spectral similarity measures, cosine correlation, Pearson's correlation, Spearman's correlation, partial correlation, and part correlation, and examined their effects on peak alignment using two sets of experimental GC×GC-MS data. The results show that the spectral similarity measure does not affect the alignment accuracy significantly in analysis of data from less complex samples, while the partial correlation performs much better than other spectral similarity measures when analyzing experimental data acquired from complex biological samples.
Collapse
Affiliation(s)
- Seongho Kim
- Biostatistics Core, Karmanos Cancer Institute, Wayne State University, Detroit, MI 48201, USA
| | - Xiang Zhang
- Department of Chemistry, University of Louisville, Louisville, KY 40292, USA
| |
Collapse
|
24
|
Smith R, Anthonymuthu TS, Ventura D, Prince JT. Statistical agglomeration: peak summarization for direct infusion lipidomics. ACTA ACUST UNITED AC 2013; 29:2445-51. [PMID: 23825371 DOI: 10.1093/bioinformatics/btt376] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Quantification of lipids is a primary goal in lipidomics. In direct infusion/injection (or shotgun) lipidomics, accurate downstream identification and quantitation requires accurate summarization of repetitive peak measurements. Imprecise peak summarization multiplies downstream error by propagating into species identification and intensity estimation. To our knowledge, this is the first analysis of direct infusion peak summarization in the literature. RESULTS We present two novel peak summarization algorithms for direct infusion samples and compare them with an off-machine ad hoc summarization algorithm as well as with the propriety Xcalibur algorithm. Our statistical agglomeration algorithm reduces peakwise error by 38% mass/charge (m/z) and 44% (intensity) compared with the ad hoc method over three datasets. Pointwise error is reduced by 23% (m/z). Compared with Xcalibur, our statistical agglomeration algorithm produces 68% less m/z error and 51% less intensity error on average on two comparable datasets. AVAILABILITY The source code for Statistical Agglomeration and the datasets used are freely available for non-commercial purposes at https://github.com/optimusmoose/statistical_agglomeration. Modified Bin Aggolmeration is freely available in MSpire, an open source mass spectrometry package at https://github.com/princelab/mspire/.
Collapse
Affiliation(s)
- Rob Smith
- Department of Computer Science and Department of Chemistry, Brigham Young University, Provo, UT 84602, USA
| | | | | | | |
Collapse
|
25
|
Cheng CY, Tsai CF, Chen YJ, Sung TY, Hsu WL. Spectrum-based Method to Generate Good Decoy Libraries for Spectral Library Searching in Peptide Identifications. J Proteome Res 2013; 12:2305-10. [DOI: 10.1021/pr301039b] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Chia-Ying Cheng
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - Chia-Feng Tsai
- Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan
- Department of Chemistry, National Taiwan University, Taipei 106, Taiwan
| | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei 115, Taiwan
- Department of Chemistry, National Taiwan University, Taipei 106, Taiwan
| | - Ting-Yi Sung
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan
| |
Collapse
|
26
|
Darville LNF, Merchant ME, Maccha V, Siddavarapu VR, Hasan A, Murray KK. Isolation and determination of the primary structure of a lectin protein from the serum of the American alligator (Alligator mississippiensis). Comp Biochem Physiol B Biochem Mol Biol 2011; 161:161-9. [PMID: 22085437 DOI: 10.1016/j.cbpb.2011.11.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Revised: 11/01/2011] [Accepted: 11/02/2011] [Indexed: 10/15/2022]
Abstract
Mass spectrometry in conjunction with de novo sequencing was used to determine the amino acid sequence of a 35kDa lectin protein isolated from the serum of the American alligator that exhibits binding to mannose. The protein N-terminal sequence was determined using Edman degradation and enzymatic digestion with different proteases was used to generate peptide fragments for analysis by liquid chromatography tandem mass spectrometry (LC MS/MS). Separate analysis of the protein digests with multiple enzymes enhanced the protein sequence coverage. De novo sequencing was accomplished using MASCOT Distiller and PEAKS software and the sequences were searched against the NCBI database using MASCOT and BLAST to identify homologous peptides. MS analysis of the intact protein indicated that it is present primarily as monomer and dimer in vitro. The isolated 35kDa protein was ~98% sequenced and found to have 313 amino acids and nine cysteine residues and was identified as an alligator lectin. The alligator lectin sequence was aligned with other lectin sequences using DIALIGN and ClustalW software and was found to exhibit 58% and 59% similarity to both human and mouse intelectin-1. The alligator lectin exhibited strong binding affinities toward mannan and mannose as compared to other tested carbohydrates.
Collapse
Affiliation(s)
- Lancia N F Darville
- Department of Chemistry, Louisiana State University, Baton Rouge, LA 70803, USA
| | | | | | | | | | | |
Collapse
|
27
|
Bao KD, Letellier A, Beaudry F. Analysis of Staphylococcus enterotoxin B using differential isotopic tags and liquid chromatography quadrupole ion trap mass spectrometry. Biomed Chromatogr 2011; 26:1049-57. [PMID: 22102423 DOI: 10.1002/bmc.1742] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Accepted: 09/14/2011] [Indexed: 12/21/2022]
Abstract
Staphylococcus aureus produces enterotoxins, which are causative agents of foodborne intoxications. Enterotoxins are single-chain polypeptides and have a molecular weight of about 26-28 kDa. The consumption of food contaminated with Staphylococcus aureus enterotoxins results in the onset of acute gastroenteritis within 2-6 h. The objective of this study was the development of a new method for the quantification of Staphylococcal enterotoxin B (SEB) in food matrices. Tryptic peptide map was generated and nine proteolytic fragments were clearly identified (sequence coverage of 35%). Among these, three specific tryptic peptides were selected to be used as surrogate peptides and internal standards for quantitative analysis using an isotopic tagging strategy along with analysis by LC-MS/MS. The linearity of the measurement by LC-MS/MS was evaluated by combining mixtures of both isotopes at 0.1, 0.2, 0.5, 1.0 and 2.0 ¹H/²H molar ratios with a slope near to 1, values of R² above 0.98 and %CV obtained from six repeated measurement was below 8%. The precision and accuracy of the method were assessed using SEB spiked in chicken meat homogenate samples. SEB was fortified at 0.2, 1 and 2 pmol/g. The accuracy results indicated that the method can provide accuracy within a 84.9-91.1% range. Overall, the results presented in this manuscript show that proteomics-based methods can be effectively used to detect, confirm and quantify SEB in food matrices.
Collapse
Affiliation(s)
- Khanh Dang Bao
- Département de Biomédecine Vétérinaire, Faculté de Médecine Vétérinaire, Université de Montréal, Saint-Hyacinthe, Québec, Canada
| | | | | |
Collapse
|
28
|
Ahrné E, Ohta Y, Nikitin F, Scherl A, Lisacek F, Müller M. An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates. Proteomics 2011; 11:4085-95. [DOI: 10.1002/pmic.201000665] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Revised: 07/13/2011] [Accepted: 07/29/2011] [Indexed: 11/06/2022]
|
29
|
Kim S, Koo I, Fang A, Zhang X. Smith-Waterman peak alignment for comprehensive two-dimensional gas chromatography-mass spectrometry. BMC Bioinformatics 2011; 12:235. [PMID: 21676240 PMCID: PMC3133553 DOI: 10.1186/1471-2105-12-235] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2011] [Accepted: 06/15/2011] [Indexed: 11/30/2022] Open
Abstract
Background Comprehensive two-dimensional gas chromatography coupled with mass spectrometry (GC × GC-MS) is a powerful technique which has gained increasing attention over the last two decades. The GC × GC-MS provides much increased separation capacity, chemical selectivity and sensitivity for complex sample analysis and brings more accurate information about compound retention times and mass spectra. Despite these advantages, the retention times of the resolved peaks on the two-dimensional gas chromatographic columns are always shifted due to experimental variations, introducing difficulty in the data processing for metabolomics analysis. Therefore, the retention time variation must be adjusted in order to compare multiple metabolic profiles obtained from different conditions. Results We developed novel peak alignment algorithms for both homogeneous (acquired under the identical experimental conditions) and heterogeneous (acquired under the different experimental conditions) GC × GC-MS data using modified Smith-Waterman local alignment algorithms along with mass spectral similarity. Compared with literature reported algorithms, the proposed algorithms eliminated the detection of landmark peaks and the usage of retention time transformation. Furthermore, an automated peak alignment software package was established by implementing a likelihood function for optimal peak alignment. Conclusions The proposed Smith-Waterman local alignment-based algorithms are capable of aligning both the homogeneous and heterogeneous data of multiple GC × GC-MS experiments without the transformation of retention times and the selection of landmark peaks. An optimal version of the SW-based algorithms was also established based on the associated likelihood function for the automatic peak alignment. The proposed alignment algorithms outperform the literature reported alignment method by analyzing the experiment data of a mixture of compound standards and a metabolite extract of mouse plasma with spiked-in compound standards.
Collapse
Affiliation(s)
- Seongho Kim
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40292, USA.
| | | | | | | |
Collapse
|
30
|
Frank AM, Monroe ME, Shah AR, Carver JJ, Bandeira N, Moore RJ, Anderson GA, Smith RD, Pevzner PA. Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra. Nat Methods 2011; 8:587-91. [PMID: 21572408 PMCID: PMC3128193 DOI: 10.1038/nmeth.1609] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2010] [Accepted: 04/13/2011] [Indexed: 11/09/2022]
Abstract
Tandem mass spectrometry (MS/MS) experiments yield multiple, nearly identical spectra of the same peptide in various laboratories, but proteomics researchers typically do not leverage the unidentified spectra produced in other labs to decode spectra they generate. We propose a spectral archives approach that clusters MS/MS datasets, representing similar spectra by a single consensus spectrum. Spectral archives extend spectral libraries by analyzing both identified and unidentified spectra in the same way and maintaining information about peptide spectra that are common across species and conditions. Thus archives offer both traditional library spectrum similarity-based search capabilities along with new ways to analyze the data. By developing a clustering tool, MS-Cluster, we generated a spectral archive from ∼1.18 billion spectra that greatly exceeds the size of existing spectral repositories. We advocate that publicly available data should be organized into spectral archives rather than be analyzed as disparate datasets, as is mostly the case today.
Collapse
Affiliation(s)
- Ari M Frank
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Baumgardner LA, Shanmugam AK, Lam H, Eng JK, Martin DB. Fast parallel tandem mass spectral library searching using GPU hardware acceleration. J Proteome Res 2011; 10:2882-8. [PMID: 21545112 DOI: 10.1021/pr200074h] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate-limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or sequence database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper, we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching), is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA, which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment.
Collapse
|
32
|
Kim S, Fang A, Wang B, Jeong J, Zhang X. An optimal peak alignment for comprehensive two-dimensional gas chromatography mass spectrometry using mixture similarity measure. ACTA ACUST UNITED AC 2011; 27:1660-6. [PMID: 21493650 DOI: 10.1093/bioinformatics/btr188] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Comprehensive two-dimensional gas chromatography mass spectrometry (GC × GC-MS) brings much increased separation capacity, chemical selectivity and sensitivity for metabolomics and provides more accurate information about metabolite retention times and mass spectra. However, there is always a shift of retention times in the two columns that makes it difficult to compare metabolic profiles obtained from multiple samples exposed to different experimental conditions. RESULTS The existing peak alignment algorithms for GC × GC-MS data use the peak distance and the spectra similarity sequentially and require predefined either distance-based window and/or spectral similarity-based window. To overcome the limitations of the current alignment methods, we developed an optimal peak alignment using a novel mixture similarity by employing the peak distance and the spectral similarity measures simultaneously without any variation windows. In addition, we examined the effect of the four different distance measures such as Euclidean, Maximum, Manhattan and Canberra distances on the peak alignment. The performance of our proposed peak alignment algorithm was compared with the existing alignment methods on the two sets of GC × GC-MS data. Our analysis showed that Canberra distance performed better than other distances and the proposed mixture similarity peak alignment algorithm prevailed against all literature reported methods. AVAILABILITY The data and software mSPA are available at http://stage.louisville.edu/faculty/x0zhan17/software/software-development.
Collapse
Affiliation(s)
- Seongho Kim
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40292, USA.
| | | | | | | | | |
Collapse
|
33
|
Herzog R, Schwudke D, Schuhmann K, Sampaio JL, Bornstein SR, Schroeder M, Shevchenko A. A novel informatics concept for high-throughput shotgun lipidomics based on the molecular fragmentation query language. Genome Biol 2011; 12:R8. [PMID: 21247462 PMCID: PMC3091306 DOI: 10.1186/gb-2011-12-1-r8] [Citation(s) in RCA: 286] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2010] [Revised: 01/04/2011] [Accepted: 01/19/2011] [Indexed: 01/27/2023] Open
Abstract
Shotgun lipidome profiling relies on direct mass spectrometric analysis of total lipid extracts from cells, tissues or organisms and is a powerful tool to elucidate the molecular composition of lipidomes. We present a novel informatics concept of the molecular fragmentation query language implemented within the LipidXplorer open source software kit that supports accurate quantification of individual species of any ionizable lipid class in shotgun spectra acquired on any mass spectrometry platform.
Collapse
Affiliation(s)
- Ronny Herzog
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | | | | | | | | | | | | |
Collapse
|
34
|
Abstract
The diverse fields of Omics research share a common logical structure combining a cataloging effort for a particular class of molecules or interactions, the underlying -ome, and a quantitative aspect attempting to record spatiotemporal patterns of concentration, expression, or variation. Consequently, these fields also share a common set of difficulties and limitations. In spite of the great success stories of Omics projects over the last decade, much remains to be understood not only at the technological, but also at the conceptual level. Here, we focus on the dark corners of Omics research, where the problems, limitations, conceptual difficulties, and lack of knowledge are hidden.
Collapse
Affiliation(s)
- Sonja J Prohaska
- Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| | | |
Collapse
|
35
|
Li S, Arnold RJ, Tang H, Radivojac P. On the accuracy and limits of peptide fragmentation spectrum prediction. Anal Chem 2010; 83:790-6. [PMID: 21175207 DOI: 10.1021/ac102272r] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We estimated the reproducibility of tandem mass spectra for the widely used collision-induced dissociation (CID) of peptide ions. Using the Pearson correlation coefficient as a measure of spectral similarity, we found that the within-experiment reproducibility of fragment ion intensities is very high (about 0.85). However, across different experiments and instrument types/setups, the correlation decreases by more than 15% (to about 0.70). We further investigated the accuracy of current predictors of peptide fragmentation spectra and found that they are more accurate than the ad-hoc models generally used by search engines (e.g., SEQUEST) and, surprisingly, approaching the empirical upper limit set by the average across-experiment spectral reproducibility (especially for charge +1 and charge +2 precursor ions). These results provide evidence that, in terms of accuracy of modeling, predicted peptide fragmentation spectra provide a viable alternative to spectral libraries for peptide identification, with a higher coverage of peptides and lower storage requirements. Furthermore, using five data sets of proteome digests by two different proteases, we find that PeptideART (a data-driven machine learning approach) is generally more accurate than MassAnalyzer (an approach based on a kinetic model for peptide fragmentation) in predicting fragmentation spectra but that both models are significantly more accurate than the ad-hoc models.
Collapse
Affiliation(s)
- Sujun Li
- School of Informatics and Computing, Indiana University, Bloomington, Indiana 47408, USA
| | | | | | | |
Collapse
|
36
|
Ye D, Fu Y, Sun RX, Wang HP, Yuan ZF, Chi H, He SM. Open MS/MS spectral library search to identify unanticipated post-translational modifications and increase spectral identification rate. ACTA ACUST UNITED AC 2010; 26:i399-406. [PMID: 20529934 PMCID: PMC2881370 DOI: 10.1093/bioinformatics/btq185] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Motivation: Identification of post-translationally modified proteins has become one of the central issues of current proteomics. Spectral library search is a new and promising computational approach to mass spectrometry-based protein identification. However, its potential in identification of unanticipated post-translational modifications has rarely been explored. The existing spectral library search tools are designed to match the query spectrum to the reference library spectra with the same peptide mass. Thus, spectra of peptides with unanticipated modifications cannot be identified. Results: In this article, we present an open spectral library search tool, named pMatch. It extends the existing library search algorithms in at least three aspects to support the identification of unanticipated modifications. First, the spectra in library are optimized with the full peptide sequence information to better tolerate the peptide fragmentation pattern variations caused by some modification(s). Second, a new scoring system is devised, which uses charge-dependent mass shifts for peak matching and combines a probability-based model with the general spectral dot-product for scoring. Third, a target-decoy strategy is used for false discovery rate control. To demonstrate the effectiveness of pMatch, a library search experiment was conducted on a public dataset with over 40 000 spectra in comparison with SpectraST, the most popular library search engine. Additional validations were done on four published datasets including over 150 000 spectra. The results showed that pMatch can effectively identify unanticipated modifications and significantly increase spectral identification rate. Availability:http://pfind.ict.ac.cn/pmatch/ Contact:yfu@ict.ac.cn; rxsun@ict.ac.cn Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ding Ye
- Institute of Computing Technology and Key Lab of Intelligent Information Processing, Chinese Academy of Sciences, Beijing 100190, China
| | | | | | | | | | | | | |
Collapse
|
37
|
Darville LNF, Merchant ME, Hasan A, Murray KK. Proteome analysis of the leukocytes from the American alligator (Alligator mississippiensis) using mass spectrometry. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2010; 5:308-16. [PMID: 20920849 DOI: 10.1016/j.cbd.2010.09.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Revised: 09/08/2010] [Accepted: 09/08/2010] [Indexed: 01/03/2023]
Abstract
Mass spectrometry was used in conjunction with gel electrophoresis and liquid chromatography, to determine peptide sequences from American alligator (Alligator mississippiensis) leukocytes and to identify similar proteins based on homology. The goal of the study was to generate an initial database of proteins related to the alligator immune system. We have adopted a typical proteomics approach for this study. Proteins from leukocyte extracts were separated using two-dimensional gel electrophoresis and the major bands were excised, digested and analyzed by on-line nano-LC MS/MS to generate peptide sequences. The sequences generated were used to identify proteins and characterize their functions. The protein identity and characterization of the protein function were based on matching two or more peptides to the same protein by searching against the NCBI database using MASCOT and Basic Local Alignment Search Tool (BLAST). For those proteins with only one peptide matching, the phylum of the matched protein was considered. Forty-three proteins were identified that exhibit sequence similarities to proteins from other vertebrates. Proteins related to the cytoskeletal system were the most abundant proteins identified. These proteins are known to regulate cell mobility and phagocytosis. Several other peptides were matched to proteins that potentially have immune-related function.
Collapse
Affiliation(s)
- Lancia N F Darville
- Department of Chemistry, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| | | | | | | |
Collapse
|
38
|
Kashyap RS, Saha SM, Nagdev KJ, Kelkar SS, Purohit HJ, Taori GM, Daginawala HF. Diagnostic markers for tuberculosis ascites: a preliminary study. Biomark Insights 2010; 5:87-94. [PMID: 20838606 PMCID: PMC2935815 DOI: 10.4137/bmi.s5196] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Objective: The diagnosis of tuberculosis (TB) ascites is problematic. Delay in the diagnosis and treatment of TB ascites are considered to be major factors that contribute to the high mortality of TB. This study identifies specific protein markers in ascitic fluid which will be useful in diagnosis of TB ascites. Methods: We used Two-Dimensional Electrophoresis, liquid chromatography-mass spectrometry/mass spectrometry, immunoblot analysis and Enzyme Linked Immunosorbent assay (ELISA) as a comprehensive quantitative proteomic screening system for the diagnosis of TB ascites. Results: The screen identified several antigens of interest: a 30-kilodalton (kDa) protein that demonstrated significant homology to the antigen 85B and 85C (Ag 85) complex; a 65-kDa protein that corresponded to Mycobacterium tuberculosis (MTB) heat shock protein 65 (65-kDa HSP), Rv0440; a 14-kDa protein and 71-kDa protein that exhibits an amino acid sequence identical to that of MTB heat shock protein 14 (14-kDa HSP), GroES; and MTB heat shock protein 71 (71-kDa HSP), Rv0350 respectively. ELISA confirmed that TB ascites patients were consistently positive for these antigens at higher rates than non-TB ascites patients. Conclusion: The 65-kDa HSP, 71-kDa HSP, 14-kDa HSP and Ag 85 complex proteins may serve as very useful diagnostic markers for TB ascites.
Collapse
Affiliation(s)
- Rajpal S Kashyap
- Biochemistry Research Laboratory, Central India Institute of Medical Sciences, Nagpur, Maharashtra, India
| | | | | | | | | | | | | |
Collapse
|
39
|
Rudnick PA, Clauser KR, Kilpatrick LE, Tchekhovskoi DV, Neta P, Blonder N, Billheimer DD, Blackman RK, Bunk DM, Cardasis HL, Ham AJL, Jaffe JD, Kinsinger CR, Mesri M, Neubert TA, Schilling B, Tabb DL, Tegeler TJ, Vega-Montoto L, Variyath AM, Wang M, Wang P, Whiteaker JR, Zimmerman LJ, Carr SA, Fisher SJ, Gibson BW, Paulovich AG, Regnier FE, Rodriguez H, Spiegelman C, Tempst P, Liebler DC, Stein SE. Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Mol Cell Proteomics 2009; 9:225-41. [PMID: 19837981 PMCID: PMC2830836 DOI: 10.1074/mcp.m900223-mcp200] [Citation(s) in RCA: 158] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
A major unmet need in LC-MS/MS-based proteomics analyses is a set of tools for quantitative assessment of system performance and evaluation of technical variability. Here we describe 46 system performance metrics for monitoring chromatographic performance, electrospray source stability, MS1 and MS2 signals, dynamic sampling of ions for MS/MS, and peptide identification. Applied to data sets from replicate LC-MS/MS analyses, these metrics displayed consistent, reasonable responses to controlled perturbations. The metrics typically displayed variations less than 10% and thus can reveal even subtle differences in performance of system components. Analyses of data from interlaboratory studies conducted under a common standard operating procedure identified outlier data and provided clues to specific causes. Moreover, interlaboratory variation reflected by the metrics indicates which system components vary the most between laboratories. Application of these metrics enables rational, quantitative quality assessment for proteomics and other LC-MS/MS analytical applications.
Collapse
Affiliation(s)
- Paul A Rudnick
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Kashyap RS, Nayak AR, Deshpande PS, Kabra D, Purohit HJ, Taori GM, Daginawala HF. Inter-α-trypsin inhibitor heavy chain 4 is a novel marker of acute ischemic stroke. Clin Chim Acta 2009; 402:160-3. [DOI: 10.1016/j.cca.2009.01.009] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
41
|
Ding J, Shi J, Poirier GG, Wu FX. A novel approach to denoising ion trap tandem mass spectra. Proteome Sci 2009; 7:9. [PMID: 19292921 PMCID: PMC2670284 DOI: 10.1186/1477-5956-7-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Accepted: 03/17/2009] [Indexed: 12/04/2022] Open
Abstract
Background Mass spectrometers can produce a large number of tandem mass spectra. They are unfortunately noise-contaminated. Noises can affect the quality of tandem mass spectra and thus increase the false positives and false negatives in the peptide identification. Therefore, it is appealing to develop an approach to denoising tandem mass spectra. Results We propose a novel approach to denoising tandem mass spectra. The proposed approach consists of two modules: spectral peak intensity adjustment and intensity local maximum extraction. In the spectral peak intensity adjustment module, we introduce five features to describe the quality of each peak. Based on these features, a score is calculated for each peak and is used to adjust its intensity. As a result, the intensity will be adjusted to a local maximum if a peak is a signal peak, and it will be decreased if the peak is a noisy one. The second module uses a morphological reconstruction filter to remove the peaks whose intensities are not the local maxima of the spectrum. Experiments have been conducted on two ion trap tandem mass spectral datasets: ISB and TOV. Experimental results show that our algorithm can remove about 69% of the peaks of a spectrum. At the same time, the number of spectra that can be identified by Mascot algorithm increases by 31.23% and 14.12% for the two tandem mass spectra datasets, respectively. Conclusion The proposed denoising algorithm can be integrated into current popular peptide identification algorithms such as Mascot to improve the reliability of assigning peptides to spectra. Availability of the software The software created from this work is available upon request.
Collapse
Affiliation(s)
- Jiarui Ding
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Canada.
| | | | | | | |
Collapse
|
42
|
Critical Evaluation of Product Ion Selection and Spectral Correlation Analysis for Biomarker Screening Using Targeted Peptide Multiple Reaction Monitoring. Clin Proteomics 2009. [DOI: 10.1007/s12014-009-9023-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Abstract
Introduction
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomic screens aimed at discovering putative protein biomarkers of disease with potential clinical applications. Systematic validation of lead candidates in large numbers of samples from patient cohorts remains an important challenge. One particularly promising high throughout technique is multiple reaction monitoring (MRM), a targeted form of MS/MS by which precise peptide precursor–product ion combinations, or transitions, are selectively tracked as informative probes. Despite recent progress, however, many important computational and statistical issues remain unresolved. These include the selection of an optimal set of transitions so as to achieve sufficiently high specificity and sensitivity when profiling complex biological specimens, and the corresponding generation of a suitable scoring function to reliably confirm tentative molecular identities based on noisy spectra.
Methods
In this study, we investigate various empirical criteria that are helpful to consider when developing and interpreting MRM-style assays based on the similarity between experimental and annotated reference spectra. We also rigorously evaluate and compare the performance of conventional spectral similarity measures, based on only a few pre-selected representative transitions, with a generic scoring metric, termed T
corr, wherein a selected product ion profile is used to score spectral comparisons.
Conclusions
Our analyses demonstrate that T
corr is potentially more suitable and effective for detecting biomarkers in complex biological mixtures than more traditional spectral library searches.
Collapse
|
43
|
Lu B, Xu T, Park SK, Yates JR. Shotgun protein identification and quantification by mass spectrometry. Methods Mol Biol 2009; 564:261-88. [PMID: 19544028 DOI: 10.1007/978-1-60761-157-8_15] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Shotgun proteomics is based on identification and quantification of peptides from digested proteins using tandem mass spectrometry. In this chapter, we discuss computational methods to analyze tandem mass spectra of peptides, including database searching, de novo peptide sequencing, hybrid approaches, library searching, and unrestricted modification search. A special focus is given to database searching programs since they are most widely used. The process of inferring proteins from identified peptides is then discussed. We also provide description of key steps in the quantitative analysis of mass spectrometry proteomics data.
Collapse
Affiliation(s)
- Bingwen Lu
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | | | | | | |
Collapse
|
44
|
Lu B, Xu T, Park SK, McClatchy DB, Liao L, Yates JR. Shotgun protein identification and quantification by mass spectrometry in neuroproteomics. Methods Mol Biol 2009; 566:229-259. [PMID: 20058176 DOI: 10.1007/978-1-59745-562-6_16] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Shotgun proteomics is based on identification and quantification of peptides from digested proteins using tandem mass spectrometry. In this chapter, we discuss computational methods to analyze tandem mass spectra of peptides, including database searching, de novo peptide sequencing, hybrid approaches, library searching, and unrestricted modification search. A special focus is given to database searching programs, since they are the most widely used. The process of inferring proteins from identified peptides is then discussed. We also provide description of key steps in the quantitative analysis of mass spectrometry proteomics data. These methods are valuable tools for discovery and hypothesis-driven analyses in neuroproteomics.
Collapse
Affiliation(s)
- Bingwen Lu
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA, USA
| | | | | | | | | | | |
Collapse
|
45
|
Lam H, Deutsch EW, Eddes JS, Eng JK, Stein SE, Aebersold R. Building consensus spectral libraries for peptide identification in proteomics. Nat Methods 2008; 5:873-5. [PMID: 18806791 PMCID: PMC2637392 DOI: 10.1038/nmeth.1254] [Citation(s) in RCA: 209] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2008] [Accepted: 08/26/2008] [Indexed: 11/09/2022]
Abstract
Spectral searching has drawn increasing interest as an alternative to sequence-database searching in proteomics. We developed and validated an open-source software toolkit, SpectraST, to enable proteomics researchers to build spectral libraries and to integrate this promising approach in their data-analysis pipeline. It allows individual researchers to condense raw data into spectral libraries, summarizing information about observed proteomes into a concise and retrievable format for future data analyses.
Collapse
Affiliation(s)
- Henry Lam
- Institute for Systems Biology, Seattle, Washington 98103, USA.
| | | | | | | | | | | |
Collapse
|
46
|
Toward high-throughput and reliable peptide identification via MS/MS spectra. Methods Mol Biol 2008. [PMID: 18592190 DOI: 10.1007/978-1-59745-398-1_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
One fundamental problem in proteomics study is to identify proteins and determine their expression levels in cells. Coupled with advanced liquid chromatography, tandem mass spectrometry has become the standard tool for peptide sequencing. In the past decade, many different algorithms and software packages have been developed to support high-throughput proteomics studies. This chapter reviews and compares the computational methods and software for the interpretation of tandem mass spectra. We also present techniques to assess the reliability of peptide identification. Finally, future directions and new research paradigms in tandem mass spectrometry are discussed.
Collapse
|
47
|
Carpentier SC, Panis B, Vertommen A, Swennen R, Sergeant K, Renaut J, Laukens K, Witters E, Samyn B, Devreese B. Proteome analysis of non-model plants: a challenging but powerful approach. MASS SPECTROMETRY REVIEWS 2008; 27:354-77. [PMID: 18381744 DOI: 10.1002/mas.20170] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Biological research has focused in the past on model organisms and most of the functional genomics studies in the field of plant sciences are still performed on model species or species that are characterized to a great extent. However, numerous non-model plants are essential as food, feed, or energy resource. Some features and processes are unique to these plant species or families and cannot be approached via a model plant. The power of all proteomic and transcriptomic methods, that is, high-throughput identification of candidate gene products, tends to be lost in non-model species due to the lack of genomic information or due to the sequence divergence to a related model organism. Nevertheless, a proteomics approach has a great potential to study non-model species. This work reviews non-model plants from a proteomic angle and provides an outline of the problems encountered when initiating the proteome analysis of a non-model organism. The review tackles problems associated with (i) sample preparation, (ii) the analysis and interpretation of a complex data set, (iii) the protein identification via MS, and (iv) data management and integration. We will illustrate the power of 2DE for non-model plants in combination with multivariate data analysis and MS/MS identification and will evaluate possible alternatives.
Collapse
|
48
|
Frewen B, MacCoss MJ. Using BiblioSpec for creating and searching tandem MS peptide libraries. ACTA ACUST UNITED AC 2008; Chapter 13:13.7.1-13.7.12. [PMID: 18428681 DOI: 10.1002/0471250953.bi1307s20] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
BiblioSpec is a software package for creating and searching libraries of tandem MS peptide spectra. Library searching provides a quick method for making peptide-spectrum matches by comparing a query spectrum to a collection of reference spectra of known peptide sequence. Pre-assembled libraries for several model organisms can be used as the basis of a search or custom libraries can easily be assembled. The protocols in this unit describe installing BiblioSpec, searching libraries, and creating custom libraries.
Collapse
|
49
|
Sandhu C, Hewel JA, Badis G, Talukder S, Liu J, Hughes TR, Emili A. Evaluation of Data-Dependent versus Targeted Shotgun Proteomic Approaches for Monitoring Transcription Factor Expression in Breast Cancer. J Proteome Res 2008; 7:1529-41. [DOI: 10.1021/pr700836q] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Charanjit Sandhu
- Program in Proteomics and Bioinformatics, Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Johannes A. Hewel
- Program in Proteomics and Bioinformatics, Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Gwenael Badis
- Program in Proteomics and Bioinformatics, Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Shaheynoor Talukder
- Program in Proteomics and Bioinformatics, Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Jian Liu
- Program in Proteomics and Bioinformatics, Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Timothy R. Hughes
- Program in Proteomics and Bioinformatics, Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Andrew Emili
- Program in Proteomics and Bioinformatics, Banting and Best Department of Medical Research, Terrence Donnelly Centre for Cellular and Biomolecular Research (CCBR), and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
50
|
Frank AM, Bandeira N, Shen Z, Tanner S, Briggs SP, Smith RD, Pevzner PA. Clustering millions of tandem mass spectra. J Proteome Res 2007; 7:113-22. [PMID: 18067247 DOI: 10.1021/pr070361e] [Citation(s) in RCA: 178] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Tandem mass spectrometry (MS/MS) experiments often generate redundant data sets containing multiple spectra of the same peptides. Clustering of MS/MS spectra takes advantage of this redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. We present an efficient clustering approach for analyzing large MS/MS data sets (over 10 million spectra) with a capability to reduce the number of spectra submitted to further analysis by an order of magnitude. The MS/MS database search of clustered spectra results in fewer spurious hits to the database and increases number of peptide identifications as compared to regular nonclustered searches. Our open source software MS-Clustering is available for download at http://peptide.ucsd.edu or can be run online at http://proteomics.bioprojects.org/MassSpec.
Collapse
Affiliation(s)
- Ari M Frank
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093-0404, USA.
| | | | | | | | | | | | | |
Collapse
|