1
|
Adams C, Laukens K, Bittremieux W, Boonen K. Machine learning-based peptide-spectrum match rescoring opens up the immunopeptidome. Proteomics 2024; 24:e2300336. [PMID: 38009585 DOI: 10.1002/pmic.202300336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 11/29/2023]
Abstract
Immunopeptidomics is a key technology in the discovery of targets for immunotherapy and vaccine development. However, identifying immunopeptides remains challenging due to their non-tryptic nature, which results in distinct spectral characteristics. Moreover, the absence of strict digestion rules leads to extensive search spaces, further amplified by the incorporation of somatic mutations, pathogen genomes, unannotated open reading frames, and post-translational modifications. This inflation in search space leads to an increase in random high-scoring matches, resulting in fewer identifications at a given false discovery rate. Peptide-spectrum match rescoring has emerged as a machine learning-based solution to address challenges in mass spectrometry-based immunopeptidomics data analysis. It involves post-processing unfiltered spectrum annotations to better distinguish between correct and incorrect peptide-spectrum matches. Recently, features based on predicted peptidoform properties, including fragment ion intensities, retention time, and collisional cross section, have been used to improve the accuracy and sensitivity of immunopeptide identification. In this review, we describe the diverse bioinformatics pipelines that are currently available for peptide-spectrum match rescoring and discuss how they can be used for the analysis of immunopeptidomics data. Finally, we provide insights into current and future machine learning solutions to boost immunopeptide identification.
Collapse
Affiliation(s)
- Charlotte Adams
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wout Bittremieux
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- ImmuneSpec BV, Niel, Belgium
| |
Collapse
|
2
|
Lenčo J, Jadeja S, Naplekov DK, Krokhin OV, Khalikova MA, Chocholouš P, Urban J, Broeckhoven K, Nováková L, Švec F. Reversed-Phase Liquid Chromatography of Peptides for Bottom-Up Proteomics: A Tutorial. J Proteome Res 2022; 21:2846-2892. [PMID: 36355445 DOI: 10.1021/acs.jproteome.2c00407] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The performance of the current bottom-up liquid chromatography hyphenated with mass spectrometry (LC-MS) analyses has undoubtedly been fueled by spectacular progress in mass spectrometry. It is thus not surprising that the MS instrument attracts the most attention during LC-MS method development, whereas optimizing conditions for peptide separation using reversed-phase liquid chromatography (RPLC) remains somewhat in its shadow. Consequently, the wisdom of the fundaments of chromatography is slowly vanishing from some laboratories. However, the full potential of advanced MS instruments cannot be achieved without highly efficient RPLC. This is impossible to attain without understanding fundamental processes in the chromatographic system and the properties of peptides important for their chromatographic behavior. We wrote this tutorial intending to give practitioners an overview of critical aspects of peptide separation using RPLC to facilitate setting the LC parameters so that they can leverage the full capabilities of their MS instruments. After briefly introducing the gradient separation of peptides, we discuss their properties that affect the quality of LC-MS chromatograms the most. Next, we address the in-column and extra-column broadening. The last section is devoted to key parameters of LC-MS methods. We also extracted trends in practice from recent bottom-up proteomics studies and correlated them with the current knowledge on peptide RPLC separation.
Collapse
Affiliation(s)
- Juraj Lenčo
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| | - Siddharth Jadeja
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| | - Denis K Naplekov
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| | - Oleg V Krokhin
- Department of Internal Medicine, Manitoba Centre for Proteomics and Systems Biology, University of Manitoba, 799 JBRC, 715 McDermot Avenue, WinnipegR3E 3P4, Manitoba, Canada
| | - Maria A Khalikova
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| | - Petr Chocholouš
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| | - Jiří Urban
- Department of Chemistry, Faculty of Science, Masaryk University, Kamenice 5, 625 00Brno, Czech Republic
| | - Ken Broeckhoven
- Department of Chemical Engineering (CHIS), Faculty of Engineering, Vrije Universiteit Brussel, Pleinlaan 2, 1050Brussel, Belgium
| | - Lucie Nováková
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| | - František Švec
- Department of Analytical Chemistry, Faculty of Pharmacy in Hradec Králové, Charles University, Heyrovského 1203/8, 500 05Hradec Králové, Czech Republic
| |
Collapse
|
3
|
Chen W, McCool EN, Sun L, Zang Y, Ning X, Liu X. Evaluation of Machine Learning Models for Proteoform Retention and Migration Time Prediction in Top-Down Mass Spectrometry. J Proteome Res 2022; 21:1736-1747. [PMID: 35616364 PMCID: PMC9250612 DOI: 10.1021/acs.jproteome.2c00124] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
![]()
Reversed-phase liquid
chromatography (RPLC) and capillary zone
electrophoresis (CZE) are two primary proteoform separation methods
in mass spectrometry (MS)-based top-down proteomics. Proteoform retention
time (RT) prediction in RPLC and migration time (MT) prediction in
CZE provide additional information for accurate proteoform identification
and quantification. While existing methods are mainly focused on peptide
RT and MT prediction in bottom-up MS, there is still a lack of methods
for proteoform RT and MT prediction in top-down MS. We systematically
evaluated eight machine learning models and a transfer learning method
for proteoform RT prediction and five models and the transfer learning
method for proteoform MT prediction. Experimental results showed that
a gated recurrent unit (GRU)-based model with transfer learning achieved
a high accuracy (R = 0.978) for proteoform RT prediction
and that the GRU-based model and a fully connected neural network
model obtained a high accuracy of R = 0.982 and 0.981
for proteoform MT prediction, respectively.
Collapse
Affiliation(s)
- Wenrong Chen
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United Staes
| | - Elijah N McCool
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United Staes
| | - Liangliang Sun
- Department of Chemistry, Michigan State University, East Lansing, Michigan 48824, United Staes
| | - Yong Zang
- Department of Biostatics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, Indiana 46202, United Staes
| | - Xia Ning
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio 43210, United Staes.,Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, United Staes.,Translational Data Analytics Institute, The Ohio State University, Columbus, Ohio 43210, United Staes
| | - Xiaowen Liu
- Tulane Center for Biomedical Informatics and Genomics, Tulane University, New Orleans, Louisiana 70112, United Staes.,Deming Department of Medicine, Tulane University, New Orleans, Louisiana 70112, United Staes
| |
Collapse
|
4
|
Giese SH, Sinn LR, Wegner F, Rappsilber J. Retention time prediction using neural networks increases identifications in crosslinking mass spectrometry. Nat Commun 2021; 12:3237. [PMID: 34050149 PMCID: PMC8163845 DOI: 10.1038/s41467-021-23441-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 04/26/2021] [Indexed: 12/13/2022] Open
Abstract
Crosslinking mass spectrometry has developed into a robust technique that is increasingly used to investigate the interactomes of organelles and cells. However, the incomplete and noisy information in the mass spectra of crosslinked peptides limits the numbers of protein-protein interactions that can be confidently identified. Here, we leverage chromatographic retention time information to aid the identification of crosslinked peptides from mass spectra. Our Siamese machine learning model xiRT achieves highly accurate retention time predictions of crosslinked peptides in a multi-dimensional separation of crosslinked E. coli lysate. Importantly, supplementing the search engine score with retention time features leads to a substantial increase in protein-protein interactions without affecting confidence. This approach is not limited to cell lysates and multi-dimensional separation but also improves considerably the analysis of crosslinked multiprotein complexes with a single chromatographic dimension. Retention times are a powerful complement to mass spectrometric information to increase the sensitivity of crosslinking mass spectrometry analyses.
Collapse
Affiliation(s)
- Sven H Giese
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany
- Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering, Potsdam, Germany
- Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Ludwig R Sinn
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany
| | - Fritz Wegner
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany
| | - Juri Rappsilber
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, Berlin, Germany.
- Wellcome Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
5
|
Chen AT, Franks A, Slavov N. DART-ID increases single-cell proteome coverage. PLoS Comput Biol 2019; 15:e1007082. [PMID: 31260443 PMCID: PMC6625733 DOI: 10.1371/journal.pcbi.1007082] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 07/12/2019] [Accepted: 05/06/2019] [Indexed: 01/09/2023] Open
Abstract
Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. This process, however, remains challenging for smaller samples, such as the proteomes of single mammalian cells, because reduced protein levels reduce the number of confidently sequenced peptides. To alleviate this reduction, we developed Data-driven Alignment of Retention Times for IDentification (DART-ID). DART-ID implements principled Bayesian frameworks for global retention time (RT) alignment and for incorporating RT estimates towards improved confidence estimates of peptide-spectrum-matches. When applied to bulk or to single-cell samples, DART-ID increased the number of data points by 30-50% at 1% FDR, and thus decreased missing data. Benchmarks indicate excellent quantification of peptides upgraded by DART-ID and support their utility for quantitative analysis, such as identifying cell types and cell-type specific proteins. The additional datapoints provided by DART-ID boost the statistical power and double the number of proteins identified as differentially abundant in monocytes and T-cells. DART-ID can be applied to diverse experimental designs and is freely available at http://dart-id.slavovlab.net.
Collapse
Affiliation(s)
- Albert Tian Chen
- Department of Bioengineering, Northeastern University, Boston, Massachusetts, United States of America
- Barnett Institute, Northeastern University, Boston, Massachusetts, United States of America
| | - Alexander Franks
- Department of Statistics and Applied Probability, University of California Santa Barbara, California, United States of America
| | - Nikolai Slavov
- Department of Bioengineering, Northeastern University, Boston, Massachusetts, United States of America
- Barnett Institute, Northeastern University, Boston, Massachusetts, United States of America
- Department of Biology, Northeastern University, Boston, Massachusetts, United States of America
| |
Collapse
|
6
|
Tarasova IA, Masselon CD, Gorshkov AV, Gorshkov MV. Predictive chromatography of peptides and proteins as a complementary tool for proteomics. Analyst 2018; 141:4816-4832. [PMID: 27419248 DOI: 10.1039/c6an00919k] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
In the last couple of decades, considerable effort has been focused on developing methods for quantitative and qualitative proteome characterization. The method of choice in this characterization is mass spectrometry used in combination with sample separation. One of the most widely used separation techniques at the front end of a mass spectrometer is high performance liquid chromatography (HPLC). A unique feature of HPLC is its specificity to the amino acid sequence of separated peptides and proteins. This specificity may provide additional information about the peptides or proteins under study which is complementary to the mass spectrometry data. The value of this information for proteomics has been recognized in the past few decades, which has stimulated significant effort in the development and implementation of computational and theoretical models for the prediction of peptide retention time for a given sequence. Here we review the advances in this area and the utility of predicted retention times for proteomic applications.
Collapse
Affiliation(s)
- Irina A Tarasova
- Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia.
| | - Christophe D Masselon
- CEA, iRTSV-BGE, Laboratoire d'Etude de la Dynamique des Protéomes, Grenoble, F-38000, France and INSERM, U1038-BGE, F-38000, Grenoble, France
| | - Alexander V Gorshkov
- N.N. Semenov Institute of Chemical Physics, Russian Academy of Sciences, Moscow 119991, Russia
| | - Mikhail V Gorshkov
- Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia. and Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow region 141700, Russia
| |
Collapse
|
7
|
Giese SH, Ishihama Y, Rappsilber J. Peptide Retention in Hydrophilic Strong Anion Exchange Chromatography Is Driven by Charged and Aromatic Residues. Anal Chem 2018. [PMID: 29528219 PMCID: PMC5937359 DOI: 10.1021/acs.analchem.7b05157] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Hydrophilic strong anion exchange chromatography (hSAX) is becoming a popular method for the prefractionation of proteomic samples. However, the use and further development of this approach is affected by the limited understanding of its retention mechanism and the absence of elution time prediction. Using a set of 59 297 confidentially identified peptides, we performed an explorative analysis and built a predictive deep learning model. As expected, charged residues are the major contributors to the retention time through electrostatic interactions. Aspartic acid and glutamic acid have a strong retaining effect and lysine and arginine have a strong repulsion effect. In addition, we also find the involvement of aromatic amino acids. This suggests a substantial contribution of cation-π interactions to the retention mechanism. The deep learning approach was validated using 5-fold cross-validation (CV) yielding a mean prediction accuracy of 70% during CV and 68% on a hold-out validation set. The results of this study emphasize that not only electrostatic interactions but rather diverse types of interactions must be integrated to build a reliable hSAX retention time predictor.
Collapse
Affiliation(s)
- Sven H Giese
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Yasushi Ishihama
- Graduate School of Pharmaceutical Sciences , Kyoto University , Kyoto 606-8501 , Japan
| | - Juri Rappsilber
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany.,Graduate School of Pharmaceutical Sciences , Kyoto University , Kyoto 606-8501 , Japan.,Wellcome Centre for Cell Biology, School of Biological Sciences , University of Edinburgh , Edinburgh EH9 3BF , United Kingdom
| |
Collapse
|
8
|
Lobas AA, Levitsky LI, Fichtenbaum A, Surin AK, Pridatchenko ML, Mitulovic G, Gorshkov AV, Gorshkov MV. Predictive Liquid Chromatography of Peptides Based on Hydrophilic Interactions for Mass Spectrometry-Based Proteomics. JOURNAL OF ANALYTICAL CHEMISTRY 2018. [DOI: 10.1134/s1061934817140076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
9
|
Moruz L, Käll L. Peptide retention time prediction. MASS SPECTROMETRY REVIEWS 2017; 36:615-623. [PMID: 26799864 DOI: 10.1002/mas.21488] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Accepted: 11/12/2015] [Indexed: 06/05/2023]
Abstract
Most methods for interpreting data from shotgun proteomics experiments are to large degree dependent on being able to predict properties of peptide-ions. Often such predicted properties are limited to molecular mass and fragment spectra, but here we put focus on a perhaps underutilized property, a peptide's chromatographic retention time. We review a couple of different principles of retention time prediction,and their applications within computational proteomics. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 36:615-623, 2017.
Collapse
Affiliation(s)
- Luminita Moruz
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology - KTH, Stockholm, Sweden
| | - Lukas Käll
- Science for Life Laboratory, School of Biotechnology, Royal Institute of Technology - KTH, Stockholm, Sweden
| |
Collapse
|
10
|
Locus-specific Retention Predictor (LsRP): A Peptide Retention Time Predictor Developed for Precision Proteomics. Sci Rep 2017; 7:43959. [PMID: 28303880 PMCID: PMC5356008 DOI: 10.1038/srep43959] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 01/31/2017] [Indexed: 11/08/2022] Open
Abstract
The precision prediction of peptide retention time (RT) plays an increasingly important role in liquid chromatography-tandem mass spectrometry (LC-MS/MS) based proteomics. Owing to the high reproducibility of liquid chromatography, RT prediction provides promising information for both identification and quantification experiment design. In this work, we present a Locus-specific Retention Predictor (LsRP) for precise prediction of peptide RT, which is based on amino acid locus information and Support Vector Regression (SVR) algorithm. Corresponding to amino acid locus, each peptide sequence was converted to a featured locus vector consisting of zeros and ones. With locus vector information from LC-MS/MS data sets, an SVR computational process was trained and evaluated. LsRP finally provided a prediction correlation coefficient of 0.95~0.99. We compared our method with two common predictors. Results showed that LsRP outperforms these methods and tracked up to 30% extra peptides in an extraction RT window of 2 min. A new strategy by combining LsRP and calibration peptide approach was then proposed, which open up new opportunities for precision proteomics.
Collapse
|
11
|
Aicheler F, Li J, Hoene M, Lehmann R, Xu G, Kohlbacher O. Retention Time Prediction Improves Identification in Nontargeted Lipidomics Approaches. Anal Chem 2015; 87:7698-704. [PMID: 26145158 DOI: 10.1021/acs.analchem.5b01139] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Identification of lipids in nontargeted lipidomics based on liquid-chromatography coupled to mass spectrometry (LC-MS) is still a major issue. While both accurate mass and fragment spectra contain valuable information, retention time (tR) information can be used to augment this data. We present a retention time model based on machine learning approaches which enables an improved assignment of lipid structures and automated annotation of lipidomics data. In contrast to common approaches we used a complex mixture of 201 lipids originating from fat tissue instead of a standard mixture to train a support vector regression (SVR) model including molecular structural features. The cross-validated model achieves a correlation coefficient between predicted and experimental test sample retention times of r = 0.989. Combining our retention time model with identification via accurate mass search (AMS) of lipids against the comprehensive LIPID MAPS database, retention time filtering can significantly reduce the rate of false positives in complex data sets like adipose tissue extracts. In our case, filtering with retention time information removed more than half of the potential identifications, while retaining 95% of the correct identifications. Combination of high-precision retention time prediction and accurate mass can thus significantly narrow down the number of hypotheses to be assessed for lipid identification in complex lipid pattern like tissue profiles.
Collapse
Affiliation(s)
- Fabian Aicheler
- †Applied Bioinformatics, Center for Bioinformatics, Quantitative Biology Center, and Department of Computer Science, University of Tuebingen, Sand 14, 72076 Tuebingen, Baden-Württemberg, Germany
| | - Jia Li
- ‡Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning 116023, China
| | - Miriam Hoene
- §Division of Clinical Chemistry and Pathobiochemistry, Department of Internal Medicine IV, University Hospital Tuebingen, 72076 Tuebingen, Baden-Württemberg, Germany
| | - Rainer Lehmann
- §Division of Clinical Chemistry and Pathobiochemistry, Department of Internal Medicine IV, University Hospital Tuebingen, 72076 Tuebingen, Baden-Württemberg, Germany.,∥Department of Molecular Diabetology, Institute for Diabetes Research and Metabolic Diseases of the Helmholtz Centre Munich at the University of Tuebingen, 72076 Tuebingen, Baden-Württemberg, Germany.,⊥German Center for Diabetes Research (DZD), 72076 Tuebingen, Baden-Württemberg, Germany
| | - Guowang Xu
- ‡Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning 116023, China
| | - Oliver Kohlbacher
- †Applied Bioinformatics, Center for Bioinformatics, Quantitative Biology Center, and Department of Computer Science, University of Tuebingen, Sand 14, 72076 Tuebingen, Baden-Württemberg, Germany.,∥Department of Molecular Diabetology, Institute for Diabetes Research and Metabolic Diseases of the Helmholtz Centre Munich at the University of Tuebingen, 72076 Tuebingen, Baden-Württemberg, Germany.,⊥German Center for Diabetes Research (DZD), 72076 Tuebingen, Baden-Württemberg, Germany
| |
Collapse
|
12
|
Parker SJ, Rost H, Rosenberger G, Collins BC, Malmström L, Amodei D, Venkatraman V, Raedschelders K, Van Eyk JE, Aebersold R. Identification of a Set of Conserved Eukaryotic Internal Retention Time Standards for Data-independent Acquisition Mass Spectrometry. Mol Cell Proteomics 2015. [PMID: 26199342 DOI: 10.1074/mcp.o114.042267] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Accurate knowledge of retention time (RT) in liquid chromatography-based mass spectrometry data facilitates peptide identification, quantification, and multiplexing in targeted and discovery-based workflows. Retention time prediction is particularly important for peptide analysis in emerging data-independent acquisition (DIA) experiments such as SWATH-MS. The indexed RT approach, iRT, uses synthetic spiked-in peptide standards (SiRT) to set RT to a unit-less scale, allowing for normalization of peptide RT between different samples and chromatographic set-ups. The obligatory use of SiRTs can be costly and complicates comparisons and data integration if standards are not included in every sample. Reliance on SiRTs also prevents the inclusion of archived mass spectrometry data for generation of the peptide assay libraries central to targeted DIA-MS data analysis. We have identified a set of peptide sequences that are conserved across most eukaryotic species, termed Common internal Retention Time standards (CiRT). In a series of tests to support the appropriateness of the CiRT-based method, we show: (1) the CiRT peptides normalized RT in human, yeast, and mouse cell lysate derived peptide assay libraries and enabled merging of archived libraries for expanded DIA-MS quantitative applications; (2) CiRTs predicted RT in SWATH-MS data within a 2-min margin of error for the majority of peptides; and (3) normalization of RT using the CiRT peptides enabled the accurate SWATH-MS-based quantification of 340 synthetic isotopically labeled peptides that were spiked into either human or yeast cell lysate. To automate and facilitate the use of these CiRT peptide lists or other custom user-defined internal RT reference peptides in DIA workflows, an algorithm was designed to automatically select a high-quality subset of datapoints for robust linear alignment of RT for use. Implementations of this algorithm are available for the OpenSWATH and Skyline platforms. Thus, CiRT peptides can be used alone or as a complement to SiRTs for RT normalization across peptide spectral libraries and in quantitative DIA-MS studies.
Collapse
Affiliation(s)
- Sarah J Parker
- ‡‡Advanced Clinical Biosystems Research Institute, The Heart Institute, and Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, California
| | - Hannes Rost
- §Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland; ¶PhD Program in Systems Biology, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - George Rosenberger
- §Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland; ¶PhD Program in Systems Biology, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Ben C Collins
- §Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | | | | | - Vidya Venkatraman
- ‡‡Advanced Clinical Biosystems Research Institute, The Heart Institute, and Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, California
| | - Koen Raedschelders
- ‡‡Advanced Clinical Biosystems Research Institute, The Heart Institute, and Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, California
| | - Jennifer E Van Eyk
- From the ‡Department of Medicine, Johns Hopkins University, Baltimore Maryland; ‡‡Advanced Clinical Biosystems Research Institute, The Heart Institute, and Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, California
| | - Ruedi Aebersold
- §Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland; §§Faculty of Science, University of Zurich, Zurich, Switzerland
| |
Collapse
|
13
|
Applications of Peptide Retention Time in Proteomic Data Analysis. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 845:67-75. [DOI: 10.1007/978-94-017-9523-4_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
14
|
Kelchtermans P, Bittremieux W, De Grave K, Degroeve S, Ramon J, Laukens K, Valkenborg D, Barsnes H, Martens L. Machine learning applications in proteomics research: how the past can boost the future. Proteomics 2014; 14:353-66. [PMID: 24323524 DOI: 10.1002/pmic.201300289] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/24/2013] [Accepted: 10/14/2013] [Indexed: 01/22/2023]
Abstract
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.
Collapse
Affiliation(s)
- Pieter Kelchtermans
- Department of Medical Protein Research, VIB, Ghent, Belgium; Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium; Flemish Institute for Technological Research (VITO), Boeretang, Mol, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Moskovets E, Goloborodko AA, Gorshkov AV, Gorshkov MV. Limitation of predictive 2-D liquid chromatography in reducing the database search space in shotgun proteomics: in silico studies. J Sep Sci 2012; 35:1771-8. [PMID: 22807359 DOI: 10.1002/jssc.201100798] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
A two-dimensional (2-D) liquid chromatography (LC) separation of complex peptide mixtures that combines a normal phase utilizing hydrophilic interactions and a reversed phase offers reportedly the highest level of 2-D LC orthogonality by providing an even spread of peptides across multiple LC fractions. Matching experimental peptide retention times to those predicted by empirical models describing chromatographic separation in each LC dimension leads to a significant reduction in a database search space. In this work, we calculated the retention times of tryptic peptides separated in the C18 reversed phase at different separation conditions (pH 2 and pH 10) and in TSK gel Amide-80 normal phase. We show that retention times calculated for different 2-D LC separation schemes utilizing these phases start to correlate once the mass range of peptides under analysis becomes progressively narrow. This effect is explained by high degree of correlation between retention coefficients in the considered phases.
Collapse
|
16
|
Moruz L, Staes A, Foster JM, Hatzou M, Timmerman E, Martens L, Käll L. Chromatographic retention time prediction for posttranslationally modified peptides. Proteomics 2012; 12:1151-9. [DOI: 10.1002/pmic.201100386] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Luminita Moruz
- Science for Life Laboratory, Department of Biochemistry and Biophysics; Stockholm University; Solna Sweden
- Stockholm Bioinformatics Center; Stockholm University; Solna Sweden
| | - An Staes
- Department of Medical Protein Research; VIB; Ghent Belgium
- Department of Biochemistry; Ghent University; Ghent Belgium
| | - Joseph M. Foster
- EMBL Outstation, European Bioinformatics Institute; Wellcome Trust Genome Campus; Hinxton Cambridge UK
| | - Maria Hatzou
- Science for Life Laboratory, Department of Biochemistry and Biophysics; Stockholm University; Solna Sweden
| | - Evy Timmerman
- Department of Medical Protein Research; VIB; Ghent Belgium
- Department of Biochemistry; Ghent University; Ghent Belgium
| | - Lennart Martens
- Department of Medical Protein Research; VIB; Ghent Belgium
- Department of Biochemistry; Ghent University; Ghent Belgium
| | - Lukas Käll
- Stockholm Bioinformatics Center; Stockholm University; Solna Sweden
- Science for Life Laboratory, School of Biotechnology; Royal Institute of Technology (KTH); Solna Sweden
| |
Collapse
|
17
|
Tyrkkö E, Pelander A, Ojanperä I. Prediction of liquid chromatographic retention for differentiation of structural isomers. Anal Chim Acta 2012; 720:142-8. [PMID: 22365132 DOI: 10.1016/j.aca.2012.01.024] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2011] [Revised: 01/13/2012] [Accepted: 01/13/2012] [Indexed: 10/14/2022]
Abstract
A liquid chromatography (LC) retention time prediction software, ACD/ChromGenius, was employed to calculate retention times for structural isomers, which cannot be differentiated by accurate mass measurement techniques alone. For 486 drug compounds included in an in-house database for urine drug screening by liquid chromatography/quadrupole time-of-flight mass spectrometry (LC/Q-TOFMS), a retention time knowledge base was created with the software. ACD/ChromGenius calculated retention times for compounds based on the drawn molecular structure and given chromatographic parameters. The ability of the software for compound identification was evaluated by calculating the retention order of the 118 isomers, in 50 isomer groups of 2-5 compounds each, included in the database. ACD/ChromGenius predicted the correct elution order for 68% (34) of isomer groups. Of the 16 groups for which the isomer elution order was incorrectly calculated, two were diastereomer pairs and thus difficult to distinguish using the software. Correlation between the calculated and experimental retention times in the knowledge base tested was moderate, r(2)=0.8533. The mean and median absolute errors were 1.12 min, and 0.84 min, respectively, and the standard deviation was 1.04 min. The information generated by ACD/ChromGenius, together with other in silico methods employing accurate mass data, makes the identification of substances more reliable. This study demonstrates an approach for tentatively identifying compounds in a large target database without a need for primary reference standards.
Collapse
Affiliation(s)
- Elli Tyrkkö
- Department of Forensic Medicine, Hjelt Institute, University of Helsinki, Finland.
| | | | | |
Collapse
|
18
|
Vaudel M, Burkhart JM, Sickmann A, Martens L, Zahedi RP. Peptide identification quality control. Proteomics 2011; 11:2105-14. [PMID: 21500347 DOI: 10.1002/pmic.201000704] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2010] [Revised: 02/10/2011] [Accepted: 02/17/2011] [Indexed: 11/10/2022]
Abstract
Identification of large proteomics data sets is routinely performed using sophisticated software tools called search engines. Yet despite the importance of the identification process, its configuration and execution is often performed according to established lab habits, and is mostly unsupervised by detailed quality control. In order to establish easily obtainable quality control criteria that can be broadly applied to the identification process, we here introduce several simple quality control methods. An unbiased quality control of identification parameters will be conducted using target/decoy searches providing significant improvement over identification standards. MASCOT identifications were for instance increased by 13% at a constant level of confidence. The target/decoy approach can however not be universally applied. We therefore also quality control the application of this strategy itself, providing useful and intuitive metrics for evaluating the precision and robustness of the obtained false discovery rate.
Collapse
Affiliation(s)
- Marc Vaudel
- ISAS-Leibniz Institut für Analytische Wissenschaften-ISAS-eV, Dortmund, Germany
| | | | | | | | | |
Collapse
|
19
|
Joyner K, Wang W, Yu YB. The Effect of Column and Eluent Fluorination on the Retention and Separation of non-Fluorinated Amino Acids and Proteins by HPLC. J Fluor Chem 2011; 132:114-122. [PMID: 21318121 DOI: 10.1016/j.jfluchem.2010.12.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The effect of column and eluent fluorination on the retention and separation of non-fluorinated amino acids and proteins in HPLC is investigated. A side-by-side comparison of fluorocarbon column and eluents (F-column and F-eluents) with their hydrocarbon counterparts (H-column and H-eluents) in the separation of a group of 33 analytes, including 30 amino acids and 3 proteins, is conducted. The H-column and the F-column contain the n-C(8)H(17) group and n-C(8)F(17) group, respectively, in their stationary phases. The H-eluents include ethanol (EtOH) and isopropanol (ISP) while the F-eluents include trifluoroethanol (TFE) and hexafluorosopropanol (HFIP). The 2 columns and 4 eluents generated 8 (column, eluent) pairs that produce 264 retention time data points for the 33 analytes. A statistical analysis of the retention time data reveals that although the H-column is better than the F-column in analyte separation and H-eluents are better than F-eluents in analyte retention, the more critical factor is the proper pairing of column with eluent. Among the conditions explored in this project, optimal retention and separation is achieved when the fluorocarbon column is paired with ethanol, even though TFE is the most polar one among the 4 eluents. This result shows fluorocarbon columns have much potential in chromatographic analysis and separation of non-fluorinated amino acids and proteins.
Collapse
Affiliation(s)
- Katherine Joyner
- Department of Pharmaceutical Sciences, University of Maryland, Baltimore, MD 21201
| | | | | |
Collapse
|
20
|
Bochet P, Rügheimer F, Guina T, Brooks P, Goodlett D, Clote P, Schwikowski B. Fragmentation-free LC-MS can identify hundreds of proteins. Proteomics 2010; 11:22-32. [DOI: 10.1002/pmic.200900765] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2009] [Revised: 09/02/2010] [Accepted: 09/20/2010] [Indexed: 11/09/2022]
|
21
|
Moruz L, Tomazela D, Käll L. Training, Selection, and Robust Calibration of Retention Time Models for Targeted Proteomics. J Proteome Res 2010; 9:5209-16. [DOI: 10.1021/pr1005058] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Luminita Moruz
- Center for Biomembrane Research, Department of Biochemistry and Biophysics, and Stockholm Bioinformatics Center, Stockholm University, Stockholm, Sweden, and Department of Genome Sciences, University of Washington, Seattle, Washington 98195
| | - Daniela Tomazela
- Center for Biomembrane Research, Department of Biochemistry and Biophysics, and Stockholm Bioinformatics Center, Stockholm University, Stockholm, Sweden, and Department of Genome Sciences, University of Washington, Seattle, Washington 98195
| | - Lukas Käll
- Center for Biomembrane Research, Department of Biochemistry and Biophysics, and Stockholm Bioinformatics Center, Stockholm University, Stockholm, Sweden, and Department of Genome Sciences, University of Washington, Seattle, Washington 98195
| |
Collapse
|
22
|
Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 2010; 73:2092-123. [PMID: 20816881 DOI: 10.1016/j.jprot.2010.08.009] [Citation(s) in RCA: 358] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 08/25/2010] [Accepted: 08/25/2010] [Indexed: 12/18/2022]
Abstract
This manuscript provides a comprehensive review of the peptide and protein identification process using tandem mass spectrometry (MS/MS) data generated in shotgun proteomic experiments. The commonly used methods for assigning peptide sequences to MS/MS spectra are critically discussed and compared, from basic strategies to advanced multi-stage approaches. A particular attention is paid to the problem of false-positive identifications. Existing statistical approaches for assessing the significance of peptide to spectrum matches are surveyed, ranging from single-spectrum approaches such as expectation values to global error rate estimation procedures such as false discovery rates and posterior probabilities. The importance of using auxiliary discriminant information (mass accuracy, peptide separation coordinates, digestion properties, and etc.) is discussed, and advanced computational approaches for joint modeling of multiple sources of information are presented. This review also includes a detailed analysis of the issues affecting the interpretation of data at the protein level, including the amplification of error rates when going from peptide to protein level, and the ambiguities in inferring the identifies of sample proteins in the presence of shared peptides. Commonly used methods for computing protein-level confidence scores are discussed in detail. The review concludes with a discussion of several outstanding computational issues.
Collapse
|
23
|
Babushok VI, Zenkevich IG. Retention Characteristics of Peptides in RP-LC: Peptide Retention Prediction. Chromatographia 2010. [DOI: 10.1365/s10337-010-1721-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
24
|
Abstract
The current status of de novo sequencing of peptides by MS/MS is reviewed with focus on collision cell MS/MS spectra. The relation between peptide structure and observed fragment ion series is discussed and the exhaustive extraction of sequence information from CID spectra of protonated peptide ions is described. The partial redundancy of the extracted sequence information and a high mass accuracy are recognized as key parameters for dependable de novo sequencing by MS. In addition, the benefits of special techniques enhancing the generation of long uninterrupted fragment ion series for de novo peptide sequencing are highlighted. Among these are terminal (18)O labeling, MS(n) of sodiated peptide ions, N-terminal derivatization, the use of special proteases, and time-delayed fragmentation. The emerging electron transfer dissociation technique and the recent progress of MALDI techniques for intact protein sequencing are covered. Finally, the integration of bioinformatic tools into peptide de novo sequencing is demonstrated.
Collapse
Affiliation(s)
- Joerg Seidler
- Molecular Structure Analysis, German Cancer Research Center, Heidelberg, Germany
| | | | | | | |
Collapse
|
25
|
[Application of peptide retention time in proteome research]. Se Pu 2010; 28:128-34. [PMID: 20556949 DOI: 10.3724/sp.j.1123.2012.00128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has been one of the most popular approaches in proteome analysis. As an independent parameter to mass spectrometry information, peptide retention time has been utilized to facilitate protein identification and quantification. In the field of peptide identification, the prediction of the retention time combined with routine tandem mass spectrometry database searching methods could help improve the confidence of identification. The sensitivity of identification could also be improved by matching peaks with both the accurate mass and retention time in multiple aligned LC-MS runs. Meanwhile, because small changes of liquid chromatography conditions lead to variability in retention times unavoidably, retention time alignment is crucial to label-free quantification. Additionally, post-translational modifications (PTM) could be identified by combining retention time shifts and mass deviation information.
Collapse
|
26
|
Goloborodko AA, Mayerhofer C, Zubarev AR, Tarasova IA, Gorshkov AV, Zubarev RA, Gorshkov MV. Empirical approach to false discovery rate estimation in shotgun proteomics. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2010; 24:454-462. [PMID: 20069687 DOI: 10.1002/rcm.4417] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Estimation of false discovery rate (FDR) for identified peptides is an important step in large-scale proteomic studies. We introduced an empirical approach to the problem that is based on the FDR-like functions of sets of peptide spectral matches (PSMs). These functions have close values for equal-sized sets with the same FDR and depend monotonically on the FDR of a set. We have found three of them, based on three complementary sources of data: chromatography, mass spectrometry, and sequences of identified peptides. Using a calibration on a set of putative correct PSMs these functions were converted into the FDR scale. The approach was tested on a set of approximately 2800 PSMs obtained from rat kidney tissue. The estimates based on all three data sources were rather consistent with each other as well as with one made using the target-decoy strategy.
Collapse
Affiliation(s)
- Anton A Goloborodko
- Institute of Energy Problems of Chemical Physics, Russian Academy of Sciences, Leninskii pr. 38, Bld.2, Moscow 119334, Russia
| | | | | | | | | | | | | |
Collapse
|