1
|
Siraj A, Bouwmeester R, Declercq A, Welp L, Chernev A, Wulf A, Urlaub H, Martens L, Degroeve S, Kohlbacher O, Sachsenberg T. Intensity and retention time prediction improves the rescoring of protein-nucleic acid cross-links. Proteomics 2024; 24:e2300144. [PMID: 38629965 DOI: 10.1002/pmic.202300144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 12/29/2023] [Accepted: 01/05/2024] [Indexed: 04/19/2024]
Abstract
In protein-RNA cross-linking mass spectrometry, UV or chemical cross-linking introduces stable bonds between amino acids and nucleic acids in protein-RNA complexes that are then analyzed and detected in mass spectra. This analytical tool delivers valuable information about RNA-protein interactions and RNA docking sites in proteins, both in vitro and in vivo. The identification of cross-linked peptides with oligonucleotides of different length leads to a combinatorial increase in search space. We demonstrate that the peptide retention time prediction tasks can be transferred to the task of cross-linked peptide retention time prediction using a simple amino acid composition encoding, yielding improved identification rates when the prediction error is included in rescoring. For the more challenging task of including fragment intensity prediction of cross-linked peptides in the rescoring, we obtain, on average, a similar improvement. Further improvement in the encoding and fine-tuning of retention time and intensity prediction models might lead to further gains, and merit further research.
Collapse
Affiliation(s)
- Arslan Siraj
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Biological and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Robbin Bouwmeester
- Department of Biomolecular Medicine, Ghent University, Gent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Gent, Belgium
| | - Arthur Declercq
- Department of Biomolecular Medicine, Ghent University, Gent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Gent, Belgium
| | - Luisa Welp
- Bioanalytical Mass Spectrometry, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- Bioanalytics, Institute of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Aleksandar Chernev
- Bioanalytical Mass Spectrometry, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Alexander Wulf
- Bioanalytical Mass Spectrometry, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Henning Urlaub
- Bioanalytical Mass Spectrometry, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- Bioanalytics, Institute of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Lennart Martens
- Department of Biomolecular Medicine, Ghent University, Gent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Gent, Belgium
| | - Sven Degroeve
- Department of Biomolecular Medicine, Ghent University, Gent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, Gent, Belgium
| | - Oliver Kohlbacher
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Biological and Medical Informatics, University of Tübingen, Tübingen, Germany
| | - Timo Sachsenberg
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
- Institute for Biological and Medical Informatics, University of Tübingen, Tübingen, Germany
| |
Collapse
|
2
|
Lin A, See D, Fondrie WE, Keich U, Noble WS. Target-decoy false discovery rate estimation using Crema. Proteomics 2024; 24:e2300084. [PMID: 38380501 DOI: 10.1002/pmic.202300084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 01/06/2024] [Accepted: 01/16/2024] [Indexed: 02/22/2024]
Abstract
Assigning statistical confidence estimates to discoveries produced by a tandem mass spectrometry proteomics experiment is critical to enabling principled interpretation of the results and assessing the cost/benefit ratio of experimental follow-up. The most common technique for computing such estimates is to use target-decoy competition (TDC), in which observed spectra are searched against a database of real (target) peptides and a database of shuffled or reversed (decoy) peptides. TDC procedures for estimating the false discovery rate (FDR) at a given score threshold have been developed for application at the level of spectra, peptides, or proteins. Although these techniques are relatively straightforward to implement, it is common in the literature to skip over the implementation details or even to make mistakes in how the TDC procedures are applied in practice. Here we present Crema, an open-source Python tool that implements several TDC methods of spectrum-, peptide- and protein-level FDR estimation. Crema is compatible with a variety of existing database search tools and provides a straightforward way to obtain robust FDR estimates.
Collapse
Affiliation(s)
- Andy Lin
- Chemical and Biological Signatures, Pacific Northwest National Laboratory, Seattle, Washington, USA
| | - Donavan See
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, USA
| | | | - Uri Keich
- School of Mathematics and Statistics, University of Sydney, Sydney, Australia
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, USA
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| |
Collapse
|
3
|
Picciani M, Gabriel W, Giurcoiu VG, Shouman O, Hamood F, Lautenbacher L, Jensen CB, Müller J, Kalhor M, Soleymaniniya A, Kuster B, The M, Wilhelm M. Oktoberfest: Open-source spectral library generation and rescoring pipeline based on Prosit. Proteomics 2024; 24:e2300112. [PMID: 37672792 DOI: 10.1002/pmic.202300112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/17/2023] [Accepted: 08/18/2023] [Indexed: 09/08/2023]
Abstract
Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applications, ranging from data-independent acquisition (DIA) data analysis to data-driven rescoring of search engine results. Here, we present Oktoberfest, an open source Python package of our spectral library generation and rescoring pipeline originally only available online via ProteomicsDB. Oktoberfest is largely search engine agnostic and provides access to online peptide property predictions, promoting the adoption of state-of-the-art ML/DL models in proteomics analysis pipelines. We demonstrate its ability to reproduce and even improve our results from previously published rescoring analyses on two distinct use cases. Oktoberfest is freely available on GitHub (https://github.com/wilhelm-lab/oktoberfest) and can easily be installed locally through the cross-platform PyPI Python package.
Collapse
Affiliation(s)
- Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Wassim Gabriel
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Victor-George Giurcoiu
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Omar Shouman
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Firas Hamood
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Ludwig Lautenbacher
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Cecilia Bang Jensen
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Julian Müller
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mostafa Kalhor
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Armin Soleymaniniya
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Matthew The
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| |
Collapse
|
4
|
Adams C, Laukens K, Bittremieux W, Boonen K. Machine learning-based peptide-spectrum match rescoring opens up the immunopeptidome. Proteomics 2024; 24:e2300336. [PMID: 38009585 DOI: 10.1002/pmic.202300336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 11/29/2023]
Abstract
Immunopeptidomics is a key technology in the discovery of targets for immunotherapy and vaccine development. However, identifying immunopeptides remains challenging due to their non-tryptic nature, which results in distinct spectral characteristics. Moreover, the absence of strict digestion rules leads to extensive search spaces, further amplified by the incorporation of somatic mutations, pathogen genomes, unannotated open reading frames, and post-translational modifications. This inflation in search space leads to an increase in random high-scoring matches, resulting in fewer identifications at a given false discovery rate. Peptide-spectrum match rescoring has emerged as a machine learning-based solution to address challenges in mass spectrometry-based immunopeptidomics data analysis. It involves post-processing unfiltered spectrum annotations to better distinguish between correct and incorrect peptide-spectrum matches. Recently, features based on predicted peptidoform properties, including fragment ion intensities, retention time, and collisional cross section, have been used to improve the accuracy and sensitivity of immunopeptide identification. In this review, we describe the diverse bioinformatics pipelines that are currently available for peptide-spectrum match rescoring and discuss how they can be used for the analysis of immunopeptidomics data. Finally, we provide insights into current and future machine learning solutions to boost immunopeptide identification.
Collapse
Affiliation(s)
- Charlotte Adams
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wout Bittremieux
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- ImmuneSpec BV, Niel, Belgium
| |
Collapse
|
5
|
Buur LM, Declercq A, Strobl M, Bouwmeester R, Degroeve S, Martens L, Dorfer V, Gabriels R. MS 2Rescore 3.0 Is a Modular, Flexible, and User-Friendly Platform to Boost Peptide Identifications, as Showcased with MS Amanda 3.0. J Proteome Res 2024. [PMID: 38491990 DOI: 10.1021/acs.jproteome.3c00785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2024]
Abstract
Rescoring of peptide-spectrum matches (PSMs) has emerged as a standard procedure for the analysis of tandem mass spectrometry data. This emphasizes the need for software maintenance and continuous improvement for such algorithms. We introduce MS2Rescore 3.0, a versatile, modular, and user-friendly platform designed to increase peptide identifications. Researchers can install MS2Rescore across various platforms with minimal effort and benefit from a graphical user interface, a modular Python API, and extensive documentation. To showcase this new version, we connected MS2Rescore 3.0 with MS Amanda 3.0, a new release of the well-established search engine, addressing previous limitations on automatic rescoring. Among new features, MS Amanda now contains additional output columns that can be used for rescoring. The full potential of rescoring is best revealed when applied on challenging data sets. We therefore evaluated the performance of these two tools on publicly available single-cell data sets, where the number of PSMs was substantially increased, thereby demonstrating that MS2Rescore offers a powerful solution to boost peptide identifications. MS2Rescore's modular design and user-friendly interface make data-driven rescoring easily accessible, even for inexperienced users. We therefore expect the MS2Rescore to be a valuable tool for the wider proteomics community. MS2Rescore is available at https://github.com/compomics/ms2rescore.
Collapse
Affiliation(s)
- Louise M Buur
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Marina Strobl
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Viktoria Dorfer
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| |
Collapse
|
6
|
Gomez-Zepeda D, Arnold-Schild D, Beyrle J, Declercq A, Gabriels R, Kumm E, Preikschat A, Łącki MK, Hirschler A, Rijal JB, Carapito C, Martens L, Distler U, Schild H, Tenzer S. Thunder-DDA-PASEF enables high-coverage immunopeptidomics and is boosted by MS 2Rescore with MS 2PIP timsTOF fragmentation prediction model. Nat Commun 2024; 15:2288. [PMID: 38480730 PMCID: PMC10937930 DOI: 10.1038/s41467-024-46380-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 02/26/2024] [Indexed: 03/17/2024] Open
Abstract
Human leukocyte antigen (HLA) class I peptide ligands (HLAIps) are key targets for developing vaccines and immunotherapies against infectious pathogens or cancer cells. Identifying HLAIps is challenging due to their high diversity, low abundance, and patient individuality. Here, we develop a highly sensitive method for identifying HLAIps using liquid chromatography-ion mobility-tandem mass spectrometry (LC-IMS-MS/MS). In addition, we train a timsTOF-specific peak intensity MS2PIP model for tryptic and non-tryptic peptides and implement it in MS2Rescore (v3) together with the CCS predictor from ionmob. The optimized method, Thunder-DDA-PASEF, semi-selectively fragments singly and multiply charged HLAIps based on their IMS and m/z. Moreover, the method employs the high sensitivity mode and extended IMS resolution with fewer MS/MS frames (300 ms TIMS ramp, 3 MS/MS frames), doubling the coverage of immunopeptidomics analyses, compared to the proteomics-tailored DDA-PASEF (100 ms TIMS ramp, 10 MS/MS frames). Additionally, rescoring boosts the HLAIps identification by 41.7% to 33%, resulting in 5738 HLAIps from as little as one million JY cell equivalents, and 14,516 HLAIps from 20 million. This enables in-depth profiling of HLAIps from diverse human cell lines and human plasma. Finally, profiling JY and Raji cells transfected to express the SARS-CoV-2 spike protein results in 16 spike HLAIps, thirteen of which have been reported to elicit immune responses in human patients.
Collapse
Affiliation(s)
- David Gomez-Zepeda
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany.
- Helmholtz Institute for Translational Oncology Mainz (HI-TRON Mainz) - A Helmholtz Institute of the DKFZ, Mainz, Germany.
- German Cancer Research Center (DKFZ) Heidelberg, Division 191, Heidelberg, Germany.
| | - Danielle Arnold-Schild
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Julian Beyrle
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
- Helmholtz Institute for Translational Oncology Mainz (HI-TRON Mainz) - A Helmholtz Institute of the DKFZ, Mainz, Germany
- German Cancer Research Center (DKFZ) Heidelberg, Division 191, Heidelberg, Germany
| | - Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Elena Kumm
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Annica Preikschat
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Mateusz Krzysztof Łącki
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Aurélie Hirschler
- BioOrganic Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI - FR2048, Strasbourg, France
| | - Jeewan Babu Rijal
- BioOrganic Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI - FR2048, Strasbourg, France
| | - Christine Carapito
- BioOrganic Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI - FR2048, Strasbourg, France
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Ute Distler
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
- Research Center for Immunotherapy (FZI), University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Hansjörg Schild
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
- Research Center for Immunotherapy (FZI), University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Stefan Tenzer
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany.
- Helmholtz Institute for Translational Oncology Mainz (HI-TRON Mainz) - A Helmholtz Institute of the DKFZ, Mainz, Germany.
- German Cancer Research Center (DKFZ) Heidelberg, Division 191, Heidelberg, Germany.
- Research Center for Immunotherapy (FZI), University Medical Center of the Johannes-Gutenberg University, Mainz, Germany.
| |
Collapse
|
7
|
Strauss MT, Bludau I, Zeng WF, Voytik E, Ammar C, Schessner JP, Ilango R, Gill M, Meier F, Willems S, Mann M. AlphaPept: a modern and open framework for MS-based proteomics. Nat Commun 2024; 15:2168. [PMID: 38461149 PMCID: PMC10924963 DOI: 10.1038/s41467-024-46485-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 02/20/2024] [Indexed: 03/11/2024] Open
Abstract
In common with other omics technologies, mass spectrometry (MS)-based proteomics produces ever-increasing amounts of raw data, making efficient analysis a principal challenge. A plethora of different computational tools can process the MS data to derive peptide and protein identification and quantification. However, during the last years there has been dramatic progress in computer science, including collaboration tools that have transformed research and industry. To leverage these advances, we develop AlphaPept, a Python-based open-source framework for efficient processing of large high-resolution MS data sets. Numba for just-in-time compilation on CPU and GPU achieves hundred-fold speed improvements. AlphaPept uses the Python scientific stack of highly optimized packages, reducing the code base to domain-specific tasks while accessing the latest advances. We provide an easy on-ramp for community contributions through the concept of literate programming, implemented in Jupyter Notebooks. Large datasets can rapidly be processed as shown by the analysis of hundreds of proteomes in minutes per file, many-fold faster than acquisition. AlphaPept can be used to build automated processing pipelines with web-serving functionality and compatibility with downstream analysis tools. It provides easy access via one-click installation, a modular Python library for advanced users, and via an open GitHub repository for developers.
Collapse
Affiliation(s)
- Maximilian T Strauss
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
| | - Isabell Bludau
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Wen-Feng Zeng
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Eugenia Voytik
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Constantin Ammar
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Julia P Schessner
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | | | | | - Florian Meier
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
- Functional Proteomics, Jena University Hospital, Jena, Germany
| | - Sander Willems
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Matthias Mann
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
- NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
8
|
Gabriel W, Picciani M, The M, Wilhelm M. Deep Learning-Assisted Analysis of Immunopeptidomics Data. Methods Mol Biol 2024; 2758:457-483. [PMID: 38549030 DOI: 10.1007/978-1-0716-3646-6_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
Liquid chromatography-coupled mass spectrometry (LC-MS/MS) is the primary method to obtain direct evidence for the presentation of disease- or patient-specific human leukocyte antigen (HLA). However, compared to the analysis of tryptic peptides in proteomics, the analysis of HLA peptides still poses computational and statistical challenges. Recently, fragment ion intensity-based matching scores assessing the similarity between predicted and observed spectra were shown to substantially increase the number of confidently identified peptides, particularly in use cases where non-tryptic peptides are analyzed. In this chapter, we describe in detail three procedures on how to benefit from state-of-the-art deep learning models to analyze and validate single spectra, single measurements, and multiple measurements in mass spectrometry-based immunopeptidomics. For this, we explain how to use the Universal Spectrum Explorer (USE), online Oktoberfest, and offline Oktoberfest. For intensity-based scoring, Oktoberfest uses fragment ion intensity and retention time predictions from the deep learning framework Prosit, a deep neural network trained on a very large number of synthetic peptides and tandem mass spectra generated within the ProteomeTools project. The examples shown highlight how deep learning-assisted analysis can increase the number of identified HLA peptides, facilitate the discovery of confidently identified neo-epitopes, or provide assistance in the assessment of the presence of cryptic peptides, such as spliced peptides.
Collapse
Affiliation(s)
- Wassim Gabriel
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Matthew The
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
9
|
Jha A, Bohaczuk SC, Mao Y, Ranchalis J, Mallory BJ, Min AT, Hamm MO, Swanson E, Dubocanin D, Finkbeiner C, Li T, Whittington D, Noble WS, Stergachis AB, Vollger MR. DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools. bioRxiv 2023:2023.04.20.537673. [PMID: 37131601 PMCID: PMC10153250 DOI: 10.1101/2023.04.20.537673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Long-read DNA sequencing has recently emerged as a powerful tool for studying both genetic and epigenetic architectures at single-molecule and single-nucleotide resolution. Long-read epigenetic studies encompass both the direct identification of native cytosine methylation as well as the identification of exogenously placed DNA N6-methyladenine (DNA-m6A). However, detecting DNA-m6A modifications using single-molecule sequencing, as well as co-processing single-molecule genetic and epigenetic architectures, is limited by computational demands and a lack of supporting tools. Here, we introduce fibertools, a state-of-the-art toolkit that features a semi-supervised convolutional neural network for fast and accurate identification of m6A-marked bases using PacBio single-molecule long-read sequencing, as well as the co-processing of long-read genetic and epigenetic data produced using either PacBio or Oxford Nanopore sequencing platforms. We demonstrate accurate DNA-m6A identification (>90% precision and recall) along >20 kilobase long DNA molecules with a ~1,000-fold improvement in speed. In addition, we demonstrate that fibertools can readily integrate genetic and epigenetic data at single-molecule resolution, including the seamless conversion between molecular and reference coordinate systems, allowing for accurate genetic and epigenetic analyses of long-read data within structurally and somatically variable genomic regions.
Collapse
Affiliation(s)
- Anupama Jha
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Stephanie C. Bohaczuk
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | - Yizi Mao
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | - Jane Ranchalis
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Alan T. Min
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Morgan O. Hamm
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Elliott Swanson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Danilo Dubocanin
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Connor Finkbeiner
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Tony Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Dale Whittington
- Department of Medical Chemistry, University of Washington, Seattle, WA, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Andrew B. Stergachis
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Mitchell R. Vollger
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| |
Collapse
|
10
|
Abstract
Single-cell proteomics by mass spectrometry (MS) allows quantifying proteins with high specificity and sensitivity. To increase its throughput, we developed nPOP, a method for parallel preparation of thousands of single cells in nanoliter volume droplets deposited on glass slides. Here, we describe its protocol with emphasis on its flexibility to prepare samples for different multiplexed MS methods. An implementation with plexDIA demonstrates accurate quantification of about 3,000 - 3,700 proteins per human cell. The protocol is implemented on the CellenONE instrument and uses readily available consumables, which should facilitate broad adoption. nPOP can be applied to all samples that can be processed to a single-cell suspension. It takes 1 or 2 days to prepare over 3,000 single cells. We provide metrics and software for quality control that can support the robust scaling of nPOP to higher plex reagents for achieving reliable high-throughput single-cell protein analysis.
Collapse
Affiliation(s)
- Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA 02115, USA
| | - Luke Koury
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA 02115, USA
| | | | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA 02115, USA
- Parallel Squared Technology Institute, Watertown, MA 02472, USA
| |
Collapse
|
11
|
Lazear MR. Sage: An Open-Source Tool for Fast Proteomics Searching and Quantification at Scale. J Proteome Res 2023; 22:3652-3659. [PMID: 37819886 DOI: 10.1021/acs.jproteome.3c00486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
The growing complexity and volume of proteomics data necessitate the development of efficient software tools for peptide identification and quantification from mass spectra. Given their central role in proteomics, it is imperative that these tools are auditable and extensible─requirements that are best fulfilled by open-source and permissively licensed software. This work presents Sage, a high-performance, open-source, and freely available proteomics pipeline. Scalable and cloud-ready, Sage matches the performance of state-of-the-art software tools while running an order of magnitude faster.
Collapse
Affiliation(s)
- Michael R Lazear
- Belharra Therapeutics, 3985 Sorrento Valley Boulevard Suite C, San Diego, California 92121, United States
| |
Collapse
|
12
|
Sarnowski C, Götze M, Leitner A. RNxQuest: An Extension to the xQuest Pipeline Enabling Analysis of Protein-RNA Cross-Linking/Mass Spectrometry Data. J Proteome Res 2023; 22:3368-3382. [PMID: 37669508 PMCID: PMC10563164 DOI: 10.1021/acs.jproteome.3c00341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Indexed: 09/07/2023]
Abstract
Cross-linking and mass spectrometry (XL-MS) workflows are increasingly popular techniques for generating low-resolution structural information about interacting biomolecules. xQuest is an established software package for analysis of protein-protein XL-MS data, supporting stable isotope-labeled cross-linking reagents. Resultant paired peaks in mass spectra aid sensitivity and specificity of data analysis. The recently developed cross-linking of isotope-labeled RNA and mass spectrometry (CLIR-MS) approach extends the XL-MS concept to protein-RNA interactions, also employing isotope-labeled cross-link (XL) species to facilitate data analysis. Data from CLIR-MS experiments are broadly compatible with core xQuest functionality, but the required analysis approach for this novel data type presents several technical challenges not optimally served by the original xQuest package. Here we introduce RNxQuest, a Python package extension for xQuest, which automates the analysis approach required for CLIR-MS data, providing bespoke, state-of-the-art processing and visualization functionality for this novel data type. Using functions included with RNxQuest, we evaluate three false discovery rate control approaches for CLIR-MS data. We demonstrate the versatility of the RNxQuest-enabled data analysis pipeline by also reanalyzing published protein-RNA XL-MS data sets that lack isotope-labeled RNA. This study demonstrates that RNxQuest provides a sensitive and specific data analysis pipeline for detection of isotope-labeled XLs in protein-RNA XL-MS experiments.
Collapse
Affiliation(s)
- Chris
P. Sarnowski
- Institute
of Molecular Systems Biology, Department of Biology, ETH Zürich, 8093 Zurich, Switzerland
- Systems
Biology PhD Program, University of Zürich
and ETH Zürich, 8093 Zurich, Switzerland
| | - Michael Götze
- Institute
of Molecular Systems Biology, Department of Biology, ETH Zürich, 8093 Zurich, Switzerland
| | - Alexander Leitner
- Institute
of Molecular Systems Biology, Department of Biology, ETH Zürich, 8093 Zurich, Switzerland
| |
Collapse
|
13
|
Zhao N, Kabotyanski EB, Saltzman AB, Malovannaya A, Yuan X, Reineke LC, Lieu N, Gao Y, Pedroza DA, Calderon SJ, Smith AJ, Hamor C, Safari K, Savage S, Zhang B, Zhou J, Solis LM, Hilsenbeck SG, Fan C, Perou CM, Rosen JM. Targeting EIF4A triggers an interferon response to synergize with chemotherapy and suppress triple-negative breast cancer. bioRxiv 2023:2023.09.28.559973. [PMID: 37808840 PMCID: PMC10557675 DOI: 10.1101/2023.09.28.559973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Protein synthesis is frequently dysregulated in cancer and selective inhibition of mRNA translation represents an attractive cancer therapy. Here, we show that therapeutically targeting the RNA helicase eIF4A by Zotatifin, the first-in-class eIF4A inhibitor, exerts pleiotropic effects on both tumor cells and the tumor immune microenvironment in a diverse cohort of syngeneic triple-negative breast cancer (TNBC) mouse models. Zotatifin not only suppresses tumor cell proliferation but also directly repolarizes macrophages towards an M1-like phenotype and inhibits neutrophil infiltration, which sensitizes tumors to immune checkpoint blockade. Mechanistic studies revealed that Zotatifin reprograms the tumor translational landscape, inhibits the translation of Sox4 and Fgfr1, and induces an interferon response uniformly across models. The induction of an interferon response is partially due to the inhibition of Sox4 translation by Zotatifin. A similar induction of interferon-stimulated genes was observed in breast cancer patient biopsies following Zotatifin treatment. Surprisingly, Zotatifin significantly synergizes with carboplatin to trigger DNA damage and an even heightened interferon response resulting in T cell-dependent tumor suppression. These studies identified a vulnerability of eIF4A in TNBC, potential pharmacodynamic biomarkers for Zotatifin, and provide a rationale for new combination regimens comprising Zotatifin and chemotherapy or immunotherapy as treatments for TNBC.
Collapse
Affiliation(s)
- Na Zhao
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Elena B. Kabotyanski
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
| | | | - Anna Malovannaya
- Mass Spectrometry Proteomics Core, Baylor College of Medicine, Houston, Texas, USA
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Xueying Yuan
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Lucas C. Reineke
- Department of Neuroscience, Baylor College of Medicine, Houston, Texas, USA
| | - Nadia Lieu
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Yang Gao
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Diego A Pedroza
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Sebastian J Calderon
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Alex J Smith
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Clark Hamor
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Kazem Safari
- Texas A&M Health Science Center, Houston, Texas, USA
| | - Sara Savage
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA
| | - Jianling Zhou
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Luisa M. Solis
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA
| | - Susan G. Hilsenbeck
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA
| | - Cheng Fan
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Charles M. Perou
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Jeffrey M. Rosen
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|
14
|
Yang KL, Yu F, Teo GC, Li K, Demichev V, Ralser M, Nesvizhskii AI. MSBooster: improving peptide identification rates using deep learning-based features. Nat Commun 2023; 14:4539. [PMID: 37500632 PMCID: PMC10374903 DOI: 10.1038/s41467-023-40129-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 07/06/2023] [Indexed: 07/29/2023] Open
Abstract
Peptide identification in liquid chromatography-tandem mass spectrometry (LC-MS/MS) experiments relies on computational algorithms for matching acquired MS/MS spectra against sequences of candidate peptides using database search tools, such as MSFragger. Here, we present a new tool, MSBooster, for rescoring peptide-to-spectrum matches using additional features incorporating deep learning-based predictions of peptide properties, such as LC retention time, ion mobility, and MS/MS spectra. We demonstrate the utility of MSBooster, in tandem with MSFragger and Percolator, in several different workflows, including nonspecific searches (immunopeptidomics), direct identification of peptides from data independent acquisition data, single-cell proteomics, and data generated on an ion mobility separation-enabled timsTOF MS platform. MSBooster is fast, robust, and fully integrated into the widely used FragPipe computational platform.
Collapse
Affiliation(s)
- Kevin L Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| | - Guo Ci Teo
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Kai Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Vadim Demichev
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Markus Ralser
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
- Nuffield Department of Medicine, The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
15
|
Sutton C, Nozawa K, Kent K, Saltzman A, Leng M, Nagarajan S, Malovannaya A, Ikawa M, Garcia TX, Matzuk MM. Molecular dissection and testing of PRSS37 function through LC-MS/MS and the generation of a PRSS37 humanized mouse model. Sci Rep 2023; 13:11374. [PMID: 37452050 PMCID: PMC10349139 DOI: 10.1038/s41598-023-37700-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Accepted: 06/26/2023] [Indexed: 07/18/2023] Open
Abstract
The quest for a non-hormonal male contraceptive pill for men still exists. Serine protease 37 (PRSS37) is a sperm-specific protein that when ablated in mice renders them sterile. In this study we sought to examine the molecular sequelae of PRSS37 loss to better understand its molecular function, and to determine whether human PRSS37 could rescue the sterility phenotype of knockout (KO) mice, allowing for a more appropriate model for drug molecule testing. To this end, we used CRISPR-EZ to create mice lacking the entire coding region of Prss37, used pronuclear injection to create transgenic mice expressing human PRSS37, intercrossed these lines to generate humanized mice, and performed LC-MS/MS of KO and control tissues to identify proteomic perturbances that could attribute a molecular function to PRSS37. We found that our newly generated Prss37 KO mouse line is sterile, our human transgene rescues the sterility phenotype of KO mice, and our proteomics data not only yields novel insight into the proteome as it evolves along the male reproductive tract, but also demonstrates the proteins significantly influenced by PRSS37 loss. In summary, we report vast biological insight including insight into PRSS37 function and the generation of a novel tool for contraceptive evaluation.
Collapse
Affiliation(s)
- Courtney Sutton
- Center for Drug Discovery, Baylor College of Medicine, Houston, TX, USA
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, USA
| | - Kaori Nozawa
- Center for Drug Discovery, Baylor College of Medicine, Houston, TX, USA
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, USA
| | - Katarzyna Kent
- Center for Drug Discovery, Baylor College of Medicine, Houston, TX, USA
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, USA
| | - Alexander Saltzman
- Mass Spectrometry Proteomics Core, Baylor College of Medicine, Houston, TX, USA
| | - Mei Leng
- Mass Spectrometry Proteomics Core, Baylor College of Medicine, Houston, TX, USA
| | - Sureshbabu Nagarajan
- Center for Drug Discovery, Baylor College of Medicine, Houston, TX, USA
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, USA
| | - Anna Malovannaya
- Mass Spectrometry Proteomics Core, Baylor College of Medicine, Houston, TX, USA
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Masahito Ikawa
- Department of Experimental Genome Research, Research Institute for Microbial Diseases, Osaka University, Suita, Osaka, Japan
- The Institute of Medical Science, The University of Tokyo, Minato-Ku, Tokyo, Japan
| | - Thomas X Garcia
- Center for Drug Discovery, Baylor College of Medicine, Houston, TX, USA
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, USA
- Scott Department of Urology, Baylor College of Medicine, Houston, TX, USA
| | - Martin M Matzuk
- Center for Drug Discovery, Baylor College of Medicine, Houston, TX, USA.
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
16
|
Gatto L, Aebersold R, Cox J, Demichev V, Derks J, Emmott E, Franks AM, Ivanov AR, Kelly RT, Khoury L, Leduc A, MacCoss MJ, Nemes P, Perlman DH, Petelski AA, Rose CM, Schoof EM, Van Eyk J, Vanderaa C, Yates JR, Slavov N. Initial recommendations for performing, benchmarking and reporting single-cell proteomics experiments. Nat Methods 2023; 20:375-386. [PMID: 36864200 PMCID: PMC10130941 DOI: 10.1038/s41592-023-01785-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 01/24/2023] [Indexed: 03/04/2023]
Abstract
Analyzing proteins from single cells by tandem mass spectrometry (MS) has recently become technically feasible. While such analysis has the potential to accurately quantify thousands of proteins across thousands of single cells, the accuracy and reproducibility of the results may be undermined by numerous factors affecting experimental design, sample preparation, data acquisition and data analysis. We expect that broadly accepted community guidelines and standardized metrics will enhance rigor, data quality and alignment between laboratories. Here we propose best practices, quality controls and data-reporting recommendations to assist in the broad adoption of reliable quantitative workflows for single-cell proteomics. Resources and discussion forums are available at https://single-cell.net/guidelines .
Collapse
Affiliation(s)
- Laurent Gatto
- Computational Biology and Bioinformatics Unit, de Duve Institute, Université Catholique de Louvain, Brussels, Belgium
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Juergen Cox
- Max Planck Institute of Biochemistry, Martinsried, Germany
| | | | - Jason Derks
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA
| | - Edward Emmott
- Centre for Proteome Research, Department of Biochemistry and Systems Biology, University of Liverpool, Liverpool, UK
| | - Alexander M Franks
- Department of Statistics and Applied Probability, University of California Santa Barbara, Santa Barbara, CA, USA
| | - Alexander R Ivanov
- Department of Chemistry and Chemical Biology, Barnett Institute of Chemical and Biological Analysis, Northeastern University, Boston, MA, USA
| | - Ryan T Kelly
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT, USA
| | - Luke Khoury
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA
| | - Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA
| | | | - Peter Nemes
- Department of Chemistry and Biochemistry, University of Maryland, College Park, MD, USA
| | - David H Perlman
- Merck Exploratory Science Center, Merck Sharp & Dohme Corp., Cambridge, MA, USA
| | - Aleksandra A Petelski
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA
- Parallel Squared Technology Institute, Watertown, MA, USA
| | - Christopher M Rose
- Department of Microchemistry, Proteomics and Lipidomics, Genentech Inc., South San Francisco, CA, USA
| | - Erwin M Schoof
- Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark
| | | | - Christophe Vanderaa
- Computational Biology and Bioinformatics Unit, de Duve Institute, Université Catholique de Louvain, Brussels, Belgium
| | - John R Yates
- Departments of Molecular Medicine and Neurobiology, the Scripps Research Institute, La Jolla, CA, USA
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single-Cell Proteomics Center and Barnett Institute, Northeastern University, Boston, MA, USA.
- Parallel Squared Technology Institute, Watertown, MA, USA.
| |
Collapse
|
17
|
Arab I, Fondrie WE, Laukens K, Bittremieux W. Semisupervised Machine Learning for Sensitive Open Modification Spectral Library Searching. J Proteome Res 2023; 22:585-593. [PMID: 36688569 DOI: 10.1021/acs.jproteome.2c00616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
A key analysis task in mass spectrometry proteomics is matching the acquired tandem mass spectra to their originating peptides by sequence database searching or spectral library searching. Machine learning is an increasingly popular postprocessing approach to maximize the number of confident spectrum identifications that can be obtained at a given false discovery rate threshold. Here, we have integrated semisupervised machine learning in the ANN-SoLo tool, an efficient spectral library search engine that is optimized for open modification searching to identify peptides with any type of post-translational modification. We show that machine learning rescoring boosts the number of spectra that can be identified for both standard searching and open searching, and we provide insights into relevant spectrum characteristics harnessed by the machine learning model. The semisupervised machine learning functionality has now been fully integrated into ANN-SoLo, which is available as open source under the permissive Apache 2.0 license on GitHub at https://github.com/bittremieux/ANN-SoLo.
Collapse
Affiliation(s)
- Issar Arab
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | | | - Kris Laukens
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium.,Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| |
Collapse
|
18
|
Barente AS, Villén J. A Python Package for the Localization of Protein Modifications in Mass Spectrometry Data. J Proteome Res 2023; 22:501-507. [PMID: 36315500 PMCID: PMC9898206 DOI: 10.1021/acs.jproteome.2c00194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Determining the correct localization of post-translational modifications (PTMs) on peptides aids in interpreting their effect on protein function. While most algorithms for this task are available as standalone applications or incorporated into software suites, improving their versatility through access from popular scripting languages facilitates experimentation and incorporation into novel workflows. Here we describe pyAscore, an efficient and versatile implementation of the Ascore algorithm in Python for scoring the localization of user defined PTMs in data dependent mass spectrometry. pyAscore can be used from the command line or imported into Python scripts and accepts standard file formats from popular software tools used in bottom-up proteomics. Access to internal objects for scoring and working with modified peptides adds to the toolbox for working with PTMs in Python. pyAscore is available as an open source package for Python 3.6+ on all major operating systems and can be found at pyascore.readthedocs.io.
Collapse
Affiliation(s)
- Anthony S. Barente
- Department of Genome Sciences, University of Washington Seattle, Washington 98195, USA
| | - Judit Villén
- Department of Genome Sciences, University of Washington Seattle, Washington 98195, USA
| |
Collapse
|
19
|
Vanderaa C, Gatto L. The Current State of Single-Cell Proteomics Data Analysis. Curr Protoc 2023; 3:e658. [PMID: 36633424 DOI: 10.1002/cpz1.658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Sound data analysis is essential to retrieve meaningful biological information from single-cell proteomics experiments. This analysis is carried out by computational methods that are assembled into workflows, and their implementations influence the conclusions that can be drawn from the data. In this work, we explore and compare the computational workflows that have been used over the last four years and identify a profound lack of consensus on how to analyze single-cell proteomics data. We highlight the need for benchmarking of computational workflows and standardization of computational tools and data, as well as carefully designed experiments. Finally, we cover the current standardization efforts that aim to fill the gap, list the remaining missing pieces, and conclude with lessons learned from the replication of published single-cell proteomics analyses. © 2023 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Christophe Vanderaa
- Computational Biology and Bioinformatics Unit (CBIO), de Duve Institute, Université catholique de Louvain, Belgium
| | - Laurent Gatto
- Computational Biology and Bioinformatics Unit (CBIO), de Duve Institute, Université catholique de Louvain, Belgium
| |
Collapse
|
20
|
Smith IR, Eng JK, Barente AS, Hogrebe A, Llovet A, Rodriguez-Mias RA, Villén J. Coisolation of Peptide Pairs for Peptide Identification and MS/MS-Based Quantification. Anal Chem 2022; 94:15198-15206. [PMID: 36306373 PMCID: PMC9851627 DOI: 10.1021/acs.analchem.2c01711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Stable-isotope labeling with amino acids in cell culture (SILAC)-based metabolic labeling is a widely adopted proteomics approach that enables quantitative comparisons among a variety of experimental conditions. Despite its quantitative capacity, SILAC experiments analyzed with data-dependent acquisition (DDA) do not fully leverage peptide pair information for identification and suffer from undersampling compared to label-free proteomic experiments. Herein, we developed a DDA strategy that coisolates and fragments SILAC peptide pairs and uses y-ions for their relative quantification. To facilitate the analysis of this type of data, we adapted the Comet sequence database search engine to make use of SILAC peptide paired fragments and developed a tool to annotate and quantify MS/MS spectra of coisolated SILAC pairs. This peptide pair coisolation approach generally improved expectation scores compared to the traditional DDA approach. Fragment ion quantification performed similarly well to precursor quantification in the MS1 and achieved more quantifications. Lastly, our method enables reliable MS/MS quantification of SILAC proteome mixtures with overlapping isotopic distributions. This study shows the feasibility of the coisolation approach. Coupling this approach with intelligent acquisition strategies has the potential to improve SILAC peptide sampling and quantification.
Collapse
Affiliation(s)
- Ian R Smith
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Jimmy K Eng
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Anthony S Barente
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Alexander Hogrebe
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Ariadna Llovet
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Ricard A Rodriguez-Mias
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Judit Villén
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
21
|
Hogrebe A, Hess KN, Llovet A, Ramos YJ, Barente AS, Hernandez-Portugues D, Smith IR, Rodríguez-Mias RA, Villén J. IsobaricQuant enables cross-platform quantification, visualization, and filtering of isobarically-labeled peptides. Proteomics 2022; 22:e2100253. [PMID: 35776068 PMCID: PMC9894126 DOI: 10.1002/pmic.202100253] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 06/21/2022] [Accepted: 06/27/2022] [Indexed: 02/04/2023]
Abstract
In mass spectrometry (MS)-based quantitative proteomics, labeling with isobaric mass tags such as iTRAQ and TMT can substantially improve sample throughput and reduce peptide missing values. Nonetheless, the quantification of labeled peptides tends to suffer from reduced accuracy due to the co-isolation of co-eluting precursors of similar mass-to-charge. Acquisition approaches such as multistage MS3 or ion mobility separation address this problem, yet are difficult to audit and limited to expensive instrumentation. Here we introduce IsobaricQuant, an open-source software tool for quantification, visualization, and filtering of peptides labeled with isobaric mass tags, with specific focus on precursor interference. IsobaricQuant is compatible with MS2 and MS3 acquisition strategies, has a viewer that allows assessing interference, and provides several scores to aid the filtering of scans with compression. We demonstrate that IsobaricQuant quantifications are accurate by comparing it with commonly used software. We further show that its QC scores can successfully filter out scans with reduced quantitative accuracy at MS2 and MS3 levels, removing inaccurate peptide quantifications and decreasing protein CVs. Finally, we apply IsobaricQuant to a PISA dataset and show that QC scores improve the sensitivity of the identification of protein targets of a kinase inhibitor. IsobaricQuant is available at https://github.com/Villen-Lab/isobaricquant.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Ian R. Smith
- Department of Genome Sciences, University of Washington
| | | | - Judit Villén
- Department of Genome Sciences, University of Washington
| |
Collapse
|
22
|
Wang B, Wang Y, Chen Y, Gao M, Ren J, Guo Y, Situ C, Qi Y, Zhu H, Li Y, Guo X. DeepSCP: utilizing deep learning to boost single-cell proteome coverage. Brief Bioinform 2022; 23:6598882. [DOI: 10.1093/bib/bbac214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/20/2022] [Accepted: 05/06/2022] [Indexed: 11/12/2022] Open
Abstract
Abstract
Multiplexed single-cell proteomes (SCPs) quantification by mass spectrometry greatly improves the SCP coverage. However, it still suffers from a low number of protein identifications and there is much room to boost proteins identification by computational methods. In this study, we present a novel framework DeepSCP, utilizing deep learning to boost SCP coverage. DeepSCP constructs a series of features of peptide-spectrum matches (PSMs) by predicting the retention time based on the multiple SCP sample sets and fragment ion intensities based on deep learning, and predicts PSM labels with an optimized-ensemble learning model. Evaluation of DeepSCP on public and in-house SCP datasets showed superior performances compared with other state-of-the-art methods. DeepSCP identified more confident peptides and proteins by controlling q-value at 0.01 using target–decoy competition method. As a convenient and low-cost computing framework, DeepSCP will help boost single-cell proteome identification and facilitate the future development and application of single-cell proteomics.
Collapse
Affiliation(s)
- Bing Wang
- School of Medicine , Southeast University, Nanjing 210009 , China
- Department of Histology and Embryology , State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166 , China
| | - Yue Wang
- Department of Histology and Embryology , State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166 , China
| | - Yu Chen
- Department of Histology and Embryology , State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166 , China
| | - Mengmeng Gao
- Department of Histology and Embryology , State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166 , China
| | - Jie Ren
- Department of Histology and Embryology , State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166 , China
| | - Yueshuai Guo
- Department of Histology and Embryology , State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166 , China
| | - Chenghao Situ
- Department of Histology and Embryology , State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166 , China
| | - Yaling Qi
- Department of Histology and Embryology , State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166 , China
| | - Hui Zhu
- Department of Clinical Laboratory , Sir Run Run Hospital, Nanjing Medical University, Nanjing 211166 , China
| | - Yan Li
- School of Medicine , Southeast University, Nanjing 210009 , China
- Department of Histology and Embryology , State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166 , China
| | - Xuejiang Guo
- Department of Histology and Embryology , State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing 211166 , China
| |
Collapse
|
23
|
Miller RM, Jordan BT, Mehlferber MM, Jeffery ED, Chatzipantsiou C, Kaur S, Millikin RJ, Dai Y, Tiberi S, Castaldi PJ, Shortreed MR, Luckey CJ, Conesa A, Smith LM, Deslattes Mays A, Sheynkman GM. Enhanced protein isoform characterization through long-read proteogenomics. Genome Biol 2022; 23:69. [PMID: 35241129 PMCID: PMC8892804 DOI: 10.1186/s13059-022-02624-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 02/02/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms. RESULTS We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. CONCLUSIONS Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.
Collapse
Affiliation(s)
- Rachel M. Miller
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Ben T. Jordan
- grid.27755.320000 0000 9136 933XDepartment of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA USA
| | - Madison M. Mehlferber
- grid.27755.320000 0000 9136 933XDepartment of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA USA ,grid.27755.320000 0000 9136 933XDepartment of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA USA
| | - Erin D. Jeffery
- grid.27755.320000 0000 9136 933XDepartment of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA USA
| | | | - Simi Kaur
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Robert J. Millikin
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Yunxiang Dai
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Simone Tiberi
- grid.7400.30000 0004 1937 0650Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland ,grid.7400.30000 0004 1937 0650Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Peter J. Castaldi
- grid.62560.370000 0004 0378 8294Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA USA ,grid.62560.370000 0004 0378 8294Division of General Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA USA
| | - Michael R. Shortreed
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Chance John Luckey
- grid.27755.320000 0000 9136 933XDepartment of Pathology, University of Virginia, Charlottesville, VA USA
| | - Ana Conesa
- grid.4711.30000 0001 2183 4846Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain ,grid.15276.370000 0004 1936 8091Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL USA
| | - Lloyd M. Smith
- grid.14003.360000 0001 2167 3675Department of Chemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Anne Deslattes Mays
- grid.420089.70000 0000 9635 8082 Office of Data Science and Sharing, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Rockville, MD USA
| | - Gloria M. Sheynkman
- grid.27755.320000 0000 9136 933XDepartment of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA USA ,grid.27755.320000 0000 9136 933XCenter for Public Health Genomics, University of Virginia, Charlottesville, VA USA ,grid.27755.320000 0000 9136 933XUVA Cancer Center, University of Virginia, Charlottesville, VA USA
| |
Collapse
|
24
|
Petelski AA, Emmott E, Leduc A, Huffman RG, Specht H, Perlman DH, Slavov N. Multiplexed single-cell proteomics using SCoPE2. Nat Protoc 2021; 16:5398-5425. [PMID: 34716448 PMCID: PMC8643348 DOI: 10.1038/s41596-021-00616-z] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 08/12/2021] [Indexed: 11/09/2022]
Abstract
Many biological systems are composed of diverse single cells. This diversity necessitates functional and molecular single-cell analysis. Single-cell protein analysis has long relied on affinity reagents, but emerging mass-spectrometry methods (either label-free or multiplexed) have enabled quantifying >1,000 proteins per cell while simultaneously increasing the specificity of protein quantification. Here we describe the Single Cell ProtEomics (SCoPE2) protocol, which uses an isobaric carrier to enhance peptide sequence identification. Single cells are isolated by FACS or CellenONE into multiwell plates and lysed by Minimal ProteOmic sample Preparation (mPOP), and their peptides labeled by isobaric mass tags (TMT or TMTpro) for multiplexed analysis. SCoPE2 affords a cost-effective single-cell protein quantification that can be fully automated using widely available equipment and scaled to thousands of single cells. SCoPE2 uses inexpensive reagents and is applicable to any sample that can be processed to a single-cell suspension. The SCoPE2 workflow allows analyzing ~200 single cells per 24 h using only standard commercial equipment. We emphasize experimental steps and benchmarks required for achieving quantitative protein analysis.
Collapse
Affiliation(s)
- Aleksandra A Petelski
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Barnett Institute, Northeastern University, Boston, MA, USA
| | - Edward Emmott
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Barnett Institute, Northeastern University, Boston, MA, USA
- Centre for Proteome Research, Department of Biochemistry & Systems Biology, University of Liverpool, Liverpool, UK
| | - Andrew Leduc
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Barnett Institute, Northeastern University, Boston, MA, USA
| | - R Gray Huffman
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Barnett Institute, Northeastern University, Boston, MA, USA
| | - Harrison Specht
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Barnett Institute, Northeastern University, Boston, MA, USA
| | - David H Perlman
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Merck Exploratory Sciences Center, Merck Sharp & Dohme Corp., Cambridge, MA, USA
| | - Nikolai Slavov
- Department of Bioengineering, Northeastern University, Boston, MA, USA.
- Barnett Institute, Northeastern University, Boston, MA, USA.
- Department of Biology, Northeastern University, Boston, MA, USA.
| |
Collapse
|