1
|
Willems P, Thery F, Van Moortel L, De Meyer M, Staes A, Gul A, Kovalchuke L, Declercq A, Devreese R, Bouwmeester R, Gabriels R, Martens L, Impens F. Maximizing Immunopeptidomics-Based Bacterial Epitope Discovery by Multiple Search Engines and Rescoring. J Proteome Res 2025; 24:2141-2151. [PMID: 40080147 PMCID: PMC11976845 DOI: 10.1021/acs.jproteome.4c00864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Revised: 02/12/2025] [Accepted: 02/26/2025] [Indexed: 03/15/2025]
Abstract
Mass spectrometry-based discovery of bacterial immunopeptides presented by infected cells allows untargeted discovery of bacterial antigens that can serve as vaccine candidates. However, reliable identification of bacterial epitopes is challenged by their extremely low abundance. Here, we describe an optimized bioinformatic framework to enhance the confident identification of bacterial immunopeptides. Immunopeptidomics data of cell cultures infected with Listeria monocytogenes were searched by four different search engines, PEAKS, Comet, Sage and MSFragger, followed by data-driven rescoring with MS2Rescore. Compared with individual search engine results, this integrated workflow boosted immunopeptide identification by an average of 27% and led to the high-confidence detection of 18 additional bacterial peptides (+27%) matching 15 different Listeria proteins (+36%). Despite the strong agreement between the search engines, a small number of spectra (<1%) had ambiguous matches to multiple peptides and were excluded to ensure high-confidence identifications. Finally, we demonstrate our workflow with sensitive timsTOF SCP data acquisition and find that rescoring, now with inclusion of ion mobility features, identifies 76% more peptides compared to Q Exactive HF acquisition. Together, our results demonstrate how integration of multiple search engine results along with data-driven rescoring maximizes immunopeptide identification, boosting the detection of high-confidence bacterial epitopes for vaccine development.
Collapse
Affiliation(s)
- Patrick Willems
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
- VIB-UGent
Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department
of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Fabien Thery
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Laura Van Moortel
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Margaux De Meyer
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - An Staes
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
- VIB
Proteomics Core, VIB, 9052 Ghent, Belgium
| | - Adillah Gul
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Lyudmila Kovalchuke
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Arthur Declercq
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Robbe Devreese
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
- BioOrganic
Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of
Strasbourg, CNRS, ProFI FR2048, Strasbourg, France
| | - Francis Impens
- VIB-UGent
Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department
of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
- VIB-UGent
Center for Plant Systems Biology, VIB, 9052 Ghent, Belgium
- VIB
Proteomics Core, VIB, 9052 Ghent, Belgium
| |
Collapse
|
2
|
Castaño JD, Beaudry F. Comparative Analysis of Data-Driven Rescoring Platforms for Improved Peptide Identification in HeLa Digest Samples. Proteomics 2025; 25:e202400225. [PMID: 39895169 PMCID: PMC11962579 DOI: 10.1002/pmic.202400225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Revised: 09/16/2024] [Accepted: 01/21/2025] [Indexed: 02/04/2025]
Abstract
Mass spectrometry is a critical tool to understand complex changes in biological processes. Despite significant advances in search engine technology, many spectra remain unassigned. This research evaluates the performance of three rescoring platforms, Oktoberfest, MS2Rescore, and inSPIRE, using MaxQuant output. The results indicated a substantial increase in identifications at the peptide level (40%-53%) and PSM level (64%-67%). However, some peptides were lost due to limitations in processing posttranslational modifications (PTMs)-with up to 75% of lost peptides exhibiting PTMs. Each platform displayed distinct strengths and weaknesses. For instance, inSPIRE performed best in terms of peptide identifications and unique peptides, while MS2Rescore performed better for PSMs at higher FDR values. Differences in platform performance stemmed from different sources: original search engine feature selection, type of ion series predicted, retention time predictor, and PTMs compatibility. Overall, inSPIRE showed a superior ability to harness original search engine results. Taken all together, rescoring platforms clearly outperformed original search results; however, they demanded additional computation time (up to 77%) and manual adjustments. The findings here underline the necessity of integrating rescoring platforms into current proteomics pipelines but also address some challenges in their implementation and optimization. Future integrated platforms may help enhance adoption.
Collapse
Affiliation(s)
- Jesus D. Castaño
- Département de Biomédecine Vétérinaire, Faculté de Médecine VétérinaireUniversité de MontréalSaint‐HyacintheCanada
- Centre de recherche sur le cerveau et l'apprentissage (CIRCA)Université de MontréalSaint‐HyacintheCanada
| | - Francis Beaudry
- Département de Biomédecine Vétérinaire, Faculté de Médecine VétérinaireUniversité de MontréalSaint‐HyacintheCanada
- Centre de recherche sur le cerveau et l'apprentissage (CIRCA)Université de MontréalSaint‐HyacintheCanada
| |
Collapse
|
3
|
Declercq A, Devreese R, Scheid J, Jachmann C, Van Den Bossche T, Preikschat A, Gomez-Zepeda D, Rijal JB, Hirschler A, Krieger JR, Srikumar T, Rosenberger G, Martelli C, Trede D, Carapito C, Tenzer S, Walz JS, Degroeve S, Bouwmeester R, Martens L, Gabriels R. TIMS 2Rescore: A Data Dependent Acquisition-Parallel Accumulation and Serial Fragmentation-Optimized Data-Driven Rescoring Pipeline Based on MS 2Rescore. J Proteome Res 2025; 24:1067-1076. [PMID: 39915959 PMCID: PMC11894666 DOI: 10.1021/acs.jproteome.4c00609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Revised: 11/08/2024] [Accepted: 01/27/2025] [Indexed: 03/08/2025]
Abstract
The high throughput analysis of proteins with mass spectrometry (MS) is highly valuable for understanding human biology, discovering disease biomarkers, identifying therapeutic targets, and exploring pathogen interactions. To achieve these goals, specialized proteomics subfields, including plasma proteomics, immunopeptidomics, and metaproteomics, must tackle specific analytical challenges, such as an increased identification ambiguity compared to routine proteomics experiments. Technical advancements in MS instrumentation can mitigate these issues by acquiring more discerning information at higher sensitivity levels. This is exemplified by the incorporation of ion mobility and parallel accumulation and serial fragmentation (PASEF) technologies in timsTOF instruments. In addition, AI-based bioinformatics solutions can help overcome ambiguity issues by integrating more data into the identification workflow. Here, we introduce TIMS2Rescore, a data-driven rescoring workflow optimized for DDA-PASEF data from timsTOF instruments. This platform includes new timsTOF MS2PIP spectrum prediction models and IM2Deep, a new deep learning-based peptide ion mobility predictor. Furthermore, to fully streamline data throughput, TIMS2Rescore directly accepts Bruker raw mass spectrometry data and search results from ProteoScape and many other search engines, including Sage and PEAKS. We showcase TIMS2Rescore performance on plasma proteomics, immunopeptidomics (HLA class I and II), and metaproteomics data sets. TIMS2Rescore is open-source and freely available at https://github.com/compomics/tims2rescore.
Collapse
Affiliation(s)
- Arthur Declercq
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Robbe Devreese
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Jonas Scheid
- Department
of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Tübingen 72076, Germany
- Cluster of
Excellence iFIT (ECX2180) Image-Guided and Functionally Instructed
Tumor Therapies, University of Tuebingen, Tuebingen 72076, Germany
- Quantitative
Biology Center (QBiC), University of Tübingen, Tübingen 72076, Germany
| | - Caroline Jachmann
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Tim Van Den Bossche
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Annica Preikschat
- Institute
of Immunology, University Medical Center
of the Johannes-Gutenberg University, Mainz 55131, Germany
| | - David Gomez-Zepeda
- Helmholtz
Institute for Translational Oncology Mainz (HI-TRON Mainz) −
A Helmholtz Institute of the DKFZ, Mainz 55131, Germany
- German Cancer
Research Center (DKFZ) Heidelberg, Division 191 & Immunopeptidomics
Platform, Heidelberg 69120, Germany
| | - Jeewan Babu Rijal
- BioOrganic
Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI
FR2048, Strasbourg 67087, France
| | - Aurélie Hirschler
- BioOrganic
Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI
FR2048, Strasbourg 67087, France
| | | | | | | | | | - Dennis Trede
- Bruker
Daltonics GmbH & Co. KG, Bremen 28359, Germany
| | - Christine Carapito
- BioOrganic
Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI
FR2048, Strasbourg 67087, France
| | - Stefan Tenzer
- Institute
of Immunology, University Medical Center
of the Johannes-Gutenberg University, Mainz 55131, Germany
- Helmholtz
Institute for Translational Oncology Mainz (HI-TRON Mainz) −
A Helmholtz Institute of the DKFZ, Mainz 55131, Germany
- Research
Center for Immunotherapy (FZI), University
Medical Center of the Johannes-Gutenberg University, Mainz 55131, Germany
| | - Juliane S Walz
- Department
of Peptide-based Immunotherapy, Institute of Immunology, University and University Hospital Tübingen, Tübingen 72076, Germany
- Cluster of
Excellence iFIT (ECX2180) Image-Guided and Functionally Instructed
Tumor Therapies, University of Tuebingen, Tuebingen 72076, Germany
- Clinical
Collaboration Unit Translational Immunology, Department of Internal
Medicine, University Hospital Tuebingen, Tuebingen 72076, Germany
- German
Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ),
partner site Tübingen, Tübingen 72076, Germany
| | - Sven Degroeve
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Robbin Bouwmeester
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Lennart Martens
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
- BioOrganic
Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI
FR2048, Strasbourg 67087, France
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| |
Collapse
|
4
|
Klein J, Carvalho L, Zaia J. Expanding N-glycopeptide identifications by modeling fragmentation, elution, and glycome connectivity. Nat Commun 2024; 15:6168. [PMID: 39039063 PMCID: PMC11263600 DOI: 10.1038/s41467-024-50338-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 07/08/2024] [Indexed: 07/24/2024] Open
Abstract
Accurate glycopeptide identification in mass spectrometry-based glycoproteomics is a challenging problem at scale. Recent innovation has been made in increasing the scope and accuracy of glycopeptide identifications, with more precise uncertainty estimates for each part of the structure. We present a dynamically adapting relative retention time model for detecting and correcting ambiguous glycan assignments that are difficult to detect from fragmentation alone, a layered approach to glycopeptide fragmentation modeling that improves N-glycopeptide identification in samples without compromising identification quality, and a site-specific method to increase the depth of the glycoproteome confidently identifiable even further. We demonstrate our techniques on a set of previously published datasets, showing the performance gains at each stage of optimization. These techniques are provided in the open-source glycomics and glycoproteomics platform GlycReSoft available at https://github.com/mobiusklein/glycresoft .
Collapse
Affiliation(s)
- Joshua Klein
- Program for Bioinformatics, Boston University, Boston, MA, US.
| | - Luis Carvalho
- Program for Bioinformatics, Boston University, Boston, MA, US
- Department of Math and Statistics, Boston University, Boston, MA, US
| | - Joseph Zaia
- Program for Bioinformatics, Boston University, Boston, MA, US.
- Department of Biochemistry and Cell Biology, Boston University, Boston, MA, US.
| |
Collapse
|
5
|
Taurozzi AJ, Rüther PL, Patramanis I, Koenig C, Sinclair Paterson R, Madupe PP, Harking FS, Welker F, Mackie M, Ramos-Madrigal J, Olsen JV, Cappellini E. Deep-time phylogenetic inference by paleoproteomic analysis of dental enamel. Nat Protoc 2024; 19:2085-2116. [PMID: 38671208 DOI: 10.1038/s41596-024-00975-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 01/12/2024] [Indexed: 04/28/2024]
Abstract
In temperate and subtropical regions, ancient proteins are reported to survive up to about 2 million years, far beyond the known limits of ancient DNA preservation in the same areas. Accordingly, their amino acid sequences currently represent the only source of genetic information available to pursue phylogenetic inference involving species that went extinct too long ago to be amenable for ancient DNA analysis. Here we present a complete workflow, including sample preparation, mass spectrometric data acquisition and computational analysis, to recover and interpret million-year-old dental enamel protein sequences. During sample preparation, the proteolytic digestion step, usually an integral part of conventional bottom-up proteomics, is omitted to increase the recovery of the randomly degraded peptides spontaneously generated by extensive diagenetic hydrolysis of ancient proteins over geological time. Similarly, we describe other solutions we have adopted to (1) authenticate the endogenous origin of the protein traces we identify, (2) detect and validate amino acid variation in the ancient protein sequences and (3) attempt phylogenetic inference. Sample preparation and data acquisition can be completed in 3-4 working days, while subsequent data analysis usually takes 2-5 days. The workflow described requires basic expertise in ancient biomolecules analysis, mass spectrometry-based proteomics and molecular phylogeny. Finally, we describe the limits of this approach and its potential for the reconstruction of evolutionary relationships in paleontology and paleoanthropology.
Collapse
Affiliation(s)
| | - Patrick L Rüther
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | | | - Claire Koenig
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | | | - Palesa P Madupe
- Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Florian Simon Harking
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Frido Welker
- Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Meaghan Mackie
- Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | | | - Jesper V Olsen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
6
|
Kalhor M, Lapin J, Picciani M, Wilhelm M. Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors Into Peptide Identification. Mol Cell Proteomics 2024; 23:100798. [PMID: 38871251 PMCID: PMC11269915 DOI: 10.1016/j.mcpro.2024.100798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/26/2024] [Accepted: 06/09/2024] [Indexed: 06/15/2024] Open
Abstract
Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities, and future perspectives of this approach and its impact on mass spectrometry-based proteomics.
Collapse
Affiliation(s)
- Mostafa Kalhor
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Joel Lapin
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany; Munich Data Science Institute, Technical University of Munich, Garching, Germany.
| |
Collapse
|
7
|
Staes A, Mendes Maia T, Dufour S, Bouwmeester R, Gabriels R, Martens L, Gevaert K, Impens F, Devos S. Benefit of In Silico Predicted Spectral Libraries in Data-Independent Acquisition Data Analysis Workflows. J Proteome Res 2024; 23:2078-2089. [PMID: 38666436 DOI: 10.1021/acs.jproteome.4c00048] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2025]
Abstract
Data-independent acquisition (DIA) has become a well-established method for MS-based proteomics. However, the list of options to analyze this type of data is quite extensive, and the use of spectral libraries has become an important factor in DIA data analysis. More specifically the use of in silico predicted libraries is gaining more interest. By working with a differential spike-in of human standard proteins (UPS2) in a constant yeast tryptic digest background, we evaluated the sensitivity, precision, and accuracy of the use of in silico predicted libraries in data DIA data analysis workflows compared to more established workflows. Three commonly used DIA software tools, DIA-NN, EncyclopeDIA, and Spectronaut, were each tested in spectral library mode and spectral library-free mode. In spectral library mode, we used independent spectral library prediction tools PROSIT and MS2PIP together with DeepLC, next to classical data-dependent acquisition (DDA)-based spectral libraries. In total, we benchmarked 12 computational workflows for DIA. Our comparison showed that DIA-NN reached the highest sensitivity while maintaining a good compromise on the reproducibility and accuracy levels in either library-free mode or using in silico predicted libraries pointing to a general benefit in using in silico predicted libraries.
Collapse
Affiliation(s)
- An Staes
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- VIB Proteomics Core, B9052 Ghent, Belgium
| | - Teresa Mendes Maia
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- VIB Proteomics Core, B9052 Ghent, Belgium
| | - Sara Dufour
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- VIB Proteomics Core, B9052 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
| | - Ralf Gabriels
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
| | - Lennart Martens
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
| | - Kris Gevaert
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
| | - Francis Impens
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- VIB Proteomics Core, B9052 Ghent, Belgium
| | - Simon Devos
- VIB Center for Medical Biotechnology, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, B9052 Ghent, Belgium
- VIB Proteomics Core, B9052 Ghent, Belgium
| |
Collapse
|
8
|
Picciani M, Gabriel W, Giurcoiu VG, Shouman O, Hamood F, Lautenbacher L, Jensen CB, Müller J, Kalhor M, Soleymaniniya A, Kuster B, The M, Wilhelm M. Oktoberfest: Open-source spectral library generation and rescoring pipeline based on Prosit. Proteomics 2024; 24:e2300112. [PMID: 37672792 DOI: 10.1002/pmic.202300112] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/17/2023] [Accepted: 08/18/2023] [Indexed: 09/08/2023]
Abstract
Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applications, ranging from data-independent acquisition (DIA) data analysis to data-driven rescoring of search engine results. Here, we present Oktoberfest, an open source Python package of our spectral library generation and rescoring pipeline originally only available online via ProteomicsDB. Oktoberfest is largely search engine agnostic and provides access to online peptide property predictions, promoting the adoption of state-of-the-art ML/DL models in proteomics analysis pipelines. We demonstrate its ability to reproduce and even improve our results from previously published rescoring analyses on two distinct use cases. Oktoberfest is freely available on GitHub (https://github.com/wilhelm-lab/oktoberfest) and can easily be installed locally through the cross-platform PyPI Python package.
Collapse
Affiliation(s)
- Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Wassim Gabriel
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Victor-George Giurcoiu
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Omar Shouman
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Firas Hamood
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Ludwig Lautenbacher
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Cecilia Bang Jensen
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Julian Müller
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mostafa Kalhor
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Armin Soleymaniniya
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Matthew The
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| |
Collapse
|
9
|
Gomez-Zepeda D, Arnold-Schild D, Beyrle J, Declercq A, Gabriels R, Kumm E, Preikschat A, Łącki MK, Hirschler A, Rijal JB, Carapito C, Martens L, Distler U, Schild H, Tenzer S. Thunder-DDA-PASEF enables high-coverage immunopeptidomics and is boosted by MS 2Rescore with MS 2PIP timsTOF fragmentation prediction model. Nat Commun 2024; 15:2288. [PMID: 38480730 PMCID: PMC10937930 DOI: 10.1038/s41467-024-46380-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 02/26/2024] [Indexed: 03/17/2024] Open
Abstract
Human leukocyte antigen (HLA) class I peptide ligands (HLAIps) are key targets for developing vaccines and immunotherapies against infectious pathogens or cancer cells. Identifying HLAIps is challenging due to their high diversity, low abundance, and patient individuality. Here, we develop a highly sensitive method for identifying HLAIps using liquid chromatography-ion mobility-tandem mass spectrometry (LC-IMS-MS/MS). In addition, we train a timsTOF-specific peak intensity MS2PIP model for tryptic and non-tryptic peptides and implement it in MS2Rescore (v3) together with the CCS predictor from ionmob. The optimized method, Thunder-DDA-PASEF, semi-selectively fragments singly and multiply charged HLAIps based on their IMS and m/z. Moreover, the method employs the high sensitivity mode and extended IMS resolution with fewer MS/MS frames (300 ms TIMS ramp, 3 MS/MS frames), doubling the coverage of immunopeptidomics analyses, compared to the proteomics-tailored DDA-PASEF (100 ms TIMS ramp, 10 MS/MS frames). Additionally, rescoring boosts the HLAIps identification by 41.7% to 33%, resulting in 5738 HLAIps from as little as one million JY cell equivalents, and 14,516 HLAIps from 20 million. This enables in-depth profiling of HLAIps from diverse human cell lines and human plasma. Finally, profiling JY and Raji cells transfected to express the SARS-CoV-2 spike protein results in 16 spike HLAIps, thirteen of which have been reported to elicit immune responses in human patients.
Collapse
Affiliation(s)
- David Gomez-Zepeda
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany.
- Helmholtz Institute for Translational Oncology Mainz (HI-TRON Mainz) - A Helmholtz Institute of the DKFZ, Mainz, Germany.
- German Cancer Research Center (DKFZ) Heidelberg, Division 191, Heidelberg, Germany.
| | - Danielle Arnold-Schild
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Julian Beyrle
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
- Helmholtz Institute for Translational Oncology Mainz (HI-TRON Mainz) - A Helmholtz Institute of the DKFZ, Mainz, Germany
- German Cancer Research Center (DKFZ) Heidelberg, Division 191, Heidelberg, Germany
| | - Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Elena Kumm
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Annica Preikschat
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Mateusz Krzysztof Łącki
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Aurélie Hirschler
- BioOrganic Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI - FR2048, Strasbourg, France
| | - Jeewan Babu Rijal
- BioOrganic Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI - FR2048, Strasbourg, France
| | - Christine Carapito
- BioOrganic Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, ProFI - FR2048, Strasbourg, France
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Ute Distler
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
- Research Center for Immunotherapy (FZI), University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Hansjörg Schild
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
- Research Center for Immunotherapy (FZI), University Medical Center of the Johannes-Gutenberg University, Mainz, Germany
| | - Stefan Tenzer
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University, Mainz, Germany.
- Helmholtz Institute for Translational Oncology Mainz (HI-TRON Mainz) - A Helmholtz Institute of the DKFZ, Mainz, Germany.
- German Cancer Research Center (DKFZ) Heidelberg, Division 191, Heidelberg, Germany.
- Research Center for Immunotherapy (FZI), University Medical Center of the Johannes-Gutenberg University, Mainz, Germany.
| |
Collapse
|
10
|
Ye J, He X, Wang S, Dong MQ, Wu F, Lu S, Feng F. Test-Time Training for Deep MS/MS Spectrum Prediction Improves Peptide Identification. J Proteome Res 2024; 23:550-559. [PMID: 38153036 DOI: 10.1021/acs.jproteome.3c00229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
In bottom-up proteomics, peptide-spectrum matching is critical for peptide and protein identification. Recently, deep learning models have been used to predict tandem mass spectra of peptides, enabling the calculation of similarity scores between the predicted and experimental spectra for peptide-spectrum matching. These models follow the supervised learning paradigm, which trains a general model using paired peptides and spectra from standard data sets and directly employs the model on experimental data. However, this approach can lead to inaccurate predictions due to differences between the training data and the experimental data, such as sample types, enzyme specificity, and instrument calibration. To tackle this problem, we developed a test-time training paradigm that adapts the pretrained model to generate experimental data-specific models, namely, PepT3. PepT3 yields a 10-40% increase in peptide identification depending on the variability in training and experimental data. Intriguingly, when applied to a patient-derived immunopeptidomic sample, PepT3 increases the identification of tumor-specific immunopeptide candidates by 60%. Two-thirds of the newly identified candidates are predicted to bind to the patient's human leukocyte antigen isoforms. To facilitate access of the model and all the results, we have archived all the intermediate files in Zenodo.org with identifier 8231084.
Collapse
Affiliation(s)
- Jianbai Ye
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Xiangnan He
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Shujuan Wang
- National Institute of Biological Sciences, Beijing 102206, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing 102206, China
| | - Feng Wu
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Shan Lu
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, California 92093, United States
| | - Fuli Feng
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
11
|
Declercq A, Demeulemeester N, Gabriels R, Bouwmeester R, Degroeve S, Martens L. Bioinformatics Pipeline for Processing Single-Cell Data. Methods Mol Biol 2024; 2817:221-239. [PMID: 38907156 DOI: 10.1007/978-1-0716-3934-4_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
Single-cell proteomics can offer valuable insights into dynamic cellular interactions, but identifying proteins at this level is challenging due to their low abundance. In this chapter, we present a state-of-the-art bioinformatics pipeline for single-cell proteomics that combines the search engine Sage (via SearchGUI), identification rescoring with MS2Rescore, quantification through FlashLFQ, and differential expression analysis using MSqRob2. MS2Rescore leverages LC-MS/MS behavior predictors, such as MS2PIP and DeepLC, to recalibrate scores with Percolator or mokapot. Combining these tools into a unified pipeline, this approach improves the detection of low-abundance peptides, resulting in increased identifications while maintaining stringent FDR thresholds.
Collapse
Affiliation(s)
- Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Nina Demeulemeester
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- StatOmics, Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium.
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.
| |
Collapse
|
12
|
Tariq MU, Ebert S, Saeed F. Making MS Omics Data ML-Ready: SpeCollate Protocols. Methods Mol Biol 2024; 2836:135-155. [PMID: 38995540 DOI: 10.1007/978-1-0716-4007-4_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
The increasing complexity and volume of mass spectrometry (MS) data have presented new challenges and opportunities for proteomics data analysis and interpretation. In this chapter, we provide a comprehensive guide to transforming MS data for machine learning (ML) training, inference, and applications. The chapter is organized into three parts. The first part describes the data analysis needed for MS-based experiments and a general introduction to our deep learning model SpeCollate-which we will use throughout the chapter for illustration. The second part of the chapter explores the transformation of MS data for inference, providing a step-by-step guide for users to deduce peptides from their MS data. This section aims to bridge the gap between data acquisition and practical applications by detailing the necessary steps for data preparation and interpretation. In the final part, we present a demonstrative example of SpeCollate, a deep learning-based peptide database search engine that overcomes the problems of simplistic simulation of theoretical spectra and heuristic scoring functions for peptide-spectrum matches by generating joint embeddings for spectra and peptides. SpeCollate is a user-friendly tool with an intuitive command-line interface to perform the search, showcasing the effectiveness of the techniques and methodologies discussed in the earlier sections and highlighting the potential of machine learning in the context of mass spectrometry data analysis. By offering a comprehensive overview of data transformation, inference, and ML model applications for mass spectrometry, this chapter aims to empower researchers and practitioners in leveraging the power of machine learning to unlock novel insights and drive innovation in the field of mass spectrometry-based omics.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- Knight Foundation School of Computing and Information Sciences (KFSCIS), Florida International University (FIU), Miami, FL, USA
| | - Samuel Ebert
- Knight Foundation School of Computing and Information Sciences (KFSCIS), Florida International University (FIU), Miami, FL, USA
| | - Fahad Saeed
- Knight Foundation School of Computing and Information Sciences (KFSCIS), Florida International University (FIU), Miami, FL, USA.
| |
Collapse
|
13
|
Gabriel W, Picciani M, The M, Wilhelm M. Deep Learning-Assisted Analysis of Immunopeptidomics Data. Methods Mol Biol 2024; 2758:457-483. [PMID: 38549030 DOI: 10.1007/978-1-0716-3646-6_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
Liquid chromatography-coupled mass spectrometry (LC-MS/MS) is the primary method to obtain direct evidence for the presentation of disease- or patient-specific human leukocyte antigen (HLA). However, compared to the analysis of tryptic peptides in proteomics, the analysis of HLA peptides still poses computational and statistical challenges. Recently, fragment ion intensity-based matching scores assessing the similarity between predicted and observed spectra were shown to substantially increase the number of confidently identified peptides, particularly in use cases where non-tryptic peptides are analyzed. In this chapter, we describe in detail three procedures on how to benefit from state-of-the-art deep learning models to analyze and validate single spectra, single measurements, and multiple measurements in mass spectrometry-based immunopeptidomics. For this, we explain how to use the Universal Spectrum Explorer (USE), online Oktoberfest, and offline Oktoberfest. For intensity-based scoring, Oktoberfest uses fragment ion intensity and retention time predictions from the deep learning framework Prosit, a deep neural network trained on a very large number of synthetic peptides and tandem mass spectra generated within the ProteomeTools project. The examples shown highlight how deep learning-assisted analysis can increase the number of identified HLA peptides, facilitate the discovery of confidently identified neo-epitopes, or provide assistance in the assessment of the presence of cryptic peptides, such as spliced peptides.
Collapse
Affiliation(s)
- Wassim Gabriel
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Matthew The
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
14
|
Van Bael S, Ludwig C, Baggerman G, Temmerman L. Identification and Targeted Quantification of Endogenous Neuropeptides in the Nematode Caenorhabditis elegans Using Mass Spectrometry. Methods Mol Biol 2024; 2758:341-373. [PMID: 38549024 DOI: 10.1007/978-1-0716-3646-6_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
The nematode Caenorhabditis elegans lends itself as an excellent model organism for peptidomics studies. Its ease of cultivation and quick generation time make it suitable for high-throughput studies. The nervous system, with its 302 neurons, is probably the best-known and studied endocrine tissue. Moreover, its neuropeptidergic signaling pathways display numerous similarities with those observed in other metazoans. Here, we describe two label-free approaches for neuropeptidomics in C. elegans: one for discovery purposes, and another for targeted quantification and comparisons of neuropeptide levels between different samples. Starting from a detailed peptide extraction procedure, we here outline the liquid chromatography tandem mass spectrometry (LC-MS/MS) setup and describe subsequent data analysis approaches.
Collapse
Affiliation(s)
- Sven Van Bael
- Department of Biology, Animal Physiology & Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium
| | - Christina Ludwig
- Bavarian Center for Biomolecular Mass Spectrometry (BayBioMS), Technical University of Munich (TUM), Freising, Germany
| | - Geert Baggerman
- Center for Proteomics, University of Antwerp, Antwerp, Belgium
| | - Liesbet Temmerman
- Department of Biology, Animal Physiology & Neurobiology, University of Leuven (KU Leuven), Leuven, Belgium.
| |
Collapse
|
15
|
Bichmann L, Gupta S, Röst H. Data-Independent Acquisition Peptidomics. Methods Mol Biol 2024; 2758:77-88. [PMID: 38549009 DOI: 10.1007/978-1-0716-3646-6_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
In recent years, data-independent acquisition (DIA) has emerged as a powerful analysis method in biological mass spectrometry (MS). Compared to the previously predominant data-dependent acquisition (DDA), it offers a way to achieve greater reproducibility, sensitivity, and dynamic range in MS measurements. To make DIA accessible to non-expert users, a multifunctional, automated high-throughput pipeline DIAproteomics was implemented in the computational workflow framework "Nextflow" ( https://nextflow.io ). This allows high-throughput processing of proteomics and peptidomics DIA datasets on diverse computing infrastructures. This chapter provides a short summary and usage protocol guide for the most important modes of operation of this pipeline regarding the analysis of peptidomics datasets using the command line. In brief, DIAproteomics is a wrapper around the OpenSwathWorkflow and relies on either existing or ad-hoc generated spectral libraries from matching DDA runs. The OpenSwathWorkflow extracts chromatograms from the DIA runs and performs chromatographic peak-picking. Further downstream of the pipeline, these peaks are scored, aligned, and statistically evaluated for qualitative and quantitative differences across conditions depending on the user's interest. DIAproteomics is open-source and available under a permissive license. We encourage the scientific community to use or modify the pipeline to meet their specific requirements.
Collapse
Affiliation(s)
- Leon Bichmann
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen, Germany
| | - Shubham Gupta
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| | - Hannes Röst
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
16
|
Kirkpatrick J, Stemmer PM, Searle BC, Herring LE, Martin L, Midha MK, Phinney BS, Shan B, Palmblad M, Wang Y, Jagtap PD, Neely BA. 2019 Association of Biomolecular Resource Facilities Multi-Laboratory Data-Independent Acquisition Proteomics Study. J Biomol Tech 2023; 34:3fc1f5fe.9b78d780. [PMID: 37435391 PMCID: PMC10332336 DOI: 10.7171/3fc1f5fe.9b78d780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Despite the advantages of fewer missing values by collecting fragment ion data on all analytes in the sample as well as the potential for deeper coverage, the adoption of data-independent acquisition (DIA) in proteomics core facility settings has been slow. The Association of Biomolecular Resource Facilities conducted a large interlaboratory study to evaluate DIA performance in proteomics laboratories with various instrumentation. Participants were supplied with generic methods and a uniform set of test samples. The resulting 49 DIA datasets act as benchmarks and have utility in education and tool development. The sample set consisted of a tryptic HeLa digest spiked with high or low levels of 4 exogenous proteins. Data are available in MassIVE MSV000086479. Additionally, we demonstrate how the data can be analyzed by focusing on 2 datasets using different library approaches and show the utility of select summary statistics. These data can be used by DIA newcomers, software developers, or DIA experts evaluating performance with different platforms, acquisition settings, and skill levels.
Collapse
Affiliation(s)
- Joanna Kirkpatrick
- Leibniz Institute on AgingFritz Lipmann Institute07745JenaGermany
- The Francis Crick InstituteLondonNW1 1ATUnited Kingdom
| | | | - Brian C. Searle
- Department of Biomedical InformaticsThe Ohio State UniversityColumbusOhio43210USA
- Pelotonia Institute for Immuno-OncologyThe Ohio State University Comprehensive Cancer CenterColumbusOhio43210USA
| | - Laura E. Herring
- UNC Proteomics Core FacilityDepartment of PharmacologyUniversity of North Carolina at Chapel HillChapel HillNorth Carolina27514USA
| | | | | | | | - Baozhen Shan
- Bioinformatics Solutions Inc.WaterlooON N2L 3K8Canada
| | - Magnus Palmblad
- Center for Proteomics and MetabolomicsLeiden University Medical Center2333 ZC LeidenThe Netherlands
| | - Yan Wang
- National Institute of Dental and Craniofacial ResearchNational Institutes of HealthBethesdaMaryland20892USA
| | - Pratik D. Jagtap
- Department of BiochemistryMolecular Biology and BiophysicsUniversity of MinnesotaMinneapolisMinnesota55455USA
| | - Benjamin A. Neely
- National Institute of Standards and TechnologyCharlestonSouth Carolina29412USA
| |
Collapse
|
17
|
Declercq A, Bouwmeester R, Chiva C, Sabidó E, Hirschler A, Carapito C, Martens L, Degroeve S, Gabriels R. Updated MS²PIP web server supports cutting-edge proteomics applications. Nucleic Acids Res 2023:7151340. [PMID: 37140039 DOI: 10.1093/nar/gkad335] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 04/04/2023] [Accepted: 04/25/2023] [Indexed: 05/05/2023] Open
Abstract
Interest in the use of machine learning for peptide fragmentation spectrum prediction has been strongly on the rise over the past years, especially for applications in challenging proteomics identification workflows such as immunopeptidomics and the full-proteome identification of data independent acquisition spectra. Since its inception, the MS²PIP peptide spectrum predictor has been widely used for various downstream applications, mostly thanks to its accuracy, ease-of-use, and broad applicability. We here present a thoroughly updated version of the MS²PIP web server, which includes new and more performant prediction models for both tryptic- and non-tryptic peptides, for immunopeptides, and for CID-fragmented TMT-labeled peptides. Additionally, we have also added new functionality to greatly facilitate the generation of proteome-wide predicted spectral libraries, requiring only a FASTA protein file as input. These libraries also include retention time predictions from DeepLC. Moreover, we now provide pre-built and ready-to-download spectral libraries for various model organisms in multiple DIA-compatible spectral library formats. Besides upgrading the back-end models, the user experience on the MS²PIP web server is thus also greatly enhanced, extending its applicability to new domains, including immunopeptidomics and MS3-based TMT quantification experiments. MS²PIP is freely available at https://iomics.ugent.be/ms2pip/.
Collapse
Affiliation(s)
- Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium
- Department of Biomolecular Medicine, Ghent University, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium
- Department of Biomolecular Medicine, Ghent University, Belgium
| | - Cristina Chiva
- Proteomics Unit, Universitat Pompeu Fabra, 08003, Barcelona, Spain
- Proteomics Unit, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08003, Barcelona, Spain
| | - Eduard Sabidó
- Proteomics Unit, Universitat Pompeu Fabra, 08003, Barcelona, Spain
- Proteomics Unit, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08003, Barcelona, Spain
| | - Aurélie Hirschler
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), Université de Strasbourg, CNRS, France
| | - Christine Carapito
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), Université de Strasbourg, CNRS, France
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium
- Department of Biomolecular Medicine, Ghent University, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium
- Department of Biomolecular Medicine, Ghent University, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium
- Department of Biomolecular Medicine, Ghent University, Belgium
| |
Collapse
|
18
|
Franciosa G, Locard-Paulet M, Jensen LJ, Olsen JV. Recent advances in kinase signaling network profiling by mass spectrometry. Curr Opin Chem Biol 2023; 73:102260. [PMID: 36657259 DOI: 10.1016/j.cbpa.2022.102260] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 12/13/2022] [Accepted: 12/14/2022] [Indexed: 01/19/2023]
Abstract
Mass spectrometry-based phosphoproteomics is currently the leading methodology for the study of global kinase signaling. The scientific community is continuously releasing technological improvements for sensitive and fast identification of phosphopeptides, and their accurate quantification. To interpret large-scale phosphoproteomics data, numerous bioinformatic resources are available that help understanding kinase network functional role in biological systems upon perturbation. Some of these resources are databases of phosphorylation sites, protein kinases and phosphatases; others are bioinformatic algorithms to infer kinase activity, predict phosphosite functional relevance and visualize kinase signaling networks. In this review, we present the latest experimental and bioinformatic tools to profile protein kinase signaling networks and provide examples of their application in biomedicine.
Collapse
Affiliation(s)
- Giulia Franciosa
- Proteomics Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Marie Locard-Paulet
- Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Lars J Jensen
- Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jesper V Olsen
- Proteomics Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
19
|
Prakash A, García-Seisdedos D, Wang S, Kundu DJ, Collins A, George N, Moreno P, Papatheodorou I, Jones AR, Vizcaíno JA. Integrated View of Baseline Protein Expression in Human Tissues. J Proteome Res 2023; 22:729-742. [PMID: 36577097 PMCID: PMC9990129 DOI: 10.1021/acs.jproteome.2c00406] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The availability of proteomics datasets in the public domain, and in the PRIDE database, in particular, has increased dramatically in recent years. This unprecedented large-scale availability of data provides an opportunity for combined analyses of datasets to get organism-wide protein abundance data in a consistent manner. We have reanalyzed 24 public proteomics datasets from healthy human individuals to assess baseline protein abundance in 31 organs. We defined tissue as a distinct functional or structural region within an organ. Overall, the aggregated dataset contains 67 healthy tissues, corresponding to 3,119 mass spectrometry runs covering 498 samples from 489 individuals. We compared protein abundances between different organs and studied the distribution of proteins across these organs. We also compared the results with data generated in analogous studies. Additionally, we performed gene ontology and pathway-enrichment analyses to identify organ-specific enriched biological processes and pathways. As a key point, we have integrated the protein abundance results into the resource Expression Atlas, where they can be accessed and visualized either individually or together with gene expression data coming from transcriptomics datasets. We believe this is a good mechanism to make proteomics data more accessible for life scientists.
Collapse
Affiliation(s)
- Ananth Prakash
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom.,Open Targets, Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom
| | - David García-Seisdedos
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom
| | - Shengbo Wang
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom
| | - Deepti Jaiswal Kundu
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom
| | - Andrew Collins
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, LiverpoolL69 7ZB, United Kingdom
| | - Nancy George
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom
| | - Pablo Moreno
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom
| | - Irene Papatheodorou
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom.,Open Targets, Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom
| | - Andrew R Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, LiverpoolL69 7ZB, United Kingdom
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom.,Open Targets, Wellcome Genome Campus, Hinxton, CambridgeCB10 1SD, United Kingdom
| |
Collapse
|
20
|
Abstract
Spectrum library searching is a powerful alternative to database searching for data dependent acquisition experiments, but has been historically limited to identifying previously observed peptides in libraries. Here we present Scribe, a new library search engine designed to leverage deep learning fragmentation prediction software such as Prosit. Rather than relying on highly curated DDA libraries, this approach predicts fragmentation and retention times for every peptide in a FASTA database. Scribe embeds Percolator for false discovery rate correction and an interference tolerant, label-free quantification integrator for an end-to-end proteomics workflow. By leveraging expected relative fragmentation and retention time values, we find that library searching with Scribe can outperform traditional database searching tools both in terms of sensitivity and quantitative precision. Scribe and its graphical interface are easy to use, freely accessible, and fully open source.
Collapse
Affiliation(s)
- Brian C Searle
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
- Proteome Software Inc., Portland, Oregon97219, United States
| | - Ariana E Shannon
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| | - Damien Beau Wilburn
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States
- Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States
- Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| |
Collapse
|
21
|
Rehfeldt T, Gabriels R, Bouwmeester R, Gessulat S, Neely BA, Palmblad M, Perez-Riverol Y, Schmidt T, Vizcaíno JA, Deutsch EW. ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics. J Proteome Res 2023; 22:632-636. [PMID: 36693629 PMCID: PMC9903315 DOI: 10.1021/acs.jproteome.2c00629] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Indexed: 01/26/2023]
Abstract
Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML.
Collapse
Affiliation(s)
- Tobias
G. Rehfeldt
- Institute
for Mathematics and Computer Science, University
of Southern Denmark, 5000 Odense, Denmark
| | - Ralf Gabriels
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | - Robbin Bouwmeester
- VIB-UGent
Center for Medical Biotechnology, VIB, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9052, Belgium
| | | | - Benjamin A. Neely
- National
Institute of Standards and Technology, Charleston, South Carolina 29412, United States
| | - Magnus Palmblad
- Center for
Proteomics and Metabolomics, Leiden University
Medical Center, 2300 RC Leiden, The Netherlands
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust
Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | | | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics
Institute (EMBL-EBI), Wellcome Trust
Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Eric W. Deutsch
- Institute
for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
22
|
Dorl S, Winkler S, Mechtler K, Dorfer V. MS Ana: Improving Sensitivity in Peptide Identification with Spectral Library Search. J Proteome Res 2023; 22:462-470. [PMID: 36688604 PMCID: PMC9903325 DOI: 10.1021/acs.jproteome.2c00658] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Spectral library search can enable more sensitive peptide identification in tandem mass spectrometry experiments. However, its drawbacks are the limited availability of high-quality libraries and the added difficulty of creating decoy spectra for result validation. We describe MS Ana, a new spectral library search engine that enables high sensitivity peptide identification using either curated or predicted spectral libraries as well as robust false discovery control through its own decoy library generation algorithm. MS Ana identifies on average 36% more spectrum matches and 4% more proteins than database search in a benchmark test on single-shot human cell-line data. Further, we demonstrate the quality of the result validation with tests on synthetic peptide pools and show the importance of library selection through a comparison of library search performance with different configurations of publicly available human spectral libraries.
Collapse
Affiliation(s)
- Sebastian Dorl
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,Department
of Computer Science, Johannes Kepler University
Linz, Altenbergerstraße
69, 4040Linz, Austria,E-mail: . Phone: +43 (0) 50804
27145
| | - Stephan Winkler
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,Department
of Computer Science, Johannes Kepler University
Linz, Altenbergerstraße
69, 4040Linz, Austria
| | - Karl Mechtler
- Research
Institute of Molecular Pathology (IMP), Protein Chemistry, Campus-Vienna-Biocenter 1, 1030Vienna, Austria,Institute
of Molecular Biotechnology (IMBA), Protein Chemistry, Vienna Biocenter
(VBC), Dr. Bohr-Gasse 3, 1030Vienna, Austria,Gregor
Mendel Institute of Molecular Plant Biology of the Austrian Academy
of Sciences (GMI), Dr.
Bohr Gasse 3, 1030Vienna, Austria
| | - Viktoria Dorfer
- University
of Applied Sciences Upper Austria, Bioinformatics Research Group, Softwarepark 11, 4232Hagenberg, Austria,E-mail: . Phone: +43 (0) 50804
22740
| |
Collapse
|
23
|
Cox J. Prediction of peptide mass spectral libraries with machine learning. Nat Biotechnol 2023; 41:33-43. [PMID: 36008611 DOI: 10.1038/s41587-022-01424-w] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 07/11/2022] [Indexed: 01/21/2023]
Abstract
The recent development of machine learning methods to identify peptides in complex mass spectrometric data constitutes a major breakthrough in proteomics. Longstanding methods for peptide identification, such as search engines and experimental spectral libraries, are being superseded by deep learning models that allow the fragmentation spectra of peptides to be predicted from their amino acid sequence. These new approaches, including recurrent neural networks and convolutional neural networks, use predicted in silico spectral libraries rather than experimental libraries to achieve higher sensitivity and/or specificity in the analysis of proteomics data. Machine learning is galvanizing applications that involve large search spaces, such as immunopeptidomics and proteogenomics. Current challenges in the field include the prediction of spectra for peptides with post-translational modifications and for cross-linked pairs of peptides. Permeation of machine-learning-based spectral prediction into search engines and spectrum-centric data-independent acquisition workflows for diverse peptide classes and measurement conditions will continue to push sensitivity and dynamic range in proteomics applications in the coming years.
Collapse
Affiliation(s)
- Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany.
- Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway.
| |
Collapse
|
24
|
Álvarez-Urdiola R, Borràs E, Valverde F, Matus JT, Sabidó E, Riechmann JL. Peptidomics Methods Applied to the Study of Flower Development. Methods Mol Biol 2023; 2686:509-536. [PMID: 37540375 DOI: 10.1007/978-1-0716-3299-4_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Understanding the global and dynamic nature of plant developmental processes requires not only the study of the transcriptome, but also of the proteome, including its largely uncharacterized peptidome fraction. Recent advances in proteomics and high-throughput analyses of translating RNAs (ribosome profiling) have begun to address this issue, evidencing the existence of novel, uncharacterized, and possibly functional peptides. To validate the accumulation in tissues of sORF-encoded polypeptides (SEPs), the basic setup of proteomic analyses (i.e., LC-MS/MS) can be followed. However, the detection of peptides that are small (up to ~100 aa, 6-7 kDa) and novel (i.e., not annotated in reference databases) presents specific challenges that need to be addressed both experimentally and with computational biology resources. Several methods have been developed in recent years to isolate and identify peptides from plant tissues. In this chapter, we outline two different peptide extraction protocols and the subsequent peptide identification by mass spectrometry using the database search or the de novo identification methods.
Collapse
Affiliation(s)
- Raquel Álvarez-Urdiola
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Eva Borràs
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Federico Valverde
- Institute for Plant Biochemistry and Photosynthesis CSIC - University of Seville, Seville, Spain
| | - José Tomás Matus
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, Valencia, Spain
| | - Eduard Sabidó
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - José Luis Riechmann
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
25
|
Jones AR, Deutsch EW, Vizcaíno JA. Is DIA proteomics data FAIR? Current data sharing practices, available bioinformatics infrastructure and recommendations for the future. Proteomics 2022; 23:e2200014. [PMID: 36074795 PMCID: PMC10155627 DOI: 10.1002/pmic.202200014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/27/2022] [Accepted: 08/29/2022] [Indexed: 11/06/2022]
Abstract
Data independent acquisition (DIA) proteomics techniques have matured enormously in recent years, thanks to multiple technical developments in e.g. instrumentation and data analysis approaches. However, there are many improvements that are still possible for DIA data in the area of the FAIR (Findability, Accessibility, Interoperability and Reusability) data principles. These include more tailored data sharing practices and open data standards, since public databases and data standards for proteomics were mostly designed with DDA data in mind. Here we first describe the current state of the art in the context of FAIR data for proteomics in general, and for DIA approaches in particular. For improving the current situation for DIA data, we make the following recommendations for the future: (i) development of an open data standard for spectral libraries; (ii) make mandatory the availability of the spectral libraries used in DIA experiments in ProteomeXchange resources; (iii) improve the support for DIA data in the data standards developed by the Proteomics Standards Initiative; and (iv) improve the support for DIA datasets in ProteomeXchange resources, including more tailored metadata requirements. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Andrew R Jones
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, L69 3BX, UK
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington, 98109, USA
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
26
|
Pauwels J, Fijałkowska D, Eyckerman S, Gevaert K. Mass spectrometry and the cellular surfaceome. MASS SPECTROMETRY REVIEWS 2022; 41:804-841. [PMID: 33655572 DOI: 10.1002/mas.21690] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 02/05/2021] [Accepted: 02/09/2021] [Indexed: 06/12/2023]
Abstract
The collection of exposed plasma membrane proteins, collectively termed the surfaceome, is involved in multiple vital cellular processes, such as the communication of cells with their surroundings and the regulation of transport across the lipid bilayer. The surfaceome also plays key roles in the immune system by recognizing and presenting antigens, with its possible malfunctioning linked to disease. Surface proteins have long been explored as potential cell markers, disease biomarkers, and therapeutic drug targets. Despite its importance, a detailed study of the surfaceome continues to pose major challenges for mass spectrometry-driven proteomics due to the inherent biophysical characteristics of surface proteins. Their inefficient extraction from hydrophobic membranes to an aqueous medium and their lower abundance compared to intracellular proteins hamper the analysis of surface proteins, which are therefore usually underrepresented in proteomic datasets. To tackle such problems, several innovative analytical methodologies have been developed. This review aims at providing an extensive overview of the different methods for surfaceome analysis, with respective considerations for downstream mass spectrometry-based proteomics.
Collapse
Affiliation(s)
- Jarne Pauwels
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | | | - Sven Eyckerman
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Kris Gevaert
- VIB Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| |
Collapse
|
27
|
Declercq A, Bouwmeester R, Hirschler A, Carapito C, Degroeve S, Martens L, Gabriels R. MS 2Rescore: Data-driven rescoring dramatically boosts immunopeptide identification rates. Mol Cell Proteomics 2022; 21:100266. [PMID: 35803561 PMCID: PMC9411678 DOI: 10.1016/j.mcpro.2022.100266] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 12/03/2022] Open
Abstract
Immunopeptidomics aims to identify major histocompatibility complex (MHC)-presented peptides on almost all cells that can be used in anti-cancer vaccine development. However, existing immunopeptidomics data analysis pipelines suffer from the nontryptic nature of immunopeptides, complicating their identification. Previously, peak intensity predictions by MS2PIP and retention time predictions by DeepLC have been shown to improve tryptic peptide identifications when rescoring peptide-spectrum matches with Percolator. However, as MS2PIP was tailored toward tryptic peptides, we have here retrained MS2PIP to include nontryptic peptides. Interestingly, the new models not only greatly improve predictions for immunopeptides but also yield further improvements for tryptic peptides. We show that the integration of new MS2PIP models, DeepLC, and Percolator in one software package, MS2Rescore, increases spectrum identification rate and unique identified peptides with 46% and 36% compared to standard Percolator rescoring at 1% FDR. Moreover, MS2Rescore also outperforms the current state-of-the-art in immunopeptide-specific identification approaches. Altogether, MS2Rescore thus allows substantially improved identification of novel epitopes from existing immunopeptidomics workflows. MS2Rescore significantly boosts immunopeptide identification rates Data-driven post-processing allows for a ten-fold increase in specificity MS2PIP and DeepLC predictors are integrated with Percolator post-processing MS2Rescore accepts identification results from MaxQuant, PEAKS, MS-GF+ and X!Tandem MS2Rescore shows great promise to extend current neo- and xeno-epitope landscapes
Collapse
Affiliation(s)
- Arthur Declercq
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biomolecular Medicine, Ghent University, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biomolecular Medicine, Ghent University, Belgium
| | - Aurélie Hirschler
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), Université de Strasbourg, CNRS
| | - Christine Carapito
- Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), Université de Strasbourg, CNRS
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biomolecular Medicine, Ghent University, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biomolecular Medicine, Ghent University, Belgium.
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biomolecular Medicine, Ghent University, Belgium
| |
Collapse
|
28
|
Shin H, Park Y, Ahn K, Kim S. Accurate Prediction of y Ions in Beam-Type Collision-Induced Dissociation Using Deep Learning. Anal Chem 2022; 94:7752-7758. [PMID: 35609248 PMCID: PMC9178553 DOI: 10.1021/acs.analchem.1c03184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Peptide fragmentation spectra contain critical information for the identification of peptides by mass spectrometry. In this study, we developed an algorithm that more accurately predicts the high-intensity peaks among the peptide spectra. The training data are composed of 180,833 peptides from the National Institute of Standards and Technology and Proteomics Identification database, which were fragmented by either quadrupole time-of-flight or triple-quadrupole collision-induced dissociation methods. Exploratory analysis of the peptide fragmentation pattern was focused on the highest intensity peaks that showed proline, peptide length, and a sliding window of four amino acid combination that can be exploited as key features. The amino acid sequence of each peptide and each of the key features were allocated to different layers of the model, where recurrent neural network, convolutional neural network, and fully connected neural network were used. The trained model, PrAI-frag, accurately predicts the fragmentation spectra compared to previous machine learning-based prediction algorithms. The model excels at high-intensity peak prediction, which is advantageous to selective/multiple reaction monitoring application. PrAI-frag is provided via a Web server which can be used for peptides of length 6-15.
Collapse
Affiliation(s)
- HyeonSeok Shin
- Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea
| | - Youngmin Park
- Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea
| | - Kyunggeun Ahn
- Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea
| | - Sungsoo Kim
- Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea
| |
Collapse
|
29
|
Xin L, Qiao R, Chen X, Tran H, Pan S, Rabinoviz S, Bian H, He X, Morse B, Shan B, Li M. A streamlined platform for analyzing tera-scale DDA and DIA mass spectrometry data enables highly sensitive immunopeptidomics. Nat Commun 2022; 13:3108. [PMID: 35672356 PMCID: PMC9174175 DOI: 10.1038/s41467-022-30867-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 05/20/2022] [Indexed: 12/21/2022] Open
Abstract
Integrating data-dependent acquisition (DDA) and data-independent acquisition (DIA) approaches can enable highly sensitive mass spectrometry, especially for imunnopeptidomics applications. Here we report a streamlined platform for both DDA and DIA data analysis. The platform integrates deep learning-based solutions of spectral library search, database search, and de novo sequencing under a unified framework, which not only boosts the sensitivity but also accurately controls the specificity of peptide identification. Our platform identifies 5-30% more peptide precursors than other state-of-the-art systems on multiple benchmark datasets. When evaluated on immunopeptidomics datasets, we identify 1.7-4.1 and 1.4-2.2 times more peptides from DDA and DIA data, respectively, than previously reported results. We also discover six T-cell epitopes from SARS-CoV-2 immunopeptidome that might represent potential targets for COVID-19 vaccine development. The platform supports data formats from all major instruments and is implemented with the distributed high-performance computing technology, allowing analysis of tera-scale datasets of thousands of samples for clinical applications.
Collapse
Affiliation(s)
- Lei Xin
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
| | - Rui Qiao
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
| | - Xin Chen
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
| | - Hieu Tran
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Shengying Pan
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
| | | | - Haibo Bian
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
| | - Xianliang He
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
| | - Brenton Morse
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
| | - Baozhen Shan
- Bioinformatics Solutions Inc., Waterloo, Ontario, Canada.
| | - Ming Li
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada.
| |
Collapse
|
30
|
Gabriel W, Giurcoiu V, Lautenbacher L, Wilhelm M. Predicting fragment intensities and retention time of iTRAQ- and TMTPro-labeled peptides with Prosit-TMT. Proteomics 2022; 22:e2100257. [PMID: 35578405 DOI: 10.1002/pmic.202100257] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Revised: 04/22/2022] [Accepted: 05/05/2022] [Indexed: 11/08/2022]
Abstract
Isobaric labeling increases the throughput of proteomics by enabling the parallel identification and quantification of peptides and proteins. Over the past decades, a variety of isobaric tags have been developed allowing the multiplexed analysis of up to 18 samples. However, experiments utilizing such tags often exhibit reduced identification rates and thus show decreased analytical depth. Re-scoring has been shown to rescue otherwise missed identifications but was not yet systematically applied on isobarically labeled data. Because iTRAQ 4/8-plex and the recently released TMTpro 16/18-plex share similar characteristics with TMT 6/10/11-plex, we hypothesized that Prosit-TMT, trained exclusively on 6/10/11-plex labeled peptides, may be applicable to these isobaric labeling strategies as well. To investigate this, we re-analyzed nine publicly available datasets covering iTRAQ and TMTpro labeling for samples with human and mouse origin. We highlight that Prosit-TMT shows remarkably good performance when comparing experimentally acquired and predicted fragmentation spectra (R of 0.84 - 0.9) and retention times (ΔRT95% of 3 - 10% gradient time) of peptides. Furthermore, re-scoring substantially increases the number of confidently identified spectra, peptides, and proteins. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Wassim Gabriel
- Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
| | - Victor Giurcoiu
- Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
| | - Ludwig Lautenbacher
- Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich, Freising, Germany
| |
Collapse
|
31
|
Gabriel W, The M, Zolg DP, Bayer FP, Shouman O, Lautenbacher L, Schnatbaum K, Zerweck J, Knaute T, Delanghe B, Huhmer A, Wenschuh H, Reimer U, Médard G, Kuster B, Wilhelm M. Prosit-TMT: Deep Learning Boosts Identification of TMT-Labeled Peptides. Anal Chem 2022; 94:7181-7190. [PMID: 35549156 DOI: 10.1021/acs.analchem.1c05435] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The prediction of fragment ion intensities and retention time of peptides has gained significant attention over the past few years. However, the progress shown in the accurate prediction of such properties focused primarily on unlabeled peptides. Tandem mass tags (TMT) are chemical peptide labels that are coupled to free amine groups usually after protein digestion to enable the multiplexed analysis of multiple samples in bottom-up mass spectrometry. It is a standard workflow in proteomics ranging from single-cell to high-throughput proteomics. Particularly for TMT, increasing the number of confidently identified spectra is highly desirable as it provides identification and quantification information with every spectrum. Here, we report on the generation of an extensive resource of synthetic TMT-labeled peptides as part of the ProteomeTools project and present the extension of the deep learning model Prosit to accurately predict the retention time and fragment ion intensities of TMT-labeled peptides with high accuracy. Prosit-TMT supports CID and HCD fragmentation and ion trap and Orbitrap mass analyzers in a single model. Reanalysis of published TMT data sets show that this single model extracts substantial additional information. Applying Prosit-TMT, we discovered that the expression of many proteins in human breast milk follows a distinct daily cycle which may prime the newborn for nutritional or environmental cues.
Collapse
Affiliation(s)
- Wassim Gabriel
- Computational Mass Spectrometry, Technical University of Munich, 85354 Freising, Germany
| | - Matthew The
- Chair of Proteomics and Bioanalytics, Technical University of Munich, 85354 Freising, Germany
| | - Daniel P Zolg
- Chair of Proteomics and Bioanalytics, Technical University of Munich, 85354 Freising, Germany
| | - Florian P Bayer
- Chair of Proteomics and Bioanalytics, Technical University of Munich, 85354 Freising, Germany
| | - Omar Shouman
- Computational Mass Spectrometry, Technical University of Munich, 85354 Freising, Germany
| | - Ludwig Lautenbacher
- Computational Mass Spectrometry, Technical University of Munich, 85354 Freising, Germany
| | | | | | - Tobias Knaute
- JPT Peptide Technologies GmbH, 12489 Berlin, Germany
| | | | - Andreas Huhmer
- Thermo Fisher Scientific, San Jose, California 95134, United States
| | | | - Ulf Reimer
- JPT Peptide Technologies GmbH, 12489 Berlin, Germany
| | - Guillaume Médard
- Chair of Proteomics and Bioanalytics, Technical University of Munich, 85354 Freising, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, 85354 Freising, Germany.,Bavarian Center for Biomolecular Mass Spectrometry, 85354 Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
32
|
Levitsky LI, Kuznetsova KG, Kliuchnikova AA, Ilina IY, Goncharov AO, Lobas AA, Ivanov MV, Lazarev VN, Ziganshin RH, Gorshkov MV, Moshkovskii SA. Validating Amino Acid Variants in Proteogenomics Using Sequence Coverage by Multiple Reads. J Proteome Res 2022; 21:1438-1448. [PMID: 35536917 DOI: 10.1021/acs.jproteome.2c00033] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Mass spectrometry-based proteome analysis implies matching the mass spectra of proteolytic peptides to amino acid sequences predicted from genomic sequences. Reliability of peptide variant identification in proteogenomic studies is often lacking. We propose a way to interpret shotgun proteomics results, specifically in the data-dependent acquisition mode, as protein sequence coverage by multiple reads as it is done in nucleic acid sequencing for calling of single nucleotide variants. Multiple reads for each sequence position could be provided by overlapping distinct peptides, thus confirming the presence of certain amino acid residues in the overlapping stretch with a lower false discovery rate. Overlapping distinct peptides originate from miscleaved tryptic peptides in combination with their properly cleaved counterparts and from peptides generated by multiple proteases after the same specimen is subject to parallel digestion and analyzed separately. We illustrate this approach using publicly available multiprotease data sets and our own data generated for the HEK-293 cell line digests obtained using trypsin, LysC, and GluC proteases. Totally, up to 30% of the whole proteome was covered by tryptic peptides with up to 7% covered twofold and more. The proteogenomic analysis of the HEK-293 cell line revealed 36 single amino acid variants, seven of which were supported by multiple reads.
Collapse
Affiliation(s)
- Lev I Levitsky
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38, bld. 2, Leninsky Prospect, Moscow 119334, Russia
| | - Ksenia G Kuznetsova
- Federal Research and Clinical Center of Physical-Chemical Medicine, 1a, Malaya Pirogovskaya, Moscow 119435, Russia
| | - Anna A Kliuchnikova
- Federal Research and Clinical Center of Physical-Chemical Medicine, 1a, Malaya Pirogovskaya, Moscow 119435, Russia.,Pirogov Russian National Research Medical University, 1, Ostrovityanova, Moscow 117997, Russia
| | - Irina Y Ilina
- Federal Research and Clinical Center of Physical-Chemical Medicine, 1a, Malaya Pirogovskaya, Moscow 119435, Russia
| | - Anton O Goncharov
- Federal Research and Clinical Center of Physical-Chemical Medicine, 1a, Malaya Pirogovskaya, Moscow 119435, Russia.,Pirogov Russian National Research Medical University, 1, Ostrovityanova, Moscow 117997, Russia
| | - Anna A Lobas
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38, bld. 2, Leninsky Prospect, Moscow 119334, Russia
| | - Mark V Ivanov
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38, bld. 2, Leninsky Prospect, Moscow 119334, Russia
| | - Vassili N Lazarev
- Federal Research and Clinical Center of Physical-Chemical Medicine, 1a, Malaya Pirogovskaya, Moscow 119435, Russia.,Moscow Institute of Physics and Technology (State University), 9, Institutskiy per., Dolgoprudny, Moscow Region 141701, Russia
| | - Rustam H Ziganshin
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, 16/10, Miklukho-Maklaya, Moscow 117997, Russia
| | - Mikhail V Gorshkov
- V.L. Talrose Institute for Energy Problems of Chemical Physics, N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, 38, bld. 2, Leninsky Prospect, Moscow 119334, Russia
| | - Sergei A Moshkovskii
- Federal Research and Clinical Center of Physical-Chemical Medicine, 1a, Malaya Pirogovskaya, Moscow 119435, Russia.,Pirogov Russian National Research Medical University, 1, Ostrovityanova, Moscow 117997, Russia
| |
Collapse
|
33
|
Shiferaw GA, Gabriels R, Bouwmeester R, Van Den Bossche T, Vandermarliere E, Martens L, Volders PJ. Sensitive and Specific Spectral Library Searching with CompOmics Spectral Library Searching Tool and Percolator. J Proteome Res 2022; 21:1365-1370. [PMID: 35446579 DOI: 10.1021/acs.jproteome.2c00075] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Maintaining high sensitivity while limiting false positives is a key challenge in peptide identification from mass spectrometry data. Here, we investigate the effects of integrating the machine learning-based postprocessor Percolator into our spectral library searching tool COSS (CompOmics Spectral library Searching tool). To evaluate the effects of this postprocessing, we have used 40 data sets from 2 different projects and have searched these against the NIST and MassIVE spectral libraries. The searching is carried out using 2 spectral library search tools, COSS and MSPepSearch with and without Percolator postprocessing, and using sequence database search engine MS-GF+ as a baseline comparator. The addition of the Percolator rescoring step to COSS is effective and results in a substantial improvement in sensitivity and specificity of the identifications. COSS is freely available as open source under the permissive Apache2 license, and binaries and source code are found at https://github.com/compomics/COSS.
Collapse
Affiliation(s)
- Genet Abay Shiferaw
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Elien Vandermarliere
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Pieter-Jan Volders
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
34
|
Na S, Choi H, Paek E. Deephos: Predicted spectral database search for TMT-labeled phosphopeptides and its false discovery rate estimation. Bioinformatics 2022; 38:2980-2987. [PMID: 35441674 DOI: 10.1093/bioinformatics/btac280] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 03/26/2022] [Accepted: 04/14/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Tandem mass tag (TMT)-based tandem mass spectrometry (MS/MS) has become the method of choice for the quantification of post-translational modifications in complex mixtures. Many cancer proteogenomic studies have highlighted the importance of large-scale phosphopeptide quantification coupled with TMT labeling. Herein, we propose a predicted Spectral DataBase (pSDB) search strategy called Deephos that can improve both sensitivity and specificity in identifying MS/MS spectra of TMT-labeled phosphopeptides. RESULTS With deep learning-based fragment ion prediction, we compiled a pSDB of TMT-labeled phosphopeptides generated from ∼8,000 human phosphoproteins annotated in UniProt. Deep learning could successfully recognize the fragmentation patterns altered by both TMT labeling and phosphorylation. In addition, we discuss the decoy spectra for false discovery rate (FDR) estimation in the pSDB search. We show that FDR could be inaccurately estimated by the existing decoy spectra generation methods and propose an innovative method to generate decoy spectra for more accurate FDR estimation. The utilities of Deephos were demonstrated in multi-stage analyses (coupled with database searches) of glioblastoma, acute myeloid leukemia, and breast cancer phosphoproteomes. AVAILABILITY Deephos pSDB and the search software are available at https://github.com/seungjinna/deephos.
Collapse
Affiliation(s)
- Seungjin Na
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, 04763, Republic of Korea
| | - Hyunjin Choi
- Department of Automotive Engineering, Hanyang University, Seoul, 04763, Republic of Korea
| | - Eunok Paek
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, 04763, Republic of Korea.,Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| |
Collapse
|
35
|
Van Puyvelde B, Daled S, Willems S, Gabriels R, Gonzalez de Peredo A, Chaoui K, Mouton-Barbosa E, Bouyssié D, Boonen K, Hughes CJ, Gethings LA, Perez-Riverol Y, Bloomfield N, Tate S, Schiltz O, Martens L, Deforce D, Dhaenens M. A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics. Sci Data 2022; 9:126. [PMID: 35354825 PMCID: PMC8967878 DOI: 10.1038/s41597-022-01216-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 02/23/2022] [Indexed: 12/23/2022] Open
Abstract
In the last decade, a revolution in liquid chromatography-mass spectrometry (LC-MS) based proteomics was unfolded with the introduction of dozens of novel instruments that incorporate additional data dimensions through innovative acquisition methodologies, in turn inspiring specialized data analysis pipelines. Simultaneously, a growing number of proteomics datasets have been made publicly available through data repositories such as ProteomeXchange, Zenodo and Skyline Panorama. However, developing algorithms to mine this data and assessing the performance on different platforms is currently hampered by the lack of a single benchmark experimental design. Therefore, we acquired a hybrid proteome mixture on different instrument platforms and in all currently available families of data acquisition. Here, we present a comprehensive Data-Dependent and Data-Independent Acquisition (DDA/DIA) dataset acquired using several of the most commonly used current day instrumental platforms. The dataset consists of over 700 LC-MS runs, including adequate replicates allowing robust statistics and covering over nearly 10 different data formats, including scanning quadrupole and ion mobility enabled acquisitions. Datasets are available via ProteomeXchange (PXD028735).
Collapse
Affiliation(s)
- Bart Van Puyvelde
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, Belgium
| | - Simon Daled
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, Belgium
| | - Sander Willems
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9000, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Anne Gonzalez de Peredo
- Institut de Pharmacologie et de Biologie Structural (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Karima Chaoui
- Institut de Pharmacologie et de Biologie Structural (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Emmanuelle Mouton-Barbosa
- Institut de Pharmacologie et de Biologie Structural (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - David Bouyssié
- Institut de Pharmacologie et de Biologie Structural (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Kurt Boonen
- VITO Health, Mol, Belgium
- Centre for Proteomics, University of Antwerpen, Antwerp, Belgium
| | | | | | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | | | | | - Odile Schiltz
- Institut de Pharmacologie et de Biologie Structural (IPBS), Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000, Ghent, Belgium
| | - Dieter Deforce
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, Belgium
| | - Maarten Dhaenens
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, Ghent, Belgium.
| |
Collapse
|
36
|
Lou R, Liu W, Li R, Li S, He X, Shui W. DeepPhospho accelerates DIA phosphoproteome profiling through in silico library generation. Nat Commun 2021; 12:6685. [PMID: 34795227 PMCID: PMC8602247 DOI: 10.1038/s41467-021-26979-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 10/26/2021] [Indexed: 12/27/2022] Open
Abstract
Phosphoproteomics integrating data-independent acquisition (DIA) enables deep phosphoproteome profiling with improved quantification reproducibility and accuracy compared to data-dependent acquisition (DDA)-based phosphoproteomics. DIA data mining heavily relies on a spectral library that in most cases is built on DDA analysis of the same sample. Construction of this project-specific DDA library impairs the analytical throughput, limits the proteome coverage, and increases the sample size for DIA phosphoproteomics. Herein we introduce a deep neural network, DeepPhospho, which conceptually differs from previous deep learning models to achieve accurate predictions of LC-MS/MS data for phosphopeptides. By leveraging in silico libraries generated by DeepPhospho, we establish a DIA workflow for phosphoproteome profiling which involves DIA data acquisition and data mining with DeepPhospho predicted libraries, thus circumventing the need of DDA library construction. Our DeepPhospho-empowered workflow substantially expands the phosphoproteome coverage while maintaining high quantification performance, which leads to the discovery of more signaling pathways and regulated kinases in an EGF signaling study than the DDA library-based approach. DeepPhospho is provided as a web server as well as an offline app to facilitate user access to model training, predictions and library generation.
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai, 201210, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Weizhen Liu
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Rongjie Li
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Shanshan Li
- iHuman Institute, ShanghaiTech University, Shanghai, 201210, China
| | - Xuming He
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
- Shanghai Engineering Research Center of Intelligent Vision and Imaging, Shanghai, 201210, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai, 201210, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
| |
Collapse
|
37
|
Tariq MU, Saeed F. SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions. PLoS One 2021; 16:e0259349. [PMID: 34714871 PMCID: PMC8555789 DOI: 10.1371/journal.pone.0259349] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 10/18/2021] [Indexed: 11/19/2022] Open
Abstract
Historically, the database search algorithms have been the de facto standard for inferring peptides from mass spectrometry (MS) data. Database search algorithms deduce peptides by transforming theoretical peptides into theoretical spectra and matching them to the experimental spectra. Heuristic similarity-scoring functions are used to match an experimental spectrum to a theoretical spectrum. However, the heuristic nature of the scoring functions and the simple transformation of the peptides into theoretical spectra, along with noisy mass spectra for the less abundant peptides, can introduce a cascade of inaccuracies. In this paper, we design and implement a Deep Cross-Modal Similarity Network called SpeCollate, which overcomes these inaccuracies by learning the similarity function between experimental spectra and peptides directly from the labeled MS data. SpeCollate transforms spectra and peptides into a shared Euclidean subspace by learning fixed size embeddings for both. Our proposed deep-learning network trains on sextuplets of positive and negative examples coupled with our custom-designed SNAP-loss function. Online hardest negative mining is used to select the appropriate negative examples for optimal training performance. We use 4.8 million sextuplets obtained from the NIST and MassIVE peptide libraries to train the network and demonstrate that for closed search, SpeCollate is able to perform better than Crux and MSFragger in terms of the number of peptide-spectrum matches (PSMs) and unique peptides identified under 1% FDR for real-world data. SpeCollate also identifies a large number of peptides not reported by either Crux or MSFragger. To the best of our knowledge, our proposed SpeCollate is the first deep-learning network that can determine the cross-modal similarity between peptides and mass-spectra for MS-based proteomics. We believe SpeCollate is significant progress towards developing machine-learning solutions for MS-based omics data analysis. SpeCollate is available at https://deepspecs.github.io/.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing & Information Sciences, Florida International University, Miami, FL, United States of America
| | - Fahad Saeed
- School of Computing & Information Sciences, Florida International University, Miami, FL, United States of America
| |
Collapse
|
38
|
van Bentum M, Selbach M. An Introduction to Advanced Targeted Acquisition Methods. Mol Cell Proteomics 2021; 20:100165. [PMID: 34673283 PMCID: PMC8600983 DOI: 10.1016/j.mcpro.2021.100165] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 10/11/2021] [Accepted: 10/13/2021] [Indexed: 01/13/2023] Open
Abstract
Targeted proteomics via selected reaction monitoring (SRM) or parallel reaction monitoring (PRM) enables fast and sensitive detection of a preselected set of target peptides. However, the number of peptides that can be monitored in conventional targeting methods is usually rather small. Recently, a series of methods has been described that employ intelligent acquisition strategies to increase the efficiency of mass spectrometers to detect target peptides. These methods are based on one of two strategies. First, retention time adjustment-based methods enable intelligent scheduling of target peptide retention times. These include Picky, iRT, as well as spike-in free real-time adjustment methods such as MaxQuant.Live. Second, in spike-in triggered acquisition methods such as SureQuant, Pseudo-PRM, TOMAHAQ, and Scout-MRM, targeted scans are initiated by abundant labeled synthetic peptides added to samples before the run. Both strategies enable the mass spectrometer to better focus data acquisition time on target peptides. This either enables more sensitive detection or a higher number of targets per run. Here, we provide an overview of available advanced targeting methods and highlight their intrinsic strengths and weaknesses and compatibility with specific experimental setups. Our goal is to provide a basic introduction to advanced targeting methods for people starting to work in this field.
Collapse
Affiliation(s)
- Mirjam van Bentum
- Proteome Dynamics, Max Delbrück Center for Molecular Medicine, Berlin, Germany; Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Matthias Selbach
- Proteome Dynamics, Max Delbrück Center for Molecular Medicine, Berlin, Germany; Charité-Universitätsmedizin Berlin, Berlin, Germany.
| |
Collapse
|
39
|
Peeters MKR, Baggerman G, Gabriels R, Pepermans E, Menschaert G, Boonen K. Ion Mobility Coupled to a Time-of-Flight Mass Analyzer Combined With Fragment Intensity Predictions Improves Identification of Classical Bioactive Peptides and Small Open Reading Frame-Encoded Peptides. Front Cell Dev Biol 2021; 9:720570. [PMID: 34604223 PMCID: PMC8484717 DOI: 10.3389/fcell.2021.720570] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 08/25/2021] [Indexed: 12/29/2022] Open
Abstract
Bioactive peptides exhibit key roles in a wide variety of complex processes, such as regulation of body weight, learning, aging, and innate immune response. Next to the classical bioactive peptides, emerging from larger precursor proteins by specific proteolytic processing, a new class of peptides originating from small open reading frames (sORFs) have been recognized as important biological regulators. But their intrinsic properties, specific expression pattern and location on presumed non-coding regions have hindered the full characterization of the repertoire of bioactive peptides, despite their predominant role in various pathways. Although the development of peptidomics has offered the opportunity to study these peptides in vivo, it remains challenging to identify the full peptidome as the lack of cleavage enzyme specification and large search space complicates conventional database search approaches. In this study, we introduce a proteogenomics methodology using a new type of mass spectrometry instrument and the implementation of machine learning tools toward improved identification of potential bioactive peptides in the mouse brain. The application of trapped ion mobility spectrometry (tims) coupled to a time-of-flight mass analyzer (TOF) offers improved sensitivity, an enhanced peptide coverage, reduction in chemical noise and the reduced occurrence of chimeric spectra. Subsequent machine learning tools MS2PIP, predicting fragment ion intensities and DeepLC, predicting retention times, improve the database searching based on a large and comprehensive custom database containing both sORFs and alternative ORFs. Finally, the identification of peptides is further enhanced by applying the post-processing semi-supervised learning tool Percolator. Applying this workflow, the first peptidomics workflow combined with spectral intensity and retention time predictions, we identified a total of 167 predicted sORF-encoded peptides, of which 48 originating from presumed non-coding locations, next to 401 peptides from known neuropeptide precursors, linked to 66 annotated bioactive neuropeptides from within 22 different families. Additional PEAKS analysis expanded the pool of SEPs on presumed non-coding locations to 84, while an additional 204 peptides completed the list of peptides from neuropeptide precursors. Altogether, this study provides insights into a new robust pipeline that fuses technological advancements from different fields ensuring an improved coverage of the neuropeptidome in the mouse brain.
Collapse
Affiliation(s)
- Marlies K. R. Peeters
- BioBix, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Geert Baggerman
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
- Unit Environmental Risk and Health, Flemish Institute for Technological Research, Mol, Belgium
| | - Ralf Gabriels
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- VIB-UGent Center for Medical Biotechnology, Flanders Institute for Biotechnology, Ghent, Belgium
| | - Elise Pepermans
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
- Unit Environmental Risk and Health, Flemish Institute for Technological Research, Mol, Belgium
| | - Gerben Menschaert
- BioBix, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
- OHMX.bio, Ghent, Belgium
| | - Kurt Boonen
- Centre for Proteomics, University of Antwerp, Antwerp, Belgium
- Unit Environmental Risk and Health, Flemish Institute for Technological Research, Mol, Belgium
| |
Collapse
|
40
|
Lill JR, Mathews WR, Rose CM, Schirle M. Proteomics in the pharmaceutical and biotechnology industry: a look to the next decade. Expert Rev Proteomics 2021; 18:503-526. [PMID: 34320887 DOI: 10.1080/14789450.2021.1962300] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
INTRODUCTION Pioneering technologies such as proteomics have helped fuel the biotechnology and pharmaceutical industry with the discovery of novel targets and an intricate understanding of the activity of therapeutics and their various activities in vitro and in vivo. The field of proteomics is undergoing an inflection point, where new sensitive technologies are allowing intricate biological pathways to be better understood, and novel biochemical tools are pivoting us into a new era of chemical proteomics and biomarker discovery. In this review, we describe these areas of innovation, and discuss where the fields are headed in terms of fueling biotechnological and pharmacological research and discuss current gaps in the proteomic technology landscape. AREAS COVERED Single cell sequencing and single molecule sequencing. Chemoproteomics. Biological matrices and clinical samples including biomarkers. Computational tools including instrument control software, data analysis. EXPERT OPINION Proteomics will likely remain a key technology in the coming decade, but will have to evolve with respect to type and granularity of data, cost and throughput of data generation as well as integration with other technologies to fulfill its promise in drug discovery.
Collapse
Affiliation(s)
- Jennie R Lill
- Department of Microchemistry, Lipidomics and Next Generation Sequencing, Genentech Inc. DNA Way, South San Francisco, CA, USA
| | - William R Mathews
- OMNI Department, Genentech Inc. 1 DNA Way, South San Francisco, CA, USA
| | - Christopher M Rose
- Department of Microchemistry, Lipidomics and Next Generation Sequencing, Genentech Inc. DNA Way, South San Francisco, CA, USA
| | - Markus Schirle
- Chemical Biology and Therapeutics Department, Novartis Institutes for Biomedical Research, Cambridge, MA, USA
| |
Collapse
|
41
|
Van Puyvelde B, Van Uytfanghe K, Tytgat O, Van Oudenhove L, Gabriels R, Bouwmeester R, Daled S, Van Den Bossche T, Ramasamy P, Verhelst S, De Clerck L, Corveleyn L, Willems S, Debunne N, Wynendaele E, De Spiegeleer B, Judak P, Roels K, De Wilde L, Van Eenoo P, Reyns T, Cherlet M, Dumont E, Debyser G, t'Kindt R, Sandra K, Gupta S, Drouin N, Harms A, Hankemeier T, Jones DJL, Gupta P, Lane D, Lane CS, El Ouadi S, Vincendet JB, Morrice N, Oehrle S, Tanna N, Silvester S, Hannam S, Sigloch FC, Bhangu-Uhlmann A, Claereboudt J, Anderson NL, Razavi M, Degroeve S, Cuypers L, Stove C, Lagrou K, Martens GA, Deforce D, Martens L, Vissers JPC, Dhaenens M. Cov-MS: A Community-Based Template Assay for Mass-Spectrometry-Based Protein Detection in SARS-CoV-2 Patients. JACS AU 2021. [PMID: 34254058 DOI: 10.1101/2020.11.18.20231688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Rising population density and global mobility are among the reasons why pathogens such as SARS-CoV-2, the virus that causes COVID-19, spread so rapidly across the globe. The policy response to such pandemics will always have to include accurate monitoring of the spread, as this provides one of the few alternatives to total lockdown. However, COVID-19 diagnosis is currently performed almost exclusively by reverse transcription polymerase chain reaction (RT-PCR). Although this is efficient, automatable, and acceptably cheap, reliance on one type of technology comes with serious caveats, as illustrated by recurring reagent and test shortages. We therefore developed an alternative diagnostic test that detects proteolytically digested SARS-CoV-2 proteins using mass spectrometry (MS). We established the Cov-MS consortium, consisting of 15 academic laboratories and several industrial partners to increase applicability, accessibility, sensitivity, and robustness of this kind of SARS-CoV-2 detection. This, in turn, gave rise to the Cov-MS Digital Incubator that allows other laboratories to join the effort, navigate, and share their optimizations and translate the assay into their clinic. As this test relies on viral proteins instead of RNA, it provides an orthogonal and complementary approach to RT-PCR using other reagents that are relatively inexpensive and widely available, as well as orthogonally skilled personnel and different instruments. Data are available via ProteomeXchange with identifier PXD022550.
Collapse
Affiliation(s)
- Bart Van Puyvelde
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Katleen Van Uytfanghe
- Laboratory of Toxicology, Department of Bioanalysis, Faculty of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium
| | - Olivier Tytgat
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
- Department of Life Science Technologies, Imec, 3000 Leuven, Belgium
| | | | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Simon Daled
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Pathmanaban Ramasamy
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
- Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, 1050 Brussels, Belgium
| | - Sigrid Verhelst
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Laura De Clerck
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Laura Corveleyn
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Sander Willems
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Nathan Debunne
- Drug Quality and Registration Group, Faculty of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium
| | - Evelien Wynendaele
- Drug Quality and Registration Group, Faculty of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium
| | - Bart De Spiegeleer
- Drug Quality and Registration Group, Faculty of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium
| | - Peter Judak
- Doping Control Laboratory, Department of Diagnostic Sciences, Ghent University, 9000 Ghent, Belgium
| | - Kris Roels
- Doping Control Laboratory, Department of Diagnostic Sciences, Ghent University, 9000 Ghent, Belgium
| | - Laurie De Wilde
- Doping Control Laboratory, Department of Diagnostic Sciences, Ghent University, 9000 Ghent, Belgium
| | - Peter Van Eenoo
- Doping Control Laboratory, Department of Diagnostic Sciences, Ghent University, 9000 Ghent, Belgium
| | - Tim Reyns
- Department of Clinical Chemistry, Ghent University Hospital, 9000 Ghent, Belgium
| | - Marc Cherlet
- Department of Pharmacology, Toxicology, and Biochemistry, Faculty of Veterinary Medicine, Ghent University 9000 Ghent, Belgium
| | - Emmie Dumont
- Research Institute for Chromatography (RIC), 8500 Kortrijk, Belgium
| | - Griet Debyser
- Research Institute for Chromatography (RIC), 8500 Kortrijk, Belgium
| | - Ruben t'Kindt
- Research Institute for Chromatography (RIC), 8500 Kortrijk, Belgium
| | - Koen Sandra
- Research Institute for Chromatography (RIC), 8500 Kortrijk, Belgium
| | - Surya Gupta
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Nicolas Drouin
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, 2311 G Leiden, The Netherlands
| | - Amy Harms
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, 2311 G Leiden, The Netherlands
| | - Thomas Hankemeier
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, 2311 G Leiden, The Netherlands
| | - Donald J L Jones
- Leicester Cancer Research Centre, RKCSB, University of Leicester, U.K., and John and Lucille van Geest Biomarker Facility, Cardiovascular Research Centre, Glenfield Hospital, Leicester LE1 7RH, United Kingdom
| | - Pankaj Gupta
- The Department of Chemical Pathology and Metabolic Diseases, Level 4, Sandringham Building, Leicester Royal Infirmary, Leicester LE1 7RH, United Kingdom
| | - Dan Lane
- The Department of Chemical Pathology and Metabolic Diseases, Level 4, Sandringham Building, Leicester Royal Infirmary, Leicester LE1 7RH, United Kingdom
| | | | - Said El Ouadi
- AB Sciex, Alderley Park, Macclesfield SK10 4TG, United Kingdom
| | | | - Nick Morrice
- AB Sciex, Alderley Park, Macclesfield SK10 4TG, United Kingdom
| | - Stuart Oehrle
- Waters Corporation, Milford, Massachusetts 01757, United States
| | - Nikunj Tanna
- Waters Corporation, Milford, Massachusetts 01757, United States
| | - Steve Silvester
- Alderley Analytical, Alderley Park, Macclesfield SK10 4TG, United Kingdom
| | - Sally Hannam
- Alderley Analytical, Alderley Park, Macclesfield SK10 4TG, United Kingdom
| | | | | | | | - N Leigh Anderson
- SISCAPA Assay Technologies, Inc., Washington, D.C. 20009, United States
| | - Morteza Razavi
- SISCAPA Assay Technologies, Inc., Washington, D.C. 20009, United States
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | - Lize Cuypers
- Clinical Department of Laboratory Medicine, UZ Leuven, KU Leuven, 3000 Leuven, Belgium
| | - Christophe Stove
- Laboratory of Toxicology, Department of Bioanalysis, Faculty of Pharmaceutical Sciences, Ghent University, 9000 Ghent, Belgium
| | - Katrien Lagrou
- Clinical Department of Laboratory Medicine, UZ Leuven, KU Leuven, 3000 Leuven, Belgium
| | - Geert A Martens
- AZ Delta Medical Laboratories, AZ Delta General Hospital, 8800 Roeselare, Belgium
| | - Dieter Deforce
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9000 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9000 Ghent Belgium
| | | | - Maarten Dhaenens
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
42
|
Bichmann L, Gupta S, Rosenberger G, Kuchenbecker L, Sachsenberg T, Ewels P, Alka O, Pfeuffer J, Kohlbacher O, Röst H. DIAproteomics: A Multifunctional Data Analysis Pipeline for Data-Independent Acquisition Proteomics and Peptidomics. J Proteome Res 2021; 20:3758-3766. [PMID: 34153189 DOI: 10.1021/acs.jproteome.1c00123] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Data-independent acquisition (DIA) is becoming a leading analysis method in biomedical mass spectrometry. The main advantages include greater reproducibility and sensitivity and a greater dynamic range compared with data-dependent acquisition (DDA). However, the data analysis is complex and often requires expert knowledge when dealing with large-scale data sets. Here we present DIAproteomics, a multifunctional, automated, high-throughput pipeline implemented in the Nextflow workflow management system that allows one to easily process proteomics and peptidomics DIA data sets on diverse compute infrastructures. The central components are well-established tools such as the OpenSwathWorkflow for the DIA spectral library search and PyProphet for the false discovery rate assessment. In addition, it provides options to generate spectral libraries from existing DDA data and to carry out the retention time and chromatogram alignment. The output includes annotated tables and diagnostic visualizations from the statistical postprocessing and computation of fold-changes across pairwise conditions, predefined in an experimental design. DIAproteomics is well documented open-source software and is available under a permissive license to the scientific community at https://www.openms.de/diaproteomics/.
Collapse
Affiliation(s)
- Leon Bichmann
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany.,Institute for Cell Biology, Department of Immunology, University of Tübingen, Tübingen 72076, Germany
| | - Shubham Gupta
- Donnelly Center for Biomolecular Research, University of Toronto, Toronto, Ontario ON M5S 3E1, Canada
| | - George Rosenberger
- Department of Systems Biology, Columbia University, New York, New York 10032, United States
| | - Leon Kuchenbecker
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany
| | - Timo Sachsenberg
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany
| | - Phil Ewels
- Science for Life Laboratory (SciLifeLab), Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Oliver Alka
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany
| | - Julianus Pfeuffer
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany.,Institute for Informatics, Freie Universität Berlin, Berlin 14195, Germany.,Zuse Institute Berlin, Berlin 14195, Germany
| | - Oliver Kohlbacher
- Department of Computer Science, Applied Bioinformatics, University of Tübingen, Tübingen 72076, Germany.,Institute for Biological and Medical Informatics, University of Tübingen, Tübingen 72076, Germany.,Institute for Translational Bioinformatics, University Hospital Tübingen, Tübingen 72076, Germany
| | - Hannes Röst
- Donnelly Center for Biomolecular Research, University of Toronto, Toronto, Ontario ON M5S 3E1, Canada
| |
Collapse
|
43
|
Salz R, Bouwmeester R, Gabriels R, Degroeve S, Martens L, Volders PJ, 't Hoen PAC. Personalized Proteome: Comparing Proteogenomics and Open Variant Search Approaches for Single Amino Acid Variant Detection. J Proteome Res 2021; 20:3353-3364. [PMID: 33998808 PMCID: PMC8280751 DOI: 10.1021/acs.jproteome.1c00264] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Indexed: 12/30/2022]
Abstract
Discovery of variant peptides such as a single amino acid variant (SAAV) in shotgun proteomics data is essential for personalized proteomics. Both the resolution of shotgun proteomics methods and the search engines have improved dramatically, allowing for confident identification of SAAV peptides. However, it is not yet known if these methods are truly successful in accurately identifying SAAV peptides without prior genomic information in the search database. We studied this in unprecedented detail by exploiting publicly available long-read RNA sequences and shotgun proteomics data from the gold standard reference cell line NA12878. Searching spectra from this cell line with the state-of-the-art open modification search engine ionbot against carefully curated search databases resulted in 96.7% false-positive SAAVs and an 85% lower true positive rate than searching with peptide search databases that incorporate prior genetic information. While adding genetic variants to the search database remains indispensable for correct peptide identification, inclusion of long-read RNA sequences in the search database contributes only 0.3% new peptide identifications. These findings reveal the differences in SAAV detection that result from various approaches, providing guidance to researchers studying SAAV peptides and developers of peptide spectrum identification tools.
Collapse
Affiliation(s)
- Renee Salz
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Pieter-Jan Volders
- VIB-UGent Center for Medical Biotechnology VIB, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Technologiepark-Zwijnaarde 75, 9052 Ghent, Belgium
| | - Peter A C 't Hoen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen 6525 GA, The Netherlands
| |
Collapse
|
44
|
Zolg DP, Gessulat S, Paschke C, Graber M, Rathke-Kuhnert M, Seefried F, Fitzemeier K, Berg F, Lopez-Ferrer D, Horn D, Henrich C, Huhmer A, Delanghe B, Frejno M. INFERYS rescoring: Boosting peptide identifications and scoring confidence of database search results. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2021:e9128. [PMID: 34015160 DOI: 10.1002/rcm.9128] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 04/14/2021] [Accepted: 05/17/2021] [Indexed: 06/12/2023]
Abstract
Database search engines for bottom-up proteomics largely ignore peptide fragment ion intensities during the automated scoring of tandem mass spectra against protein databases. Recent advances in deep learning allow the accurate prediction of peptide fragment ion intensities. Using these predictions to calculate additional intensity-based scores helps to overcome this drawback. Here, we describe a processing workflow termed INFERYS™ rescoring for the intensity-based rescoring of Sequest HT search engine results in Thermo Scientific™ Proteome Discoverer™ 2.5 software. The workflow is based on the deep learning platform INFERYS capable of predicting fragment ion intensities, which runs on personal computers without the need for graphics processing units. This workflow calculates intensity-based scores comparing peptide spectrum matches from Sequest HT and predicted spectra. Resulting scores are combined with classical search engine scores for input to the false discovery rate estimation tool Percolator. We demonstrate the merits of this approach by analyzing a classical HeLa standard sample and exemplify how this workflow leads to a better separation of target and decoy identifications, in turn resulting in increased peptide spectrum match, peptide and protein identification numbers. On an immunopeptidome dataset, this workflow leads to a 50% increase in identified peptides, emphasizing the advantage of intensity-based scores when analyzing low-intensity spectra or analytes with very similar physicochemical properties that require vast search spaces. Overall, the end-to-end integration of INFERYS rescoring enables simple and easy access to a powerful enhancement to classical database search engines, promising a deeper, more confident and more comprehensive analysis of proteomic data from any organism by unlocking the intensity dimension of tandem mass spectra for identification and more confident scoring.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Frank Berg
- Thermo Fisher Scientific (Bremen) GmbH, Bremen, Germany
| | | | - David Horn
- Thermo Fisher Scientific, San Jose, CA, USA
| | | | | | | | | |
Collapse
|
45
|
Schmidt T, Samaras P, Dorfer V, Panse C, Kockmann T, Bichmann L, van Puyvelde B, Perez-Riverol Y, Deutsch EW, Kuster B, Wilhelm M. Universal Spectrum Explorer: A Standalone (Web-)Application for Cross-Resource Spectrum Comparison. J Proteome Res 2021; 20:3388-3394. [PMID: 33970638 DOI: 10.1021/acs.jproteome.1c00096] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Here, we present the Universal Spectrum Explorer (USE), a web-based tool based on IPSA for cross-resource (peptide) spectrum visualization and comparison (https://www.proteomicsdb.org/use/). Mass spectra under investigation can be either provided manually by the user (table format) or automatically retrieved from online repositories supporting access to spectral data via the universal spectrum identifier (USI), or requested from other resources and services implementing a newly designed REST interface. As a proof of principle, we implemented such an interface in ProteomicsDB thereby allowing the retrieval of spectra acquired within the ProteomeTools project or real-time prediction of tandem mass spectra from the deep learning framework Prosit. Annotated mirror spectrum plots can be exported from the USE as editable scalable high-quality vector graphics. The USE was designed and implemented with minimal external dependencies allowing local usage and integration into other web sites (https://github.com/kusterlab/universal_spectrum_explorer).
Collapse
Affiliation(s)
- Tobias Schmidt
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising 85354, Germany
| | - Patroklos Samaras
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising 85354, Germany
| | - Viktoria Dorfer
- Bioinformatics Research Group, University of Applied Sciences Upper Austria, 4232 Hagenberg im Mühlkreis, Austria
| | - Christian Panse
- Functional Genomics Center Zurich, Swiss Federal Institute of Technology in Zurich/University of Zurich, 8092 Zürich, Switzerland.,SIB Swiss Institute of Bioinformatics, Quartier Sorge - Batiment Amphipole, 1015 Lausanne, Switzerland
| | - Tobias Kockmann
- Functional Genomics Center Zurich, Swiss Federal Institute of Technology in Zurich/University of Zurich, 8092 Zürich, Switzerland
| | - Leon Bichmann
- Applied Bioinformatics, Department of Computer Science, University Tübingen, 72074 Tübingen, Germany
| | - Bart van Puyvelde
- ProGenTomics, Laboratory of Pharmaceutical Biotechnology, Ghent University, B9000, Ghent, Denmark
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States of America
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising 85354, Germany.,Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University of Munich (TUM), Freising 85354, Germany
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising 85354, Germany.,Computational Mass Spectrometry, Technical University of Munich, Freising 85354, Germany
| |
Collapse
|
46
|
Daled S, Willems S, Van Puyvelde B, Corveleyn L, Verhelst S, De Clerck L, Deforce D, Dhaenens M. Histone Sample Preparation for Bottom-Up Mass Spectrometry: A Roadmap to Informed Decisions. Proteomes 2021; 9:17. [PMID: 33919160 PMCID: PMC8167631 DOI: 10.3390/proteomes9020017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 04/08/2021] [Accepted: 04/19/2021] [Indexed: 11/16/2022] Open
Abstract
Histone-based chromatin organization enabled eukaryotic genome complexity. This epigenetic control mechanism allowed for the differentiation of stable gene-expression and thus the very existence of multicellular organisms. This existential role in biology makes histones one of the most complexly modified molecules in the biotic world, which makes these key regulators notoriously hard to analyze. We here provide a roadmap to enable fast and informed selection of a bottom-up mass spectrometry sample preparation protocol that matches a specific research question. We therefore propose a two-step assessment procedure: (i) visualization of the coverage that is attained for a given workflow and (ii) direct alignment between runs to assess potential pitfalls at the ion level. To illustrate the applicability, we compare four different sample preparation protocols while adding a new enzyme to the toolbox, i.e., RgpB (GingisREX®, Genovis, Lund, Sweden), an endoproteinase that selectively and efficiently cleaves at the c-terminal end of arginine residues. Raw data are available via ProteomeXchange with identifier PXD024423.
Collapse
Affiliation(s)
- Simon Daled
- Laboratory of Pharmaceutical Biotechnology/ProGenTomics, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium; (S.D.); (B.V.P.); (L.C.); (S.V.); (L.D.C.); (D.D.)
| | - Sander Willems
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany;
| | - Bart Van Puyvelde
- Laboratory of Pharmaceutical Biotechnology/ProGenTomics, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium; (S.D.); (B.V.P.); (L.C.); (S.V.); (L.D.C.); (D.D.)
| | - Laura Corveleyn
- Laboratory of Pharmaceutical Biotechnology/ProGenTomics, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium; (S.D.); (B.V.P.); (L.C.); (S.V.); (L.D.C.); (D.D.)
| | - Sigrid Verhelst
- Laboratory of Pharmaceutical Biotechnology/ProGenTomics, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium; (S.D.); (B.V.P.); (L.C.); (S.V.); (L.D.C.); (D.D.)
| | - Laura De Clerck
- Laboratory of Pharmaceutical Biotechnology/ProGenTomics, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium; (S.D.); (B.V.P.); (L.C.); (S.V.); (L.D.C.); (D.D.)
| | - Dieter Deforce
- Laboratory of Pharmaceutical Biotechnology/ProGenTomics, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium; (S.D.); (B.V.P.); (L.C.); (S.V.); (L.D.C.); (D.D.)
| | - Maarten Dhaenens
- Laboratory of Pharmaceutical Biotechnology/ProGenTomics, Ghent University, Ottergemsesteenweg 460, 9000 Ghent, Belgium; (S.D.); (B.V.P.); (L.C.); (S.V.); (L.D.C.); (D.D.)
| |
Collapse
|
47
|
Chen ZL, Mao PZ, Zeng WF, Chi H, He SM. pDeepXL: MS/MS Spectrum Prediction for Cross-Linked Peptide Pairs by Deep Learning. J Proteome Res 2021; 20:2570-2582. [PMID: 33821641 DOI: 10.1021/acs.jproteome.0c01004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In cross-linking mass spectrometry, the identification of cross-linked peptide pairs heavily relies on the ability of a database search engine to measure the similarities between experimental and theoretical MS/MS spectra. However, the lack of accurate ion intensities in theoretical spectra impairs the performance of search engines, in particular, on proteome scales. Here we introduce pDeepXL, a deep neural network to predict MS/MS spectra of cross-linked peptide pairs. To train pDeepXL, we used the transfer-learning technique because it facilitated the training with limited benchmark data of cross-linked peptide pairs. Test results on more than ten data sets showed that pDeepXL accurately predicted the spectra of both noncleavable DSS/BS3/Leiker cross-linked peptide pairs (>80% of predicted spectra have Pearson's r values higher than 0.9) and cleavable DSSO/DSBU cross-linked peptide pairs (>75% of predicted spectra have Pearson's r values higher than 0.9). pDeepXL also achieved the accurate prediction on unseen data sets using an online fine-tuning technique. Lastly, integrating pDeepXL into a database search engine increased the number of identified cross-link spectra by 18% on average.
Collapse
Affiliation(s)
- Zhen-Lin Chen
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Peng-Zhi Mao
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Si-Min He
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
48
|
Verbruggen S, Gessulat S, Gabriels R, Matsaroki A, Van de Voorde H, Kuster B, Degroeve S, Martens L, Van Criekinge W, Wilhelm M, Menschaert G. Spectral Prediction Features as a Solution for the Search Space Size Problem in Proteogenomics. Mol Cell Proteomics 2021; 20:100076. [PMID: 33823297 PMCID: PMC8214147 DOI: 10.1016/j.mcpro.2021.100076] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 03/04/2021] [Accepted: 03/25/2021] [Indexed: 11/17/2022] Open
Abstract
Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting. First proteogenomics with PSM rescoring using machine learning–predicted spectra Demonstrated on both ribosome profiling and nanopore RNA-Seq–derived databases Rescoring leads to elevated stringency and increased identification rates Rescoring compensates for the search space size issues in proteogenomics
Collapse
Affiliation(s)
- Steven Verbruggen
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium
| | - Siegfried Gessulat
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Ralf Gabriels
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | | | | | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Sven Degroeve
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Lennart Martens
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
| | - Wim Van Criekinge
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Mathias Wilhelm
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Gerben Menschaert
- BioBix, Lab of Bioinformatics and Computational Genomics, Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium; OHMX.bio, Ghent, Belgium.
| |
Collapse
|
49
|
Wilburn DB, Richards AL, Swaney DL, Searle BC. CIDer: A Statistical Framework for Interpreting Differences in CID and HCD Fragmentation. J Proteome Res 2021; 20:1951-1965. [PMID: 33729787 DOI: 10.1021/acs.jproteome.0c00964] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Library searching is a powerful technique for detecting peptides using either data independent or data dependent acquisition. While both large-scale spectrum library curators and deep learning prediction approaches have focused on beam-type CID fragmentation (HCD), resonance CID fragmentation remains a popular technique. Here we demonstrate an approach to model the differences between HCD and CID spectra, and present a software tool, CIDer, for converting libraries between the two fragmentation methods. We demonstrate that just using a combination of simple linear models and basic principles of peptide fragmentation, we can explain up to 43% of the variation between ions fragmented by HCD and CID across an array of collision energy settings. We further show that in some circumstances, searching converted CID libraries can detect more peptides than searching existing CID libraries or libraries of machine learning predictions from FASTA databases. These results suggest that leveraging information in existing libraries by converting between HCD and CID libraries may be an effective interim solution while large-scale CID libraries are being developed.
Collapse
Affiliation(s)
- Damien B Wilburn
- Institute for Systems Biology, Seattle, Washington 98109, United States.,Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Alicia L Richards
- Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California 94158, United States.,J. David Gladstone Institutes, San Francisco, California 94158, United States.,Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, California 94158, United States
| | - Danielle L Swaney
- Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California 94158, United States.,J. David Gladstone Institutes, San Francisco, California 94158, United States.,Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, California 94158, United States
| | - Brian C Searle
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
50
|
Willems P, Fels U, Staes A, Gevaert K, Van Damme P. Use of Hybrid Data-Dependent and -Independent Acquisition Spectral Libraries Empowers Dual-Proteome Profiling. J Proteome Res 2021; 20:1165-1177. [PMID: 33467856 PMCID: PMC7871992 DOI: 10.1021/acs.jproteome.0c00350] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Indexed: 01/01/2023]
Abstract
In the context of bacterial infections, it is imperative that physiological responses can be studied in an integrated manner, meaning a simultaneous analysis of both the host and the pathogen responses. To improve the sensitivity of detection, data-independent acquisition (DIA)-based proteomics was found to outperform data-dependent acquisition (DDA) workflows in identifying and quantifying low-abundant proteins. Here, by making use of representative bacterial pathogen/host proteome samples, we report an optimized hybrid library generation workflow for DIA mass spectrometry relying on the use of data-dependent and in silico-predicted spectral libraries. When compared to searching DDA experiment-specific libraries only, the use of hybrid libraries significantly improved peptide detection to an extent suggesting that infection-relevant host-pathogen conditions could be profiled in sufficient depth without the need of a priori bacterial pathogen enrichment when studying the bacterial proteome. Proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD017904 and PXD017945.
Collapse
Affiliation(s)
- Patrick Willems
- Department
of Biochemistry and Microbiology, Ghent
University, Ghent 9000, Belgium
- Department
of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9000, Belgium
- VIB-UGent
Center for Plant Systems Biology, Ghent 9052, Belgium
| | - Ursula Fels
- Department
of Biochemistry and Microbiology, Ghent
University, Ghent 9000, Belgium
- VIB-UGent
Center for Medical Biotechnology, Ghent 9052, Belgium
| | - An Staes
- VIB-UGent
Center for Medical Biotechnology, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9000, Belgium
| | - Kris Gevaert
- VIB-UGent
Center for Medical Biotechnology, Ghent 9052, Belgium
- Department
of Biomolecular Medicine, Ghent University, Ghent 9000, Belgium
| | - Petra Van Damme
- Department
of Biochemistry and Microbiology, Ghent
University, Ghent 9000, Belgium
| |
Collapse
|