1
|
Wang K, Zhu M, Boulila W, Driss M, Gadekallu TR, Chen CM, Wang L, Kumari S, Yiu SM. SeqNovo: De Novo Peptide Sequencing Prediction in IoMT via Seq2Seq. IEEE J Biomed Health Inform 2025; 29:2377-2387. [PMID: 37792659 DOI: 10.1109/jbhi.2023.3321780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/06/2023]
Abstract
In the Internet of Medical Things (IoMT), de novo peptide sequencing prediction is one of the most important techniques for the fields of disease prediction, diagnosis, and treatment. Recently, deep-learning-based peptide sequencing prediction has been a new trend. However, most popular deep learning models for peptide sequencing prediction suffer from poor interpretability and poor ability to capture long-range dependencies. To solve these issues, we propose a model named SeqNovo, which has the encoding-decoding structure of sequence to sequence (Seq2Seq), the highly nonlinear properties of multilayer perceptron (MLP), and the ability of the attention mechanism to capture long-range dependencies. SeqNovo use MLP to improve the feature extraction and utilize the attention mechanism to discover key information. A series of experiments have been conducted to show that the SeqNovo is superior to the Seq2Seq benchmark model, DeepNovo. SeqNovo improves both the accuracy and interpretability of the predictions, which will be expected to support more related research.
Collapse
|
2
|
Ranff T, Dennison M, Bédorf J, Schulze S, Zinn N, Bantscheff M, van Heugten JJRM, Fufezan C. PeptideForest: Semisupervised Machine Learning Integrating Multiple Search Engines for Peptide Identification. J Proteome Res 2025; 24:929-939. [PMID: 39840643 DOI: 10.1021/acs.jproteome.4c00686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2025]
Abstract
The first step in bottom-up proteomics is the assignment of measured fragmentation mass spectra to peptide sequences, also known as peptide spectrum matches. In recent years novel algorithms have pushed the assignment to new heights; unfortunately, different algorithms come with different strengths and weaknesses and choosing the appropriate algorithm poses a challenge for the user. Here we introduce PeptideForest, a semisupervised machine learning approach that integrates the assignments of multiple algorithms to train a random forest classifier to alleviate that issue. Additionally, PeptideForest increases the number of peptide-to-spectrum matches that exhibit a q-value lower than 1% by 25.2 ± 1.6% compared to MS-GF+ data on samples containing mixed HEK and Escherichia coli proteomes. However, an increase in quantity does not necessarily reflect an increase in quality and this is why we devised a novel approach to determine the quality of the assigned spectra through TMT quantification of samples with known ground truths. Thereby, we could show that the increase in PSMs below 1% q-value does not come with a decrease in quantification quality and as such PeptideForest offers a possibility to gain deeper insights into bottom-up proteomics. PeptideForest has been integrated into our pipeline framework Ursgal and can therefore be combined with a wide array of algorithms.
Collapse
Affiliation(s)
- Tristan Ranff
- Institute of Pharmacy and Molecular Biotechnology, Heidelberg University, 69120 Heidelberg, Germany
- Cellzome, A GSK Company, Heidelberg 69117, Germany
- GSK/RDDT/QEL/DE─Data Streams and Operation, Heidelberg 69117, Germany
| | | | - Jeroen Bédorf
- Minds.ai, Santa Cruz, California 95060, United States
| | - Stefan Schulze
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, New York 14608, United States
| | - Nico Zinn
- Cellzome, A GSK Company, Heidelberg 69117, Germany
| | | | | | - Christian Fufezan
- Institute of Pharmacy and Molecular Biotechnology, Heidelberg University, 69120 Heidelberg, Germany
- Cellzome, A GSK Company, Heidelberg 69117, Germany
- GSK/RDDT/QEL/DE─Data Streams and Operation, Heidelberg 69117, Germany
| |
Collapse
|
3
|
Seo J, Choi S, Paek E. NovoRank: Refinement for De Novo Peptide Sequencing Based on Spectral Clustering and Deep Learning. J Proteome Res 2025; 24:903-910. [PMID: 39739539 DOI: 10.1021/acs.jproteome.4c00300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2025]
Abstract
De novo peptide sequencing is a valuable technique in mass-spectrometry-based proteomics, as it deduces peptide sequences directly from tandem mass spectra without relying on sequence databases. This database-independent method, however, relies solely on imperfect scoring functions that often lead to erroneous peptide identifications. To boost correct identification, we present NovoRank, a postprocessing tool that employs spectral clustering and machine learning to assign more plausible peptide sequences to spectra. Prior to de novo peptide sequencing, spectral clustering is applied to group similar spectra under the assumption that they originated from the same peptide species. NovoRank then employs a deep learning model, incorporating both cluster-derived proteomic features and individual spectrum characteristics, to rerank the candidate peptides produced by de novo peptide sequencing. Our results show that NovoRank significantly enhances the performance of various de novo peptide sequencing tools, increasing both recall and precision by 0.020 to 0.080 at the peptide-spectrum match (PSM) level. Notably, NovoRank achieves a recall as high as 0.830 for Casanovo at the PSM level. The source code of NovoRank is freely available at https://github.com/HanyangBISLab/NovoRank and is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
Collapse
Affiliation(s)
- Jangho Seo
- Department of Artificial Intelligence, Hanyang University, Seoul 04763, Republic of Korea
| | - Seunghyuk Choi
- Department of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
| | - Eunok Paek
- Department of Artificial Intelligence, Hanyang University, Seoul 04763, Republic of Korea
- Department of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
| |
Collapse
|
4
|
Van Den Bossche T, Beslic D, van Puyenbroeck S, Suomi T, Holstein T, Martens L, Elo LL, Muth T. Metaproteomics Beyond Databases: Addressing the Challenges and Potentials of De Novo Sequencing. Proteomics 2025:e202400321. [PMID: 39888246 DOI: 10.1002/pmic.202400321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 01/09/2025] [Accepted: 01/10/2025] [Indexed: 02/01/2025]
Abstract
Metaproteomics enables the large-scale characterization of microbial community proteins, offering crucial insights into their taxonomic composition, functional activities, and interactions within their environments. By directly analyzing proteins, metaproteomics offers insights into community phenotypes and the roles individual members play in diverse ecosystems. Although database-dependent search engines are commonly used for peptide identification, they rely on pre-existing protein databases, which can be limiting for complex, poorly characterized microbiomes. De novo sequencing presents a promising alternative, which derives peptide sequences directly from mass spectra without requiring a database. Over time, this approach has evolved from manual annotation to advanced graph-based, tag-based, and deep learning-based methods, significantly improving the accuracy of peptide identification. This Viewpoint explores the evolution, advantages, limitations, and future opportunities of de novo sequencing in metaproteomics. We highlight recent technological advancements that have improved its potential for detecting unsequenced species and for providing deeper functional insights into microbial communities.
Collapse
Affiliation(s)
- Tim Van Den Bossche
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Denis Beslic
- Centre for Artificial Intelligence in Public Health Research, Robert Koch Institute, Berlin, Germany
| | - Sam van Puyenbroeck
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Tomi Suomi
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Tanja Holstein
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
- Data Competence Center MF 2, Robert Koch Institute, Berlin, Germany
| | - Lennart Martens
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
- Institute of Biomedicine, University of Turku, Turku, Finland
| | - Thilo Muth
- Data Competence Center MF 2, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
5
|
Zhang X, Ling T, Jin Z, Xu S, Gao Z, Sun B, Qiu Z, Wei J, Dong N, Wang G, Wang G, Li L, Abdul-Mageed M, Lakshmanan LVS, He F, Ouyang W, Chang C, Sun S. π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing. Nat Commun 2025; 16:267. [PMID: 39747823 PMCID: PMC11695716 DOI: 10.1038/s41467-024-55021-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 11/28/2024] [Indexed: 01/04/2025] Open
Abstract
Peptide sequencing via tandem mass spectrometry (MS/MS) is essential in proteomics. Unlike traditional database searches, deep learning excels at de novo peptide sequencing, even for peptides missing from existing databases. Current deep learning models often rely on autoregressive generation, which suffers from error accumulation and slow inference speeds. In this work, we introduce π-PrimeNovo, a non-autoregressive Transformer-based model for peptide sequencing. With our architecture design and a CUDA-enhanced decoding module for precise mass control, π-PrimeNovo achieves significantly higher accuracy and up to 89x faster inference than state-of-the-art methods, making it ideal for large-scale applications like metaproteomics. Additionally, it excels in phosphopeptide mining and detecting low-abundance post-translational modifications (PTMs), marking a substantial advance in peptide sequencing with broad potential in biological research.
Collapse
Affiliation(s)
- Xiang Zhang
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
- University of British Columbia, Vancouver, BC, Canada
| | - Tianze Ling
- Tsinghua University, Beijing, China
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Zhi Jin
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Sheng Xu
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
| | - Zhiqiang Gao
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Boyan Sun
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Zijie Qiu
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
| | - Jiaqi Wei
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
- Zhejiang University, Zhejiang, China
| | - Nanqing Dong
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Guangshuai Wang
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
| | - Guibin Wang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Leyuan Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Muhammad Abdul-Mageed
- University of British Columbia, Vancouver, BC, Canada
- MBZUAI, Abu Dhabi, United Arab Emirates
| | | | - Fuchu He
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
- International Academy of Phronesis Medicine (Guangdong), Guangdong, Guangzhou, China
| | - Wanli Ouyang
- Shanghai Artificial Intelligence Laboratory, Shanghai, China.
| | - Cheng Chang
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China.
| | - Siqi Sun
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China.
| |
Collapse
|
6
|
Takan S, Allmer J. De Novo Sequencing of Peptides from Tandem Mass Spectra and Applications in Proteogenomics. Methods Mol Biol 2025; 2859:1-19. [PMID: 39436593 DOI: 10.1007/978-1-0716-4152-1_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
The changes in protein expression are hallmarks of development and disease. Protein expression can be established qualitatively and quantitatively using mass spectrometry (MS). Samples are prepared, proteins extracted and then analyzed using MS and MS/MS. The resulting spectra need to be processed computationally to assign peptide spectrum match. Database searches employ sequence databases or spectral libraries for matching possible peptides with the measured spectra. This route is well established but fails when peptides are not found in sequence repositories. In this case, de novo sequencing of MS/MS spectra can be employed. Many computational algorithms that establish the peptide sequence from MS/MS spectrum alone are available. While de novo sequencing assigns a sequence to an MS/MS spectrum, this assignment can be used in further processes for genome annotation. For example, novel exons can be assigned, known exons can be extended, and splice sites can be validated at the protein level. We compiled an extensive list of such algorithms, grouped them, and discussed the selected approaches. We also provide a roadmap of how de novo sequencing can enter mainstream proteogenomic analysis. In the future, de novo predictions can be added to sample-specific protein databases, including RNA-seq translations. These enriched databases can then be used for proteogenomics studies with existing pipelines.
Collapse
Affiliation(s)
- Savas Takan
- Department of artificial intelligence and data engineering, Faculty of Engineering, Ankara University, Ankara, Turkey
| | - Jens Allmer
- Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied Sciences, Mülheim adR., Germany.
| |
Collapse
|
7
|
Wen B, Noble WS. A multi-species benchmark for training and validating mass spectrometry proteomics machine learning models. Sci Data 2024; 11:1207. [PMID: 39516479 PMCID: PMC11549408 DOI: 10.1038/s41597-024-04068-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 11/01/2024] [Indexed: 11/16/2024] Open
Abstract
Training machine learning models for tasks such as de novo sequencing or spectral clustering requires large collections of confidently identified spectra. Here we describe a dataset of 2.8 million high-confidence peptide-spectrum matches derived from nine different species. The dataset is based on a previously described benchmark but has been re-processed to ensure consistent data quality and enforce separation of training and test peptides.
Collapse
Affiliation(s)
- Bo Wen
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
8
|
Melendez C, Sanders J, Yilmaz M, Bittremieux W, Fondrie WE, Oh S, Noble WS. Accounting for Digestion Enzyme Bias in Casanovo. J Proteome Res 2024; 23:4761-4769. [PMID: 39213590 DOI: 10.1021/acs.jproteome.4c00422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
A key parameter of any bottom-up proteomics mass spectrometry experiment is the identity of the enzyme that is used to digest proteins in the sample into peptides. The Casanovo de novo sequencing model was trained using data that was generated with trypsin digestion; consequently, the model prefers to predict peptides that end with the amino acids "K" or "R". This bias is desirable when Casanovo is used to analyze data that was also generated using trypsin but can be problematic if the data was generated using some other digestion enzyme. In this work, we modify Casanovo to take as input the identity of the digestion enzyme alongside each observed spectrum. We then train Casanovo with data generated by using several different enzymes, and we demonstrate that the resulting model successfully learns to capture enzyme-specific behavior. However, we find, surprisingly, that this new model does not yield a significant improvement in sequencing accuracy relative to a model trained without enzyme information but using the same training set. This observation may have important implications for future attempts to make use of experimental metadata in de novo sequencing models.
Collapse
Affiliation(s)
- Carlo Melendez
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Justin Sanders
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Melih Yilmaz
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium
| | | | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
9
|
Tran NH, Qiao R, Mao Z, Pan S, Zhang Q, Li W, Xin L, Li M, Shan B. NovoBoard: A Comprehensive Framework for Evaluating the False Discovery Rate and Accuracy of De Novo Peptide Sequencing. Mol Cell Proteomics 2024; 23:100849. [PMID: 39321875 PMCID: PMC11532909 DOI: 10.1016/j.mcpro.2024.100849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 08/27/2024] [Accepted: 09/18/2024] [Indexed: 09/27/2024] Open
Abstract
De novo peptide sequencing is one of the most fundamental research areas in mass spectrometry-based proteomics. Many methods have often been evaluated using a couple of simple metrics that do not fully reflect their overall performance. Moreover, there has not been an established method to estimate the false discovery rate (FDR) of de novo peptide-spectrum matches. Here we propose NovoBoard, a comprehensive framework to evaluate the performance of de novo peptide-sequencing methods. The framework consists of diverse benchmark datasets (including tryptic, nontryptic, immunopeptidomics, and different species) and a standard set of accuracy metrics to evaluate the fragment ions, amino acids, and peptides of the de novo results. More importantly, a new approach is designed to evaluate de novo peptide-sequencing methods on target-decoy spectra and to estimate and validate their FDRs. Our FDR estimation provides valuable information to assess the reliability of new peptides identified by de novo sequencing tools, especially when no ground-truth information is available to evaluate their accuracy. The FDR estimation can also be used to evaluate the capability of de novo peptide sequencing tools to distinguish between de novo peptide-spectrum matches and random matches. Our results thoroughly reveal the strengths and weaknesses of different de novo peptide-sequencing methods and how their performances depend on specific applications and the types of data.
Collapse
Affiliation(s)
| | - Rui Qiao
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada
| | - Zeping Mao
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada; David R. Cheriton School of Computer Science, University of Waterloo, Ontario, Canada
| | - Shengying Pan
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada
| | - Qing Zhang
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada
| | - Wenting Li
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada
| | - Lei Xin
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada.
| | - Ming Li
- David R. Cheriton School of Computer Science, University of Waterloo, Ontario, Canada.
| | - Baozhen Shan
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada.
| |
Collapse
|
10
|
Zeng WF, Yan G, Zhao HH, Liu C, Cao W. Uncovering missing glycans and unexpected fragments with pGlycoNovo for site-specific glycosylation analysis across species. Nat Commun 2024; 15:8055. [PMID: 39277585 PMCID: PMC11401942 DOI: 10.1038/s41467-024-52099-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 08/23/2024] [Indexed: 09/17/2024] Open
Abstract
Precision mapping of site-specific glycans using mass spectrometry is vital in glycoproteomics. However, the diversity of glycan compositions across species often exceeds database capacity, hindering the identification of rare glycans. Here, we introduce pGlycoNovo, a software within the pGlyco3 software environment, which employs a glycan first-based full-range Y-ion dynamic searching strategy. pGlycoNovo enables de novo identification of intact glycopeptides with rare glycans by considering all possible monosaccharide combinations, expanding the glycan search space to 16~1000 times compared to non-open search methods, while maintaining accuracy, sensitivity and speed. Reanalysis of SARS Covid-2 spike protein glycosylation data revealed 230 additional site-specific N-glycans and 30 previously unreported O-glycans. pGlycoNovo demonstrated high complementarity to six other tools and superior search speed. It enables characterization of site-specific N-glycosylation across five evolutionarily distant species, contributing to a dataset of 32,549 site-specific glycans on 4602 proteins, including 2409 site-specific rare glycans, and uncovering unexpected glycan fragments.
Collapse
Affiliation(s)
- Wen-Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- Center for Infectious Disease Research & School of Engineering, Westlake University, Hangzhou, China
| | - Guoquan Yan
- Shanghai Fifth People's Hospital and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- NHC Key Laboratory of Glycoconjugates Research, Fudan University, Shanghai, China
| | - Huan-Huan Zhao
- Shanghai Fifth People's Hospital and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- NHC Key Laboratory of Glycoconjugates Research, Fudan University, Shanghai, China
| | - Chao Liu
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, China
- School of Engineering Medicine & School of Biological Science and Medical Engineering, Beihang University, Beijing, China
| | - Weiqian Cao
- Shanghai Fifth People's Hospital and Institutes of Biomedical Sciences, Fudan University, Shanghai, China.
- NHC Key Laboratory of Glycoconjugates Research, Fudan University, Shanghai, China.
| |
Collapse
|
11
|
Tariq U, Saeed F. Predicting peptide properties from mass spectrometry data using deep attention-based multitask network and uncertainty quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.21.609035. [PMID: 39229185 PMCID: PMC11370541 DOI: 10.1101/2024.08.21.609035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
Database search algorithms reduce the number of potential candidate peptides against which scoring needs to be performed using a single (i.e. mass) property for filtering. While useful, filtering based on one property may lead to exclusion of non-abundant spectra and uncharacterized peptides - potentially exacerbating the streetlight effect. Here we present ProteoRift, a novel attention and multitask deep-network, which can predict multiple peptide properties (length, missed cleavages, and modification status) directly from spectra. We demonstrate that ProteoRift can predict these properties with up to 97% accuracy resulting in search-space reduction by more than 90%. As a result, our end-to-end pipeline is shown to exhibit 8x to 12x speedups with peptide deduction accuracy comparable to algorithmic techniques. We also formulate two uncertainty estimation metrics, which can distinguish between in-distribution and out-of-distribution data (ROC-AUC 0.99) and predict high-scoring mass spectra against correct peptide (ROC-AUC 0.94). These models and metrics are integrated in an end-to-end ML pipeline available at https://github.com/pcdslab/ProteoRift.
Collapse
Affiliation(s)
- Usman Tariq
- Knight Foundation School of Computing, and Information Sciences, Florida International University (FIU), Miami, FL USA
| | - Fahad Saeed
- Knight Foundation School of Computing, and Information Sciences, Florida International University (FIU), Miami, FL USA
- Biomolecular Sciences Institute (BSI), Florida International University, Miami, FL, USA
- Department of Human and Molecular Genetics, Herbert Wertheim School of Medicine, Florida International University, Miami, FL, USA
| |
Collapse
|
12
|
Flender D, Vilenne F, Adams C, Boonen K, Valkenborg D, Baggerman G. Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain. MASS SPECTROMETRY REVIEWS 2024. [PMID: 39152539 DOI: 10.1002/mas.21905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/19/2024]
Abstract
Immunopeptidomics is becoming an increasingly important field of study. The capability to identify immunopeptides with pivotal roles in the human immune system is essential to shift the current curative medicine towards personalized medicine. Throughout the years, the field has matured, giving insight into the current pitfalls. Nowadays, it is commonly accepted that generalizing shotgun proteomics workflows is malpractice because immunopeptidomics faces numerous challenges. While many of these difficulties have been addressed, the road towards the ideal workflow remains complicated. Although the presence of Posttranslational modifications (PTMs) in the immunopeptidome has been demonstrated, their identification remains highly challenging despite their significance for immunotherapies. The large number of unpredictable modifications in the immunopeptidome plays a pivotal role in the functionality and these challenges. This review provides a comprehensive overview of the current advancements in immunopeptidomics. We delve into the challenges associated with identifying PTMs within the immunopeptidome, aiming to address the current state of the field.
Collapse
Affiliation(s)
- Daniel Flender
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- Health Unit, VITO, Mol, Belgium
| | - Frédérique Vilenne
- Health Unit, VITO, Mol, Belgium
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- ImmuneSpec, Niel, Belgium
| | - Dirk Valkenborg
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Geert Baggerman
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
- ImmuneSpec, Niel, Belgium
| |
Collapse
|
13
|
Yilmaz M, Fondrie WE, Bittremieux W, Melendez CF, Nelson R, Ananth V, Oh S, Noble WS. Sequence-to-sequence translation from mass spectra to peptides with a transformer model. Nat Commun 2024; 15:6427. [PMID: 39080256 PMCID: PMC11289372 DOI: 10.1038/s41467-024-49731-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 06/18/2024] [Indexed: 08/02/2024] Open
Abstract
A fundamental challenge in mass spectrometry-based proteomics is the identification of the peptide that generated each acquired tandem mass spectrum. Approaches that leverage known peptide sequence databases cannot detect unexpected peptides and can be impractical or impossible to apply in some settings. Thus, the ability to assign peptide sequences to tandem mass spectra without prior information-de novo peptide sequencing-is valuable for tasks including antibody sequencing, immunopeptidomics, and metaproteomics. Although many methods have been developed to address this problem, it remains an outstanding challenge in part due to the difficulty of modeling the irregular data structure of tandem mass spectra. Here, we describe Casanovo, a machine learning model that uses a transformer neural network architecture to translate the sequence of peaks in a tandem mass spectrum into the sequence of amino acids that comprise the generating peptide. We train a Casanovo model from 30 million labeled spectra and demonstrate that the model outperforms several state-of-the-art methods on a cross-species benchmark dataset. We also develop a version of Casanovo that is fine-tuned for non-enzymatic peptides. Finally, we demonstrate that Casanovo's superior performance improves the analysis of immunopeptidomics and metaproteomics experiments and allows us to delve deeper into the dark proteome.
Collapse
Affiliation(s)
- Melih Yilmaz
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | | | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Carlo F Melendez
- Department of Genome Sciences, University of Washington, Seattle, USA
| | - Rowan Nelson
- Department of Genome Sciences, University of Washington, Seattle, USA
| | - Varun Ananth
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA.
- Department of Genome Sciences, University of Washington, Seattle, USA.
| |
Collapse
|
14
|
Kalhor M, Lapin J, Picciani M, Wilhelm M. Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors Into Peptide Identification. Mol Cell Proteomics 2024; 23:100798. [PMID: 38871251 PMCID: PMC11269915 DOI: 10.1016/j.mcpro.2024.100798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/26/2024] [Accepted: 06/09/2024] [Indexed: 06/15/2024] Open
Abstract
Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities, and future perspectives of this approach and its impact on mass spectrometry-based proteomics.
Collapse
Affiliation(s)
- Mostafa Kalhor
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Joel Lapin
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany; Munich Data Science Institute, Technical University of Munich, Garching, Germany.
| |
Collapse
|
15
|
Liu W, Zhao M, Gan L, Sun B, He S, Liu Y, Liu L, Li W, Chen J, Liu Y, Zhang J, Xu J. PeposX-Exhaust: A lightweight and efficient tool for identification of short peptides. Food Chem X 2024; 22:101249. [PMID: 38440058 PMCID: PMC10910222 DOI: 10.1016/j.fochx.2024.101249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 02/16/2024] [Accepted: 02/18/2024] [Indexed: 03/06/2024] Open
Abstract
Short peptides have become the focus of recent research due to their variable bioactivities, good digestibility and wide existences in food-derived protein hydrolysates. However, due to the high complexity of the samples, identifying short peptides still remains a challenge. In this work, a tool, named PeposX-Exhaust, was developed for short peptide identification. Through validation with known peptides, PeposX-Exhaust identified all the submitted spectra and the accuracy rate reached 75.36%, and the adjusted accuracy rate further reached 98.55% when with top 5 candidates considered. Compared with other tools, the accuracy rate by PeposX-Exhaust was at least 70% higher than two database-search tools and 15% higher than the other two de novo-sequencing tools, respectively. For further application, the numbers of short peptides identified from soybean, walnut, collagen and bonito protein hydrolysates reached 1145, 628, 746 and 681, respectively. This fully demonstrated the superiority of the tool in short peptide identification.
Collapse
Affiliation(s)
- Wanshun Liu
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, China
| | - Mouming Zhao
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, China
- School of Food Science and Engineering, South China University of Technology, Guangzhou 510640, China
| | - Lishe Gan
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Baoguo Sun
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, Beijing Technology & Business University, Beijing 100048, China
| | - Shiqi He
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, China
| | - Yang Liu
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, China
- College of Food Science and Technology, Hunan Agricultural University, Changsha 410128, China
| | - Lei Liu
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, China
| | - Wu Li
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, China
| | - Jing Chen
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, China
| | - Yang Liu
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, China
| | - Jianan Zhang
- School of Food Science and Engineering, South China University of Technology, Guangzhou 510640, China
| | - Jucai Xu
- Guangdong Provincial Key Laboratory of Large Animal Models for Biomedicine, School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, China
| |
Collapse
|
16
|
Ananth V, Sanders J, Yilmaz M, Wen B, Oh S, Noble WS. A learned score function improves the power of mass spectrometry database search. Bioinformatics 2024; 40:i410-i417. [PMID: 38940129 PMCID: PMC11211853 DOI: 10.1093/bioinformatics/btae218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION One of the core problems in the analysis of protein tandem mass spectrometry data is the peptide assignment problem: determining, for each observed spectrum, the peptide sequence that was responsible for generating the spectrum. Two primary classes of methods are used to solve this problem: database search and de novo peptide sequencing. State-of-the-art methods for de novo sequencing use machine learning methods, whereas most database search engines use hand-designed score functions to evaluate the quality of a match between an observed spectrum and a candidate peptide from the database. We hypothesized that machine learning models for de novo sequencing implicitly learn a score function that captures the relationship between peptides and spectra, and thus may be re-purposed as a score function for database search. Because this score function is trained from massive amounts of mass spectrometry data, it could potentially outperform existing, hand-designed database search tools. RESULTS To test this hypothesis, we re-engineered Casanovo, which has been shown to provide state-of-the-art de novo sequencing capabilities, to assign scores to given peptide-spectrum pairs. We then evaluated the statistical power of this Casanovo score function, Casanovo-DB, to detect peptides on a benchmark of three mass spectrometry runs from three different species. In addition, we show that re-scoring with the Percolator post-processor benefits Casanovo-DB more than other score functions, further increasing the number of detected peptides.
Collapse
Affiliation(s)
- Varun Ananth
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Justin Sanders
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Melih Yilmaz
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Bo Wen
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
17
|
Ebrahimi S, Guo X. Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. ARXIV 2024:arXiv:2402.11363v3. [PMID: 38659639 PMCID: PMC11042412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce Transformer-DIA, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Transformer-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our Transformer-DIA model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Transformer-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/Transformer-DIA.
Collapse
Affiliation(s)
- Shiva Ebrahimi
- Computer Science & Engineering, University of North Texas, Denton, USA
| | - Xuan Guo
- Computer Science & Engineering, University of North Texas, Denton, USA
| |
Collapse
|
18
|
Minegishi Y, Haga Y, Ueda K. Emerging potential of immunopeptidomics by mass spectrometry in cancer immunotherapy. Cancer Sci 2024; 115:1048-1059. [PMID: 38382459 PMCID: PMC11007014 DOI: 10.1111/cas.16118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/02/2024] [Accepted: 02/07/2024] [Indexed: 02/23/2024] Open
Abstract
With significant advances in analytical technologies, research in the field of cancer immunotherapy, such as adoptive T cell therapy, cancer vaccine, and immune checkpoint blockade (ICB), is currently gaining tremendous momentum. Since the efficacy of cancer immunotherapy is recognized only by a minority of patients, more potent tumor-specific antigens (TSAs, also known as neoantigens) and predictive markers for treatment response are of great interest. In cancer immunity, immunopeptides, presented by human leukocyte antigen (HLA) class I, play a role as initiating mediators of immunogenicity. The latest advancement in the interdisciplinary multiomics approach has rapidly enlightened us about the identity of the "dark matter" of cancer and the associated immunopeptides. In this field, mass spectrometry (MS) is a viable option to select because of the naturally processed and actually presented TSA candidates in order to grasp the whole picture of the immunopeptidome. In the past few years the search space has been enlarged by the multiomics approach, the sensitivity of mass spectrometers has been improved, and deep/machine-learning-supported peptide search algorithms have taken immunopeptidomics to the next level. In this review, along with the introduction of key technical advancements in immunopeptidomics, the potential and further directions of immunopeptidomics will be reviewed from the perspective of cancer immunotherapy.
Collapse
Affiliation(s)
- Yuriko Minegishi
- Cancer Proteomics Group, Cancer Precision Medicine CenterJapanese Foundation for Cancer ResearchTokyoJapan
| | - Yoshimi Haga
- Cancer Proteomics Group, Cancer Precision Medicine CenterJapanese Foundation for Cancer ResearchTokyoJapan
| | - Koji Ueda
- Cancer Proteomics Group, Cancer Precision Medicine CenterJapanese Foundation for Cancer ResearchTokyoJapan
| |
Collapse
|
19
|
Liao H, Barra C, Zhou Z, Peng X, Woodhouse I, Tailor A, Parker R, Carré A, Borrow P, Hogan MJ, Paes W, Eisenlohr LC, Mallone R, Nielsen M, Ternette N. MARS an improved de novo peptide candidate selection method for non-canonical antigen target discovery in cancer. Nat Commun 2024; 15:661. [PMID: 38253617 PMCID: PMC10803737 DOI: 10.1038/s41467-023-44460-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 12/14/2023] [Indexed: 01/24/2024] Open
Abstract
Understanding the nature and extent of non-canonical human leukocyte antigen (HLA) presentation in tumour cells is a priority for target antigen discovery for the development of next generation immunotherapies in cancer. We here employ a de novo mass spectrometric sequencing approach with a refined, MHC-centric analysis strategy to detect non-canonical MHC-associated peptides specific to cancer without any prior knowledge of the target sequence from genomic or RNA sequencing data. Our strategy integrates MHC binding rank, Average local confidence scores, and peptide Retention time prediction for improved de novo candidate Selection; culminating in the machine learning model MARS. We benchmark our model on a large synthetic peptide library dataset and reanalysis of a published dataset of high-quality non-canonical MHC-associated peptide identifications in human cancer. We achieve almost 2-fold improvement for high quality spectral assignments in comparison to de novo sequencing alone with an estimated accuracy of above 85.7% when integrated with a stepwise peptide sequence mapping strategy. Finally, we utilize MARS to detect and validate lncRNA-derived peptides in human cervical tumour resections, demonstrating its suitability to discover novel, immunogenic, non-canonical peptide sequences in primary tumour tissue.
Collapse
Affiliation(s)
- Hanqing Liao
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | | | - Zhicheng Zhou
- Université Paris Cité, Institut Cochin, CNRS, INSERM, 75014, Paris, France
| | - Xu Peng
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
| | - Isaac Woodhouse
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Arun Tailor
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Robert Parker
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Alexia Carré
- Université Paris Cité, Institut Cochin, CNRS, INSERM, 75014, Paris, France
| | - Persephone Borrow
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Michael J Hogan
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Wayne Paes
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK
| | - Laurence C Eisenlohr
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Roberto Mallone
- Université Paris Cité, Institut Cochin, CNRS, INSERM, 75014, Paris, France
- Assistance Publique Hôpitaux de Paris, Service de Diabétologie et Immunologie Clinique, Cochin Hospital, 75014, Paris, France
| | | | - Nicola Ternette
- The Jenner Institute, University of Oxford, Oxford, OX3 7BN, UK.
- Centre for Immuno-Oncology, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7DQ, UK.
- University of Utrecht, Department of Pharmaceutical Sciences, 3584 CH, Utrecht, The Netherlands.
| |
Collapse
|
20
|
Adalia R, Patel S, Paiva A, Kaufman T, Zamora I, Cai X, Sanjuan G, Shou WZ. Development of a Predictive Multiple Reaction Monitoring (MRM) Model for High-Throughput ADME Analyses Using Learning-to-Rank (LTR) Techniques. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:131-139. [PMID: 38014625 DOI: 10.1021/jasms.3c00363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Multiple Reaction Monitoring (MRM) is an important MS/MS technique commonly used in drug discovery and development, allowing for the selective and sensitive quantification of compounds in complex matrices. However, compound optimization can be resource intensive and requires experimental determination of product ions for each compound. In this study, we developed a Learning-to-Rank (LTR) model to predict the product ions directly from compound structures, eliminating the requirement for MRM optimization experiments. Experimentally determined MRM conditions for 5757 compounds were used to develop the model. Using the MassChemSite software, theoretical fragments and their mass-to-charge ratios were generated, which were then matched to the experimental product ions to create a data set. Each possible fragment was ranked based on its intensity in the experimental data. Different LTR models were built on a training split. Hyperparameter selection was performed using 5-fold cross validation. The models were evaluated using the Normalized Discounted Cumulative Gain at top k (NDCG@k) and the Coverage at top k (Coverage@k) metrics. Finally, the model was applied to predict MRM conditions for a prospective set of 235 compounds in high-throughput Caco-2 permeability and metabolic stability assays, and quantification results were compared to those obtained with experimentally acquired MRM conditions. The LTR model achieved a NDCG@5 of 0.732 and Coverage@5 of 0.841 on the validation split, and its predictions led to 97% of biologically equivalent results in the Caco-2 permeability and metabolic stability assays.
Collapse
Affiliation(s)
- Ramon Adalia
- Lead Molecular Design S.L., 08172 Sant Cugat de Valles, Spain
- Universitat Autònoma de Barcelona, 08193 Barcelona, Spain
| | - Shivani Patel
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| | - Anthony Paiva
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| | - Tierni Kaufman
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| | - Ismael Zamora
- Lead Molecular Design S.L., 08172 Sant Cugat de Valles, Spain
| | - Xianmei Cai
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| | - Gemma Sanjuan
- Universitat Autònoma de Barcelona, 08193 Barcelona, Spain
| | - Wilson Z Shou
- Lead Discovery and Optimization, Bristol-Myers Squibb, Princeton, New Jersey 08648, United States
| |
Collapse
|
21
|
Klaproth-Andrade D, Hingerl J, Bruns Y, Smith NH, Träuble J, Wilhelm M, Gagneur J. Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing. Nat Commun 2024; 15:151. [PMID: 38167372 PMCID: PMC10762064 DOI: 10.1038/s41467-023-44323-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 12/08/2023] [Indexed: 01/05/2024] Open
Abstract
Unlike for DNA and RNA, accurate and high-throughput sequencing methods for proteins are lacking, hindering the utility of proteomics in applications where the sequences are unknown including variant calling, neoepitope identification, and metaproteomics. We introduce Spectralis, a de novo peptide sequencing method for tandem mass spectrometry. Spectralis leverages several innovations including a convolutional neural network layer connecting peaks in spectra spaced by amino acid masses, proposing fragment ion series classification as a pivotal task for de novo peptide sequencing, and a peptide-spectrum confidence score. On spectra for which database search provided a ground truth, Spectralis surpassed 40% sensitivity at 90% precision, nearly doubling state-of-the-art sensitivity. Application to unidentified spectra confirmed its superiority and showcased its applicability to variant calling. Altogether, these algorithmic innovations and the substantial sensitivity increase in the high-precision range constitute an important step toward broadly applicable peptide sequencing.
Collapse
Affiliation(s)
- Daniela Klaproth-Andrade
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
| | - Johannes Hingerl
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Yanik Bruns
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Nicholas H Smith
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Jakob Träuble
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Mathias Wilhelm
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
- Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, Freising, Germany.
| | - Julien Gagneur
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| |
Collapse
|
22
|
Zancolli G, von Reumont BM, Anderluh G, Caliskan F, Chiusano ML, Fröhlich J, Hapeshi E, Hempel BF, Ikonomopoulou MP, Jungo F, Marchot P, de Farias TM, Modica MV, Moran Y, Nalbantsoy A, Procházka J, Tarallo A, Tonello F, Vitorino R, Zammit ML, Antunes A. Web of venom: exploration of big data resources in animal toxin research. Gigascience 2024; 13:giae054. [PMID: 39250076 PMCID: PMC11382406 DOI: 10.1093/gigascience/giae054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/01/2024] [Accepted: 07/13/2024] [Indexed: 09/10/2024] Open
Abstract
Research on animal venoms and their components spans multiple disciplines, including biology, biochemistry, bioinformatics, pharmacology, medicine, and more. Manipulating and analyzing the diverse array of data required for venom research can be challenging, and relevant tools and resources are often dispersed across different online platforms, making them less accessible to nonexperts. In this article, we address the multifaceted needs of the scientific community involved in venom and toxin-related research by identifying and discussing web resources, databases, and tools commonly used in this field. We have compiled these resources into a comprehensive table available on the VenomZone website (https://venomzone.expasy.org/10897). Furthermore, we highlight the challenges currently faced by researchers in accessing and using these resources and emphasize the importance of community-driven interdisciplinary approaches. We conclude by underscoring the significance of enhancing standards, promoting interoperability, and encouraging data and method sharing within the venom research community.
Collapse
Affiliation(s)
- Giulia Zancolli
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Björn Marcus von Reumont
- Goethe University Frankfurt, Faculty of Biological Sciences, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
| | - Gregor Anderluh
- Department of Molecular Biology and Nanobiotechnology, National Institute of Chemistry, 1000 Ljubljana, Slovenia
| | - Figen Caliskan
- Department of Biology, Faculty of Science, Eskisehir Osmangazi University, 26040 Eskişehir, Turkey
| | - Maria Luisa Chiusano
- Department of Agricultural Sciences, University Federico II of Naples, 80055 Portici, Naples, Italy
- Department of Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy
| | - Jacob Fröhlich
- Veterinary Center for Resistance Research (TZR), Freie Universität Berlin, 14163 Berlin, Germany
| | - Evroula Hapeshi
- Department of Health Sciences, School of Life and Health Sciences, University of Nicosia, 1700 Nicosia, Cyprus
| | - Benjamin-Florian Hempel
- Veterinary Center for Resistance Research (TZR), Freie Universität Berlin, 14163 Berlin, Germany
| | - Maria P Ikonomopoulou
- Madrid Institute of Advanced Studies in Food, Precision Nutrition & Aging Program, 28049 Madrid, Spain
| | - Florence Jungo
- SIB Swiss Institute of Bioinformatics, Swiss-Prot Group, 1211 Geneva, Switzerland
| | - Pascale Marchot
- Laboratory Architecture et Fonction des Macromolécules Biologiques, Aix-Marseille University, Centre National de la Recherche Scientifique, Faculté des Sciences, Campus Luminy, 13288 Marseille, France
| | - Tarcisio Mendes de Farias
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maria Vittoria Modica
- Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, 00198 Rome, Italy
| | - Yehu Moran
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, 9190401 Jerusalem, Israel
| | - Ayse Nalbantsoy
- Engineering Faculty, Bioengineering Department, Ege University, 35100 Bornova-Izmir, Turkey
| | - Jan Procházka
- Laboratory of Transgenic Models of Diseases, Institute of Molecular Genetics of the Czech Academy of Sciences, 252 50 Vestec, Czech Republic
| | - Andrea Tarallo
- Institute of Research on Terrestrial Ecosystems (IRET), National Research Council (CNR), 73100 Lecce, Italy
| | - Fiorella Tonello
- Neuroscience Institute, National Research Council (CNR), 35131 Padua, Italy
| | - Rui Vitorino
- Department of Medical Sciences, iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Mark Lawrence Zammit
- Department of Clinical Pharmacology & Therapeutics, Faculty of Medicine & Surgery, University of Malta, 2090 Msida, Malta
- Malta National Poisons Centre, Malta Life Sciences Park, 3000 San Ġwann, Malta
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| |
Collapse
|
23
|
Ebrahimi S, Guo X. Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING 2023; 2023:28-35. [PMID: 38665266 PMCID: PMC11044815 DOI: 10.1109/bibe60311.2023.00013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce Casanovo-DIA, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our Casanovo-DIA model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/Casanovo-DIA.
Collapse
Affiliation(s)
- Shiva Ebrahimi
- Computer Science & Engineering University of North Texas Denton, USA
| | - Xuan Guo
- Computer Science & Engineering University of North Texas Denton, USA
| |
Collapse
|
24
|
Wang X, Zeng H, Lin L, Huang Y, Lin H, Que Y. Deep learning-empowered crop breeding: intelligent, efficient and promising. FRONTIERS IN PLANT SCIENCE 2023; 14:1260089. [PMID: 37860239 PMCID: PMC10583549 DOI: 10.3389/fpls.2023.1260089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 09/13/2023] [Indexed: 10/21/2023]
Abstract
Crop breeding is one of the main approaches to increase crop yield and improve crop quality. However, the breeding process faces challenges such as complex data, difficulties in data acquisition, and low prediction accuracy, resulting in low breeding efficiency and long cycle. Deep learning-based crop breeding is a strategy that applies deep learning techniques to improve and optimize the breeding process, leading to accelerated crop improvement, enhanced breeding efficiency, and the development of higher-yielding, more adaptive, and disease-resistant varieties for agricultural production. This perspective briefly discusses the mechanisms, key applications, and impact of deep learning in crop breeding. We also highlight the current challenges associated with this topic and provide insights into its future application prospects.
Collapse
Affiliation(s)
- Xiaoding Wang
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Haitao Zeng
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Limei Lin
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Yanze Huang
- School of Computer Science and Mathematics, Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, China
| | - Hui Lin
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Youxiong Que
- Key Laboratory of Sugarcane Biology and Genetic Breeding, Ministry of Agriculture and Rural Affairs, Fujian Agriculture and Forestry University, Fuzhou, China
- National Key Laboratory for Tropical Crop Breeding, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural Sciences, Hainan, China
| |
Collapse
|
25
|
Ng CCA, Zhou Y, Yao ZP. Algorithms for de-novo sequencing of peptides by tandem mass spectrometry: A review. Anal Chim Acta 2023; 1268:341330. [PMID: 37268337 DOI: 10.1016/j.aca.2023.341330] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 06/04/2023]
Abstract
Peptide sequencing is of great significance to fundamental and applied research in the fields such as chemical, biological, medicinal and pharmaceutical sciences. With the rapid development of mass spectrometry and sequencing algorithms, de-novo peptide sequencing using tandem mass spectrometry (MS/MS) has become the main method for determining amino acid sequences of novel and unknown peptides. Advanced algorithms allow the amino acid sequence information to be accurately obtained from MS/MS spectra in short time. In this review, algorithms from exhaustive search to the state-of-art machine learning and neural network for high-throughput and automated de-novo sequencing are introduced and compared. Impacts of datasets on algorithm performance are highlighted. The current limitations and promising direction of de-novo peptide sequencing are also discussed in this review.
Collapse
Affiliation(s)
- Cheuk Chi A Ng
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Yin Zhou
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Zhong-Ping Yao
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China.
| |
Collapse
|
26
|
Potgieter MG, Nel AJM, Fortuin S, Garnett S, Wendoh JM, Tabb DL, Mulder NJ, Blackburn JM. MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets. PLoS Comput Biol 2023; 19:e1011163. [PMID: 37327214 PMCID: PMC10310047 DOI: 10.1371/journal.pcbi.1011163] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 06/29/2023] [Accepted: 05/08/2023] [Indexed: 06/18/2023] Open
Abstract
BACKGROUND Microbiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines. RESULTS We compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database-but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation. CONCLUSIONS By estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself.
Collapse
Affiliation(s)
- Matthys G. Potgieter
- Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Andrew J. M. Nel
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Suereta Fortuin
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Shaun Garnett
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
| | - Jerome M. Wendoh
- Division of Immunology, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - David L. Tabb
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences; African Microbiome Institute; South African Tuberculosis Bioinformatics Initiative; Stellenbosch University, Cape Town, South Africa
| | - Nicola J. Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Institute of Infectious Disease & Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jonathan M. Blackburn
- Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa
- Institute of Infectious Disease & Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
27
|
Hellinger R, Sigurdsson A, Wu W, Romanova EV, Li L, Sweedler JV, Süssmuth RD, Gruber CW. Peptidomics. NATURE REVIEWS. METHODS PRIMERS 2023; 3:25. [PMID: 37250919 PMCID: PMC7614574 DOI: 10.1038/s43586-023-00205-2] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/09/2023] [Indexed: 05/31/2023]
Abstract
Peptides are biopolymers, typically consisting of 2-50 amino acids. They are biologically produced by the cellular ribosomal machinery or by non-ribosomal enzymes and, sometimes, other dedicated ligases. Peptides are arranged as linear chains or cycles, and include post-translational modifications, unusual amino acids and stabilizing motifs. Their structure and molecular size render them a unique chemical space, between small molecules and larger proteins. Peptides have important physiological functions as intrinsic signalling molecules, such as neuropeptides and peptide hormones, for cellular or interspecies communication, as toxins to catch prey or as defence molecules to fend off enemies and microorganisms. Clinically, they are gaining popularity as biomarkers or innovative therapeutics; to date there are more than 60 peptide drugs approved and more than 150 in clinical development. The emerging field of peptidomics comprises the comprehensive qualitative and quantitative analysis of the suite of peptides in a biological sample (endogenously produced, or exogenously administered as drugs). Peptidomics employs techniques of genomics, modern proteomics, state-of-the-art analytical chemistry and innovative computational biology, with a specialized set of tools. The complex biological matrices and often low abundance of analytes typically examined in peptidomics experiments require optimized sample preparation and isolation, including in silico analysis. This Primer covers the combination of techniques and workflows needed for peptide discovery and characterization and provides an overview of various biological and clinical applications of peptidomics.
Collapse
Affiliation(s)
- Roland Hellinger
- Center for Physiology and Pharmacology, Medical University of Vienna, Vienna, Austria
| | - Arnar Sigurdsson
- Institut für Chemie, Technische Universität Berlin, Berlin, Germany
| | - Wenxin Wu
- School of Pharmacy and Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Elena V Romanova
- Department of Chemistry, University of Illinois, Urbana, IL, USA
| | - Lingjun Li
- School of Pharmacy and Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Christian W Gruber
- Center for Physiology and Pharmacology, Medical University of Vienna, Vienna, Austria
| |
Collapse
|
28
|
Beslic D, Tscheuschner G, Renard BY, Weller MG, Muth T. Comprehensive evaluation of peptide de novo sequencing tools for monoclonal antibody assembly. Brief Bioinform 2023; 24:bbac542. [PMID: 36545804 PMCID: PMC9851299 DOI: 10.1093/bib/bbac542] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/25/2022] [Accepted: 11/10/2022] [Indexed: 12/24/2022] Open
Abstract
Monoclonal antibodies are biotechnologically produced proteins with various applications in research, therapeutics and diagnostics. Their ability to recognize and bind to specific molecule structures makes them essential research tools and therapeutic agents. Sequence information of antibodies is helpful for understanding antibody-antigen interactions and ensuring their affinity and specificity. De novo protein sequencing based on mass spectrometry is a valuable method to obtain the amino acid sequence of peptides and proteins without a priori knowledge. In this study, we evaluated six recently developed de novo peptide sequencing algorithms (Novor, pNovo 3, DeepNovo, SMSNet, PointNovo and Casanovo), which were not specifically designed for antibody data. We validated their ability to identify and assemble antibody sequences on three multi-enzymatic data sets. The deep learning-based tools Casanovo and PointNovo showed an increased peptide recall across different enzymes and data sets compared with spectrum-graph-based approaches. We evaluated different error types of de novo peptide sequencing tools and their performance for different numbers of missing cleavage sites, noisy spectra and peptides of various lengths. We achieved a sequence coverage of 97.69-99.53% on the light chains of three different antibody data sets using the de Bruijn assembler ALPS and the predictions from Casanovo. However, low sequence coverage and accuracy on the heavy chains demonstrate that complete de novo protein sequencing remains a challenging issue in proteomics that requires improved de novo error correction, alternative digestion strategies and hybrid approaches such as homology search to achieve high accuracy on long protein sequences.
Collapse
Affiliation(s)
- Denis Beslic
- Robert Koch Institute, MF1, Nordufer 20, 13353 Berlin
| | - Georg Tscheuschner
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| | - Bernhard Y Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam
| | - Michael G Weller
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| | - Thilo Muth
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| |
Collapse
|
29
|
Gueto-Tettay C, Tang D, Happonen L, Heusel M, Khakzad H, Malmström J, Malmström L. Multienzyme deep learning models improve peptide de novo sequencing by mass spectrometry proteomics. PLoS Comput Biol 2023; 19:e1010457. [PMID: 36668672 PMCID: PMC9891523 DOI: 10.1371/journal.pcbi.1010457] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 02/01/2023] [Accepted: 01/04/2023] [Indexed: 01/21/2023] Open
Abstract
Generating and analyzing overlapping peptides through multienzymatic digestion is an efficient procedure for de novo protein using from bottom-up mass spectrometry (MS). Despite improved instrumentation and software, de novo MS data analysis remains challenging. In recent years, deep learning models have represented a performance breakthrough. Incorporating that technology into de novo protein sequencing workflows require machine-learning models capable of handling highly diverse MS data. In this study, we analyzed the requirements for assembling such generalizable deep learning models by systemcally varying the composition and size of the training set. We assessed the generated models' performances using two test sets composed of peptides originating from the multienzyme digestion of samples from various species. The peptide recall values on the test sets showed that the deep learning models generated from a collection of highly N- and C-termini diverse peptides generalized 76% more over the termini-restricted ones. Moreover, expanding the training set's size by adding peptides from the multienzymatic digestion with five proteases of several species samples led to a 2-3 fold generalizability gain. Furthermore, we tested the applicability of these multienzyme deep learning (MEM) models by fully de novo sequencing the heavy and light monomeric chains of five commercial antibodies (mAbs). MEMs extracted over 10000 matching and overlapped peptides across six different proteases mAb samples, achieving a 100% sequence coverage for 8 of the ten polypeptide chains. We foretell that the MEMs' proven improvements to de novo analysis will positively impact several applications, such as analyzing samples of high complexity, unknown nature, or the peptidomics field.
Collapse
Affiliation(s)
- Carlos Gueto-Tettay
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Di Tang
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Lotta Happonen
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Moritz Heusel
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Hamed Khakzad
- Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
| | - Johan Malmström
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Lars Malmström
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| |
Collapse
|
30
|
Zhang D, Lin Q, Xia T, Zhao J, Zhang W, Ouyang Z, Xia Y. LipidOA: A Machine-Learning and Prior-Knowledge-Based Tool for Structural Annotation of Glycerophospholipids. Anal Chem 2022; 94:16759-16767. [DOI: 10.1021/acs.analchem.2c03505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Affiliation(s)
- Donghui Zhang
- State Key Laboratory of Precision Measurement Technology and Instruments, Department of Precision Instrument, Tsinghua University, Beijing100084, China
- MOE Key Laboratory of Bioorganic Phosphorus Chemistry and Chemical Biology, Department of Chemistry, Tsinghua University, Beijing10084, China
| | - Qiaohong Lin
- MOE Key Laboratory of Bioorganic Phosphorus Chemistry and Chemical Biology, Department of Chemistry, Tsinghua University, Beijing10084, China
| | - Tian Xia
- MOE Key Laboratory of Bioorganic Phosphorus Chemistry and Chemical Biology, Department of Chemistry, Tsinghua University, Beijing10084, China
| | - Jing Zhao
- MOE Key Laboratory of Bioorganic Phosphorus Chemistry and Chemical Biology, Department of Chemistry, Tsinghua University, Beijing10084, China
| | - Wenpeng Zhang
- State Key Laboratory of Precision Measurement Technology and Instruments, Department of Precision Instrument, Tsinghua University, Beijing100084, China
| | - Zheng Ouyang
- State Key Laboratory of Precision Measurement Technology and Instruments, Department of Precision Instrument, Tsinghua University, Beijing100084, China
| | - Yu Xia
- MOE Key Laboratory of Bioorganic Phosphorus Chemistry and Chemical Biology, Department of Chemistry, Tsinghua University, Beijing10084, China
| |
Collapse
|
31
|
Xiang H, Zhang L, Bu F, Guan X, Chen L, Zhang H, Zhao Y, Chen H, Zhang W, Li Y, Lee LJ, Mei Z, Rao Y, Gu Y, Hou Y, Mu F, Dong X. A Novel Proteogenomic Integration Strategy Expands the Breadth of Neo-Epitope Sources. Cancers (Basel) 2022; 14:cancers14123016. [PMID: 35740681 PMCID: PMC9220843 DOI: 10.3390/cancers14123016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 06/09/2022] [Accepted: 06/13/2022] [Indexed: 11/16/2022] Open
Abstract
Tumor-specific antigens can activate T cell-based antitumor immune responses and are ideal targets for cancer immunotherapy. However, their identification is still challenging. Although mass spectrometry can directly identify human leukocyte antigen (HLA) binding peptides in tumor cells, it focuses on tumor-specific antigens derived from annotated protein-coding regions constituting only 1.5% of the genome. We developed a novel proteogenomic integration strategy to expand the breadth of tumor-specific epitopes derived from all genomic regions. Using the colorectal cancer cell line HCT116 as a model, we accurately identified 10,737 HLA-presented peptides, 1293 of which were non-canonical peptides that traditional database searches could not identify. Moreover, we found eight tumor neo-epitopes derived from somatic mutations, four of which were not previously reported. Our findings suggest that this new proteogenomic approach holds great promise for increasing the number of tumor-specific antigen candidates, potentially enlarging the tumor target pool and improving cancer immunotherapy.
Collapse
Affiliation(s)
- Haitao Xiang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; (H.X.); (X.G.); (W.Z.); (Y.L.)
- BGI-Shenzhen, Shenzhen 518103, China; (F.B.); (L.C.); (H.Z.); (Y.Z.); (H.C.); (Y.G.)
| | - Le Zhang
- BGI-GenoImmune, BGI-Shenzhen, Shenzhen 518083, China; (L.Z.); (L.J.L.)
| | - Fanyu Bu
- BGI-Shenzhen, Shenzhen 518103, China; (F.B.); (L.C.); (H.Z.); (Y.Z.); (H.C.); (Y.G.)
| | - Xiangyu Guan
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; (H.X.); (X.G.); (W.Z.); (Y.L.)
- BGI-Shenzhen, Shenzhen 518103, China; (F.B.); (L.C.); (H.Z.); (Y.Z.); (H.C.); (Y.G.)
| | - Lei Chen
- BGI-Shenzhen, Shenzhen 518103, China; (F.B.); (L.C.); (H.Z.); (Y.Z.); (H.C.); (Y.G.)
| | - Haibo Zhang
- BGI-Shenzhen, Shenzhen 518103, China; (F.B.); (L.C.); (H.Z.); (Y.Z.); (H.C.); (Y.G.)
| | - Yuntong Zhao
- BGI-Shenzhen, Shenzhen 518103, China; (F.B.); (L.C.); (H.Z.); (Y.Z.); (H.C.); (Y.G.)
| | - Huanyi Chen
- BGI-Shenzhen, Shenzhen 518103, China; (F.B.); (L.C.); (H.Z.); (Y.Z.); (H.C.); (Y.G.)
| | - Weicong Zhang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; (H.X.); (X.G.); (W.Z.); (Y.L.)
- BGI-Shenzhen, Shenzhen 518103, China; (F.B.); (L.C.); (H.Z.); (Y.Z.); (H.C.); (Y.G.)
| | - Yijian Li
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; (H.X.); (X.G.); (W.Z.); (Y.L.)
- BGI-Shenzhen, Shenzhen 518103, China; (F.B.); (L.C.); (H.Z.); (Y.Z.); (H.C.); (Y.G.)
- Guangdong Provincial Key Laboratory of Human Disease Genomics, Shenzhen Key Laboratory of Genomics, Shenzhen 518083, China
| | - Leo Jingyu Lee
- BGI-GenoImmune, BGI-Shenzhen, Shenzhen 518083, China; (L.Z.); (L.J.L.)
| | - Zhanlong Mei
- BGI, Shenzhen 518083, China; (Z.M.); (Y.R.); (Y.H.)
| | - Yuan Rao
- BGI, Shenzhen 518083, China; (Z.M.); (Y.R.); (Y.H.)
| | - Ying Gu
- BGI-Shenzhen, Shenzhen 518103, China; (F.B.); (L.C.); (H.Z.); (Y.Z.); (H.C.); (Y.G.)
- Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen 518120, China
| | - Yong Hou
- BGI, Shenzhen 518083, China; (Z.M.); (Y.R.); (Y.H.)
| | - Feng Mu
- BGI, Shenzhen 518083, China; (Z.M.); (Y.R.); (Y.H.)
- Correspondence: (F.M.); (X.D.)
| | - Xuan Dong
- BGI-Shenzhen, Shenzhen 518103, China; (F.B.); (L.C.); (H.Z.); (Y.Z.); (H.C.); (Y.G.)
- Guangdong Provincial Key Laboratory of Human Disease Genomics, Shenzhen Key Laboratory of Genomics, Shenzhen 518083, China
- Correspondence: (F.M.); (X.D.)
| |
Collapse
|
32
|
Zhang W, Yang C, Liu J, Liang Z, Shan Y, Zhang L, Zhang Y. Accurate discrimination of leucine and isoleucine residues by combining continuous digestion with multiple MS 3 spectra integration in protein sequence. Talanta 2022; 249:123666. [PMID: 35717752 DOI: 10.1016/j.talanta.2022.123666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/07/2022] [Accepted: 06/08/2022] [Indexed: 12/26/2022]
Abstract
Protein de novo sequencing based on tandem mass spectrometry is a crucial technology that enables the identification of peptides without searching databases and assembling unknown sequence proteins, especially for monoclonal antibodies (mAbs). However, the discrimination of leucine (Leu) and isoleucine (Ile) residues in the target protein sequence is still challenging. Herein, we developed an accurate method by continuous digestion with MS3-based fragmentation and multiple spectra integration (evaluated by combined verification score, CVS) to distinguish Leu and Ile residues. Continuous digestion promotes the diversity of peptides in order to expose more Leu and Ile at the N-terminal. CVS integrates multiple MS3 spectra to reduce the interference from noise and co-fragmented ions and improve accuracy. This method successfully resolved all 75 Leu/Ile in bovine serum albumin, especially 3 consecutive Leu/Ile. We further applied the method to analyze trastuzumab and 67 out of the 68 Leu/Ile from the light chain and heavy chain were accurately discriminated, demonstrating the great potential in mAbs sequencing.
Collapse
Affiliation(s)
- Weijie Zhang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China; University of Chinese Academy of Sciences, Beijing, 100039, China
| | - Chao Yang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China; University of Chinese Academy of Sciences, Beijing, 100039, China
| | - Jianhui Liu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China
| | - Zhen Liang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China
| | - Yichu Shan
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China.
| | - Lihua Zhang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China.
| | - Yukui Zhang
- CAS Key Laboratory of Separation Science for Analytical Chemistry, National Chromatographic R. & A. Center, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning, 116023, China
| |
Collapse
|
33
|
von Reumont BM, Anderluh G, Antunes A, Ayvazyan N, Beis D, Caliskan F, Crnković A, Damm M, Dutertre S, Ellgaard L, Gajski G, German H, Halassy B, Hempel BF, Hucho T, Igci N, Ikonomopoulou MP, Karbat I, Klapa MI, Koludarov I, Kool J, Lüddecke T, Ben Mansour R, Vittoria Modica M, Moran Y, Nalbantsoy A, Ibáñez MEP, Panagiotopoulos A, Reuveny E, Céspedes JS, Sombke A, Surm JM, Undheim EAB, Verdes A, Zancolli G. Modern venomics-Current insights, novel methods, and future perspectives in biological and applied animal venom research. Gigascience 2022; 11:giac048. [PMID: 35640874 PMCID: PMC9155608 DOI: 10.1093/gigascience/giac048] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 04/10/2022] [Accepted: 04/12/2022] [Indexed: 12/11/2022] Open
Abstract
Venoms have evolved >100 times in all major animal groups, and their components, known as toxins, have been fine-tuned over millions of years into highly effective biochemical weapons. There are many outstanding questions on the evolution of toxin arsenals, such as how venom genes originate, how venom contributes to the fitness of venomous species, and which modifications at the genomic, transcriptomic, and protein level drive their evolution. These questions have received particularly little attention outside of snakes, cone snails, spiders, and scorpions. Venom compounds have further become a source of inspiration for translational research using their diverse bioactivities for various applications. We highlight here recent advances and new strategies in modern venomics and discuss how recent technological innovations and multi-omic methods dramatically improve research on venomous animals. The study of genomes and their modifications through CRISPR and knockdown technologies will increase our understanding of how toxins evolve and which functions they have in the different ontogenetic stages during the development of venomous animals. Mass spectrometry imaging combined with spatial transcriptomics, in situ hybridization techniques, and modern computer tomography gives us further insights into the spatial distribution of toxins in the venom system and the function of the venom apparatus. All these evolutionary and biological insights contribute to more efficiently identify venom compounds, which can then be synthesized or produced in adapted expression systems to test their bioactivity. Finally, we critically discuss recent agrochemical, pharmaceutical, therapeutic, and diagnostic (so-called translational) aspects of venoms from which humans benefit.
Collapse
Affiliation(s)
- Bjoern M von Reumont
- Goethe University Frankfurt, Institute for Cell Biology and Neuroscience, Department for Applied Bioinformatics, 60438 Frankfurt am Main, Germany
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Frankfurt, Senckenberganlage 25, 60235 Frankfurt, Germany
- Justus Liebig University Giessen, Institute for Insectbiotechnology, Heinrich Buff Ring 26-32, 35396 Giessen, Germany
| | - Gregor Anderluh
- Department of Molecular Biology and Nanobiotechnology, National Institute of Chemistry, 1000 Ljubljana, Slovenia
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450–208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Naira Ayvazyan
- Orbeli Institute of Physiology of NAS RA, Orbeli ave. 22, 0028 Yerevan, Armenia
| | - Dimitris Beis
- Developmental Biology, Centre for Clinical, Experimental Surgery and Translational Research, Biomedical Research Foundation Academy of Athens, Athens 11527, Greece
| | - Figen Caliskan
- Department of Biology, Faculty of Science and Letters, Eskisehir Osmangazi University, TR-26040 Eskisehir, Turkey
| | - Ana Crnković
- Department of Molecular Biology and Nanobiotechnology, National Institute of Chemistry, 1000 Ljubljana, Slovenia
| | - Maik Damm
- Technische Universität Berlin, Department of Chemistry, Straße des 17. Juni 135, 10623 Berlin, Germany
| | | | - Lars Ellgaard
- Department of Biology, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | - Goran Gajski
- Institute for Medical Research and Occupational Health, Mutagenesis Unit, Ksaverska cesta 2, 10000 Zagreb, Croatia
| | - Hannah German
- Amsterdam Institute of Molecular and Life Sciences, Division of BioAnalytical Chemistry, Faculty of Science, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081HV Amsterdam, The Netherlands
| | - Beata Halassy
- University of Zagreb, Centre for Research and Knowledge Transfer in Biotechnology, Trg Republike Hrvatske 14, 10000 Zagreb, Croatia
| | - Benjamin-Florian Hempel
- BIH Center for Regenerative Therapies BCRT, Charité - Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Tim Hucho
- Translational Pain Research, Department of Anesthesiology and Intensive Care Medicine, Faculty of Medicine and University Hospital Cologne, University of Cologne, 50931 Cologne, Germany
| | - Nasit Igci
- Nevsehir Haci Bektas Veli University, Faculty of Arts and Sciences, Department of Molecular Biology and Genetics, 50300 Nevsehir, Turkey
| | - Maria P Ikonomopoulou
- Madrid Institute for Advanced Studies in Food, Madrid,E28049, Spain
- The University of Queensland, St Lucia, QLD 4072, Australia
| | - Izhar Karbat
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Maria I Klapa
- Metabolic Engineering and Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research & Technology Hellas (FORTH/ICE-HT), Patras GR-26504, Greece
| | - Ivan Koludarov
- Justus Liebig University Giessen, Institute for Insectbiotechnology, Heinrich Buff Ring 26-32, 35396 Giessen, Germany
| | - Jeroen Kool
- Amsterdam Institute of Molecular and Life Sciences, Division of BioAnalytical Chemistry, Faculty of Science, Vrije Universiteit Amsterdam, De Boelelaan 1085, 1081HV Amsterdam, The Netherlands
| | - Tim Lüddecke
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberg Frankfurt, Senckenberganlage 25, 60235 Frankfurt, Germany
- Department of Bioresources, Fraunhofer Institute for Molecular Biology and Applied Ecology, 35392 Gießen, Germany
| | - Riadh Ben Mansour
- Department of Life Sciences, Faculty of Sciences, Gafsa University, Campus Universitaire Siidi Ahmed Zarrouk, 2112 Gafsa, Tunisia
| | - Maria Vittoria Modica
- Dept. of Biology and Evolution of Marine Organisms (BEOM), Stazione Zoologica Anton Dohrn, Via Po 25c, I-00198 Roma, Italy
| | - Yehu Moran
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Ayse Nalbantsoy
- Department of Bioengineering, Faculty of Engineering, Ege University, 35100 Bornova, Izmir, Turkey
| | - María Eugenia Pachón Ibáñez
- Unit of Infectious Diseases, Microbiology, and Preventive Medicine, Virgen del Rocío University Hospital, Institute of Biomedicine of Seville, 41013 Sevilla, Spain
- CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Madrid, Spain
| | - Alexios Panagiotopoulos
- Metabolic Engineering and Systems Biology Laboratory, Institute of Chemical Engineering Sciences, Foundation for Research & Technology Hellas (FORTH/ICE-HT), Patras GR-26504, Greece
- Animal Biology Division, Department of Biology, University of Patras, Patras, GR-26500, Greece
| | - Eitan Reuveny
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Javier Sánchez Céspedes
- Unit of Infectious Diseases, Microbiology, and Preventive Medicine, Virgen del Rocío University Hospital, Institute of Biomedicine of Seville, 41013 Sevilla, Spain
- CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Madrid, Spain
| | - Andy Sombke
- Department of Evolutionary Biology, University of Vienna, Djerassiplatz 1, 1030 Vienna, Austria
| | - Joachim M Surm
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, Jerusalem 9190401, Israel
| | - Eivind A B Undheim
- University of Oslo, Centre for Ecological and Evolutionary Synthesis, Postboks 1066 Blindern 0316 Oslo, Norway
| | - Aida Verdes
- Department of Biodiversity and Evolutionary Biology, Museo Nacional de Ciencias Naturales, José Gutiérrez Abascal 2, 28006 Madrid, Spain
| | - Giulia Zancolli
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
34
|
Na S, Choi H, Paek E. Deephos: Predicted spectral database search for TMT-labeled phosphopeptides and its false discovery rate estimation. Bioinformatics 2022; 38:2980-2987. [PMID: 35441674 DOI: 10.1093/bioinformatics/btac280] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 03/26/2022] [Accepted: 04/14/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Tandem mass tag (TMT)-based tandem mass spectrometry (MS/MS) has become the method of choice for the quantification of post-translational modifications in complex mixtures. Many cancer proteogenomic studies have highlighted the importance of large-scale phosphopeptide quantification coupled with TMT labeling. Herein, we propose a predicted Spectral DataBase (pSDB) search strategy called Deephos that can improve both sensitivity and specificity in identifying MS/MS spectra of TMT-labeled phosphopeptides. RESULTS With deep learning-based fragment ion prediction, we compiled a pSDB of TMT-labeled phosphopeptides generated from ∼8,000 human phosphoproteins annotated in UniProt. Deep learning could successfully recognize the fragmentation patterns altered by both TMT labeling and phosphorylation. In addition, we discuss the decoy spectra for false discovery rate (FDR) estimation in the pSDB search. We show that FDR could be inaccurately estimated by the existing decoy spectra generation methods and propose an innovative method to generate decoy spectra for more accurate FDR estimation. The utilities of Deephos were demonstrated in multi-stage analyses (coupled with database searches) of glioblastoma, acute myeloid leukemia, and breast cancer phosphoproteomes. AVAILABILITY Deephos pSDB and the search software are available at https://github.com/seungjinna/deephos.
Collapse
Affiliation(s)
- Seungjin Na
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, 04763, Republic of Korea
| | - Hyunjin Choi
- Department of Automotive Engineering, Hanyang University, Seoul, 04763, Republic of Korea
| | - Eunok Paek
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, 04763, Republic of Korea.,Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| |
Collapse
|
35
|
Zhang Z, Li Y, Yuan W, Wang Z, Wan C. Proteomic-driven identification of short open reading frame-encoded peptides. Proteomics 2022; 22:e2100312. [PMID: 35384297 DOI: 10.1002/pmic.202100312] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/29/2022] [Accepted: 03/30/2022] [Indexed: 11/10/2022]
Abstract
Accumulating evidence has shown that a large number of short open reading frames (sORFs) also have the ability to encode proteins. The discovery of sORFs opens up a new research area, leading to the identification and functional study of sORF encoded peptides (SEPs) at the omics level. Besides bioinformatics prediction and ribosomal profiling, mass spectrometry (MS) has become a significant tool as it directly detects the sequence of SEPs. Though MS-based proteomics methods have proved to be effective for qualitative and quantitative analysis of SEPs, the detection of SEPs is still a great challenge due to their low abundance and short sequence. To illustrate the progress in method development, we described and discussed the main steps of large-scale proteomics identification of SEPs, including SEP extraction and enrichment, MS detection, data processing and quality control, quantification, and function prediction and validation methods. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Zheng Zhang
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Yujie Li
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Wenqian Yuan
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Zhiwei Wang
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| | - Cuihong Wan
- School of Life Sciences and Hubei Key Laboratory of Genetic Regulation and Integrative Biology, Central China Normal University, Wuhan, Hubei, 430079, People's Republic of China
| |
Collapse
|
36
|
Affinity Selection from Synthetic Peptide Libraries Enabled by De Novo MS/MS Sequencing. Int J Pept Res Ther 2022. [DOI: 10.1007/s10989-022-10370-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
AbstractRecently, de novo MS/MS peptide sequencing has enabled the application of affinity selections to synthetic peptide mixtures that approach the diversity of phage libraries (> 108 random peptides). In conjunction with ‘split-mix’ solid phase synthesis to access equimolar peptide mixtures, this approach provides a straightforward means to examine synthetic peptide libraries of considerably higher diversity than has been feasible historically. Here, we offer a critical perspective on this work, report emerging data, and highlight opportunities for further methods refinement. With continued development, ‘affinity selection–mass spectrometry’ may become a complimentary approach to phage display, in vitro selection, and DNA-encoded libraries for the discovery of synthetic ligands that modulate protein function.
Collapse
|
37
|
Tran NH, Xu J, Li M. A tale of solving two computational challenges in protein science: neoantigen prediction and protein structure prediction. Brief Bioinform 2022; 23:bbab493. [PMID: 34891158 PMCID: PMC8769896 DOI: 10.1093/bib/bbab493] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/11/2021] [Accepted: 10/26/2021] [Indexed: 12/30/2022] Open
Abstract
In this article, we review two challenging computational questions in protein science: neoantigen prediction and protein structure prediction. Both topics have seen significant leaps forward by deep learning within the past five years, which immediately unlocked new developments of drugs and immunotherapies. We show that deep learning models offer unique advantages, such as representation learning and multi-layer architecture, which make them an ideal choice to leverage a huge amount of protein sequence and structure data to address those two problems. We also discuss the impact and future possibilities enabled by those two applications, especially how the data-driven approach by deep learning shall accelerate the progress towards personalized biomedicine.
Collapse
Affiliation(s)
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, USA
| | - Ming Li
- University of Waterloo, Canada
| |
Collapse
|
38
|
Gholamizoj S, Ma B. SPEQ: quality assessment of peptide tandem mass spectra with deep learning. Bioinformatics 2022; 38:1568-1574. [PMID: 34978568 PMCID: PMC8896601 DOI: 10.1093/bioinformatics/btab874] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 12/25/2021] [Accepted: 12/30/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION In proteomics, database search programs are routinely used for peptide identification from tandem mass spectrometry data. However, many low-quality spectra cannot be interpreted by any programs. Meanwhile, certain high-quality spectra may not be identified due to incompleteness of the database or failure of the software. Thus, spectrum quality (SPEQ) assessment tools are helpful programs that can eliminate poor-quality spectra before the database search and highlight the high-quality spectra that are not identified in the initial search. These spectra may be valuable candidates for further analyses. RESULTS We propose SPEQ: a spectrum quality assessment tool that uses a deep neural network to classify spectra into high-quality, which are worthy candidates for interpretation, and low-quality, which lack sufficient information for identification. SPEQ was compared with a few other prediction models and demonstrated improved prediction accuracy. AVAILABILITY AND IMPLEMENTATION Source code and scripts are freely available at github.com/sor8sh/SPEQ, implemented in Python.
Collapse
Affiliation(s)
- Soroosh Gholamizoj
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Bin Ma
- To whom correspondence should be addressed.
| |
Collapse
|
39
|
Chen L, Yang Y, Zhang Y, Li K, Cai H, Wang H, Zhao Q. The Small Open Reading Frame-Encoded Peptides: Advances in Methodologies and Functional Studies. Chembiochem 2021; 23:e202100534. [PMID: 34862721 DOI: 10.1002/cbic.202100534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/15/2021] [Indexed: 11/07/2022]
Abstract
Small open reading frames (sORFs) are an important class of genes with less than 100 codons. They were historically annotated as noncoding or even junk sequences. In recent years, accumulating evidence suggests that sORFs could encode a considerable number of polypeptides, many of which play important roles in both physiology and disease pathology. However, it has been technically challenging to directly detect sORF-encoded peptides (SEPs). Here, we discuss the latest advances in methodologies for identifying SEPs with mass spectrometry, as well as the progress on functional studies of SEPs.
Collapse
Affiliation(s)
- Lei Chen
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China.,Laboratory for Synthetic Chemistry and Chemical Biology Limited, Hong Kong Science and Technology Park, New Territories, Hong Kong SAR, 999077, P. R. China
| | - Ying Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Yuanliang Zhang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Kecheng Li
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510623, P. R. China
| | - Hongwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510623, P. R. China
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| |
Collapse
|
40
|
Hruska M, Holub D. Evaluation of an integrative Bayesian peptide detection approach on a combinatorial peptide library. EUROPEAN JOURNAL OF MASS SPECTROMETRY (CHICHESTER, ENGLAND) 2021; 27:217-234. [PMID: 34989269 DOI: 10.1177/14690667211066725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Detection of peptides lies at the core of bottom-up proteomics analyses. We examined a Bayesian approach to peptide detection, integrating match-based models (fragments, retention time, isotopic distribution, and precursor mass) and peptide prior probability models under a unified probabilistic framework. To assess the relevance of these models and their various combinations, we employed a complete- and a tail-complete search of a low-precursor-mass synthetic peptide library based on oncogenic KRAS peptides. The fragment match was by far the most informative match-based model, while the retention time match was the only remaining such model with an appreciable impact--increasing correct detections by around 8 %. A peptide prior probability model built from a reference proteome greatly improved the detection over a uniform prior, essentially transforming de novo sequencing into a reference-guided search. The knowledge of a correct sequence tag in advance to peptide-spectrum matching had only a moderate impact on peptide detection unless the tag was long and of high certainty. The approach also derived more precise error rates on the analyzed combinatorial peptide library than those estimated using PeptideProphet and Percolator, showing its potential applicability for the detection of homologous peptides. Although the approach requires further computational developments for routine data analysis, it illustrates the value of peptide prior probabilities and presents a Bayesian approach for their incorporation into peptide detection.
Collapse
Affiliation(s)
- Miroslav Hruska
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, 98735Palacky University, Olomouc, Czech Republic
- Department of Computer Science, Faculty of Science, 98735Palacky University, Olomouc, Czech Republic
| | - Dusan Holub
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, 98735Palacky University, Olomouc, Czech Republic
| |
Collapse
|
41
|
Diving Deep into the Data: A Review of Deep Learning Approaches and Potential Applications in Foodomics. Foods 2021; 10:foods10081803. [PMID: 34441579 PMCID: PMC8392494 DOI: 10.3390/foods10081803] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 07/30/2021] [Accepted: 08/02/2021] [Indexed: 01/18/2023] Open
Abstract
Deep learning is a trending field in bioinformatics; so far, mostly known for image processing and speech recognition, but it also shows promising possibilities for data processing in food analysis, especially, foodomics. Thus, more and more deep learning approaches are used. This review presents an introduction into deep learning in the context of metabolomics and proteomics, focusing on the prediction of shelf-life, food authenticity, and food quality. Apart from the direct food-related applications, this review summarizes deep learning for peptide sequencing and its context to food analysis. The review’s focus further lays on MS (mass spectrometry)-based approaches. As a result of the constant development and improvement of analytical devices, as well as more complex holistic research questions, especially with the diverse and complex matrix food, there is a need for more effective methods for data processing. Deep learning might offer meeting this need and gives prospect to deal with the vast amount and complexity of data.
Collapse
|
42
|
Zeng X, Ma B. MSTracer: A Machine Learning Software Tool for Peptide Feature Detection from Liquid Chromatography-Mass Spectrometry Data. J Proteome Res 2021; 20:3455-3462. [PMID: 34137255 DOI: 10.1021/acs.jproteome.0c01029] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Liquid chromatography with tandem mass spectrometry (MS/MS) has been widely used in proteomics. Although a typical experiment includes both MS and MS/MS scans, existing bioinformatics research has focused far more on MS/MS data than on MS data. In MS data, each peptide produces a few trails of signal peaks, which are collectively called a peptide feature. Here, we introduce MSTracer, a new software tool for detecting peptide features from MS data. The software incorporates two scoring functions based on machine learning: one for detecting the peptide features and the other for assigning a quality score to each detected feature. The software was compared with several existing tools and demonstrated significantly better performance.
Collapse
Affiliation(s)
- Xiangyuan Zeng
- Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Bin Ma
- Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| |
Collapse
|
43
|
Aggarwal S, Tolani P, Gupta S, Yadav AK. Posttranslational modifications in systems biology. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2021; 127:93-126. [PMID: 34340775 DOI: 10.1016/bs.apcsb.2021.03.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The biological complexity cannot be captured by genes or proteins alone. The protein posttranslational modifications (PTMs) impart functional diversity to the proteome and regulate protein structure, activity, localization and interactions. Their dynamics drive cellular signaling, growth and development while their dysregulation causes many diseases. Mass spectrometry based quantitative profiling of PTMs and bioinformatics analysis tools allow systems level insights into their network architecture. High-resolution profiling of PTM networks will advance disease understanding and precision medicine. It can accelerate the discovery of biomarkers and drug targets. This requires better tools for unbiased, high-throughput and accurate PTM identification, site localization and automated annotation on a systems level.
Collapse
Affiliation(s)
- Suruchi Aggarwal
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; Department of Molecular Biology and Biotechnology, Cotton University, Guwahati, Assam, India
| | - Priya Tolani
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India
| | - Srishti Gupta
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India; School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, Faridabad, Haryana, India.
| |
Collapse
|
44
|
Tarn C, Zeng WF. pDeep3: Toward More Accurate Spectrum Prediction with Fast Few-Shot Learning. Anal Chem 2021; 93:5815-5822. [PMID: 33797898 DOI: 10.1021/acs.analchem.0c05427] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Spectrum prediction using deep learning has attracted a lot of attention in recent years. Although existing deep learning methods have dramatically increased the prediction accuracy, there is still considerable space for improvement, which is presently limited by the difference of fragmentation types or instrument settings. In this work, we use the few-shot learning method to fit the data online to make up for the shortcoming. The method is evaluated using ten data sets, where the instruments includes Velos, QE, Lumos, and Sciex, with collision energies being differently set. Experimental results show that few-shot learning can achieve higher prediction accuracy with almost negligible computing resources. For example, on the data set from a untrained instrument Sciex-6600, within about 10 s, the prediction accuracy is increased from 69.7% to 86.4%; on the CID (collision-induced dissociation) data set, the prediction accuracy of the model trained by HCD (higher energy collision dissociation) spectra is increased from 48.0% to 83.9%. It is also shown that, the method is not critical to data quality and is sufficiently efficient to fill the accuracy gap. The source code of pDeep3 is available at http://pfind.ict.ac.cn/software/pdeep3.
Collapse
Affiliation(s)
- Ching Tarn
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, 100190, Beijing, China.,University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, 100190, Beijing, China.,University of Chinese Academy of Sciences, 100049, Beijing, China
| |
Collapse
|
45
|
Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00304-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
46
|
Chung HH, Kao CY, Wang TSA, Chu J, Pei J, Hsu CC. Reaction Tracking and High-Throughput Screening of Active Compounds in Combinatorial Chemistry by Tandem Mass Spectrometry Molecular Networking. Anal Chem 2021; 93:2456-2463. [PMID: 33416326 DOI: 10.1021/acs.analchem.0c04481] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Combinatorial synthesis has been widely used as an efficient strategy to screen for active compounds. Mass spectrometry is the method of choice in the identification of hits resulting from high-throughput screenings due to its high sensitivity, specificity, and speed. However, manual data processing of mass spectrometry data, especially for structurally diverse products in combinatorial chemistry, is extremely time-consuming and one of the bottlenecks in this process. In this study, we demonstrated the effectiveness of a tandem mass spectrometry molecular networking-based strategy for product identification, reaction dynamics monitoring, and active compound targeting in combinatorial synthesis. Molecular networking connects compounds with similar tandem mass spectra into a cluster and has been widely used in natural products analysis. We show that both the expected and side products can be readily characterized using molecular networking based on their mass spectrometry fragmentation patterns. Additionally, time-dependent molecular networking was integrated to track reaction dynamics to determine the optimal reaction time to maximize target product yields. We also present a proof-of-concept experiment that successfully identified and isolated active molecules from a dynamic combinatorial library. These results demonstrated the potential of using molecular networking for identifying, tracking, and high-throughput screening of active compounds in combinatorial synthesis.
Collapse
Affiliation(s)
- Hsin-Hsiang Chung
- Department of Chemistry, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan
| | - Chih-Yao Kao
- Department of Chemistry, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan
| | - Tsung-Shing Andrew Wang
- Department of Chemistry, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan
| | - John Chu
- Department of Chemistry, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan
| | - Jiying Pei
- Department of Chemistry, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan.,School of Marine Sciences, Guangxi University, No.100, East Daxue Rd., Nanning City, Guangxi 530015, China
| | - Cheng-Chih Hsu
- Department of Chemistry, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan
| |
Collapse
|
47
|
Abstract
Mass spectrometry (MS)-based proteomics is currently the most successful approach to measure and compare peptides and proteins in a large variety of biological samples. Modern mass spectrometers, equipped with high-resolution analyzers, provide large amounts of data output. This is the case of shotgun/bottom-up proteomics, which consists in the enzymatic digestion of protein into peptides that are then measured by MS-instruments through a data dependent acquisition (DDA) mode. Dedicated bioinformatic tools and platforms have been developed to face the increasing size and complexity of raw MS data that need to be processed and interpreted for large-scale protein identification and quantification. This chapter illustrates the most popular bioinformatics solution for the analysis of shotgun MS-proteomics data. A general description will be provided on the data preprocessing options and the different search engines available, including practical suggestions on how to optimize the parameters for peptide search, based on hands-on experience.
Collapse
Affiliation(s)
- Avinash Yadav
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Federica Marini
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Alessandro Cuomo
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
| | - Tiziana Bonaldi
- Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy.
| |
Collapse
|
48
|
Yang C, Shan YC, Zhang WJ, Dai ZP, Zhang LH, Zhang YK. Full-length Protein Sequencing Based on Continuous Digestion Using Non-specific Proteases. ACTA CHIMICA SINICA 2021. [DOI: 10.6023/a21010025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
49
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
50
|
Comprehensive identification of native medium-sized and short bioactive peptides in sea bass muscle. Food Chem 2020; 343:128443. [PMID: 33129615 DOI: 10.1016/j.foodchem.2020.128443] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 10/16/2020] [Accepted: 10/17/2020] [Indexed: 02/06/2023]
Abstract
Native peptides from sea bass muscle were analyzed by two different approaches: medium-sized peptides by peptidomics analysis, whereas short peptides by suspect screening analysis employing an inclusion list of exact m/z values of all possible amino acid combinations (from 2 up to 4). The method was also extended to common post-translational modifications potentially interesting in food analysis, as well as non-proteolytic aminoacyl derivatives, which are well-known taste-active building blocks in pseudo-peptides. The medium-sized peptides were identified by de novo and combination of de novo and spectra matching to a protein sequence database, with up to 4077 peptides (2725 modified) identified by database search and 2665 peptides (223 modified) identified by de novo only; 102 short peptide sequences were identified (with 12 modified ones), and most of them had multiple reported bioactivities. The method can be extended to any peptide mixture, either endogenous or by protein hydrolysis, from other food matrices.
Collapse
|