1
|
Pipart J, Holstein T, Martens L, Muth T. MultiStageSearch: An Iterative Workflow for Unbiased Taxonomic Analysis of Pathogens Using Proteogenomics. J Proteome Res 2025. [PMID: 40384001 DOI: 10.1021/acs.jproteome.4c00901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2025]
Abstract
The global SARS-CoV-2 pandemic emphasized the need for accurate pathogen diagnostics. While genomics is the gold standard, integrating mass spectrometry-based proteomics offers additional benefits. However, current proteomic and genomic reference databases are often biased toward specific taxa, such as pathogenic strains or model organisms, and proteomic databases are less comprehensive. These biases and gaps can lead to inaccurate identifications. To address these issues, we introduce MultiStageSearch, a multistep database search method that combines proteome and genome databases for taxonomic analysis. Initially, a generalist proteome database is used to infer potential species. Then, MultiStageSearch generates a specialized proteogenomic database for precise identification. This database is preprocessed to filter duplicates and cluster identical open reading frames to reduce genomic database biases. The workflow operates independently of strain-level NCBI taxonomy, enabling the identification of strains not represented in existing taxonomies. We benchmarked the workflow on viral and bacterial samples, demonstrating its superior performance in strain-level taxonomic inference compared to existing methods. MultiStageSearch offers a flexible and accurate approach for pathogen research and diagnostics, overcoming incomplete search spaces and biases inherent in reference databases.
Collapse
Affiliation(s)
- Julian Pipart
- Data Competence Center MF 2, Robert Koch Institute, Berlin 13353, Germany
| | - Tanja Holstein
- Data Competence Center MF 2, Robert Koch Institute, Berlin 13353, Germany
- CompOmics, VIB Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent 9000, Belgium
- BioOrganic Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, Strasbourg 67000, France
- Infrastructure Nationale de Protéomique ProFIFR2048, Strasbourg 67087, France
| | - Lennart Martens
- CompOmics, VIB Center for Medical Biotechnology, VIB, Ghent 9000, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent 9000, Belgium
- BioOrganic Mass Spectrometry Laboratory (LSMBO), IPHC UMR 7178, University of Strasbourg, CNRS, Strasbourg 67000, France
- Infrastructure Nationale de Protéomique ProFIFR2048, Strasbourg 67087, France
| | - Thilo Muth
- Data Competence Center MF 2, Robert Koch Institute, Berlin 13353, Germany
| |
Collapse
|
2
|
Xie Y, Butler M. Compositional profiling of protein hydrolysates by high resolution liquid chromatography-mass spectrometry and chemometric analysis. Food Chem 2025; 487:144756. [PMID: 40398240 DOI: 10.1016/j.foodchem.2025.144756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 05/08/2025] [Accepted: 05/12/2025] [Indexed: 05/23/2025]
Abstract
Protein hydrolysates have attracted growing research and commercial attention due to their numerous nutritional, functional, and biological activities. However, only a limited range of proximate properties are determined routinely due to their substantial structural complexity and compositional variability. From both a manufacturing and functional perspective, it is of critical importance to monitor the compositional variations and identify potential similar or disparate features between different protein hydrolysates. In the current study, a single-approached method employing reverse phase ultra-high performance liquid chromatography coupled to high resolution electrospray ionization tandem mass spectrometry (RP-UHPLC-HR-ESI-MS/MS) was developed, optimized, and cross-validated for comprehensive structural and compositional profiling of a range of protein hydrolysates of varying raw materials, including soy, cotton, wheat, rice, and meat. Untargeted chemometric analysis and feature-based molecular network demonstrated potential for large-scale compositional assessment of protein hydrolysates without the need of prior component annotation. Signature features were identified to differentiate soy hydrolysates prepared from different batches of raw material and by different manufacturing processes. A hybrid approach combining de novo sequencing and target-decoy database homology search for peptide annotation is also described. Short peptides of 2 to 5 amino acids represented the most abundant components in soy protein hydrolysates (SPHs). A simple yet reliable integrated workflow for comprehensive structural and compositional profiling of protein hydrolysates was developed to enable an eventual correlation between their structure and function.
Collapse
Affiliation(s)
- Yongjing Xie
- National Institute for Bioprocessing Research and Training, Foster Avenue, Mount Merrion, Blackrock, Co. Dublin, A94 X099, Ireland
| | - Michael Butler
- National Institute for Bioprocessing Research and Training, Foster Avenue, Mount Merrion, Blackrock, Co. Dublin, A94 X099, Ireland; School of Chemical and Bioprocess Engineering, University College Dublin (UCD), Belfield, Dublin 4, D04 V1W8, Ireland.
| |
Collapse
|
3
|
Zhang T, Celiker B, Shao Y, Gai J, Hill M, Wang C, Zheng L. Comparison of Shared Class I HLA-Bound Noncanonical Neoepitopes between Normal and Neoplastic Tissues of Pancreatic Adenocarcinoma. Clin Cancer Res 2025; 31:1956-1965. [PMID: 39699517 PMCID: PMC12079097 DOI: 10.1158/1078-0432.ccr-24-2251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 10/04/2024] [Accepted: 12/17/2024] [Indexed: 12/20/2024]
Abstract
PURPOSE Developing T-cell or vaccine therapies for pancreatic ductal adenocarcinoma (PDAC) has been challenging because of a lack of knowledge regarding immunodominant, cancer-specific antigens as PDAC are characterized by a scarcity of genomic mutation-associated neoepitopes, and effective approaches to discover them are limited. EXPERIMENTAL DESIGN An advanced mass spectrometry approach was employed to compare the immunopeptidome of PDAC tissues and matched normal tissues from the same patients. RESULTS This study identified HLA class I-binding variant peptides derived from canonical proteins, which had single amino-acid substitutions not attributed to genetic mutations or RNA editing. These amino-acid substitutions appeared to result from translational errors. The variant peptides were predominantly found in tumor tissues, with certain peptides common among multiple patients. Importantly, several of these variant peptides were more immunogenic than their wild-type counterparts. CONCLUSIONS The shared noncanonical neoepitopes identified in this study offer promising candidates for vaccine and T-cell therapy development, potentially providing new avenues for immunotherapy in PDAC. See related commentary by Yuan et al., p. 1821.
Collapse
Affiliation(s)
- Tengyi Zhang
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Pancreatic Cancer Precision Medicine Center of Excellence Program, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Bloomberg-Kimmel Institute for Cancer Immunotherapy, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Betul Celiker
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Pancreatic Cancer Precision Medicine Center of Excellence Program, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Bloomberg-Kimmel Institute for Cancer Immunotherapy, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Yingkuan Shao
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Breast Surgery and Oncology, Key Laboratory of Cancer Prevention and Intervention, Cancer Institute, Ministry of Education, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Jessica Gai
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Pancreatic Cancer Precision Medicine Center of Excellence Program, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Bloomberg-Kimmel Institute for Cancer Immunotherapy, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Mark Hill
- Immuno-Oncology Discovery and Translational Medicine, Bristol Myers Squibb Company, Seattle, Washington
| | - Chunyu Wang
- Department of Biological Sciences, Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York
| | - Lei Zheng
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Pancreatic Cancer Precision Medicine Center of Excellence Program, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Bloomberg-Kimmel Institute for Cancer Immunotherapy, Johns Hopkins University School of Medicine, Baltimore, Maryland
- Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, Maryland
- The Cancer Convergence Institute at Johns Hopkins, Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
4
|
Lopes DS, Almeida LGVC, Nardo AE, Añón MC, Dos Santos LD, Rossini BC, Pinilla CMB, Pacheco MTB, Galland F. Antioxidant bioactivity of sunflower protein hydrolysates in Caco-2 cells and in silico structural properties. Food Chem 2025; 487:144733. [PMID: 40373717 DOI: 10.1016/j.foodchem.2025.144733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2025] [Revised: 04/29/2025] [Accepted: 05/11/2025] [Indexed: 05/17/2025]
Abstract
Sunflower protein hydrolysate (SPH), with 95 % reduced phenolic content, was studied for its protective effects against oxidative stress in intestinal cells (Caco-2). Produced via alcalase hydrolysis, SPH's molecular weight, amino acid composition, and hydrophobicity were characterized. The antioxidant activity of SPH, tested by ABTS, was maintained in Caco-2 cells under oxidative stress, modulating glutathione and catalase enzymes. LC/MS/MS identified 196 peptides, which were cross-referenced with a bioactive database and found to contain several di- and tripeptides with antioxidant activity. Higher hydrophobicity and molecular weight correlated with predicted antioxidant and anti-inflammatory activity scores, provided by tools that use machine learning methods. The study shows that SPH exhibits antioxidant properties in enterocyte cell models, even with reduced phenolic content, suggesting its potential use in functional foods.
Collapse
Affiliation(s)
- Daniel S Lopes
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brasil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil.
| | - Lilian G V C Almeida
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brasil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil.
| | - Agustina E Nardo
- Centro de Investigacion y Desarrollo en Criotecnología de Alimentos (CIDCA), Comision de Investigaciones Científicas (CIC-PBA) - Consejo Nacional de Investigaciones Científicas y Tecnicas (CONICET- CCT La Plata), Universidad Nacional de La Plata (UNLP), calle 47 y 116, 1900, La Plata, Argentina.
| | - María Cristina Añón
- Centro de Investigacion y Desarrollo en Criotecnología de Alimentos (CIDCA), Comision de Investigaciones Científicas (CIC-PBA) - Consejo Nacional de Investigaciones Científicas y Tecnicas (CONICET- CCT La Plata), Universidad Nacional de La Plata (UNLP), calle 47 y 116, 1900, La Plata, Argentina.
| | - Lucilene D Dos Santos
- Institute of Biotechnology, São Paulo State University (UNESP), Botucatu, SP 18607-440, Brazil.
| | - Bruno C Rossini
- Institute of Biotechnology, São Paulo State University (UNESP), Botucatu, SP 18607-440, Brazil.
| | - Cristian M B Pinilla
- Center of Technology and Lactic Acid Bacteria, Institute of Food Technology (ITAL), Brasil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil.
| | - Maria T B Pacheco
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brasil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil.
| | - Fabiana Galland
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brasil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil.
| |
Collapse
|
5
|
He MT, Li N, Wang JH, Wei ZZ, Feng J, Li WT, Sui JH, Huang N, Dong MQ. Do-It-Yourself De Novo Antibody Sequencing Workflow that Achieves Complete Accuracy of the Variable Regions. J Proteome Res 2025. [PMID: 40323442 DOI: 10.1021/acs.jproteome.5c00210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2025]
Abstract
Antibodies are widely used as research tools or therapeutic agents. Knowing the sequences of the variable regions of an antibody─both the heavy chain and the light chain─is a prerequisite for the production of recombinant antibodies. Mass spectrometry-based de novo sequencing is a frequently used, and sometimes the only approach to gaining this information. Here, we describe a workflow that enables accurate sequence determination of monoclonal antibodies based on mass spectrometry data and freely available software tools. This workflow, which we developed using a homemade anti-FLAG monoclonal antibody as a reference sample, achieved 100% accuracy of the variable regions with clear distinction between leucine (L) and isoleucine (I). Using this workflow, we successfully decoded a monoclonal anti-HA antibody, for which we had no prior knowledge of its sequence. Based on the de novo sequencing result, we generated a recombinant anti-HA antibody, and demonstrated that it has the same specificity, sensitivity, and affinity as the commercial antibody.
Collapse
Affiliation(s)
- Meng-Ting He
- College of Life Sciences, Beijing Normal University, 19 Xinjiekouwai Avenue, Beijing 100875, China
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| | - Ning Li
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| | | | - Zhi-Zhong Wei
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| | - Jie Feng
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| | - Wen-Ting Li
- Bioinformatics Solutions Inc., Waterloo, ON N2L 6J2, Canada
| | - Jian-Hua Sui
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| | - Niu Huang
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| |
Collapse
|
6
|
Chu F, Jenson SC, Barente AS, Heller NC, Merkley ED, Jarman KH. MARLOWE: An Untargeted Proteomics, Statistical Approach to Taxonomic Classification for Forensics. J Proteome Res 2025; 24:995-1007. [PMID: 39898467 DOI: 10.1021/acs.jproteome.3c00477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
General proteomics research for fundamental science typically addresses laboratory- or patient-derived samples of known origin and composition. However, in a few research areas, such as environmental proteomics, clinical identification of infectious organisms, archeology, art/cultural history, and forensics, attributing the origin of a protein-containing sample to the organisms that produced it is a central focus. A small number of groups have approached this problem and developed software tools for taxonomic characterization and/or identification using bottom-up proteomics. Most such tools identify peptides via database search, and many rely on organism-specific peptides as markers. Our group recently introduced MARLOWE, a software tool for taxonomic characterization of unknown samples based on de novo peptide identification and signal-erosion-resistant strong peptides, which are shared peptides distributed in a taxonomy-dependent manner. In the current work, we further characterize the utility of MARLOWE using publicly available proteomics data from forensically-relevant samples. MARLOWE characterizes samples based on their protein profile, and returns ranked organism lists of potential contributors and taxonomic scores based on shared strong peptides between organisms. Overall, the correct characterization rate ranges between 44 and 100%, depending on the sample type and data acquisition parameters (with lower numbers associated with lower-quality data sets). MARLOWE demonstrates successful characterization of true contributors and close relatives, and provides sufficient specificity to distinguish certain microbial species. MARLOWE demonstrates its ability to provide insight into potential taxonomic sources for a wide range of sample types without prior assumptions about sample contents. This approach can find utility in forensic science and also broadly in bioanalytical applications that utilize proteomics approaches for taxonomic characterization.
Collapse
Affiliation(s)
- Fanny Chu
- Chemical & Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Sarah C Jenson
- Chemical & Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Anthony S Barente
- Chemical & Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Natalie C Heller
- Applied Statistics and Computational Modeling Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Eric D Merkley
- Chemical & Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Kristin H Jarman
- Chemical & Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
7
|
Luo S, Peng H, Shi Y, Cai J, Zhang S, Shao N, Li J. Integration of proteomics profiling data to facilitate discovery of cancer neoantigens: a survey. Brief Bioinform 2025; 26:bbaf087. [PMID: 40052441 PMCID: PMC11886573 DOI: 10.1093/bib/bbaf087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 12/29/2024] [Accepted: 02/19/2025] [Indexed: 03/10/2025] Open
Abstract
Cancer neoantigens are peptides that originate from alterations in the genome, transcriptome, or proteome. These peptides can elicit cancer-specific T-cell recognition, making them potential candidates for cancer vaccines. The rapid advancement of proteomics technology holds tremendous potential for identifying these neoantigens. Here, we provided an up-to-date survey about database-based search methods and de novo peptide sequencing approaches in proteomics, and we also compared these methods to recommend reliable analytical tools for neoantigen identification. Unlike previous surveys on mass spectrometry-based neoantigen discovery, this survey summarizes the key advancements in de novo peptide sequencing approaches that utilize artificial intelligence. From a comparative study on a dataset of the HepG2 cell line and nine mixed hepatocellular carcinoma proteomics samples, we demonstrated the potential of proteomics for the identification of cancer neoantigens and conducted comparisons of the existing methods to illustrate their limits. Understanding these limits, we suggested a novel workflow for neoantigen discovery as perspectives.
Collapse
Affiliation(s)
- Shifu Luo
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518107, Guangdong, China
- Faculty of Health Sciences, University of Macau, Taipa, Macao SAR 999078, China
| | - Hui Peng
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518107, Guangdong, China
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore
| | - Ying Shi
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518107, Guangdong, China
- School of Computer and Information Technology, Shanxi University, Taiyuan, 030006, Shanxi, China
| | - Jiaxin Cai
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518107, Guangdong, China
| | - Songming Zhang
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518107, Guangdong, China
| | - Ningyi Shao
- Faculty of Health Sciences, University of Macau, Taipa, Macao SAR 999078, China
| | - Jinyan Li
- Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518107, Guangdong, China
| |
Collapse
|
8
|
Deshpande AS, Lin A, O'Bryon I, Aufrecht JA, Merkley ED. Emerging protein sequencing technologies: proteomics without mass spectrometry? Expert Rev Proteomics 2025; 22:89-106. [PMID: 40105028 DOI: 10.1080/14789450.2025.2476979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Revised: 02/12/2025] [Accepted: 03/03/2025] [Indexed: 03/20/2025]
Abstract
INTRODUCTION Liquid chromatography-tandem mass spectrometry (LC-MS/MS) has been a leading method for proteomics for 30 years. Advantages provided by LC-MS/MS are offset by significant disadvantages, including cost. Recently, several non-mass spectrometric methods have emerged, but little information is available about their capacity to analyze the complex mixtures routine for mass spectrometry. AREAS COVERED We review recent non-mass-spectrometric methods for sequencing proteins and peptides, including those using nanopores, sequencing by degradation, reverse translation, and short-epitope mapping, with comments on bioinformatics challenges, fundamental limitations, and areas where new technologies will be more or less competitive with LC-MS/MS. In addition to conventional literature searches, instrument vendor websites, patents, webinars, and preprints were also consulted to give a more up-to-date picture. EXPERT OPINION Many new technologies are promising. However, demonstrations that they outperform mass spectrometry in terms of peptides and proteins identified have not yet been published, and astute observers note important disadvantages, especially relating to the dynamic range of single-molecule measurements of complex mixtures. Still, even if the performance of emerging methods proves inferior to LC-MS/MS, their low cost could create a different kind of revolution: a dramatic increase in the number of biology laboratories engaging in new forms of proteomics research.
Collapse
Affiliation(s)
- A S Deshpande
- Biogeochemical Transformations Group, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - A Lin
- Chemical and Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - I O'Bryon
- Chemical and Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - J A Aufrecht
- Biogeochemical Transformations Group, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - E D Merkley
- Chemical and Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington, USA
| |
Collapse
|
9
|
Zhao Y, Wang S, Huang J, Meng B, An D, Fang X, Wei Y, Dai X. A transformer-based semi-autoregressive framework for high-speed and accurate de novo peptide sequencing. Commun Biol 2025; 8:234. [PMID: 39948275 PMCID: PMC11825679 DOI: 10.1038/s42003-025-07584-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Accepted: 01/21/2025] [Indexed: 02/16/2025] Open
Abstract
De novo peptide sequencing directly identifies peptides from mass spectrometry data, playing a critical role in discovering novel proteins and analyzing complex biological samples without reliance on existing databases. To address challenges in both speed and accuracy, a transformer-based model, TSARseqNovo, incorporates two key innovations: a Semi-Autoregressive decoder for parallel prediction of multiple amino acids and a Masking Refinement decoder for refining low-confidence predictions. These features significantly enhance sequencing efficiency and accuracy. Evaluations on the Nine-Species, Aggregated, and Glycoproteomic datasets, demonstrate that TSARseqNovo outperforms state-of-the-art models, including CasaNovo, NovoB, InstaNovo + , and π-HelixNovo. Specifically, TSARseqNovo achieves up to a 2-fold speed increase over CasaNovo and π-HelixNovo, and approximately 10-fold over NovoB and InstaNovo + , while also showing substantial improvements in peptide prediction precision, especially for long peptides. These advancements position TSARseqNovo as a powerful tool for accelerating high-throughput proteomics research and addressing increasingly complex biological questions.
Collapse
Affiliation(s)
- Yang Zhao
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China.
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100029, China.
| | - Shuo Wang
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China
| | - Jinze Huang
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100029, China
| | - Bo Meng
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100029, China
| | - Dong An
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China
| | - Xiang Fang
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100029, China.
| | - Yaoguang Wei
- College of Information and Electrical Engineering, China Agricultural University, Beijing, 100083, China.
| | - Xinhua Dai
- Technology Innovation Center of Mass Spectrometry for State Market Regulation, Center for Advanced Measurement Science, National Institute of Metrology, Beijing, 100029, China.
| |
Collapse
|
10
|
Ranff T, Dennison M, Bédorf J, Schulze S, Zinn N, Bantscheff M, van Heugten JJRM, Fufezan C. PeptideForest: Semisupervised Machine Learning Integrating Multiple Search Engines for Peptide Identification. J Proteome Res 2025; 24:929-939. [PMID: 39840643 DOI: 10.1021/acs.jproteome.4c00686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2025]
Abstract
The first step in bottom-up proteomics is the assignment of measured fragmentation mass spectra to peptide sequences, also known as peptide spectrum matches. In recent years novel algorithms have pushed the assignment to new heights; unfortunately, different algorithms come with different strengths and weaknesses and choosing the appropriate algorithm poses a challenge for the user. Here we introduce PeptideForest, a semisupervised machine learning approach that integrates the assignments of multiple algorithms to train a random forest classifier to alleviate that issue. Additionally, PeptideForest increases the number of peptide-to-spectrum matches that exhibit a q-value lower than 1% by 25.2 ± 1.6% compared to MS-GF+ data on samples containing mixed HEK and Escherichia coli proteomes. However, an increase in quantity does not necessarily reflect an increase in quality and this is why we devised a novel approach to determine the quality of the assigned spectra through TMT quantification of samples with known ground truths. Thereby, we could show that the increase in PSMs below 1% q-value does not come with a decrease in quantification quality and as such PeptideForest offers a possibility to gain deeper insights into bottom-up proteomics. PeptideForest has been integrated into our pipeline framework Ursgal and can therefore be combined with a wide array of algorithms.
Collapse
Affiliation(s)
- Tristan Ranff
- Institute of Pharmacy and Molecular Biotechnology, Heidelberg University, 69120 Heidelberg, Germany
- Cellzome, A GSK Company, Heidelberg 69117, Germany
- GSK/RDDT/QEL/DE─Data Streams and Operation, Heidelberg 69117, Germany
| | | | - Jeroen Bédorf
- Minds.ai, Santa Cruz, California 95060, United States
| | - Stefan Schulze
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Thomas H. Gosnell School of Life Sciences, Rochester Institute of Technology, Rochester, New York 14608, United States
| | - Nico Zinn
- Cellzome, A GSK Company, Heidelberg 69117, Germany
| | | | | | - Christian Fufezan
- Institute of Pharmacy and Molecular Biotechnology, Heidelberg University, 69120 Heidelberg, Germany
- Cellzome, A GSK Company, Heidelberg 69117, Germany
- GSK/RDDT/QEL/DE─Data Streams and Operation, Heidelberg 69117, Germany
| |
Collapse
|
11
|
Souza‐Silva IM, Carregari VC, Steckelings UM, Verano‐Braga T. Phosphoproteomics for studying signaling pathways evoked by hormones of the renin-angiotensin system: A source of untapped potential. Acta Physiol (Oxf) 2025; 241:e14280. [PMID: 39821680 PMCID: PMC11737475 DOI: 10.1111/apha.14280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 12/20/2024] [Accepted: 12/31/2024] [Indexed: 01/19/2025]
Abstract
The Renin-Angiotensin System (RAS) is a complex neuroendocrine system consisting of a single precursor protein, angiotensinogen (AGT), which is processed into various peptide hormones, including the angiotensins [Ang I, Ang II, Ang III, Ang IV, Ang-(1-9), Ang-(1-7), Ang-(1-5), etc] and Alamandine-related peptides [Ang A, Alamandine, Ala-(1-5)], through intricate enzymatic pathways. Functionally, the RAS is divided into two axes with opposing effects: the classical axis, primarily consisting of Ang II acting through the AT1 receptor (AT1R), and in contrast the protective axis, which includes the receptors Mas, AT2R and MrgD and their respective ligands. A key area of RAS research is to gain a better understanding how signaling cascades elicited by these receptors lead to either "classical" or "protective" effects, as imbalances between the two axes can contribute to disease. On the other hand, therapeutic benefits can be achieved by selectively activating protective receptors and their associated signaling pathways. Traditionally, robust "hypothesis-driven" methods like Western blotting have built a solid knowledge foundation on RAS signaling. In this review, we introduce untargeted mass spectrometry-based phosphoproteomics, a "hypothesis-generating approach", to explore RAS signaling pathways. This technology enables the unbiased discovery of phosphorylation events, offering insights into previously unknown signaling mechanisms. We review the existing studies which used phosphoproteomics to study RAS signaling and discuss potential future applications of phosphoproteomics in RAS research including advantages and limitations. Ultimately, phosphoproteomics represents a so far underused tool for deepening our understanding of RAS signaling and unveiling novel therapeutic targets.
Collapse
Affiliation(s)
- Igor Maciel Souza‐Silva
- Max‐Delbrück‐Center for Molecular Medicine in the Helmholtz AssociationBerlinGermany
- Department of Molecular Medicine, Cardiovascular and Renal Research UnitUniversity of Southern DenmarkOdense MDenmark
| | - Victor Corasolla Carregari
- Laboratório de Neuroproteômica, Instituto de BiologiaUniversidade de CampinasSão PauloBrazil
- Department of Biochemistry and Molecular Biology, Protein Research GroupUniversity of Southern DenmarkOdense MDenmark
| | - U. Muscha Steckelings
- Department of Molecular Medicine, Cardiovascular and Renal Research UnitUniversity of Southern DenmarkOdense MDenmark
| | - Thiago Verano‐Braga
- Department of Molecular Medicine, Cardiovascular and Renal Research UnitUniversity of Southern DenmarkOdense MDenmark
- Departamento de Fisiologia e BiofísicaUniversidade Federal de Minas GeraisBelo HorizonteMinas GeraisBrazil
- Instituto Nacional de Ciência e Tecnologia Em Nanobiofarmacêutica (INCT‐Nanobiofar)Universidade Federal de Minas GeraisBelo HorizonteMinas GeraisBrazil
| |
Collapse
|
12
|
Van Den Bossche T, Beslic D, van Puyenbroeck S, Suomi T, Holstein T, Martens L, Elo LL, Muth T. Metaproteomics Beyond Databases: Addressing the Challenges and Potentials of De Novo Sequencing. Proteomics 2025:e202400321. [PMID: 39888246 DOI: 10.1002/pmic.202400321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 01/09/2025] [Accepted: 01/10/2025] [Indexed: 02/01/2025]
Abstract
Metaproteomics enables the large-scale characterization of microbial community proteins, offering crucial insights into their taxonomic composition, functional activities, and interactions within their environments. By directly analyzing proteins, metaproteomics offers insights into community phenotypes and the roles individual members play in diverse ecosystems. Although database-dependent search engines are commonly used for peptide identification, they rely on pre-existing protein databases, which can be limiting for complex, poorly characterized microbiomes. De novo sequencing presents a promising alternative, which derives peptide sequences directly from mass spectra without requiring a database. Over time, this approach has evolved from manual annotation to advanced graph-based, tag-based, and deep learning-based methods, significantly improving the accuracy of peptide identification. This Viewpoint explores the evolution, advantages, limitations, and future opportunities of de novo sequencing in metaproteomics. We highlight recent technological advancements that have improved its potential for detecting unsequenced species and for providing deeper functional insights into microbial communities.
Collapse
Affiliation(s)
- Tim Van Den Bossche
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Denis Beslic
- Centre for Artificial Intelligence in Public Health Research, Robert Koch Institute, Berlin, Germany
| | - Sam van Puyenbroeck
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Tomi Suomi
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Tanja Holstein
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
- Data Competence Center MF 2, Robert Koch Institute, Berlin, Germany
| | - Lennart Martens
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
- Institute of Biomedicine, University of Turku, Turku, Finland
| | - Thilo Muth
- Data Competence Center MF 2, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
13
|
Zhu C, Liu LY, Ha A, Yamaguchi TN, Zhu H, Hugh-White R, Livingstone J, Patel Y, Kislinger T, Boutros PC. moPepGen: Rapid and Comprehensive Identification of Non-canonical Peptides. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.28.587261. [PMID: 38585946 PMCID: PMC10996593 DOI: 10.1101/2024.03.28.587261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Gene expression is a multi-step transformation of biological information from its storage form (DNA) into functional forms (protein and some RNAs). Regulatory activities at each step of this transformation multiply a single gene into a myriad of proteoforms. Proteogenomics is the study of how genomic and transcriptomic variation creates this proteomic diversity, and is limited by the challenges of modeling the complexities of gene-expression. We therefore created moPepGen, a graph-based algorithm that comprehensively generates non-canonical peptides in linear time. moPepGen works with multiple technologies, in multiple species and on all types of genetic and transcriptomic data. In human cancer proteomes, it enumerates previously unobservable noncanonical peptides arising from germline and somatic genomic variants, noncoding open reading frames, RNA fusions and RNA circularization. By enabling efficient detection and quantitation of previously hidden proteins in both existing and new proteomic data, moPepGen facilitates all proteogenomics applications. It is available at: https://github.com/uclahs-cds/package-moPepGen.
Collapse
Affiliation(s)
- Chenghao Zhu
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
- Department of Urology, University of California, Los Angeles, CA, USA
| | - Lydia Y. Liu
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| | - Annie Ha
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Takafumi N. Yamaguchi
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Helen Zhu
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| | - Rupert Hugh-White
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Julie Livingstone
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Yash Patel
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | - Thomas Kislinger
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Paul C. Boutros
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
- Department of Urology, University of California, Los Angeles, CA, USA
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
| |
Collapse
|
14
|
Le Bihan T, McDonald Z, Celejewski KR, Liu Q, Ma B. Enhancing De Novo Protein Sequencing through the C-Terminal Labeling Strategy: Resolving Isobaric Ambiguities by Electron-Transfer/Higher Energy Collision Dissociation (EThcD). Anal Chem 2024; 96:16802-16810. [PMID: 39388386 DOI: 10.1021/acs.analchem.4c03459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
De novo protein sequencing via a bottom-up approach requires various proteases to produce overlapping peptides. However, peptides generated by proteases other than trypsin, LysC, and ArgC often yield C-terminal fragments with suboptimal ionization in positive mode mass spectrometry (MS). This study introduces a novel peptide labeling strategy that involves modifying peptides at the C-terminal and at the carboxyl groups of Aspartic and Glutamic acid with arginine methyl ester (R-met) to improve peptide fragmentation and resolve isobaric ambiguities encountered during sequencing. An amidation reaction is used with coupling reagents to conjugate R-met to the peptide's C-terminal end, introducing a functional group that enhances the detectability of C-terminal peptide fragment ions by mass spectrometry. Subsequently, selecting a charge state of +2 or higher can facilitate optimal fragmentation of the derivatized peptides using electron-transfer/higher energy collision dissociation (EThcD), thereby generating essential w-ions to resolve common isobaric ambiguities. Demonstrating this strategy across diverse protein types, including albumin and antibodies and using different proteases for digestion, highlights the unique characteristics of combining the proposed amidation reaction with the specific proteases tested.
Collapse
Affiliation(s)
| | - Zac McDonald
- Rapid Novor, 137 Glasgow St, Kitchener N2G 4X8, Ontario, Canada
| | | | - Qixin Liu
- Rapid Novor, 137 Glasgow St, Kitchener N2G 4X8, Ontario, Canada
| | - Bin Ma
- Rapid Novor, 137 Glasgow St, Kitchener N2G 4X8, Ontario, Canada
| |
Collapse
|
15
|
Le Bihan T, Nunez de Villavicencio Diaz T, Reitzel C, Lange V, Park M, Beadle E, Wu L, Jovic M, Dubois RM, Couzens AL, Duan J, Han X, Liu Q, Ma B. De novo protein sequencing of antibodies for identification of neutralizing antibodies in human plasma post SARS-CoV-2 vaccination. Nat Commun 2024; 15:8790. [PMID: 39389968 PMCID: PMC11466954 DOI: 10.1038/s41467-024-53105-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 10/02/2024] [Indexed: 10/12/2024] Open
Abstract
The antibody response to vaccination and infection is a key component of the immune response to pathogens. Sequencing of peripheral B cells may not represent the complete B cell receptor repertoire. Here we present a method for sequencing human plasma-derived polyclonal IgG using a combination of mass spectrometry and B-cell sequencing. We investigate the IgG response to the Moderna Spikevax COVID-19 vaccine. From the sequencing data of the natural polyclonal response to vaccination, we generate 12 recombinant antibodies. Six derived recombinant antibodies, including four generated with de novo protein sequencing, exhibit similar or higher binding affinities than the original natural polyclonal antibody. Neutralization tests reveal that the six antibodies possess neutralizing capabilities against the target antigen. This research provides insights into sequencing polyclonal IgG antibodies and the potential of our approach in generating recombinant antibodies with robust binding affinity and neutralization capabilities. Directly examining the circulating IgG pool is crucial due to potential misrepresentations by B-cell analysis alone.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Lin Wu
- Rapid Novor, Kitchener, ON, Canada
| | | | | | | | - Jin Duan
- Rapid Novor, Kitchener, ON, Canada
| | | | | | - Bin Ma
- Rapid Novor, Kitchener, ON, Canada.
| |
Collapse
|
16
|
Martins AMA, D M Santos M, C Camillo-Andrade A, Leite AL, Souza JS, Sánchez S, Muotri AR, Carvalho PC, Yates JR. Integrating DIA Single-Cell Proteomics Data with the DiagnoMass Proteomic Hub for Biological Insights. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:2308-2314. [PMID: 39258941 DOI: 10.1021/jasms.4c00187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Single-cell proteomics has emerged as a powerful technology for unraveling the complexities of cellular heterogeneity, enabling insights into individual cell functions and pathologies. One of the primary challenges in single-cell proteomics is data generation, where low mass spectral signals often preclude the triggering of MS2 events. This challenge is addressed by Data Independent Acquisition (DIA), a data acquisition strategy that does not depend on peptide ion isotopic signatures to generate an MS2 event. In this study, we present data generated from the integration of DIA single-cell proteomics with a version of the DiagnoMass Proteomic Hub that was adapted to handle DIA data. DiagnoMass employs a hierarchical clustering methodology that enables the identification of tandem mass spectral clusters that are discriminative of biological conditions, thereby reducing the reliance on search engine biases for identifications. Nevertheless, a search engine (in this work, DIA-NN) can be integrated with DiagnoMass for spectral annotation. We used single-cell proteomic data from iPSC-derived neuroprogenitor cell cultures as a test study of this integrated approach. We were able to differentiate between control and Rett Syndrome patient cells to discern the proteomic variances potentially contributing to the disease's pathology. Our research confirms that the DiagnoMass-DIA synergy significantly enhances the identification of discriminative proteomic signatures, highlighting critical biological variations such as the presence of unique spectra that could be related to Rett Syndrome pathology.
Collapse
Affiliation(s)
- Aline M A Martins
- Departments of Molecular Medicine and Neurobiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR302, La Jolla, California 92037, United States
| | - Marlon D M Santos
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, Fiocruz - Paraná, R. Professor Algacyr Munhoz Mader, 3775 Curitiba, PR, Brazil
- Analytical Biochemistry and Proteomics Unit, Instituto de Investigaciones Biológicas Clemente Estable, Institut Pasteur de Montevideo, Mataojo 2020, 11400 Montevideo, Uruguay
| | - Amanda C Camillo-Andrade
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, Fiocruz - Paraná, R. Professor Algacyr Munhoz Mader, 3775 Curitiba, PR, Brazil
| | - Aline Lima Leite
- Bruker Daltonics Corporation, USA, 40 Manning Rd, Billerica, Massachusetts 01821, United States
| | - Janaina Sena Souza
- Department of Pediatrics, Sanford Consortium for Regenerative Medicine, UCSD, 2880 Torrey Pines Scenic Dr, La Jolla, California 92037, United States
| | - Sandra Sánchez
- Department of Pediatrics, Sanford Consortium for Regenerative Medicine, UCSD, 2880 Torrey Pines Scenic Dr, La Jolla, California 92037, United States
| | - Alysson R Muotri
- Department of Pediatrics, Sanford Consortium for Regenerative Medicine, UCSD, 2880 Torrey Pines Scenic Dr, La Jolla, California 92037, United States
| | - Paulo Costa Carvalho
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, Fiocruz - Paraná, R. Professor Algacyr Munhoz Mader, 3775 Curitiba, PR, Brazil
| | - John R Yates
- Departments of Molecular Medicine and Neurobiology, The Scripps Research Institute, 10550 North Torrey Pines Road, SR302, La Jolla, California 92037, United States
| |
Collapse
|
17
|
Tran NH, Qiao R, Mao Z, Pan S, Zhang Q, Li W, Xin L, Li M, Shan B. NovoBoard: A Comprehensive Framework for Evaluating the False Discovery Rate and Accuracy of De Novo Peptide Sequencing. Mol Cell Proteomics 2024; 23:100849. [PMID: 39321875 PMCID: PMC11532909 DOI: 10.1016/j.mcpro.2024.100849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 08/27/2024] [Accepted: 09/18/2024] [Indexed: 09/27/2024] Open
Abstract
De novo peptide sequencing is one of the most fundamental research areas in mass spectrometry-based proteomics. Many methods have often been evaluated using a couple of simple metrics that do not fully reflect their overall performance. Moreover, there has not been an established method to estimate the false discovery rate (FDR) of de novo peptide-spectrum matches. Here we propose NovoBoard, a comprehensive framework to evaluate the performance of de novo peptide-sequencing methods. The framework consists of diverse benchmark datasets (including tryptic, nontryptic, immunopeptidomics, and different species) and a standard set of accuracy metrics to evaluate the fragment ions, amino acids, and peptides of the de novo results. More importantly, a new approach is designed to evaluate de novo peptide-sequencing methods on target-decoy spectra and to estimate and validate their FDRs. Our FDR estimation provides valuable information to assess the reliability of new peptides identified by de novo sequencing tools, especially when no ground-truth information is available to evaluate their accuracy. The FDR estimation can also be used to evaluate the capability of de novo peptide sequencing tools to distinguish between de novo peptide-spectrum matches and random matches. Our results thoroughly reveal the strengths and weaknesses of different de novo peptide-sequencing methods and how their performances depend on specific applications and the types of data.
Collapse
Affiliation(s)
| | - Rui Qiao
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada
| | - Zeping Mao
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada; David R. Cheriton School of Computer Science, University of Waterloo, Ontario, Canada
| | - Shengying Pan
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada
| | - Qing Zhang
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada
| | - Wenting Li
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada
| | - Lei Xin
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada.
| | - Ming Li
- David R. Cheriton School of Computer Science, University of Waterloo, Ontario, Canada.
| | - Baozhen Shan
- Bioinformatics Solutions Inc, Waterloo, Ontario, Canada.
| |
Collapse
|
18
|
Jiang Y, Rex DA, Schuster D, Neely BA, Rosano GL, Volkmar N, Momenzadeh A, Peters-Clarke TM, Egbert SB, Kreimer S, Doud EH, Crook OM, Yadav AK, Vanuopadath M, Hegeman AD, Mayta M, Duboff AG, Riley NM, Moritz RL, Meyer JG. Comprehensive Overview of Bottom-Up Proteomics Using Mass Spectrometry. ACS MEASUREMENT SCIENCE AU 2024; 4:338-417. [PMID: 39193565 PMCID: PMC11348894 DOI: 10.1021/acsmeasuresciau.3c00068] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 05/03/2024] [Accepted: 05/03/2024] [Indexed: 08/29/2024]
Abstract
Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification. "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry. Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural alterations, absolute and relative protein quantification, post-translational modifications, and protein stability. To enable this range of different experiments, there are diverse strategies for proteome analysis. The nuances of how proteomic workflows differ may be challenging to understand for new practitioners. Here, we provide a comprehensive overview of different proteomics methods. We cover from biochemistry basics and protein extraction to biological interpretation and orthogonal validation. We expect this Review will serve as a handbook for researchers who are new to the field of bottom-up proteomics.
Collapse
Affiliation(s)
- Yuming Jiang
- Department
of Computational Biomedicine, Cedars Sinai
Medical Center, Los Angeles, California 90048, United States
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| | - Devasahayam Arokia
Balaya Rex
- Center for
Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Dina Schuster
- Department
of Biology, Institute of Molecular Systems
Biology, ETH Zurich, Zurich 8093, Switzerland
- Department
of Biology, Institute of Molecular Biology
and Biophysics, ETH Zurich, Zurich 8093, Switzerland
- Laboratory
of Biomolecular Research, Division of Biology and Chemistry, Paul Scherrer Institute, Villigen 5232, Switzerland
| | - Benjamin A. Neely
- Chemical
Sciences Division, National Institute of
Standards and Technology, NIST, Charleston, South Carolina 29412, United States
| | - Germán L. Rosano
- Mass
Spectrometry
Unit, Institute of Molecular and Cellular
Biology of Rosario, Rosario, 2000 Argentina
| | - Norbert Volkmar
- Department
of Biology, Institute of Molecular Systems
Biology, ETH Zurich, Zurich 8093, Switzerland
| | - Amanda Momenzadeh
- Department
of Computational Biomedicine, Cedars Sinai
Medical Center, Los Angeles, California 90048, United States
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| | - Trenton M. Peters-Clarke
- Department
of Pharmaceutical Chemistry, University
of California—San Francisco, San Francisco, California, 94158, United States
| | - Susan B. Egbert
- Department
of Chemistry, University of Manitoba, Winnipeg, Manitoba, R3T 2N2 Canada
| | - Simion Kreimer
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| | - Emma H. Doud
- Center
for Proteome Analysis, Indiana University
School of Medicine, Indianapolis, Indiana, 46202-3082, United States
| | - Oliver M. Crook
- Oxford
Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United
Kingdom
| | - Amit Kumar Yadav
- Translational
Health Science and Technology Institute, NCR Biotech Science Cluster 3rd Milestone Faridabad-Gurgaon
Expressway, Faridabad, Haryana 121001, India
| | | | - Adrian D. Hegeman
- Departments
of Horticultural Science and Plant and Microbial Biology, University of Minnesota, Twin Cities, Minnesota 55108, United States
| | - Martín
L. Mayta
- School
of Medicine and Health Sciences, Center for Health Sciences Research, Universidad Adventista del Plata, Libertador San Martin 3103, Argentina
- Molecular
Biology Department, School of Pharmacy and Biochemistry, Universidad Nacional de Rosario, Rosario 2000, Argentina
| | - Anna G. Duboff
- Department
of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Nicholas M. Riley
- Department
of Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Robert L. Moritz
- Institute
for Systems biology, Seattle, Washington 98109, United States
| | - Jesse G. Meyer
- Department
of Computational Biomedicine, Cedars Sinai
Medical Center, Los Angeles, California 90048, United States
- Smidt Heart
Institute, Cedars Sinai Medical Center, Los Angeles, California 90048, United States
- Advanced
Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los
Angeles, California 90048, United States
| |
Collapse
|
19
|
Flender D, Vilenne F, Adams C, Boonen K, Valkenborg D, Baggerman G. Exploring the dynamic landscape of immunopeptidomics: Unravelling posttranslational modifications and navigating bioinformatics terrain. MASS SPECTROMETRY REVIEWS 2024. [PMID: 39152539 DOI: 10.1002/mas.21905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 07/30/2024] [Accepted: 08/01/2024] [Indexed: 08/19/2024]
Abstract
Immunopeptidomics is becoming an increasingly important field of study. The capability to identify immunopeptides with pivotal roles in the human immune system is essential to shift the current curative medicine towards personalized medicine. Throughout the years, the field has matured, giving insight into the current pitfalls. Nowadays, it is commonly accepted that generalizing shotgun proteomics workflows is malpractice because immunopeptidomics faces numerous challenges. While many of these difficulties have been addressed, the road towards the ideal workflow remains complicated. Although the presence of Posttranslational modifications (PTMs) in the immunopeptidome has been demonstrated, their identification remains highly challenging despite their significance for immunotherapies. The large number of unpredictable modifications in the immunopeptidome plays a pivotal role in the functionality and these challenges. This review provides a comprehensive overview of the current advancements in immunopeptidomics. We delve into the challenges associated with identifying PTMs within the immunopeptidome, aiming to address the current state of the field.
Collapse
Affiliation(s)
- Daniel Flender
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- Health Unit, VITO, Mol, Belgium
| | - Frédérique Vilenne
- Health Unit, VITO, Mol, Belgium
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Charlotte Adams
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Centre for Proteomics, University of Antwerp, Antwerpen, Belgium
- ImmuneSpec, Niel, Belgium
| | - Dirk Valkenborg
- Data Science Institute, University of Hasselt, Hasselt, Belgium
| | - Geert Baggerman
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
- ImmuneSpec, Niel, Belgium
| |
Collapse
|
20
|
Tan Y, Li M, Zhou Z, Tan P, Yu H, Fan G, Hong L. PETA: evaluating the impact of protein transfer learning with sub-word tokenization on downstream applications. J Cheminform 2024; 16:92. [PMID: 39095917 PMCID: PMC11297785 DOI: 10.1186/s13321-024-00884-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 07/13/2024] [Indexed: 08/04/2024] Open
Abstract
Protein language models (PLMs) play a dominant role in protein representation learning. Most existing PLMs regard proteins as sequences of 20 natural amino acids. The problem with this representation method is that it simply divides the protein sequence into sequences of individual amino acids, ignoring the fact that certain residues often occur together. Therefore, it is inappropriate to view amino acids as isolated tokens. Instead, the PLMs should recognize the frequently occurring combinations of amino acids as a single token. In this study, we use the byte-pair-encoding algorithm and unigram to construct advanced residue vocabularies for protein sequence tokenization, and we have shown that PLMs pre-trained using these advanced vocabularies exhibit superior performance on downstream tasks when compared to those trained with simple vocabularies. Furthermore, we introduce PETA, a comprehensive benchmark for systematically evaluating PLMs. We find that vocabularies comprising 50 and 200 elements achieve optimal performance. Our code, model weights, and datasets are available at https://github.com/ginnm/ProteinPretraining . SCIENTIFIC CONTRIBUTION: This study introduces advanced protein sequence tokenization analysis, leveraging the byte-pair-encoding algorithm and unigram. By recognizing frequently occurring combinations of amino acids as single tokens, our proposed method enhances the performance of PLMs on downstream tasks. Additionally, we present PETA, a new comprehensive benchmark for the systematic evaluation of PLMs, demonstrating that vocabularies of 50 and 200 elements offer optimal performance.
Collapse
Affiliation(s)
- Yang Tan
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Science, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China
- Chongqing Artificial Intelligence Research Institute of Shanghai Jiao Tong University, Chongqing, 200240, China
| | - Mingchen Li
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Science, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China
- Chongqing Artificial Intelligence Research Institute of Shanghai Jiao Tong University, Chongqing, 200240, China
| | - Ziyi Zhou
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Science, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Pan Tan
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Science, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China
| | - Huiqun Yu
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
| | - Guisheng Fan
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.
| | - Liang Hong
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Science, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China.
- Chongqing Artificial Intelligence Research Institute of Shanghai Jiao Tong University, Chongqing, 200240, China.
| |
Collapse
|
21
|
Schulte D, Snijder J. A Handle on Mass Coincidence Errors in De Novo Sequencing of Antibodies by Bottom-up Proteomics. J Proteome Res 2024; 23:3552-3559. [PMID: 38932690 PMCID: PMC11301774 DOI: 10.1021/acs.jproteome.4c00188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 05/29/2024] [Accepted: 06/20/2024] [Indexed: 06/28/2024]
Abstract
Antibody sequences can be determined at 99% accuracy directly from the polypeptide product by using bottom-up proteomics techniques. Sequencing accuracy at the peptide level is limited by the isobaric residues leucine and isoleucine, incomplete fragmentation spectra in which the order of two or more residues remains ambiguous due to lacking fragment ions for the intermediate positions, and isobaric combinations of amino acids, of potentially different lengths, for example, GG = N and GA = Q. Here, we present several updates to Stitch (v1.5), which performs template-based assembly of de novo peptides to reconstruct antibody sequences. This version introduces a mass-based alignment algorithm that explicitly accounts for mass coincidence errors. In addition, it incorporates a postprocessing procedure to assign I/L residues based on secondary fragments (satellite ions, i.e., w-ions). Moreover, evidence for sequence assignments can now be directly evaluated with the addition of an integrated spectrum viewer. Lastly, input data from a wider selection of de novo peptide sequencing algorithms are allowed, now including Casanovo, PEAKS, Novor.Cloud, pNovo, and MaxNovo, in addition to flat text and FASTA. Combined, these changes make Stitch compatible with a larger range of data processing pipelines and improve its tolerance to peptide-level sequencing errors.
Collapse
Affiliation(s)
- Douwe Schulte
- Biomolecular Mass Spectrometry
and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht
Institute of Pharmaceutical Sciences, Utrecht
University, Padualaan 8, Utrecht 3584
CH, The Netherlands
| | - Joost Snijder
- Biomolecular Mass Spectrometry
and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht
Institute of Pharmaceutical Sciences, Utrecht
University, Padualaan 8, Utrecht 3584
CH, The Netherlands
| |
Collapse
|
22
|
Yilmaz M, Fondrie WE, Bittremieux W, Melendez CF, Nelson R, Ananth V, Oh S, Noble WS. Sequence-to-sequence translation from mass spectra to peptides with a transformer model. Nat Commun 2024; 15:6427. [PMID: 39080256 PMCID: PMC11289372 DOI: 10.1038/s41467-024-49731-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 06/18/2024] [Indexed: 08/02/2024] Open
Abstract
A fundamental challenge in mass spectrometry-based proteomics is the identification of the peptide that generated each acquired tandem mass spectrum. Approaches that leverage known peptide sequence databases cannot detect unexpected peptides and can be impractical or impossible to apply in some settings. Thus, the ability to assign peptide sequences to tandem mass spectra without prior information-de novo peptide sequencing-is valuable for tasks including antibody sequencing, immunopeptidomics, and metaproteomics. Although many methods have been developed to address this problem, it remains an outstanding challenge in part due to the difficulty of modeling the irregular data structure of tandem mass spectra. Here, we describe Casanovo, a machine learning model that uses a transformer neural network architecture to translate the sequence of peaks in a tandem mass spectrum into the sequence of amino acids that comprise the generating peptide. We train a Casanovo model from 30 million labeled spectra and demonstrate that the model outperforms several state-of-the-art methods on a cross-species benchmark dataset. We also develop a version of Casanovo that is fine-tuned for non-enzymatic peptides. Finally, we demonstrate that Casanovo's superior performance improves the analysis of immunopeptidomics and metaproteomics experiments and allows us to delve deeper into the dark proteome.
Collapse
Affiliation(s)
- Melih Yilmaz
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | | | - Wout Bittremieux
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Carlo F Melendez
- Department of Genome Sciences, University of Washington, Seattle, USA
| | - Rowan Nelson
- Department of Genome Sciences, University of Washington, Seattle, USA
| | - Varun Ananth
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, USA.
- Department of Genome Sciences, University of Washington, Seattle, USA.
| |
Collapse
|
23
|
Lê Quý K, Chernigovskaya M, Stensland M, Singh S, Leem J, Revale S, Yadin DA, Nice FL, Povall C, Minns DH, Galson JD, Nyman TA, Snapkow I, Greiff V. Benchmarking and integrating human B-cell receptor genomic and antibody proteomic profiling. NPJ Syst Biol Appl 2024; 10:73. [PMID: 38997321 PMCID: PMC11245537 DOI: 10.1038/s41540-024-00402-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 07/01/2024] [Indexed: 07/14/2024] Open
Abstract
Immunoglobulins (Ig), which exist either as B-cell receptors (BCR) on the surface of B cells or as antibodies when secreted, play a key role in the recognition and response to antigenic threats. The capability to jointly characterize the BCR and antibody repertoire is crucial for understanding human adaptive immunity. From peripheral blood, bulk BCR sequencing (bulkBCR-seq) currently provides the highest sampling depth, single-cell BCR sequencing (scBCR-seq) allows for paired chain characterization, and antibody peptide sequencing by tandem mass spectrometry (Ab-seq) provides information on the composition of secreted antibodies in the serum. Yet, it has not been benchmarked to what extent the datasets generated by these three technologies overlap and complement each other. To address this question, we isolated peripheral blood B cells from healthy human donors and sequenced BCRs at bulk and single-cell levels, in addition to utilizing publicly available sequencing data. Integrated analysis was performed on these datasets, resolved by replicates and across individuals. Simultaneously, serum antibodies were isolated, digested with multiple proteases, and analyzed with Ab-seq. Systems immunology analysis showed high concordance in repertoire features between bulk and scBCR-seq within individuals, especially when replicates were utilized. In addition, Ab-seq identified clonotype-specific peptides using both bulk and scBCR-seq library references, demonstrating the feasibility of combining scBCR-seq and Ab-seq for reconstructing paired-chain Ig sequences from the serum antibody repertoire. Collectively, our work serves as a proof-of-principle for combining bulk sequencing, single-cell sequencing, and mass spectrometry as complementary methods towards capturing humoral immunity in its entirety.
Collapse
Grants
- The Leona M. and Harry B. Helmsley Charitable Trust (#2019PG-T1D011, to VG), UiO World-Leading Research Community (to VG), UiO: LifeScience Convergence Environment Immunolingo (to VG), EU Horizon 2020 iReceptorplus (#825821) (to VG), a Norwegian Cancer Society Grant (#215817, to VG), Research Council of Norway projects (#300740, (#311341, #331890 to VG), a Research Council of Norway IKTPLUSS project (#311341, to VG). This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 101007799 (Inno4Vac). This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA (to VG).
- Mass spectrometry-based proteomic analyses were performed by the Proteomics Core Facility, Department of Immunology, University of Oslo/Oslo University Hospital, which is supported by the Core Facilities program of the South-Eastern Norway Regional Health Authority. This core facility is also a member of the National Network of Advanced Proteomics Infrastructure (NAPI), which is funded by the Research Council of Norway INFRASTRUKTUR-program (project number: 295910).
Collapse
Affiliation(s)
- Khang Lê Quý
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Maria Chernigovskaya
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Maria Stensland
- Proteomics Core Facility, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Sachin Singh
- Proteomics Core Facility, University of Oslo and Oslo University Hospital, Oslo, Norway
| | | | | | | | | | | | | | | | - Tuula A Nyman
- Proteomics Core Facility, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Igor Snapkow
- Department of Chemical Toxicology, Norwegian Institute of Public Health, Oslo, Norway
| | - Victor Greiff
- Department of Immunology, University of Oslo and Oslo University Hospital, Oslo, Norway.
| |
Collapse
|
24
|
Galland F, de Espindola JS, Sacilotto ES, Almeida LGVC, Morari J, Velloso LA, Dos Santos LD, Rossini BC, Bertoldo Pacheco MT. Digestion of whey peptide induces antioxidant and anti-inflammatory bioactivity on glial cells: Sequences identification and structural activity analysis. Food Res Int 2024; 188:114433. [PMID: 38823827 DOI: 10.1016/j.foodres.2024.114433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 04/22/2024] [Accepted: 04/24/2024] [Indexed: 06/03/2024]
Abstract
Whey derived peptides have shown potential activity improving brain function in pathological condition. However, there is little information about their mechanism of action on glial cells, which have important immune functions in brain. Astrocytes and microglia are essential in inflammatory and oxidative defense that take place in neurodegenerative disease. In this work we evaluate antioxidant and anti-inflammatory potential bioactivity of whey peptide in glial cells. Peptides were formed during simulated gastrointestinal digestion (Infogest protocol), and low molecular weight (<5kDA) peptides (WPHf) attenuated reactive oxygen species (ROS) production induced by hydrogen peroxide stimulus in both cells in dose-dependent manner. WPHf induced an increase in the antioxidant glutathione (GSH) content and prevented GSH reduction induced by lipopolysaccharides (LPS) stimulus in astrocytes cells in a cell specific form. An increase in cytokine mRNA expression (TNFα and IL6) and nitric oxide secretion induced by LPS was attenuated by WPHf pre-treatment in both cells. The inflammatory pathway was dependent on NFκB activation. Bioactive peptide ranking analysis showed positive correlation with hydrophobicity and negative correlation with high molecular weights. The sequence identification revealed 19 peptides cross-referred with bioactive database. Whey peptides were rich in leucine, valine and tyrosine in the C-terminal region and lysine in the N-terminal region. The anti-inflammatory and antioxidant potential of whey peptides were assessed in glia cells and its mechanisms of action were related, such as modulation of antioxidant enzymes and anti-inflammatory pathways. Features of the peptide structure, such as molecular size, hydrophobicity and types of amino acids present in the terminal region are associated to bioactivity.
Collapse
Affiliation(s)
- Fabiana Galland
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brazil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil.
| | - Juliana Santos de Espindola
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brazil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil
| | - Eduarda Spagnol Sacilotto
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brazil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil
| | - Lilian Gabriely V C Almeida
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brazil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil
| | - Joseane Morari
- Obesity and Comorbidities Research Center (OCRC), University of Campinas, São Paulo, Brazil
| | - Lício Augusto Velloso
- Obesity and Comorbidities Research Center (OCRC), University of Campinas, São Paulo, Brazil.
| | | | - Bruno Cesar Rossini
- Institute of Biotechnology, São Paulo State University (UNESP), Botucatu, SP 18607-440, Brazil.
| | - Maria Teresa Bertoldo Pacheco
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brazil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil.
| |
Collapse
|
25
|
Petrovskiy DV, Nikolsky KS, Kulikova LI, Rudnev VR, Butkova TV, Malsagova KA, Kopylov AT, Kaysheva AL. PowerNovo: de novo peptide sequencing via tandem mass spectrometry using an ensemble of transformer and BERT models. Sci Rep 2024; 14:15000. [PMID: 38951578 PMCID: PMC11217302 DOI: 10.1038/s41598-024-65861-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 06/25/2024] [Indexed: 07/03/2024] Open
Abstract
The primary objective of analyzing the data obtained in a mass spectrometry-based proteomic experiment is peptide and protein identification, or correct assignment of the tandem mass spectrum to one amino acid sequence. Comparison of empirical fragment spectra with the theoretical predicted one or matching with the collected spectra library are commonly accepted strategies of proteins identification and defining of their amino acid sequences. Although these approaches are widely used and are appreciably efficient for the well-characterized model organisms or measured proteins, they cannot detect novel peptide sequences that have not been previously annotated or are rare. This study presents PowerNovo tool for de novo sequencing of proteins using tandem mass spectra acquired in a variety of types of mass analyzers and different fragmentation techniques. PowerNovo involves an ensemble of models for peptide sequencing: model for detecting regularities in tandem mass spectra, precursors, and fragment ions and a natural language processing model, which has a function of peptide sequence quality assessment and helps with reconstruction of noisy sequences. The results of testing showed that the performance of PowerNovo is comparable and even better than widely utilized PointNovo, DeepNovo, Casanovo, and Novor packages. Also, PowerNovo provides complete cycle of processing (pipeline) of mass spectrometry data and, along with predicting the peptide sequence, involves the peptide assembly and protein inference blocks.
Collapse
|
26
|
Ebrahimi S, Guo X. Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. ARXIV 2024:arXiv:2402.11363v3. [PMID: 38659639 PMCID: PMC11042412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce Transformer-DIA, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Transformer-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our Transformer-DIA model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Transformer-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/Transformer-DIA.
Collapse
Affiliation(s)
- Shiva Ebrahimi
- Computer Science & Engineering, University of North Texas, Denton, USA
| | - Xuan Guo
- Computer Science & Engineering, University of North Texas, Denton, USA
| |
Collapse
|
27
|
Wang X, Liu Y, Yong ZH, Yu XJ, Zhou FD, Zhao MH. Immunoglobulin repertoire sequencing and de novo sequencing - Powerful tools for identifying free light chains from patients with light chain cast nephropathy. Int Immunopharmacol 2024; 135:112302. [PMID: 38772298 DOI: 10.1016/j.intimp.2024.112302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 05/02/2024] [Accepted: 05/16/2024] [Indexed: 05/23/2024]
Abstract
In patients with light chain cast nephropathy (LCCN), abundantly produced monoclonal immunoglobulin free light chains (FLCs) play a vital role in pathogenesis. Determining the precise sequences of patient-derived FLCs is therefore highly desirable. Although immunoglobulin repertoire sequencing (5' RACE-seq) has been proven to be sensitive enough to provide full-length V(D)J region (variable, diversity and joining genes) of FLCs using bone marrow samples, an invasive and bone marrow independent method is still in demand. Here a de novo sequencing workflow based on the bottom-up proteomics for patient-derived FLCs was established. PEAKS software was used for the de novo sequencing of peptides that were further assembled into full-length FLC sequences. This de novo protein sequencing method can obtain the full-length amino acid sequences of FLCs, and had been shown to be as reliable as 5' RACE-seq. The two LCCN sequences derived from above the two methods were identical, and they possessed more hydrophobic or nonpolar amino acids compared with the corresponding germline, which may be associated with the pathogenesis.
Collapse
Affiliation(s)
- Xin Wang
- Renal Division, Department of Medicine, Peking University First Hospital, No. 8, Xishiku Street, Xicheng District, Beijing, China; Peking-Tsinghua Center for Life Sciences, Beijing, China; Institute of Nephrology, Peking University, Beijing, China; Renal Pathology Center, Institute of Nephrology, Peking University, Beijing, China; Key Laboratory of Renal Disease, Ministry of Health of China, Beijing, China; Key Laboratory of CKD Prevention and Treatment, Ministry of Education of China, Beijing, China.
| | - Yi Liu
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education, School of Pharmaceutical Sciences, Wuhan University, Wuhan, China
| | - Zi-Hao Yong
- Department of Basic Medicine, Anhui Medical College, Hefei, Anhui, China
| | - Xiao-Juan Yu
- Renal Division, Department of Medicine, Peking University First Hospital, No. 8, Xishiku Street, Xicheng District, Beijing, China; Institute of Nephrology, Peking University, Beijing, China; Renal Pathology Center, Institute of Nephrology, Peking University, Beijing, China; Key Laboratory of Renal Disease, Ministry of Health of China, Beijing, China; Key Laboratory of CKD Prevention and Treatment, Ministry of Education of China, Beijing, China; Research Units of Diagnosis and Treatment of Immune-Mediated Kidney Diseases, Chinese Academy of Medical Sciences, Beijing, China.
| | - Fu-de Zhou
- Renal Division, Department of Medicine, Peking University First Hospital, No. 8, Xishiku Street, Xicheng District, Beijing, China; Institute of Nephrology, Peking University, Beijing, China; Renal Pathology Center, Institute of Nephrology, Peking University, Beijing, China; Key Laboratory of Renal Disease, Ministry of Health of China, Beijing, China; Key Laboratory of CKD Prevention and Treatment, Ministry of Education of China, Beijing, China; Research Units of Diagnosis and Treatment of Immune-Mediated Kidney Diseases, Chinese Academy of Medical Sciences, Beijing, China
| | - Ming-Hui Zhao
- Renal Division, Department of Medicine, Peking University First Hospital, No. 8, Xishiku Street, Xicheng District, Beijing, China; Peking-Tsinghua Center for Life Sciences, Beijing, China; Institute of Nephrology, Peking University, Beijing, China; Renal Pathology Center, Institute of Nephrology, Peking University, Beijing, China; Key Laboratory of Renal Disease, Ministry of Health of China, Beijing, China; Key Laboratory of CKD Prevention and Treatment, Ministry of Education of China, Beijing, China; Research Units of Diagnosis and Treatment of Immune-Mediated Kidney Diseases, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
28
|
Minegishi Y, Haga Y, Ueda K. Emerging potential of immunopeptidomics by mass spectrometry in cancer immunotherapy. Cancer Sci 2024; 115:1048-1059. [PMID: 38382459 PMCID: PMC11007014 DOI: 10.1111/cas.16118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/02/2024] [Accepted: 02/07/2024] [Indexed: 02/23/2024] Open
Abstract
With significant advances in analytical technologies, research in the field of cancer immunotherapy, such as adoptive T cell therapy, cancer vaccine, and immune checkpoint blockade (ICB), is currently gaining tremendous momentum. Since the efficacy of cancer immunotherapy is recognized only by a minority of patients, more potent tumor-specific antigens (TSAs, also known as neoantigens) and predictive markers for treatment response are of great interest. In cancer immunity, immunopeptides, presented by human leukocyte antigen (HLA) class I, play a role as initiating mediators of immunogenicity. The latest advancement in the interdisciplinary multiomics approach has rapidly enlightened us about the identity of the "dark matter" of cancer and the associated immunopeptides. In this field, mass spectrometry (MS) is a viable option to select because of the naturally processed and actually presented TSA candidates in order to grasp the whole picture of the immunopeptidome. In the past few years the search space has been enlarged by the multiomics approach, the sensitivity of mass spectrometers has been improved, and deep/machine-learning-supported peptide search algorithms have taken immunopeptidomics to the next level. In this review, along with the introduction of key technical advancements in immunopeptidomics, the potential and further directions of immunopeptidomics will be reviewed from the perspective of cancer immunotherapy.
Collapse
Affiliation(s)
- Yuriko Minegishi
- Cancer Proteomics Group, Cancer Precision Medicine CenterJapanese Foundation for Cancer ResearchTokyoJapan
| | - Yoshimi Haga
- Cancer Proteomics Group, Cancer Precision Medicine CenterJapanese Foundation for Cancer ResearchTokyoJapan
| | - Koji Ueda
- Cancer Proteomics Group, Cancer Precision Medicine CenterJapanese Foundation for Cancer ResearchTokyoJapan
| |
Collapse
|
29
|
Lou R, Shui W. Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Mol Cell Proteomics 2024; 23:100712. [PMID: 38182042 PMCID: PMC10847697 DOI: 10.1016/j.mcpro.2024.100712] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/27/2023] [Accepted: 01/02/2024] [Indexed: 01/07/2024] Open
Abstract
Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. This review provides a comprehensive overview of recent advances in both the experimental and computational methods for DIA proteomics, from data acquisition schemes to analysis strategies and software tools. DIA acquisition schemes are categorized based on the design of precursor isolation windows, highlighting wide-window, overlapping-window, narrow-window, scanning quadrupole-based, and parallel accumulation-serial fragmentation-enhanced DIA methods. For DIA data analysis, major strategies are classified into spectrum reconstruction, sequence-based search, library-based search, de novo sequencing, and sequencing-independent approaches. A wide array of software tools implementing these strategies are reviewed, with details on their overall workflows and scoring approaches at different steps. The generation and optimization of spectral libraries, which are critical resources for DIA analysis, are also discussed. Publicly available benchmark datasets covering global proteomics and phosphoproteomics are summarized to facilitate performance evaluation of various software tools and analysis workflows. Continued advances and synergistic developments of versatile components in DIA workflows are expected to further enhance the power of DIA-based proteomics.
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
30
|
Lee S, Kim H. Bidirectional de novo peptide sequencing using a transformer model. PLoS Comput Biol 2024; 20:e1011892. [PMID: 38416757 PMCID: PMC10901305 DOI: 10.1371/journal.pcbi.1011892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 02/02/2024] [Indexed: 03/01/2024] Open
Abstract
In proteomics, a crucial aspect is to identify peptide sequences. De novo sequencing methods have been widely employed to identify peptide sequences, and numerous tools have been proposed over the past two decades. Recently, deep learning approaches have been introduced for de novo sequencing. Previous methods focused on encoding tandem mass spectra and predicting peptide sequences from the first amino acid onwards. However, when predicting peptides using tandem mass spectra, the peptide sequence can be predicted not only from the first amino acid but also from the last amino acid due to the coexistence of b-ion (or a- or c-ion) and y-ion (or x- or z-ion) fragments in the tandem mass spectra. Therefore, it is essential to predict peptide sequences bidirectionally. Our approach, called NovoB, utilizes a Transformer model to predict peptide sequences bidirectionally, starting with both the first and last amino acids. In comparison to Casanovo, our method achieved an improvement of the average peptide-level accuracy rate of approximately 9.8% across all species.
Collapse
Affiliation(s)
- Sangjeong Lee
- Center for Biomedical Computing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| | - Hyunwoo Kim
- Center for Biomedical Computing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea
| |
Collapse
|
31
|
Klaproth-Andrade D, Hingerl J, Bruns Y, Smith NH, Träuble J, Wilhelm M, Gagneur J. Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing. Nat Commun 2024; 15:151. [PMID: 38167372 PMCID: PMC10762064 DOI: 10.1038/s41467-023-44323-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 12/08/2023] [Indexed: 01/05/2024] Open
Abstract
Unlike for DNA and RNA, accurate and high-throughput sequencing methods for proteins are lacking, hindering the utility of proteomics in applications where the sequences are unknown including variant calling, neoepitope identification, and metaproteomics. We introduce Spectralis, a de novo peptide sequencing method for tandem mass spectrometry. Spectralis leverages several innovations including a convolutional neural network layer connecting peaks in spectra spaced by amino acid masses, proposing fragment ion series classification as a pivotal task for de novo peptide sequencing, and a peptide-spectrum confidence score. On spectra for which database search provided a ground truth, Spectralis surpassed 40% sensitivity at 90% precision, nearly doubling state-of-the-art sensitivity. Application to unidentified spectra confirmed its superiority and showcased its applicability to variant calling. Altogether, these algorithmic innovations and the substantial sensitivity increase in the high-precision range constitute an important step toward broadly applicable peptide sequencing.
Collapse
Affiliation(s)
- Daniela Klaproth-Andrade
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Munich Data Science Institute, Technical University of Munich, Garching, Germany
| | - Johannes Hingerl
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Yanik Bruns
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Nicholas H Smith
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Jakob Träuble
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Mathias Wilhelm
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
- Computational Mass Spectrometry, School of Life Sciences, Technical University of Munich, Freising, Germany.
| | - Julien Gagneur
- Computational Molecular Medicine, School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Munich Data Science Institute, Technical University of Munich, Garching, Germany.
- Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
| |
Collapse
|
32
|
Zancolli G, von Reumont BM, Anderluh G, Caliskan F, Chiusano ML, Fröhlich J, Hapeshi E, Hempel BF, Ikonomopoulou MP, Jungo F, Marchot P, de Farias TM, Modica MV, Moran Y, Nalbantsoy A, Procházka J, Tarallo A, Tonello F, Vitorino R, Zammit ML, Antunes A. Web of venom: exploration of big data resources in animal toxin research. Gigascience 2024; 13:giae054. [PMID: 39250076 PMCID: PMC11382406 DOI: 10.1093/gigascience/giae054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/01/2024] [Accepted: 07/13/2024] [Indexed: 09/10/2024] Open
Abstract
Research on animal venoms and their components spans multiple disciplines, including biology, biochemistry, bioinformatics, pharmacology, medicine, and more. Manipulating and analyzing the diverse array of data required for venom research can be challenging, and relevant tools and resources are often dispersed across different online platforms, making them less accessible to nonexperts. In this article, we address the multifaceted needs of the scientific community involved in venom and toxin-related research by identifying and discussing web resources, databases, and tools commonly used in this field. We have compiled these resources into a comprehensive table available on the VenomZone website (https://venomzone.expasy.org/10897). Furthermore, we highlight the challenges currently faced by researchers in accessing and using these resources and emphasize the importance of community-driven interdisciplinary approaches. We conclude by underscoring the significance of enhancing standards, promoting interoperability, and encouraging data and method sharing within the venom research community.
Collapse
Affiliation(s)
- Giulia Zancolli
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Björn Marcus von Reumont
- Goethe University Frankfurt, Faculty of Biological Sciences, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
| | - Gregor Anderluh
- Department of Molecular Biology and Nanobiotechnology, National Institute of Chemistry, 1000 Ljubljana, Slovenia
| | - Figen Caliskan
- Department of Biology, Faculty of Science, Eskisehir Osmangazi University, 26040 Eskişehir, Turkey
| | - Maria Luisa Chiusano
- Department of Agricultural Sciences, University Federico II of Naples, 80055 Portici, Naples, Italy
- Department of Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Naples, Italy
| | - Jacob Fröhlich
- Veterinary Center for Resistance Research (TZR), Freie Universität Berlin, 14163 Berlin, Germany
| | - Evroula Hapeshi
- Department of Health Sciences, School of Life and Health Sciences, University of Nicosia, 1700 Nicosia, Cyprus
| | - Benjamin-Florian Hempel
- Veterinary Center for Resistance Research (TZR), Freie Universität Berlin, 14163 Berlin, Germany
| | - Maria P Ikonomopoulou
- Madrid Institute of Advanced Studies in Food, Precision Nutrition & Aging Program, 28049 Madrid, Spain
| | - Florence Jungo
- SIB Swiss Institute of Bioinformatics, Swiss-Prot Group, 1211 Geneva, Switzerland
| | - Pascale Marchot
- Laboratory Architecture et Fonction des Macromolécules Biologiques, Aix-Marseille University, Centre National de la Recherche Scientifique, Faculté des Sciences, Campus Luminy, 13288 Marseille, France
| | - Tarcisio Mendes de Farias
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maria Vittoria Modica
- Department of Biology and Evolution of Marine Organisms, Stazione Zoologica Anton Dohrn, 00198 Rome, Italy
| | - Yehu Moran
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem, 9190401 Jerusalem, Israel
| | - Ayse Nalbantsoy
- Engineering Faculty, Bioengineering Department, Ege University, 35100 Bornova-Izmir, Turkey
| | - Jan Procházka
- Laboratory of Transgenic Models of Diseases, Institute of Molecular Genetics of the Czech Academy of Sciences, 252 50 Vestec, Czech Republic
| | - Andrea Tarallo
- Institute of Research on Terrestrial Ecosystems (IRET), National Research Council (CNR), 73100 Lecce, Italy
| | - Fiorella Tonello
- Neuroscience Institute, National Research Council (CNR), 35131 Padua, Italy
| | - Rui Vitorino
- Department of Medical Sciences, iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Mark Lawrence Zammit
- Department of Clinical Pharmacology & Therapeutics, Faculty of Medicine & Surgery, University of Malta, 2090 Msida, Malta
- Malta National Poisons Centre, Malta Life Sciences Park, 3000 San Ġwann, Malta
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| |
Collapse
|
33
|
Holstein T, Muth T. Bioinformatic Workflows for Metaproteomics. Methods Mol Biol 2024; 2820:187-213. [PMID: 38941024 DOI: 10.1007/978-1-0716-3910-8_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
The strong influence of microbiomes on areas such as ecology and human health has become widely recognized in the past years. Accordingly, various techniques for the investigation of the composition and function of microbial community samples have been developed. Metaproteomics, the comprehensive analysis of the proteins from microbial communities, allows for the investigation of not only the taxonomy but also the functional and quantitative composition of microbiome samples. Due to the complexity of the investigated communities, methods developed for single organism proteomics cannot be readily applied to metaproteomic samples. For this purpose, methods specifically tailored to metaproteomics are required. In this work, a detailed overview of current bioinformatic solutions and protocols in metaproteomics is given. After an introduction to the proteomic database search, the metaproteomic post-processing steps are explained in detail. Ten specific bioinformatic software solutions are focused on, covering various steps including database-driven identification and quantification as well as taxonomic and functional assignment.
Collapse
Affiliation(s)
- Tanja Holstein
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany
- VIB-UGent Center for Medical Biotechnology, VIB and Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Data Competence Center, Robert Koch Institute, Berlin, Deutschland
| | - Thilo Muth
- Section eScience (S.3), Federal Institute for Materials Research and Testing, Berlin, Germany.
- Data Competence Center, Robert Koch Institute, Berlin, Deutschland.
| |
Collapse
|
34
|
Liu K, Ye Y, Li S, Tang H. Accurate de novo peptide sequencing using fully convolutional neural networks. Nat Commun 2023; 14:7974. [PMID: 38042873 PMCID: PMC10693636 DOI: 10.1038/s41467-023-43010-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 10/29/2023] [Indexed: 12/04/2023] Open
Abstract
De novo peptide sequencing, which does not rely on a comprehensive target sequence database, provides us with a way to identify novel peptides from tandem mass spectra. However, current de novo sequencing algorithms suffer from low accuracy and coverage, which hinders their application in proteomics. In this paper, we present PepNet, a fully convolutional neural network for high accuracy de novo peptide sequencing. PepNet takes an MS/MS spectrum (represented as a high-dimensional vector) as input, and outputs the optimal peptide sequence along with its confidence score. The PepNet model is trained using a total of 3 million high-energy collisional dissociation MS/MS spectra from multiple human peptide spectral libraries. Evaluation results show that PepNet significantly outperforms current best-performing de novo sequencing algorithms (e.g. PointNovo and DeepNovo) in both peptide-level accuracy and positional-level accuracy. PepNet can sequence a large fraction of spectra that were not identified by database search engines, and thus could be used as a complementary tool to database search engines for peptide identification in proteomics. In addition, PepNet runs around 3x and 7x faster than PointNovo and DeepNovo on GPUs, respectively, thus being more suitable for the analysis of large-scale proteomics data.
Collapse
Affiliation(s)
- Kaiyuan Liu
- Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, 47408, IN, USA
| | - Yuzhen Ye
- Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, 47408, IN, USA
| | - Sujun Li
- Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, 47408, IN, USA
- Dengding BioAI Co., Ltd., Bloomington, USA
| | - Haixu Tang
- Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, 47408, IN, USA.
| |
Collapse
|
35
|
Ebrahimi S, Guo X. Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING 2023; 2023:28-35. [PMID: 38665266 PMCID: PMC11044815 DOI: 10.1109/bibe60311.2023.00013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce Casanovo-DIA, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our Casanovo-DIA model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/Casanovo-DIA.
Collapse
Affiliation(s)
- Shiva Ebrahimi
- Computer Science & Engineering University of North Texas Denton, USA
| | - Xuan Guo
- Computer Science & Engineering University of North Texas Denton, USA
| |
Collapse
|
36
|
Jiang Y, Rex DAB, Schuster D, Neely BA, Rosano GL, Volkmar N, Momenzadeh A, Peters-Clarke TM, Egbert SB, Kreimer S, Doud EH, Crook OM, Yadav AK, Vanuopadath M, Mayta ML, Duboff AG, Riley NM, Moritz RL, Meyer JG. Comprehensive Overview of Bottom-Up Proteomics using Mass Spectrometry. ARXIV 2023:arXiv:2311.07791v1. [PMID: 38013887 PMCID: PMC10680866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification. "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry. Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural alterations, absolute and relative protein quantification, post-translational modifications, and protein stability. To enable this range of different experiments, there are diverse strategies for proteome analysis. The nuances of how proteomic workflows differ may be challenging to understand for new practitioners. Here, we provide a comprehensive overview of different proteomics methods to aid the novice and experienced researcher. We cover from biochemistry basics and protein extraction to biological interpretation and orthogonal validation. We expect this work to serve as a basic resource for new practitioners in the field of shotgun or bottom-up proteomics.
Collapse
Affiliation(s)
- Yuming Jiang
- Department of Computational Biomedicine, Cedars Sinai Medical Center
| | - Devasahayam Arokia Balaya Rex
- Center for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Dina Schuster
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich 8093, Switzerland; Department of Biology, Institute of Molecular Biology and Biophysics, ETH Zurich, Zurich 8093, Switzerland; Laboratory of Biomolecular Research, Division of Biology and Chemistry, Paul Scherrer Institute, Villigen 5232, Switzerland
| | - Benjamin A. Neely
- Chemical Sciences Division, National Institute of Standards and Technology, NIST Charleston · Funded by NIST
| | - Germán L. Rosano
- Mass Spectrometry Unit, Institute of Molecular and Cellular Biology of Rosario, Rosario, Argentina · Funded by Grant PICT 2019-02971 (Agencia I+D+i)
| | - Norbert Volkmar
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich 8093, Switzerland
| | - Amanda Momenzadeh
- Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, California, USA
| | | | - Susan B. Egbert
- Department of Chemistry, University of Manitoba, Winnipeg, Cananda
| | - Simion Kreimer
- Smidt Heart Institute, Cedars Sinai Medical Center; Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center
| | - Emma H. Doud
- Center for Proteome Analysis, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Oliver M. Crook
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute · Funded by Grant BT/PR16456/BID/7/624/2016 (Department of Biotechnology, India); Grant Translational Research Program (TRP) at THSTI funded by DBT
| | - Muralidharan Vanuopadath
- School of Biotechnology, Amrita Vishwa Vidyapeetham, Kollam-690 525, Kerala, India · Funded by Department of Health Research, Indian Council of Medical Research, Government of India (File No.R.12014/31/2022-HR)
| | - Martín L. Mayta
- School of Medicine and Health Sciences, Center for Health Sciences Research, Universidad Adventista del Plata, Libertador San Martín 3103, Argentina; Molecular Biology Department, School of Pharmacy and Biochemistry, Universidad Nacional de Rosario, Rosario 2000, Argentina
| | - Anna G. Duboff
- Department of Chemistry, University of Washington · Funded by Summer Research Acceleration Fellowship, Department of Chemistry, University of Washington
| | - Nicholas M. Riley
- Department of Chemistry, University of Washington · Funded by National Institutes of Health Grant R00 GM147304
| | - Robert L. Moritz
- Institute for Systems biology, Seattle, WA, USA, 98109 · Funded by National Institutes of Health Grants R01GM087221, R24GM127667, U19AG023122, S10OD026936; National Science Foundation Award 1920268
| | - Jesse G. Meyer
- Department of Computational Biomedicine, Cedars Sinai Medical Center · Funded by National Institutes of Health Grant R21 AG074234; National Institutes of Health Grant R35 GM142502
| |
Collapse
|
37
|
Ferreira R, Amado F, Vitorino R. Empowering peptidomics: utilizing computational tools and approaches. Bioanalysis 2023; 15:1315-1325. [PMID: 37737150 DOI: 10.4155/bio-2023-0102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/23/2023] Open
Abstract
Bioinformatics plays a critical role in the advancement of peptidomics by providing powerful tools for data analysis, interpretation and integration. Peptidomics is concerned with the study of peptides, short chains of amino acids with diverse biological functions. This area includes peptide identification and characterization, database construction, de novo sequencing, functional annotation, omics data integration and systems biology. Artificial intelligence techniques, such as machine learning and natural language processing, aid in the interpretation of peptide sequence data and the generation of biological insights. By using bioinformatics approaches, peptidomics researchers can accelerate peptide discovery, understand their functions and gain insights into complex molecular interactions.
Collapse
Affiliation(s)
- Rita Ferreira
- LAQV-REQUIMTE, Department of Chemistry, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Francisco Amado
- LAQV-REQUIMTE, Department of Chemistry, University of Aveiro, 3810-193 Aveiro, Portugal
| | - Rui Vitorino
- LAQV-REQUIMTE, Department of Chemistry, University of Aveiro, 3810-193 Aveiro, Portugal
- Unidade de Investigação Cardiovascular, Departamento de Cirurgia e Fisiologia, Universidade do Porto, Porto, Portugal
- iBiMED, Department of Medical Sciences, University of Aveiro, Aveiro, Portugal
| |
Collapse
|
38
|
de Espindola JS, Ferreira Taccóla M, da Silva VSN, Dos Santos LD, Rossini BC, Mendonça BC, Pacheco MTB, Galland F. Digestion-resistant whey peptides promote antioxidant effect on Caco-2 cells. Food Res Int 2023; 173:113291. [PMID: 37803604 DOI: 10.1016/j.foodres.2023.113291] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/14/2023] [Accepted: 07/16/2023] [Indexed: 10/08/2023]
Abstract
Enteric endothelial cells are the first structure to come in contact with digested food and may suffer oxidative damage by innumerous exogenous factors. Although peptides derived from whey digestion have presented antioxidant potential, little is known regarding antioxidant pathways activation in Caco-2 cell line model. Hence, we evaluated the ability to form whey peptides resistant to simulated gastrointestinal digestive processes, with potential antioxidant activity on gastrointestinal cells and associated with sequence structure and activity. Using the INFOGEST method of simulated static digestion, we achieved 35.2% proteolysis, with formation of peptides of low molecular mass (<600 Da) evaluated by FPLC. The digestion-resistant peptides showed a high proportion of hydrophobic and acidic amino acids, but with average surface hydrophobicity. We identified 24 peptide sequences, mainly originated from β-lactoglobulin, that exhibit various bioactivities. Structurally, the sequenced peptides predominantly contained the amino acids lysine and valine in the N-terminal region, and tyrosine in the C-terminal region, which are known to exhibit antioxidant properties. The antioxidant activity of the peptide digests was on average twice as potent as that of the protein isolates for the same concentration, as evaluated by ABTS, DPPH and ORAC. Evaluation of biological activity in Caco-2 intestinal cells, stimulated with hydrogen peroxide, showed that they attenuated the production of reactive oxygen species and prevented GSH reduction and SOD activity increase. Caco-2 cells were not responsive to nitric oxide secretion. This study suggests that whey peptides formed during gastric digestion exhibit biological antioxidant activity, without the need for previously hydrolysis with exogenous enzymes for supplement application. The study's primary contribution was demonstrating the antioxidant activity of whey peptides in maintaining the gastrointestinal epithelial cells, potentially preventing oxidative stress that affects the digestive system.
Collapse
Affiliation(s)
- Juliana Santos de Espindola
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brasil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil.
| | - Milena Ferreira Taccóla
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brasil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil.
| | - Vera Sônia Nunes da Silva
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brasil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil.
| | | | - Bruno Cesar Rossini
- Institute of Biotechnology, São Paulo State University (UNESP), Botucatu, SP 18607-440, Brazil.
| | - Bruna Cavecci Mendonça
- Institute of Biotechnology, São Paulo State University (UNESP), Botucatu, SP 18607-440, Brazil.
| | - Maria Teresa Bertoldo Pacheco
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brasil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil.
| | - Fabiana Galland
- Quality and Science Center of Food, Institute of Food Technology (ITAL), Brasil Ave. 2880, P.O. Box 139, Campinas, SP 13070-178, Brazil.
| |
Collapse
|
39
|
Samgina TY, Vasileva ID, Trebše P, Torkar G, Surin AK, Meng Z, Zubarev RA, Lebedev AT. Tandem Mass Spectrometry de novo Sequencing of the Skin Defense Peptides of the Central Slovenian Agile Frog Rana dalmatina. Molecules 2023; 28:7118. [PMID: 37894596 PMCID: PMC10608968 DOI: 10.3390/molecules28207118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 10/11/2023] [Accepted: 10/13/2023] [Indexed: 10/29/2023] Open
Abstract
Peptides released on frogs' skin in a stress situation represent their only weapon against micro-organisms and predators. Every species and even population of frog possesses its own peptidome being appropriate for their habitat. Skin peptides are considered potential pharmaceuticals, while the whole peptidome may be treated as a taxonomic characteristic of each particular population. Continuing the studies on frog peptides, here we report the peptidome composition of the Central Slovenian agile frog Rana dalmatina population. The detection and top-down de novo sequencing of the corresponding peptides was conducted exclusively by tandem mass spectrometry without using any chemical derivatization procedures. Collision-induced dissociation (CID), higher energy collision-induced dissociation (HCD), electron transfer dissociation (ETD) and combined MS3 method EThcD with stepwise increase of HCD energy were used for that purpose. MS/MS revealed the whole sequence of the detected peptides including differentiation between isomeric Leu/Ile, and the sequence portion hidden in the disulfide cycle. The array of the discovered peptide families (brevinins 1 and 2, melittin-related peptides (MRPs), temporins and bradykinin-related peptides (BRPs)) is quite similar to that of R. temporaria. Since the genome of this frog remains unknown, the obtained results were compared with the recently published transcriptome of R. dalmatina.
Collapse
Affiliation(s)
- Tatiana Yu. Samgina
- Department of Materials Science, MSU-BIT University, Shenzhen 517182, China
- Department of Organic Chemistry, Lomonosov Moscow State University, 119991 Moscow, Russia;
| | - Irina D. Vasileva
- Department of Organic Chemistry, Lomonosov Moscow State University, 119991 Moscow, Russia;
| | - Polonca Trebše
- Faculty of Health Sciences, University of Ljubljana Zdravstvena Pot 5, 1000 Ljubljana, Slovenia;
| | - Gregor Torkar
- Department for Biology, Chemistry and Home Economics, University of Ljubljana Faculty of Education, Kardeljeva Ploščad 16, 1000 Ljubljana, Slovenia;
| | - Alexey K. Surin
- Pushchino Branch, Shemyakin–Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Prospekt Nauki 6, Pushchino, 142290 Moscow, Russia;
| | - Zhaowei Meng
- Department of Medicinal Biochemistry and Biophysics, Division of Molecular Biometry, Karolinska Institutet, SE-171 77 Stockholm, Sweden; (Z.M.); (R.A.Z.)
| | - Roman A. Zubarev
- Department of Medicinal Biochemistry and Biophysics, Division of Molecular Biometry, Karolinska Institutet, SE-171 77 Stockholm, Sweden; (Z.M.); (R.A.Z.)
- The National Medical Research Center for Endocrinology, 115478 Moscow, Russia
- Department of Pharmacological & Technological Chemistry, I.M. Sechenov First Moscow State Medical University, 119146 Moscow, Russia
| | - Albert T. Lebedev
- Department of Materials Science, MSU-BIT University, Shenzhen 517182, China
- Department of Organic Chemistry, Lomonosov Moscow State University, 119991 Moscow, Russia;
| |
Collapse
|
40
|
Ng CCA, Zhou Y, Yao ZP. Algorithms for de-novo sequencing of peptides by tandem mass spectrometry: A review. Anal Chim Acta 2023; 1268:341330. [PMID: 37268337 DOI: 10.1016/j.aca.2023.341330] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 06/04/2023]
Abstract
Peptide sequencing is of great significance to fundamental and applied research in the fields such as chemical, biological, medicinal and pharmaceutical sciences. With the rapid development of mass spectrometry and sequencing algorithms, de-novo peptide sequencing using tandem mass spectrometry (MS/MS) has become the main method for determining amino acid sequences of novel and unknown peptides. Advanced algorithms allow the amino acid sequence information to be accurately obtained from MS/MS spectra in short time. In this review, algorithms from exhaustive search to the state-of-art machine learning and neural network for high-throughput and automated de-novo sequencing are introduced and compared. Impacts of datasets on algorithm performance are highlighted. The current limitations and promising direction of de-novo peptide sequencing are also discussed in this review.
Collapse
Affiliation(s)
- Cheuk Chi A Ng
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Yin Zhou
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Zhong-Ping Yao
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China.
| |
Collapse
|
41
|
Oreper D, Klaeger S, Jhunjhunwala S, Delamarre L. The peptide woods are lovely, dark and deep: Hunting for novel cancer antigens. Semin Immunol 2023; 67:101758. [PMID: 37027981 DOI: 10.1016/j.smim.2023.101758] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/22/2023] [Accepted: 03/22/2023] [Indexed: 04/08/2023]
Abstract
Harnessing the patient's immune system to control a tumor is a proven avenue for cancer therapy. T cell therapies as well as therapeutic vaccines, which target specific antigens of interest, are being explored as treatments in conjunction with immune checkpoint blockade. For these therapies, selecting the best suited antigens is crucial. Most of the focus has thus far been on neoantigens that arise from tumor-specific somatic mutations. Although there is clear evidence that T-cell responses against mutated neoantigens are protective, the large majority of these mutations are not immunogenic. In addition, most somatic mutations are unique to each individual patient and their targeting requires the development of individualized approaches. Therefore, novel antigen types are needed to broaden the scope of such treatments. We review high throughput approaches for discovering novel tumor antigens and some of the key challenges associated with their detection, and discuss considerations when selecting tumor antigens to target in the clinic.
Collapse
Affiliation(s)
- Daniel Oreper
- Genentech, 1 DNA way, South San Francisco, 94080 CA, USA.
| | - Susan Klaeger
- Genentech, 1 DNA way, South San Francisco, 94080 CA, USA.
| | | | | |
Collapse
|
42
|
DiagnoMass: A proteomics hub for pinpointing discriminative spectral clusters. J Proteomics 2023; 277:104853. [PMID: 36804625 DOI: 10.1016/j.jprot.2023.104853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/14/2023] [Indexed: 02/18/2023]
Abstract
MOTIVATION There are several well-established paradigms for identifying and pinpointing discriminative peptides/proteins using shotgun proteomic data; examples are peptide-spectrum matching, de novo sequencing, open searches, and even hybrid approaches. Such an arsenal of complementary paradigms can provide deep data coverage, albeit some unidentified discriminative peptides remain. RESULTS We present DiagnoMass, software tool that groups similar spectra into spectral clusters and then shortlists those clusters that are discriminative for biological conditions. DiagnoMass then communicates with proteomic tools to attempt the identification of such clusters. We demonstrate the effectiveness of DiagnoMass by analyzing proteomic data from Escherichia coli, Salmonella, and Shigella, listing many high-quality discriminative spectral clusters that had thus far remained unidentified by widely adopted proteomic tools. DiagnoMass can also classify proteomic profiles. We anticipate the use of DiagnoMass as a vital tool for pinpointing biomarkers. AVAILABILITY DiagnoMass and related documentation, including a usage protocol, are available at http://www.diagnomass.com.
Collapse
|
43
|
Phetsanthad A, Vu NQ, Yu Q, Buchberger AR, Chen Z, Keller C, Li L. Recent advances in mass spectrometry analysis of neuropeptides. MASS SPECTROMETRY REVIEWS 2023; 42:706-750. [PMID: 34558119 PMCID: PMC9067165 DOI: 10.1002/mas.21734] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 08/22/2021] [Accepted: 08/28/2021] [Indexed: 05/08/2023]
Abstract
Due to their involvement in numerous biochemical pathways, neuropeptides have been the focus of many recent research studies. Unfortunately, classic analytical methods, such as western blots and enzyme-linked immunosorbent assays, are extremely limited in terms of global investigations, leading researchers to search for more advanced techniques capable of probing the entire neuropeptidome of an organism. With recent technological advances, mass spectrometry (MS) has provided methodology to gain global knowledge of a neuropeptidome on a spatial, temporal, and quantitative level. This review will cover key considerations for the analysis of neuropeptides by MS, including sample preparation strategies, instrumental advances for identification, structural characterization, and imaging; insightful functional studies; and newly developed absolute and relative quantitation strategies. While many discoveries have been made with MS, the methodology is still in its infancy. Many of the current challenges and areas that need development will also be highlighted in this review.
Collapse
Affiliation(s)
- Ashley Phetsanthad
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Nhu Q. Vu
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Qing Yu
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705, USA
| | - Amanda R. Buchberger
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Zhengwei Chen
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Caitlin Keller
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, WI 53706, USA
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, WI 53705, USA
| |
Collapse
|
44
|
Deutsch EW, Mendoza L, Shteynberg DD, Hoopmann MR, Sun Z, Eng JK, Moritz RL. Trans-Proteomic Pipeline: Robust Mass Spectrometry-Based Proteomics Data Analysis Suite. J Proteome Res 2023; 22:615-624. [PMID: 36648445 PMCID: PMC10166710 DOI: 10.1021/acs.jproteome.2c00624] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The Trans-Proteomic Pipeline (TPP) mass spectrometry data analysis suite has been in continual development and refinement since its first tools, PeptideProphet and ProteinProphet, were published 20 years ago. The current release provides a large complement of tools for spectrum processing, spectrum searching, search validation, abundance computation, protein inference, and more. Many of the tools include machine-learning modeling to extract the most information from data sets and build robust statistical models to compute the probabilities that derived information is correct. Here we present the latest information on the many TPP tools, and how TPP can be deployed on various platforms from personal Windows laptops to Linux clusters and expansive cloud computing environments. We describe tutorials on how to use TPP in a variety of ways and describe synergistic projects that leverage TPP. We conclude with plans for continued development of TPP.
Collapse
Affiliation(s)
- Eric W Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Luis Mendoza
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | | | | | - Zhi Sun
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Jimmy K Eng
- Proteomics Resource, University of Washington, Seattle, Washington 98195, United States
| | - Robert L Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
45
|
Beslic D, Tscheuschner G, Renard BY, Weller MG, Muth T. Comprehensive evaluation of peptide de novo sequencing tools for monoclonal antibody assembly. Brief Bioinform 2023; 24:bbac542. [PMID: 36545804 PMCID: PMC9851299 DOI: 10.1093/bib/bbac542] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/25/2022] [Accepted: 11/10/2022] [Indexed: 12/24/2022] Open
Abstract
Monoclonal antibodies are biotechnologically produced proteins with various applications in research, therapeutics and diagnostics. Their ability to recognize and bind to specific molecule structures makes them essential research tools and therapeutic agents. Sequence information of antibodies is helpful for understanding antibody-antigen interactions and ensuring their affinity and specificity. De novo protein sequencing based on mass spectrometry is a valuable method to obtain the amino acid sequence of peptides and proteins without a priori knowledge. In this study, we evaluated six recently developed de novo peptide sequencing algorithms (Novor, pNovo 3, DeepNovo, SMSNet, PointNovo and Casanovo), which were not specifically designed for antibody data. We validated their ability to identify and assemble antibody sequences on three multi-enzymatic data sets. The deep learning-based tools Casanovo and PointNovo showed an increased peptide recall across different enzymes and data sets compared with spectrum-graph-based approaches. We evaluated different error types of de novo peptide sequencing tools and their performance for different numbers of missing cleavage sites, noisy spectra and peptides of various lengths. We achieved a sequence coverage of 97.69-99.53% on the light chains of three different antibody data sets using the de Bruijn assembler ALPS and the predictions from Casanovo. However, low sequence coverage and accuracy on the heavy chains demonstrate that complete de novo protein sequencing remains a challenging issue in proteomics that requires improved de novo error correction, alternative digestion strategies and hybrid approaches such as homology search to achieve high accuracy on long protein sequences.
Collapse
Affiliation(s)
- Denis Beslic
- Robert Koch Institute, MF1, Nordufer 20, 13353 Berlin
| | - Georg Tscheuschner
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| | - Bernhard Y Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam
| | - Michael G Weller
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| | - Thilo Muth
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| |
Collapse
|
46
|
McDonnell K, Howley E, Abram F. Critical evaluation of the use of artificial data for machine learning based de novo peptide identification. Comput Struct Biotechnol J 2023; 21:2732-2743. [PMID: 37168871 PMCID: PMC10165132 DOI: 10.1016/j.csbj.2023.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/16/2023] [Accepted: 04/16/2023] [Indexed: 05/13/2023] Open
Abstract
Proteins are essential components of all living cells and so the study of their in situ expression, proteomics, has wide reaching applications. Peptide identification in proteomics typically relies on matching high resolution tandem mass spectra to a protein database but can also be performed de novo. While artificial spectra have been successfully incorporated into database search pipelines to increase peptide identification rates, little work has been done to investigate the utility of artificial spectra in the context of de novo peptide identification. Here, we perform a critical analysis of the use of artificial data for the training and evaluation of de novo peptide identification algorithms. First, we classify the different fragment ion types present in real spectra and then estimate the number of spurious matches using random peptides. We then categorise the different types of noise present in real spectra. Finally, we transfer this knowledge to artificial data and test the performance of a state-of-the-art de novo peptide identification algorithm trained using artificial spectra with and without relevant noise addition. Noise supplementation increased artificial training data performance from 30% to 77% of real training data peptide recall. While real data performance was not fully replicated, this work provides the first steps towards an artificial spectrum framework for the training and evaluation of de novo peptide identification algorithms. Further enhanced artificial spectra may allow for more in depth analysis of de novo algorithms as well as alleviating the reliance on database searches for training data.
Collapse
Affiliation(s)
- Kevin McDonnell
- Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland
- School of Computer Science, University of Galway, Ireland
- Corresponding author at: Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland.
| | - Enda Howley
- School of Computer Science, University of Galway, Ireland
| | - Florence Abram
- Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland
- Corresponding author.
| |
Collapse
|
47
|
Álvarez-Urdiola R, Borràs E, Valverde F, Matus JT, Sabidó E, Riechmann JL. Peptidomics Methods Applied to the Study of Flower Development. Methods Mol Biol 2023; 2686:509-536. [PMID: 37540375 DOI: 10.1007/978-1-0716-3299-4_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Understanding the global and dynamic nature of plant developmental processes requires not only the study of the transcriptome, but also of the proteome, including its largely uncharacterized peptidome fraction. Recent advances in proteomics and high-throughput analyses of translating RNAs (ribosome profiling) have begun to address this issue, evidencing the existence of novel, uncharacterized, and possibly functional peptides. To validate the accumulation in tissues of sORF-encoded polypeptides (SEPs), the basic setup of proteomic analyses (i.e., LC-MS/MS) can be followed. However, the detection of peptides that are small (up to ~100 aa, 6-7 kDa) and novel (i.e., not annotated in reference databases) presents specific challenges that need to be addressed both experimentally and with computational biology resources. Several methods have been developed in recent years to isolate and identify peptides from plant tissues. In this chapter, we outline two different peptide extraction protocols and the subsequent peptide identification by mass spectrometry using the database search or the de novo identification methods.
Collapse
Affiliation(s)
- Raquel Álvarez-Urdiola
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Eva Borràs
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Federico Valverde
- Institute for Plant Biochemistry and Photosynthesis CSIC - University of Seville, Seville, Spain
| | - José Tomás Matus
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
- Institute for Integrative Systems Biology (I2SysBio), Universitat de València-CSIC, Paterna, Valencia, Spain
| | - Eduard Sabidó
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - José Luis Riechmann
- Centre for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Edifici CRAG, Campus UAB, Cerdanyola del Vallès, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
48
|
McDonnell K, Abram F, Howley E. Application of a Novel Hybrid CNN-GNN for Peptide Ion Encoding. J Proteome Res 2022; 22:323-333. [PMID: 36534699 PMCID: PMC9903319 DOI: 10.1021/acs.jproteome.2c00234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Almost all state-of-the-art de novo peptide sequencing algorithms now use machine learning models to encode fragment peaks and hence identify amino acids in mass spectrometry (MS) spectra. Previous work has highlighted how the inherent MS challenges of noise and missing peptide peaks detrimentally affect the performance of these models. In the present research we extracted and evaluated the encoding modules from 3 state-of-the-art de novo peptide sequencing algorithms. We also propose a convolutional neural network-graph neural network machine learning model for encoding peptide ions in tandem MS spectra. We compared the proposed encoding module to those used in the state-of-the-art de novo peptide sequencing algorithms by assessing their ability to identify b-ions and y-ions in MS spectra. This included a comprehensive evaluation in both real and artificial data across various levels of noise and missing peptide peaks. The proposed model performed best across all data sets using two different metrics (area under the receiver operating characteristic curve (AUC) and average precision). The work also highlighted the effect of including additional features such as intensity rank in these encoding modules as well as issues with using the AUC as a metric. This work is of significance to those designing future de novo peptide identification algorithms as it is the first step toward a new approach.
Collapse
Affiliation(s)
- Kevin McDonnell
- Department
of Information Technology, School of Computer Science, University of Galway, GalwayH91 TK33, Ireland,Functional
Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, GalwayH91 TK33, Ireland,E-mail:
| | - Florence Abram
- Functional
Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, GalwayH91 TK33, Ireland
| | - Enda Howley
- Department
of Information Technology, School of Computer Science, University of Galway, GalwayH91 TK33, Ireland
| |
Collapse
|
49
|
Bonacina F, Moregola A, Svecla M, Coe D, Uboldi P, Fraire S, Beretta S, Beretta G, Pellegatta F, Catapano AL, Marelli-Berg FM, Norata GD. The low-density lipoprotein receptor-mTORC1 axis coordinates CD8+ T cell activation. J Cell Biol 2022; 221:213488. [PMID: 36129440 PMCID: PMC9499829 DOI: 10.1083/jcb.202202011] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 06/10/2022] [Accepted: 08/08/2022] [Indexed: 11/22/2022] Open
Abstract
Activation of T cells relies on the availability of intracellular cholesterol for an effective response after stimulation. We investigated the contribution of cholesterol derived from extracellular uptake by the low-density lipoprotein (LDL) receptor in the immunometabolic response of T cells. By combining proteomics, gene expression profiling, and immunophenotyping, we described a unique role for cholesterol provided by the LDLR pathway in CD8+ T cell activation. mRNA and protein expression of LDLR was significantly increased in activated CD8+ compared to CD4+ WT T cells, and this resulted in a significant reduction of proliferation and cytokine production (IFNγ, Granzyme B, and Perforin) of CD8+ but not CD4+ T cells from Ldlr -/- mice after in vitro and in vivo stimulation. This effect was the consequence of altered cholesterol routing to the lysosome resulting in a lower mTORC1 activation. Similarly, CD8+ T cells from humans affected by familial hypercholesterolemia (FH) carrying a mutation on the LDLR gene showed reduced activation after an immune challenge.
Collapse
Affiliation(s)
- Fabrizia Bonacina
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Annalisa Moregola
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Monika Svecla
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - David Coe
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK
- Centre for Inflammation and Therapeutic Innovation, Queen Mary University of London, Charterhouse Square, London, UK
| | - Patrizia Uboldi
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Sara Fraire
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Simona Beretta
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
| | - Giangiacomo Beretta
- Department of Environmental Science and Policy, Università degli Studi di Milano, Milan, Italy
| | - Fabio Pellegatta
- Istituti di Ricovero e Cura a Carattere Scientifico Multimedica, Milan, Italy
| | - Alberico Luigi Catapano
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
- Istituti di Ricovero e Cura a Carattere Scientifico Multimedica, Milan, Italy
| | - Federica M Marelli-Berg
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK
- Centre for Inflammation and Therapeutic Innovation, Queen Mary University of London, Charterhouse Square, London, UK
| | - Giuseppe Danilo Norata
- Department of Excellence of Pharmacological and Biomolecular Sciences, Università degli Studi di Milano, Milan, Italy
- Centro SISA per lo Studio dell'Aterosclerosi, Ospedale Bassini, Cinisello Balsamo, Italy
| |
Collapse
|
50
|
Caira S, Picariello G, Renzone G, Arena S, Troise AD, De Pascale S, Ciaravolo V, Pinto G, Addeo F, Scaloni A. Recent developments in peptidomics for the quali-quantitative analysis of food-derived peptides in human body fluids and tissues. Trends Food Sci Technol 2022. [DOI: 10.1016/j.tifs.2022.06.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|