1
|
Lundgren T, Clark PL, Champion MM. Fit for Purpose Approach To Evaluate Detection of Amino Acid Substitutions in Shotgun Proteomics. J Proteome Res 2024; 23:1263-1271. [PMID: 38478054 PMCID: PMC11003417 DOI: 10.1021/acs.jproteome.3c00730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/04/2024] [Accepted: 02/27/2024] [Indexed: 04/06/2024]
Abstract
Amino acid substitutions (AASs) alter proteins from their genome-expected sequences. Accumulation of substitutions in proteins underlies numerous diseases and antibiotic mechanisms. Accurate global detection of AASs and their frequencies is crucial for understanding these mechanisms. Shotgun proteomics provides an untargeted method for measuring AASs but introduces biases when extrapolating from the genome to identify AASs. To characterize these biases, we created a "ground-truth" approach using the similarities betweenEscherichia coli and Salmonella typhimurium to model the complexity of AAS detection. Shotgun proteomics on mixed lysates generated libraries representing ∼100,000 peptide-spectra and 4161 peptide sequences with a single AAS and defined stoichiometry. Identifying S. typhimurium peptide-spectra with only the E. coli genome resulted in 64.1% correctly identified library peptides. Specific AASs exhibit variable identification efficiencies. There was no inherent bias from the stoichiometry of the substitutions. Short peptides and AASs localized near peptide termini had poor identification efficiency. We identify a new class of "scissor substitutions" that gain or lose protease cleavage sites. Scissor substitutions also had poor identification efficiency. This ground-truth AAS library reveals various sources of bias, which will guide the application of shotgun proteomics to validate AAS hypotheses.
Collapse
Affiliation(s)
- Taylor
J. Lundgren
- Department
of Chemistry and Biochemistry, University
of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Patricia L. Clark
- Department
of Chemistry and Biochemistry, University
of Notre Dame, Notre Dame, Indiana 46556, United States
- Department
of Chemical and Biomolecular Engineering, University of Notre Dame, Notre
Dame, Indiana 46556, United States
| | - Matthew M. Champion
- Department
of Chemistry and Biochemistry, University
of Notre Dame, Notre Dame, Indiana 46556, United States
| |
Collapse
|
2
|
Adams C, Laukens K, Bittremieux W, Boonen K. Machine learning-based peptide-spectrum match rescoring opens up the immunopeptidome. Proteomics 2024; 24:e2300336. [PMID: 38009585 DOI: 10.1002/pmic.202300336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 11/29/2023]
Abstract
Immunopeptidomics is a key technology in the discovery of targets for immunotherapy and vaccine development. However, identifying immunopeptides remains challenging due to their non-tryptic nature, which results in distinct spectral characteristics. Moreover, the absence of strict digestion rules leads to extensive search spaces, further amplified by the incorporation of somatic mutations, pathogen genomes, unannotated open reading frames, and post-translational modifications. This inflation in search space leads to an increase in random high-scoring matches, resulting in fewer identifications at a given false discovery rate. Peptide-spectrum match rescoring has emerged as a machine learning-based solution to address challenges in mass spectrometry-based immunopeptidomics data analysis. It involves post-processing unfiltered spectrum annotations to better distinguish between correct and incorrect peptide-spectrum matches. Recently, features based on predicted peptidoform properties, including fragment ion intensities, retention time, and collisional cross section, have been used to improve the accuracy and sensitivity of immunopeptide identification. In this review, we describe the diverse bioinformatics pipelines that are currently available for peptide-spectrum match rescoring and discuss how they can be used for the analysis of immunopeptidomics data. Finally, we provide insights into current and future machine learning solutions to boost immunopeptide identification.
Collapse
Affiliation(s)
- Charlotte Adams
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wout Bittremieux
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- ImmuneSpec BV, Niel, Belgium
| |
Collapse
|
3
|
Kelliher JM, Robinson AJ, Longley R, Johnson LYD, Hanson BT, Morales DP, Cailleau G, Junier P, Bonito G, Chain PSG. The endohyphal microbiome: current progress and challenges for scaling down integrative multi-omic microbiome research. MICROBIOME 2023; 11:192. [PMID: 37626434 PMCID: PMC10463477 DOI: 10.1186/s40168-023-01634-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 07/29/2023] [Indexed: 08/27/2023]
Abstract
As microbiome research has progressed, it has become clear that most, if not all, eukaryotic organisms are hosts to microbiomes composed of prokaryotes, other eukaryotes, and viruses. Fungi have only recently been considered holobionts with their own microbiomes, as filamentous fungi have been found to harbor bacteria (including cyanobacteria), mycoviruses, other fungi, and whole algal cells within their hyphae. Constituents of this complex endohyphal microbiome have been interrogated using multi-omic approaches. However, a lack of tools, techniques, and standardization for integrative multi-omics for small-scale microbiomes (e.g., intracellular microbiomes) has limited progress towards investigating and understanding the total diversity of the endohyphal microbiome and its functional impacts on fungal hosts. Understanding microbiome impacts on fungal hosts will advance explorations of how "microbiomes within microbiomes" affect broader microbial community dynamics and ecological functions. Progress to date as well as ongoing challenges of performing integrative multi-omics on the endohyphal microbiome is discussed herein. Addressing the challenges associated with the sample extraction, sample preparation, multi-omic data generation, and multi-omic data analysis and integration will help advance current knowledge of the endohyphal microbiome and provide a road map for shrinking microbiome investigations to smaller scales. Video Abstract.
Collapse
Affiliation(s)
| | | | - Reid Longley
- Los Alamos National Laboratory, Los Alamos, NM, USA
| | | | | | | | | | | | | | | |
Collapse
|
4
|
Ng CCA, Zhou Y, Yao ZP. Algorithms for de-novo sequencing of peptides by tandem mass spectrometry: A review. Anal Chim Acta 2023; 1268:341330. [PMID: 37268337 DOI: 10.1016/j.aca.2023.341330] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 06/04/2023]
Abstract
Peptide sequencing is of great significance to fundamental and applied research in the fields such as chemical, biological, medicinal and pharmaceutical sciences. With the rapid development of mass spectrometry and sequencing algorithms, de-novo peptide sequencing using tandem mass spectrometry (MS/MS) has become the main method for determining amino acid sequences of novel and unknown peptides. Advanced algorithms allow the amino acid sequence information to be accurately obtained from MS/MS spectra in short time. In this review, algorithms from exhaustive search to the state-of-art machine learning and neural network for high-throughput and automated de-novo sequencing are introduced and compared. Impacts of datasets on algorithm performance are highlighted. The current limitations and promising direction of de-novo peptide sequencing are also discussed in this review.
Collapse
Affiliation(s)
- Cheuk Chi A Ng
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Yin Zhou
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China
| | - Zhong-Ping Yao
- State Key Laboratory of Chemical Biology and Drug Discovery, and Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; Research Institute for Future Food, and Research Center for Chinese Medicine Innovation, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Special Administrative Region of China; State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), and Shenzhen Key Laboratory of Food Biological Safety Control, The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, 518057, China.
| |
Collapse
|
5
|
Sun B, Liu Z, Liu J, Zhao S, Wang L, Wang F. The utility of proteases in proteomics, from sequence profiling to structure and function analysis. Proteomics 2023; 23:e2200132. [PMID: 36382392 DOI: 10.1002/pmic.202200132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 11/08/2022] [Accepted: 11/08/2022] [Indexed: 11/18/2022]
Abstract
In mass spectrometry (MS)-based bottom-up proteomics, protease digestion plays an essential role in profiling both proteome sequences and post-translational modifications (PTMs). Trypsin is the gold standard in digesting intact proteins into small-size peptides, which are more suitable for high-performance liquid chromatography (HPLC) separation and tandem MS (MS/MS) characterization. However, protein sequences lacking Lys and Arg cannot be cleaved by trypsin and may be missed in conventional proteomic analysis. Proteases with cleavage sites complementary to trypsin are widely applied in proteomic analysis to greatly improve the coverage of proteome sequences and PTM sites. In this review, we survey the common and newly emerging proteases used in proteomics analysis mainly in the last 5 years, focusing on their unique cleavage features and specific proteomics applications such as missing protein characterization, new PTM discovery, and de novo sequencing. In addition, we summarize the applications of proteases in structural proteomics and protein function analysis in recent years. Finally, we discuss the future development directions of new proteases and applications in proteomics.
Collapse
Affiliation(s)
- Binwen Sun
- Engineering Research Center for New Materials and Precision Treatment Technology of Malignant Tumors Therapy, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 463 Zhongshan Road, Dalian, 116023, China
- Engineering Technology Research Center for Translational Medicine, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
| | - Zheyi Liu
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 463 Zhongshan Road, Dalian, 116023, China
| | - Jin Liu
- Engineering Research Center for New Materials and Precision Treatment Technology of Malignant Tumors Therapy, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- Engineering Technology Research Center for Translational Medicine, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- Division of Hepatobiliary and Pancreatic Surgery, Department of General Surgery, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
| | - Shan Zhao
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 463 Zhongshan Road, Dalian, 116023, China
| | - Liming Wang
- Engineering Research Center for New Materials and Precision Treatment Technology of Malignant Tumors Therapy, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- Engineering Technology Research Center for Translational Medicine, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- Division of Hepatobiliary and Pancreatic Surgery, Department of General Surgery, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
| | - Fangjun Wang
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 463 Zhongshan Road, Dalian, 116023, China
- University of Chinese Academy of Sciences, 19 Yuquan Road, Beijing, 100049, China
| |
Collapse
|
6
|
Beslic D, Tscheuschner G, Renard BY, Weller MG, Muth T. Comprehensive evaluation of peptide de novo sequencing tools for monoclonal antibody assembly. Brief Bioinform 2022; 24:6955273. [PMID: 36545804 PMCID: PMC9851299 DOI: 10.1093/bib/bbac542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/25/2022] [Accepted: 11/10/2022] [Indexed: 12/24/2022] Open
Abstract
Monoclonal antibodies are biotechnologically produced proteins with various applications in research, therapeutics and diagnostics. Their ability to recognize and bind to specific molecule structures makes them essential research tools and therapeutic agents. Sequence information of antibodies is helpful for understanding antibody-antigen interactions and ensuring their affinity and specificity. De novo protein sequencing based on mass spectrometry is a valuable method to obtain the amino acid sequence of peptides and proteins without a priori knowledge. In this study, we evaluated six recently developed de novo peptide sequencing algorithms (Novor, pNovo 3, DeepNovo, SMSNet, PointNovo and Casanovo), which were not specifically designed for antibody data. We validated their ability to identify and assemble antibody sequences on three multi-enzymatic data sets. The deep learning-based tools Casanovo and PointNovo showed an increased peptide recall across different enzymes and data sets compared with spectrum-graph-based approaches. We evaluated different error types of de novo peptide sequencing tools and their performance for different numbers of missing cleavage sites, noisy spectra and peptides of various lengths. We achieved a sequence coverage of 97.69-99.53% on the light chains of three different antibody data sets using the de Bruijn assembler ALPS and the predictions from Casanovo. However, low sequence coverage and accuracy on the heavy chains demonstrate that complete de novo protein sequencing remains a challenging issue in proteomics that requires improved de novo error correction, alternative digestion strategies and hybrid approaches such as homology search to achieve high accuracy on long protein sequences.
Collapse
Affiliation(s)
- Denis Beslic
- Corresponding authors: D. Beslic, Robert Koch Institute, ZKI-PH 3, Nordufer 20, 13353 Berlin, Germany. E-mail: ; G. Tscheuschner, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; B.Y. Renard, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany. E-mail: ; M.G. Weller, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; T. Muth, Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, 12205 Berlin, Germany. E-mail:
| | - Georg Tscheuschner
- Corresponding authors: D. Beslic, Robert Koch Institute, ZKI-PH 3, Nordufer 20, 13353 Berlin, Germany. E-mail: ; G. Tscheuschner, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; B.Y. Renard, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany. E-mail: ; M.G. Weller, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; T. Muth, Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, 12205 Berlin, Germany. E-mail:
| | - Bernhard Y Renard
- Corresponding authors: D. Beslic, Robert Koch Institute, ZKI-PH 3, Nordufer 20, 13353 Berlin, Germany. E-mail: ; G. Tscheuschner, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; B.Y. Renard, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany. E-mail: ; M.G. Weller, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; T. Muth, Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, 12205 Berlin, Germany. E-mail:
| | - Michael G Weller
- Corresponding authors: D. Beslic, Robert Koch Institute, ZKI-PH 3, Nordufer 20, 13353 Berlin, Germany. E-mail: ; G. Tscheuschner, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; B.Y. Renard, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany. E-mail: ; M.G. Weller, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; T. Muth, Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, 12205 Berlin, Germany. E-mail:
| | - Thilo Muth
- Corresponding authors: D. Beslic, Robert Koch Institute, ZKI-PH 3, Nordufer 20, 13353 Berlin, Germany. E-mail: ; G. Tscheuschner, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; B.Y. Renard, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam, Germany. E-mail: ; M.G. Weller, Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin, Germany. E-mail: ; T. Muth, Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, 12205 Berlin, Germany. E-mail:
| |
Collapse
|
7
|
Abstract
Paleoproteomics, the study of ancient proteins, is a rapidly growing field at the intersection of molecular biology, paleontology, archaeology, paleoecology, and history. Paleoproteomics research leverages the longevity and diversity of proteins to explore fundamental questions about the past. While its origins predate the characterization of DNA, it was only with the advent of soft ionization mass spectrometry that the study of ancient proteins became truly feasible. Technological gains over the past 20 years have allowed increasing opportunities to better understand preservation, degradation, and recovery of the rich bioarchive of ancient proteins found in the archaeological and paleontological records. Growing from a handful of studies in the 1990s on individual highly abundant ancient proteins, paleoproteomics today is an expanding field with diverse applications ranging from the taxonomic identification of highly fragmented bones and shells and the phylogenetic resolution of extinct species to the exploration of past cuisines from dental calculus and pottery food crusts and the characterization of past diseases. More broadly, these studies have opened new doors in understanding past human-animal interactions, the reconstruction of past environments and environmental changes, the expansion of the hominin fossil record through large scale screening of nondiagnostic bone fragments, and the phylogenetic resolution of the vertebrate fossil record. Even with these advances, much of the ancient proteomic record still remains unexplored. Here we provide an overview of the history of the field, a summary of the major methods and applications currently in use, and a critical evaluation of current challenges. We conclude by looking to the future, for which innovative solutions and emerging technology will play an important role in enabling us to access the still unexplored "dark" proteome, allowing for a fuller understanding of the role ancient proteins can play in the interpretation of the past.
Collapse
Affiliation(s)
- Christina Warinner
- Department
of Anthropology, Harvard University, Cambridge, Massachusetts 02138, United States
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany
| | - Kristine Korzow Richter
- Department
of Anthropology, Harvard University, Cambridge, Massachusetts 02138, United States
| | - Matthew J. Collins
- Department
of Archaeology, Cambridge University, Cambridge CB2 3DZ, United Kingdom
- Section
for Evolutionary Genomics, Globe Institute,
University of Copenhagen, Copenhagen 1350, Denmark
| |
Collapse
|
8
|
The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms. Comput Struct Biotechnol J 2022; 20:1402-1412. [PMID: 35386104 PMCID: PMC8956878 DOI: 10.1016/j.csbj.2022.03.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 03/09/2022] [Accepted: 03/09/2022] [Indexed: 01/24/2023] Open
Abstract
Most correct de novo peptides have ⩽1 missing fragmentation cleavages. DeepNovo outperforms Novor for peptide accuracy for both data types. Novor excels at amino acid recall when many fragmentation cleavages are missing. Deep learning allows DeepNovo to predict amino acids without adjacent peaks.
Proteomics aims to characterise system-wide protein expression and typically relies on mass-spectrometry and peptide fragmentation, followed by a database search for protein identification. It has wide ranging applications from clinical to environmental settings and virtually impacts on every area of biology. In that context, de novo peptide sequencing is becoming increasingly popular. Historically its performance lagged behind database search methods but with the integration of machine learning, this field of research is gaining momentum. To enable de novo peptide sequencing to realise its full potential, it is critical to explore the mass spectrometry data underpinning peptide identification. In this research we investigate the characteristics of tandem mass spectra using 8 published datasets. We then evaluate two state of the art de novo peptide sequencing algorithms, Novor and DeepNovo, with a particular focus on their performance with regard to missing fragmentation cleavage sites and noise. DeepNovo was found to perform better than Novor overall. However, Novor recalled more correct amino acids when 6 or more cleavage sites were missing. Furthermore, less than 11% of each algorithms’ correct peptide predictions emanate from data with more than one missing cleavage site, highlighting the issues missing cleavages pose. We further investigate how the algorithms manage to correctly identify peptides with many of these missing fragmentation cleavages. We show how noise negatively impacts the performance of both algorithms, when high intensity peaks are considered. Finally, we provide recommendations regarding further algorithms’ improvements and offer potential avenues to overcome current inherent data limitations.
Collapse
|
9
|
Abstract
Accurate full-length sequencing of a purified unknown protein is still challenging nowadays due to the error-prone mass-spectrometry (MS)-based methods. De novo identified peptide sequence largely contain errors, undermining the accuracy of assembly. Bias on the detectability of the peptides also makes low-coverage regions, resulting in gaps. Although recent advances on multi-enzyme hydrolysis and algorithms showed complete assembly of full-length protein sequences in a few examples, the robustness in practical application is still to be improved. Here, inspired by genome assembly strategies, we demonstrate a contig-scaffolding strategy to assemble protein sequences with high robustness and accuracy. This strategy integrates multiple unspecific hydrolysis methods to minimize the bias in the hydrolysis process. After de novo identification of the peptides, our assembly algorithm, named Multiple Contigs & Scaffolding (MuCS), assembles the peptide sequences in a multistep, i.e., contig-scaffold manner, with error correction in each step. MS data from different hydrolysis experiments complement each other for robust contig extension and error correction. We demonstrated that our strategy on three proteins and three replications all reached 100% coverage (except one with 98.85%) and 98.69-100% accuracy. It can also efficiently deal with the membrane protein, although the transmembrane region was missing due to the limitation of the MS. The three replicates reached 88.85-92.57% coverage and 97.57-100% accuracy. In sum, we provided a practical, robust, and accurate solution for full-length protein sequencing. The MuCS software is available at http://chi-biotech.com/mucs/.
Collapse
Affiliation(s)
- Zhi-Biao Mai
- Big Data Decision Institute, Jinan University, Guangzhou 510632, China.,Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Zhong-Hua Zhou
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| | - Qing-Yu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| |
Collapse
|
10
|
Affinity Selection from Synthetic Peptide Libraries Enabled by De Novo MS/MS Sequencing. Int J Pept Res Ther 2022. [DOI: 10.1007/s10989-022-10370-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
AbstractRecently, de novo MS/MS peptide sequencing has enabled the application of affinity selections to synthetic peptide mixtures that approach the diversity of phage libraries (> 108 random peptides). In conjunction with ‘split-mix’ solid phase synthesis to access equimolar peptide mixtures, this approach provides a straightforward means to examine synthetic peptide libraries of considerably higher diversity than has been feasible historically. Here, we offer a critical perspective on this work, report emerging data, and highlight opportunities for further methods refinement. With continued development, ‘affinity selection–mass spectrometry’ may become a complimentary approach to phage display, in vitro selection, and DNA-encoded libraries for the discovery of synthetic ligands that modulate protein function.
Collapse
|
11
|
Marchi FC, Mendes-Silva E, Rodrigues-Ribeiro L, Bolais-Ramos LG, Verano-Braga T. Toxinology in the proteomics era: a review on arachnid venom proteomics. J Venom Anim Toxins Incl Trop Dis 2022; 28:20210034. [PMID: 35291269 PMCID: PMC8893269 DOI: 10.1590/1678-9199-jvatitd-2021-0034] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 06/01/2021] [Indexed: 11/22/2022] Open
|
12
|
Abstract
The goal of paleoproteomics is to characterize proteins from specimens that have been subjected to the degrading and obscuring effects of time, thus obtaining biological information about tissues or organisms both unobservable in the present and unobtainable through morphological study. Although the description of sequences from Tyrannosaurus rex and Brachylophosaurus canadensis suggested that proteins may persist over tens of millions of years, the majority of paleoproteomic analyses have focused on historical, archeological, or relatively young paleontological samples that rarely exceed 1 million years in age. However, recent advances in methodology and analyses of diverse tissues types (e.g., fossil eggshell, dental enamel) have begun closing the large window of time that remains unexplored in the fossil history of the Cenozoic. In this perspective, we discuss the history and current state of deep time paleoproteomics (DTPp), here defined as paleoproteomic study of samples ∼1 million years (1 Ma) or more in age. We then discuss the future of DTPp research, including what we see as critical ways the field can expand, advancements in technology that can be utilized, and the types of questions DTPp can address if such a future is realized.
Collapse
Affiliation(s)
- Elena R Schroeter
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Timothy P Cleland
- Museum Conservation Institute, Smithsonian Institution, Suitland, Maryland 20746, United States
| | - Mary H Schweitzer
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27695, United States.,North Carolina Museum of Natural Sciences, Raleigh, North Carolina 27605, United States.,Department of Geology, Lund University, Lund SE-221 00, Sweden
| |
Collapse
|
13
|
Chen L, Yang Y, Zhang Y, Li K, Cai H, Wang H, Zhao Q. The Small Open Reading Frame-Encoded Peptides: Advances in Methodologies and Functional Studies. Chembiochem 2021; 23:e202100534. [PMID: 34862721 DOI: 10.1002/cbic.202100534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 11/15/2021] [Indexed: 11/07/2022]
Abstract
Small open reading frames (sORFs) are an important class of genes with less than 100 codons. They were historically annotated as noncoding or even junk sequences. In recent years, accumulating evidence suggests that sORFs could encode a considerable number of polypeptides, many of which play important roles in both physiology and disease pathology. However, it has been technically challenging to directly detect sORF-encoded peptides (SEPs). Here, we discuss the latest advances in methodologies for identifying SEPs with mass spectrometry, as well as the progress on functional studies of SEPs.
Collapse
Affiliation(s)
- Lei Chen
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China.,Laboratory for Synthetic Chemistry and Chemical Biology Limited, Hong Kong Science and Technology Park, New Territories, Hong Kong SAR, 999077, P. R. China
| | - Ying Yang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Yuanliang Zhang
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Kecheng Li
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510623, P. R. China
| | - Hongwei Wang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, 510623, P. R. China
| | - Qian Zhao
- State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, 999077, P. R. China
| |
Collapse
|
14
|
Hruska M, Holub D. Evaluation of an integrative Bayesian peptide detection approach on a combinatorial peptide library. EUROPEAN JOURNAL OF MASS SPECTROMETRY (CHICHESTER, ENGLAND) 2021; 27:217-234. [PMID: 34989269 DOI: 10.1177/14690667211066725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Detection of peptides lies at the core of bottom-up proteomics analyses. We examined a Bayesian approach to peptide detection, integrating match-based models (fragments, retention time, isotopic distribution, and precursor mass) and peptide prior probability models under a unified probabilistic framework. To assess the relevance of these models and their various combinations, we employed a complete- and a tail-complete search of a low-precursor-mass synthetic peptide library based on oncogenic KRAS peptides. The fragment match was by far the most informative match-based model, while the retention time match was the only remaining such model with an appreciable impact--increasing correct detections by around 8 %. A peptide prior probability model built from a reference proteome greatly improved the detection over a uniform prior, essentially transforming de novo sequencing into a reference-guided search. The knowledge of a correct sequence tag in advance to peptide-spectrum matching had only a moderate impact on peptide detection unless the tag was long and of high certainty. The approach also derived more precise error rates on the analyzed combinatorial peptide library than those estimated using PeptideProphet and Percolator, showing its potential applicability for the detection of homologous peptides. Although the approach requires further computational developments for routine data analysis, it illustrates the value of peptide prior probabilities and presents a Bayesian approach for their incorporation into peptide detection.
Collapse
Affiliation(s)
- Miroslav Hruska
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, 98735Palacky University, Olomouc, Czech Republic
- Department of Computer Science, Faculty of Science, 98735Palacky University, Olomouc, Czech Republic
| | - Dusan Holub
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, 98735Palacky University, Olomouc, Czech Republic
| |
Collapse
|
15
|
Čaval T, Hecht ES, Tang W, Uy‐Gomez M, Nichols A, Kil YJ, Sandoval W, Bern M, Heck AJR. The lysosomal endopeptidases Cathepsin D and L are selective and effective proteases for the middle-down characterization of antibodies. FEBS J 2021; 288:5389-5405. [PMID: 33713388 PMCID: PMC8518856 DOI: 10.1111/febs.15813] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Revised: 01/23/2021] [Accepted: 03/08/2021] [Indexed: 01/18/2023]
Abstract
Mass spectrometry is gaining momentum as a method of choice to de novo sequence antibodies (Abs). Adequate sequence coverage of the hypervariable regions remains one of the toughest identification challenges by either bottom-up or top-down workflows. Methods that efficiently generate mid-size Ab fragments would further facilitate top-down MS and decrease data complexity. Here, we explore the proteases Cathepsins L and D for forming protein fragments from three IgG1s, one IgG2, and one bispecific, knob-and-hole IgG1. We demonstrate that high-resolution native MS provides a sensitive method for the detection of clipping sites. Both Cathepsins produced multiple, albeit specific cleavages. The Abs were cleaved immediately after the CDR3 region, yielding ~ 12 kDa fragments, that is, ideal sequencing-sized. Cathepsin D, but not Cathepsin L, also cleaved directly below the Ab hinge, releasing the F(ab')2. When constrained by the different disulfide bonds found in the IgG2 subtype or by the tertiary structure of the hole-containing bispecific IgG1, the hinge region digest product was not produced. The Cathepsin L and Cathepsin D clipping motifs were related to sequences of neutral amino acids and the tertiary structure of the Ab. A single pot (L + D) digestion protocol was optimized to achieve 100% efficiency. Nine protein fragments, corresponding to the VL, VH, CL, CH1, CH2, CH3, CL + CH1, and F(ab')2, constituted ~ 70% of the summed intensities of all deconvolved proteolytic products. Cleavage sites were confirmed by the Edman degradation and validated with top-down sequencing. The described work offers a complementary method for middle-down analysis that may be applied to top-down Ab sequencing. ENZYMES: Cathepsin L-EC 3.4.22.15, Cathepsin D-EC 3.4.23.5.
Collapse
Affiliation(s)
- Tomislav Čaval
- Biomolecular Mass Spectrometry and ProteomicsBijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical SciencesUtrecht UniversityThe Netherlands
- Netherlands Proteomics CentreUtrechtThe Netherlands
| | - Elizabeth Sara Hecht
- Department of Microchemistry, Proteomics, and Lipidomics & Next Generation SequencingGenentech, Inc.South San FranciscoCAUSA
| | | | - Maelia Uy‐Gomez
- Department of Microchemistry, Proteomics, and Lipidomics & Next Generation SequencingGenentech, Inc.South San FranciscoCAUSA
| | | | | | - Wendy Sandoval
- Department of Microchemistry, Proteomics, and Lipidomics & Next Generation SequencingGenentech, Inc.South San FranciscoCAUSA
| | | | - Albert J. R. Heck
- Biomolecular Mass Spectrometry and ProteomicsBijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical SciencesUtrecht UniversityThe Netherlands
- Netherlands Proteomics CentreUtrechtThe Netherlands
| |
Collapse
|
16
|
Ahlquist KD, Bañuelos MM, Funk A, Lai J, Rong S, Villanea FA, Witt KE. Our Tangled Family Tree: New Genomic Methods Offer Insight into the Legacy of Archaic Admixture. Genome Biol Evol 2021; 13:evab115. [PMID: 34028527 PMCID: PMC8480178 DOI: 10.1093/gbe/evab115] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 05/07/2021] [Accepted: 05/22/2021] [Indexed: 11/30/2022] Open
Abstract
The archaic ancestry present in the human genome has captured the imagination of both scientists and the wider public in recent years. This excitement is the result of new studies pushing the envelope of what we can learn from the archaic genetic information that has survived for over 50,000 years in the human genome. Here, we review the most recent ten years of literature on the topic of archaic introgression, including the current state of knowledge on Neanderthal and Denisovan introgression, as well as introgression from other as-yet unidentified archaic populations. We focus this review on four topics: 1) a reimagining of human demographic history, including evidence for multiple admixture events between modern humans, Neanderthals, Denisovans, and other archaic populations; 2) state-of-the-art methods for detecting archaic ancestry in population-level genomic data; 3) how these novel methods can detect archaic introgression in modern African populations; and 4) the functional consequences of archaic gene variants, including how those variants were co-opted into novel function in modern human populations. The goal of this review is to provide a simple-to-access reference for the relevant methods and novel data, which has changed our understanding of the relationship between our species and its siblings. This body of literature reveals the large degree to which the genetic legacy of these extinct hominins has been integrated into the human populations of today.
Collapse
Affiliation(s)
- K D Ahlquist
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, USA
| | - Mayra M Bañuelos
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, USA
| | - Alyssa Funk
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, USA
| | - Jiaying Lai
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Brown Center for Biomedical Informatics, Brown University, Providence, Rhode Island, USA
| | - Stephen Rong
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, USA
| | - Fernando A Villanea
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Department of Anthropology, University of Colorado Boulder, Colorado, USA
| | - Kelsey E Witt
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, USA
- Department of Ecology and Evolutionary Biology, Brown University, Providence, Rhode Island, USA
| |
Collapse
|
17
|
Empirical Evaluation of the Use of Computational HLA Binding as an Early Filter to the Mass Spectrometry-Based Epitope Discovery Workflow. Cancers (Basel) 2021; 13:cancers13102307. [PMID: 34065814 PMCID: PMC8150281 DOI: 10.3390/cancers13102307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 05/06/2021] [Accepted: 05/06/2021] [Indexed: 12/22/2022] Open
Abstract
Immunopeptidomics is used to identify novel epitopes for (therapeutic) vaccination strategies in cancer and infectious disease. Various false discovery rates (FDRs) are applied in the field when converting liquid chromatography-tandem mass spectrometry (LC-MS/MS) spectra to peptides. Subsequently, large efforts have recently been made to rescue peptides of lower confidence. However, it remains unclear what the overall relation is between the FDR threshold and the percentage of obtained HLA-binders. We here directly evaluated the effect of varying FDR thresholds on the resulting immunopeptidomes of HLA-eluates from human cancer cell lines and primary hepatocyte isolates using HLA-binding algorithms. Additional peptides obtained using less stringent FDR-thresholds, although generally derived from poorer spectra, still contained a high amount of HLA-binders and confirmed recently developed tools that tap into this pool of otherwise ignored peptides. Most of these peptides were identified with improved confidence when cell input was increased, supporting the validity and potential of these identifications. Altogether, our data suggest that increasing the FDR threshold for peptide identification in conjunction with data filtering by HLA-binding prediction, is a valid and highly potent method to more efficient exhaustion of immunopeptidome datasets for epitope discovery and reveals the extent of peptides to be rescued by recently developed algorithms.
Collapse
|
18
|
Fingleton E, Li Y, Roche KW. Advances in Proteomics Allow Insights Into Neuronal Proteomes. Front Mol Neurosci 2021; 14:647451. [PMID: 33935646 PMCID: PMC8084103 DOI: 10.3389/fnmol.2021.647451] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 03/25/2021] [Indexed: 11/29/2022] Open
Abstract
Protein–protein interaction networks and signaling complexes are essential for normal brain function and are often dysregulated in neurological disorders. Nevertheless, unraveling neuron- and synapse-specific proteins interaction networks has remained a technical challenge. New techniques, however, have allowed for high-resolution and high-throughput analyses, enabling quantification and characterization of various neuronal protein populations. Over the last decade, mass spectrometry (MS) has surfaced as the primary method for analyzing multiple protein samples in tandem, allowing for the precise quantification of proteomic data. Moreover, the development of sophisticated protein-labeling techniques has given MS a high temporal and spatial resolution, facilitating the analysis of various neuronal substructures, cell types, and subcellular compartments. Recent studies have leveraged these novel techniques to reveal the proteomic underpinnings of well-characterized neuronal processes, such as axon guidance, long-term potentiation, and homeostatic plasticity. Translational MS studies have facilitated a better understanding of complex neurological disorders, such as Alzheimer’s disease (AD), Schizophrenia (SCZ), and Autism Spectrum Disorder (ASD). Proteomic investigation of these diseases has not only given researchers new insight into disease mechanisms but has also been used to validate disease models and identify new targets for research.
Collapse
Affiliation(s)
- Erin Fingleton
- National Institute of Neurological Disorders and Stroke (NINDS), Bethesda, MD, United States
| | - Yan Li
- National Institute of Neurological Disorders and Stroke (NINDS), Bethesda, MD, United States
| | - Katherine W Roche
- National Institute of Neurological Disorders and Stroke (NINDS), Bethesda, MD, United States
| |
Collapse
|
19
|
Abstract
Proteomics, the large-scale study of all proteins of an organism or system, is a powerful tool for studying biological systems. It can provide a holistic view of the physiological and biochemical states of given samples through identification and quantification of large numbers of peptides and proteins. In forensic science, proteomics can be used as a confirmatory and orthogonal technique for well-built genomic analyses. Proteomics is highly valuable in cases where nucleic acids are absent or degraded, such as hair and bone samples. It can be used to identify body fluids, ethnic group, gender, individual, and estimate post-mortem interval using bone, muscle, and decomposition fluid samples. Compared to genomic analysis, proteomics can provide a better global picture of a sample. It has been used in forensic science for a wide range of sample types and applications. In this review, we briefly introduce proteomic methods, including sample preparation techniques, data acquisition using liquid chromatography-tandem mass spectrometry, and data analysis using database search, spectral library search, and de novo sequencing. We also summarize recent applications in the past decade of proteomics in forensic science with a special focus on human samples, including hair, bone, body fluids, fingernail, muscle, brain, and fingermark, and address the challenges, considerations, and future developments of forensic proteomics.
Collapse
|
20
|
Bugyi F, Szabó D, Szabó G, Révész Á, Pape VFS, Soltész-Katona E, Tóth E, Kovács O, Langó T, Vékey K, Drahos L. Influence of Post-Translational Modifications on Protein Identification in Database Searches. ACS OMEGA 2021; 6:7469-7477. [PMID: 33778259 PMCID: PMC7992065 DOI: 10.1021/acsomega.0c05997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 03/02/2021] [Indexed: 06/12/2023]
Abstract
Comprehensive analysis of post-translation modifications (PTMs) is an important mission of proteomics. However, the consideration of PTMs increases the search space and may therefore impair the efficiency of protein identification. Using thousands of proteomic searches, we investigated the practical aspects of considering multiple PTMs in Byonic searches for the maximization of protein and peptide hits. The inclusion of all PTMs, which occur with at least 2% frequency in the sample, has an advantageous effect on protein and peptide identification. A linear relationship was established between the number of considered PTMs and the number of reliably identified peptides and proteins. Even though they handle multiple modifications less efficiently, the results of MASCOT (using the Percolator function) and Andromeda (the search engine included in MaxQuant) became comparable to those of Byonic, in the case of a few PTMs.
Collapse
Affiliation(s)
- Fanni Bugyi
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
- Hevesy
György PhD School of Chemistry, Eötvös
Loránd University, Pázmány Péter sétány 1/A, H-1117 Budapest, Hungary
| | - Dániel Szabó
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
- Hevesy
György PhD School of Chemistry, Eötvös
Loránd University, Pázmány Péter sétány 1/A, H-1117 Budapest, Hungary
| | - Győző Szabó
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
- Faculty
of Informatics, Eötvös Loránd
University, Pázmány
Péter sétány 1/C, H-1117 Budapest, Hungary
| | - Ágnes Révész
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
| | - Veronika F. S. Pape
- Department
of Physiology, Faculty of Medicine, Semmelweis
University, Tűzoltó utca 37-47, H-1094 Budapest, Hungary
| | - Eszter Soltész-Katona
- Department
of Physiology, Faculty of Medicine, Semmelweis
University, Tűzoltó utca 37-47, H-1094 Budapest, Hungary
- ELKH
Supported Research Groups, Gellérthegy u. 30-32, H-1016 Budapest, Hungary
| | - Eszter Tóth
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
- Institute
of Enzymology, Research Centre for Natural
Sciences, Magyar Tudósok krt 2., H-1117 Budapest, Hungary
| | - Orsolya Kovács
- Department
of Physiology, Faculty of Medicine, Semmelweis
University, Tűzoltó utca 37-47, H-1094 Budapest, Hungary
- Department
of Genetics, Cell- and Immunobiology, Semmelweis
University, Nagyvárad tér 4, H-1089 Budapest, Hungary
| | - Tamás Langó
- Institute
of Enzymology, Research Centre for Natural
Sciences, Magyar Tudósok krt 2., H-1117 Budapest, Hungary
| | - Károly Vékey
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
| | - László Drahos
- Institute
of Organic Chemistry, Research Centre for
Natural Sciences, Magyar Tudósok krt 2, H-1117 Budapest, Hungary
| |
Collapse
|
21
|
Ho JJD, Man JHS, Schatz JH, Marsden PA. Translational remodeling by RNA-binding proteins and noncoding RNAs. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021; 12:e1647. [PMID: 33694288 DOI: 10.1002/wrna.1647] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 02/09/2021] [Accepted: 02/10/2021] [Indexed: 12/14/2022]
Abstract
Responsible for generating the proteome that controls phenotype, translation is the ultimate convergence point for myriad upstream signals that influence gene expression. System-wide adaptive translational reprogramming has recently emerged as a pillar of cellular adaptation. As classic regulators of mRNA stability and translation efficiency, foundational studies established the concept of collaboration and competition between RNA-binding proteins (RBPs) and noncoding RNAs (ncRNAs) on individual mRNAs. Fresh conceptual innovations now highlight stress-activated, evolutionarily conserved RBP networks and ncRNAs that increase the translation efficiency of populations of transcripts encoding proteins that participate in a common cellular process. The discovery of post-transcriptional functions for long noncoding RNAs (lncRNAs) was particularly intriguing given their cell-type-specificity and historical definition as nuclear-functioning epigenetic regulators. The convergence of RBPs, lncRNAs, and microRNAs on functionally related mRNAs to enable adaptive protein synthesis is a newer biological paradigm that highlights their role as "translatome (protein output) remodelers" and reinvigorates the paradigm of "RNA operons." Together, these concepts modernize our understanding of cellular stress adaptation and strategies for therapeutic development. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications Translation > Translation Regulation Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs.
Collapse
Affiliation(s)
- J J David Ho
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, Florida, USA.,Division of Hematology, Department of Medicine, Miller School of Medicine, University of Miami, Miami, Florida, USA
| | - Jeffrey H S Man
- Keenan Research Centre, Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, Ontario, Canada.,Department of Medicine, University of Toronto, Toronto, Ontario, Canada.,Department of Respirology, University Health Network, Latner Thoracic Research Laboratories, University of Toronto, Toronto, Ontario, Canada
| | - Jonathan H Schatz
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, Florida, USA.,Division of Hematology, Department of Medicine, Miller School of Medicine, University of Miami, Miami, Florida, USA
| | - Philip A Marsden
- Keenan Research Centre, Li Ka Shing Knowledge Institute, St. Michael's Hospital, Toronto, Ontario, Canada.,Department of Medicine, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
22
|
Moyer TB, Parsley NC, Sadecki PW, Schug WJ, Hicks LM. Leveraging orthogonal mass spectrometry based strategies for comprehensive sequencing and characterization of ribosomal antimicrobial peptide natural products. Nat Prod Rep 2021; 38:489-509. [PMID: 32929442 PMCID: PMC7956910 DOI: 10.1039/d0np00046a] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Covering: Up to July 2020Ribosomal antimicrobial peptide (AMP) natural products, also known as ribosomally synthesized and post-translationally modified peptides (RiPPs) or host defense peptides, demonstrate potent bioactivities and impressive complexity that complicate molecular and biological characterization. Tandem mass spectrometry (MS) has rapidly accelerated bioactive peptide sequencing efforts, yet standard workflows insufficiently address intrinsic AMP diversity. Herein, orthogonal approaches to accelerate comprehensive and accurate molecular characterization without the need for prior isolation are reviewed. Chemical derivatization, proteolysis (enzymatic and chemical cleavage), multistage MS fragmentation, and separation (liquid chromatography and ion mobility) strategies can provide complementary amino acid composition and post-translational modification data to constrain sequence solutions. Examination of two complex case studies, gomesin and styelin D, highlights the practical implementation of the proposed approaches. Finally, we emphasize the importance of heterogeneous AMP peptidoforms that confer varying biological function, an area that warrants significant further development.
Collapse
Affiliation(s)
- Tessa B Moyer
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | | | | | | | | |
Collapse
|
23
|
The challenge of detecting modifications on proteins. Essays Biochem 2020; 64:135-153. [PMID: 31957791 DOI: 10.1042/ebc20190055] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/17/2019] [Accepted: 12/19/2019] [Indexed: 12/16/2022]
Abstract
Post-translational modifications (PTMs) are integral to the regulation of protein function, characterising their role in this process is vital to understanding how cells work in both healthy and diseased states. Mass spectrometry (MS) facilitates the mass determination and sequencing of peptides, and thereby also the detection of site-specific PTMs. However, numerous challenges in this field continue to persist. The diverse chemical properties, low abundance, labile nature and instability of many PTMs, in combination with the more practical issues of compatibility with MS and bioinformatics challenges, contribute to the arduous nature of their analysis. In this review, we present an overview of the established MS-based approaches for analysing PTMs and the common complications associated with their investigation, including examples of specific challenges focusing on phosphorylation, lysine acetylation and redox modifications.
Collapse
|
24
|
Vitorino R, Guedes S, Trindade F, Correia I, Moura G, Carvalho P, Santos MAS, Amado F. De novo sequencing of proteins by mass spectrometry. Expert Rev Proteomics 2020; 17:595-607. [PMID: 33016158 DOI: 10.1080/14789450.2020.1831387] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
INTRODUCTION Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. AREAS COVERED De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. EXPERT OPINION As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.
Collapse
Affiliation(s)
- Rui Vitorino
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal.,iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal.,Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Sofia Guedes
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| | - Fabio Trindade
- Unidade De Investigação Cardiovascular, Departamento De Cirurgia E Fisiologia, Faculdade De Medicina, Universidade Do Porto , Porto, Portugal
| | - Inês Correia
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Gabriela Moura
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Paulo Carvalho
- Laboratory for Structural and Computational Proteomics, Carlos Chagas Institute, FIOCRUZ, Laboratory for Proteomics and Protein Engineering , Brazil
| | - Manuel A S Santos
- iBiMED, Department of Medical Sciences, University of Aveiro , Aveiro, Portugal
| | - Francisco Amado
- QOPNA & LAQV-REQUIMTE, Departamento De Química, Institute of Biomedicine - iBiMED , Aveiro, Portugal
| |
Collapse
|
25
|
Villalobos Solis MI, Poudel S, Bonnot C, Shrestha HK, Hettich RL, Veneault-Fourrey C, Martin F, Abraham PE. A Viable New Strategy for the Discovery of Peptide Proteolytic Cleavage Products in Plant-Microbe Interactions. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2020; 33:1177-1188. [PMID: 32597696 DOI: 10.1094/mpmi-04-20-0082-ta] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Small peptides that are proteolytic cleavage products (PCPs) of less than 100 amino acids are emerging as key signaling molecules that mediate cell-to-cell communication and biological processes that occur between and within plants, fungi, and bacteria. Yet, the discovery and characterization of these molecules is largely overlooked. Today, selective enrichment and subsequent characterization by mass spectrometry-based sequencing offers the greatest potential for their comprehensive characterization, however qualitative and quantitative performance metrics are rarely captured. Herein, we addressed this need by benchmarking the performance of an enrichment strategy, optimized specifically for small PCPs, using state-of-the-art de novo-assisted peptide sequencing. As a case study, we implemented this approach to identify PCPs from different root and foliar tissues of the hybrid poplar Populus × canescens 717-1B4 in interaction with the ectomycorrhizal basidiomycete Laccaria bicolor. In total, we identified 1,660 and 2,870 Populus and L. bicolor unique PCPs, respectively. Qualitative results supported the identification of well-known PCPs, like the mature form of the photosystem II complex 5-kDa protein (approximately 3 kDa). A total of 157 PCPs were determined to be significantly more abundant in root tips with established ectomycorrhiza when compared with root tips without established ectomycorrhiza and extramatrical mycelium of L. bicolor. These PCPs mapped to 64 Populus proteins and 69 L. bicolor proteins in our database, with several of them previously implicated in biologically relevant associations between plant and fungus.
Collapse
Affiliation(s)
- Manuel I Villalobos Solis
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, U.S.A
- Department of Genome Science and Technology, University of Tennessee-Knoxville, Knoxville, TN 37996, U.S.A
| | - Suresh Poudel
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, U.S.A
| | - Clemence Bonnot
- UMR 1136 INRA-Université de Lorraine 'Interactions Arbres/Microorganismes', Laboratoire d'Excellence ARBRE, Centre INRA-Lorraine, 54280 Champenoux, France
| | - Him K Shrestha
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, U.S.A
- Department of Genome Science and Technology, University of Tennessee-Knoxville, Knoxville, TN 37996, U.S.A
| | - Robert L Hettich
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, U.S.A
| | - Claire Veneault-Fourrey
- UMR 1136 INRA-Université de Lorraine 'Interactions Arbres/Microorganismes', Laboratoire d'Excellence ARBRE, Centre INRA-Lorraine, 54280 Champenoux, France
| | - Francis Martin
- UMR 1136 INRA-Université de Lorraine 'Interactions Arbres/Microorganismes', Laboratoire d'Excellence ARBRE, Centre INRA-Lorraine, 54280 Champenoux, France
| | - Paul E Abraham
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, U.S.A
| |
Collapse
|
26
|
Schiebenhoefer H, Schallert K, Renard BY, Trappe K, Schmid E, Benndorf D, Riedel K, Muth T, Fuchs S. A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and Prophane. Nat Protoc 2020; 15:3212-3239. [PMID: 32859984 DOI: 10.1038/s41596-020-0368-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 05/29/2020] [Indexed: 12/14/2022]
Abstract
Metaproteomics, the study of the collective protein composition of multi-organism systems, provides deep insights into the biodiversity of microbial communities and the complex functional interplay between microbes and their hosts or environment. Thus, metaproteomics has become an indispensable tool in various fields such as microbiology and related medical applications. The computational challenges in the analysis of corresponding datasets differ from those of pure-culture proteomics, e.g., due to the higher complexity of the samples and the larger reference databases demanding specific computing pipelines. Corresponding data analyses usually consist of numerous manual steps that must be closely synchronized. With MetaProteomeAnalyzer and Prophane, we have established two open-source software solutions specifically developed and optimized for metaproteomics. Among other features, peptide-spectrum matching is improved by combining different search engines and, compared to similar tools, metaproteome annotation benefits from the most comprehensive set of available databases (such as NCBI, UniProt, EggNOG, PFAM, and CAZy). The workflow described in this protocol combines both tools and leads the user through the entire data analysis process, including protein database creation, database search, protein grouping and annotation, and results visualization. To the best of our knowledge, this protocol presents the most comprehensive, detailed and flexible guide to metaproteomics data analysis to date. While beginners are provided with robust, easy-to-use, state-of-the-art data analysis in a reasonable time (a few hours, depending on, among other factors, the protein database size and the number of identified peptides and inferred proteins), advanced users benefit from the flexibility and adaptability of the workflow.
Collapse
Affiliation(s)
- Henning Schiebenhoefer
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Hasso Plattner Institute, Faculty for Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Kay Schallert
- Bioprocess Engineering, Otto von Guericke University, Magdeburg, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Hasso Plattner Institute, Faculty for Digital Engineering, University of Potsdam, Potsdam, Germany
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Emanuel Schmid
- ID Computational & Data Science Support, Eidgenössische Technische Hochschule, Zurich, Switzerland
| | - Dirk Benndorf
- Bioprocess Engineering, Otto von Guericke University, Magdeburg, Germany
- Bioprocess Engineering, Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
| | - Katharina Riedel
- Center for Functional Genomics of Microbes (CFGM), Institute of Microbiology, University of Greifswald, Greifswald, Germany
| | - Thilo Muth
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin, Germany
| | - Stephan Fuchs
- Department of Infectious Diseases, Robert Koch Institute, Wernigerode, Germany.
| |
Collapse
|
27
|
O'Bryon I, Jenson SC, Merkley ED. Flying blind, or just flying under the radar? The underappreciated power of de novo methods of mass spectrometric peptide identification. Protein Sci 2020; 29:1864-1878. [PMID: 32713088 PMCID: PMC7454419 DOI: 10.1002/pro.3919] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/21/2020] [Accepted: 07/23/2020] [Indexed: 12/15/2022]
Abstract
Mass spectrometry-based proteomics is a popular and powerful method for precise and highly multiplexed protein identification. The most common method of analyzing untargeted proteomics data is called database searching, where the database is simply a collection of protein sequences from the target organism, derived from genome sequencing. Experimental peptide tandem mass spectra are compared to simplified models of theoretical spectra calculated from the translated genomic sequences. However, in several interesting application areas, such as forensics, archaeology, venomics, and others, a genome sequence may not be available, or the correct genome sequence to use is not known. In these cases, de novo peptide identification can play an important role. De novo methods infer peptide sequence directly from the tandem mass spectrum without reference to a sequence database, usually using graph-based or machine learning algorithms. In this review, we provide a basic overview of de novo peptide identification methods and applications, briefly covering de novo algorithms and tools, and focusing in more depth on recent applications from venomics, metaproteomics, forensics, and characterization of antibody drugs.
Collapse
Affiliation(s)
- Isabelle O'Bryon
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Sarah C. Jenson
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Eric D. Merkley
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| |
Collapse
|
28
|
Peng J, Zhang H, Niu H, Wu R. Peptidomic analyses: The progress in enrichment and identification of endogenous peptides. Trends Analyt Chem 2020. [DOI: 10.1016/j.trac.2020.115835] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
29
|
Kote S, Pirog A, Bedran G, Alfaro J, Dapic I. Mass Spectrometry-Based Identification of MHC-Associated Peptides. Cancers (Basel) 2020; 12:cancers12030535. [PMID: 32110973 PMCID: PMC7139412 DOI: 10.3390/cancers12030535] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 02/15/2020] [Accepted: 02/20/2020] [Indexed: 02/06/2023] Open
Abstract
Neoantigen-based immunotherapies promise to improve patient outcomes over the current standard of care. However, detecting these cancer-specific antigens is one of the significant challenges in the field of mass spectrometry. Even though the first sequencing of the immunopeptides was done decades ago, today there is still a diversity of the protocols used for neoantigen isolation from the cell surface. This heterogeneity makes it difficult to compare results between the laboratories and the studies. Isolation of the neoantigens from the cell surface is usually done by mild acid elution (MAE) or immunoprecipitation (IP) protocol. However, limited amounts of the neoantigens present on the cell surface impose a challenge and require instrumentation with enough sensitivity and accuracy for their detection. Detecting these neopeptides from small amounts of available patient tissue limits the scope of most of the studies to cell cultures. Here, we summarize protocols for the extraction and identification of the major histocompatibility complex (MHC) class I and II peptides. We aimed to evaluate existing methods in terms of the appropriateness of the isolation procedure, as well as instrumental parameters used for neoantigen detection. We also focus on the amount of the material used in the protocols as the critical factor to consider when analyzing neoantigens. Beyond experimental aspects, there are numerous readily available proteomics suits/tools applicable for neoantigen discovery; however, experimental validation is still necessary for neoantigen characterization.
Collapse
|
30
|
Ahsan N, Wilson RS, Rao RSP, Salvato F, Sabila M, Ullah H, Miernyk JA. Mass Spectrometry-Based Identification of Phospho-Tyr in Plant Proteomics. J Proteome Res 2020; 19:561-571. [PMID: 31967836 DOI: 10.1021/acs.jproteome.9b00550] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
O-Phosphorylation (phosphorylation of the hydroxyl-group of S, T, and Y residues) is among the first described and most thoroughly studied posttranslational modification (PTM). Y-Phosphorylation, catalyzed by Y-kinases, is a key step in both signal transduction and regulation of enzymatic activity in mammalian systems. Canonical Y-kinase sequences are absent from plant genomes/kinomes, often leading to the assumption that plant cells lack O-phospho-l-tyrosine (pY). However, recent improvements in sample preparation, coupled with advances in instrument sensitivity and accessibility, have led to results that unequivocally disproved this assumption. Identification of hundreds of pY-peptides/proteins, followed by validation using genomic, molecular, and biochemical approaches, implies previously unappreciated roles for this "animal PTM" in plants. Herein, we review extant results from studies of pY in plants and propose a strategy for preparation and analysis of pY-peptides that will allow a depth of coverage of the plant pY-proteome comparable to that achieved in mammalian systems.
Collapse
Affiliation(s)
- Nagib Ahsan
- Division of Biology and Medicine , Brown University , Providence , Rhode Island 02903 , United States.,Center for Cancer Research Development, Proteomics Core Facility , Rhode Island Hospital , Providence , Rhode Island 02903 , United States
| | - Rashaun S Wilson
- Keck Mass Spectrometry & Proteomics Resource , Yale University , New Haven , Connecticut 06511 , United States
| | - R Shyama Prasad Rao
- Biostatistics and Bioinformatics Division, Yenepoya Research Center , Yenepoya University , Mangalore 575018 , India
| | - Fernanda Salvato
- Department of Plant and Microbial Biology, College of Agriculture and Life Sciences , North Carolina State University , Raleigh , North Carolina 27695 , United States
| | - Mercy Sabila
- Department of Biology , Howard University , Washington , D.C. 20059 , United States
| | - Hemayet Ullah
- Department of Biology , Howard University , Washington , D.C. 20059 , United States
| | - Ján A Miernyk
- Division of Biochemistry , University of Missouri , Columbia , Missouri 65211 , United States
| |
Collapse
|
31
|
Karunratanakul K, Tang HY, Speicher DW, Chuangsuwanich E, Sriswasdi S. Uncovering Thousands of New Peptides with Sequence-Mask-Search Hybrid De Novo Peptide Sequencing Framework. Mol Cell Proteomics 2019; 18:2478-2491. [PMID: 31591261 PMCID: PMC6885704 DOI: 10.1074/mcp.tir119.001656] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 09/09/2019] [Indexed: 01/03/2023] Open
Abstract
Typical analyses of mass spectrometry data only identify amino acid sequences that exist in reference databases. This restricts the possibility of discovering new peptides such as those that contain uncharacterized mutations or originate from unexpected processing of RNAs and proteins. De novo peptide sequencing approaches address this limitation but often suffer from low accuracy and require extensive validation by experts. Here, we develop SMSNet, a deep learning-based de novo peptide sequencing framework that achieves >95% amino acid accuracy while retaining good identification coverage. Applications of SMSNet on landmark proteomics and peptidomics studies reveal over 10,000 previously uncharacterized HLA antigens and phosphopeptides, and in conjunction with database-search methods, expand the coverage of peptide identification by almost 30%. The power to accurately identify new peptides of SMSNet would make it an invaluable tool for any future proteomics and peptidomics studies, including tumor neoantigen discovery, antibody sequencing, and proteome characterization of non-model organisms.
Collapse
Affiliation(s)
- Korrawe Karunratanakul
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok 10330, Thailand
| | - Hsin-Yao Tang
- Proteomics and Metabolomics Facility, The Wistar Institute, Philadelphia, PA 19104
| | - David W Speicher
- Center for Systems and Computational Biology, The Wistar Institute, Philadelphia, PA 19104
| | - Ekapol Chuangsuwanich
- Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok 10330, Thailand; Computational Molecular Biology Group, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand. mailto:
| | - Sira Sriswasdi
- Computational Molecular Biology Group, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand; Research Affairs, Faculty of Medicine, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Bangkok 10330, Thailand. mailto:
| |
Collapse
|
32
|
Pino L, Lin A, Bittremieux W. 2018 YPIC Challenge: A Case Study in Characterizing an Unknown Protein Sample. J Proteome Res 2019; 18:3936-3943. [PMID: 31556620 PMCID: PMC6824964 DOI: 10.1021/acs.jproteome.9b00384] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
For the 2018 YPIC Challenge, contestants were invited to try to decipher two unknown English questions encoded by a synthetic protein expressed in Escherichia coli. In addition to deciphering the sentence, contestants were asked to determine the three-dimensional structure and detect any post-translation modifications left by the host organism. We present our experimental and computational strategy to characterize this sample by identifying the unknown protein sequence and detecting the presence of post-translational modifications. The sample was acquired with dynamic exclusion disabled to increase the signal-to-noise ratio of the measured molecules, after which spectral clustering was used to generate high-quality consensus spectra. De novo spectrum identification was used to determine the synthetic protein sequence, and any post-translational modifications introduced by E. coli on the synthetic protein were analyzed via spectral networking. This workflow resulted in a de novo sequence coverage of 70%, on par with sequence database searching performance. Additionally, the spectral networking analysis indicated that no systematic modifications were introduced on the synthetic protein by E. coli. The strategy presented here can be directly used to analyze samples for which no protein sequence information is available or when the identity of the sample is unknown. All software and code to perform the bioinformatics analysis is available as open source, and self-contained Jupyter notebooks are provided to fully recreate the analysis.
Collapse
Affiliation(s)
- Lindsay Pino
- Department of Genome Sciences, University of Washington, Seattle WA 98195, USA
| | - Andy Lin
- Department of Genome Sciences, University of Washington, Seattle WA 98195, USA
| | - Wout Bittremieux
- Department of Genome Sciences, University of Washington, Seattle WA 98195, USA
- Department of Mathematics and Computer Science, University of Antwerp, 2020 Antwerp, Belgium
- Biomedical Informatics Network Antwerpen (biomina), 2020 Antwerp, Belgium
| |
Collapse
|
33
|
Callahan N, Tullman J, Kelman Z, Marino J. Strategies for Development of a Next-Generation Protein Sequencing Platform. Trends Biochem Sci 2019; 45:76-89. [PMID: 31676211 DOI: 10.1016/j.tibs.2019.09.005] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 09/11/2019] [Accepted: 09/17/2019] [Indexed: 02/08/2023]
Abstract
Proteomic analysis can be a critical bottleneck in cellular characterization. The current paradigm relies primarily on mass spectrometry of peptides and affinity reagents (i.e., antibodies), both of which require a priori knowledge of the sample. An unbiased protein sequencing method, with a dynamic range that covers the full range of protein concentrations in proteomes, would revolutionize the field of proteomics, allowing a more facile characterization of novel gene products and subcellular complexes. To this end, several new platforms based on single-molecule protein-sequencing approaches have been proposed. This review summarizes four of these approaches, highlighting advantages, limitations, and challenges for each method towards advancing as a core technology for next-generation protein sequencing.
Collapse
Affiliation(s)
- Nicholas Callahan
- Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology, and University of Maryland, Rockville, MD 20850, USA.
| | - Jennifer Tullman
- Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology, and University of Maryland, Rockville, MD 20850, USA
| | - Zvi Kelman
- Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology, and University of Maryland, Rockville, MD 20850, USA; Biomolecular Labeling Laboratory, Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA
| | - John Marino
- Institute for Bioscience and Biotechnology Research, National Institute of Standards and Technology, and University of Maryland, Rockville, MD 20850, USA
| |
Collapse
|
34
|
Qu Z, Li Z, Ma L, Wei X, Zhang L, Liang R, Meng G, Zhang N, Xia C. Structure and Peptidome of the Bat MHC Class I Molecule Reveal a Novel Mechanism Leading to High-Affinity Peptide Binding. THE JOURNAL OF IMMUNOLOGY 2019; 202:3493-3506. [PMID: 31076531 DOI: 10.4049/jimmunol.1900001] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 04/10/2019] [Indexed: 01/21/2023]
Abstract
Bats are natural reservoir hosts, harboring more than 100 viruses, some of which are lethal to humans. The asymptomatic coexistence with viruses is thought to be connected to the unique immune system of bats. MHC class I (MHC I) presentation is closely related to cytotoxic lymphocyte immunity, which plays an important role in viral resistance. To investigate the characteristics of MHC I presentation in bats, the crystal structures of peptide-MHC I complexes of Pteropus alecto, Ptal-N*01:01/HEV-1 (DFANTFLP) and Ptal-N*01:01/HEV-2 (DYINTNLVP), and two related mutants, Ptal-N*01:01/HEV-1PΩL (DFANTFLL) and Ptal-N*01:01ΔMDL/HEV-1, were determined. Through structural analysis, we found that Ptal-N*01:01 had a multi-Ala-assembled pocket B and a flexible hydrophobic pocket F, which could accommodate variable anchor residues and allow Ptal-N*01:01 to bind numerous peptides. Three sequential amino acids, Met, Asp, and Leu, absent from the α1 domain of the H chain in other mammals, were present in this domain in the bat. Upon deleting these amino acids and determining the structure in p/Ptal-N*01:01ΔMDL/HEV-1, we found they helped form an extra salt-bridge chain between the H chain and the N-terminal aspartic acid of the peptide. By introducing an MHC I random peptide library for de novo liquid chromatography-tandem mass spectrometry analysis, we found that this insertion module, present in all types of bats, can promote MHC I presentation of peptides with high affinity during the peptide exchange process. This study will help us better understand how bat MHC I presents high-affinity peptides from an extensive binding peptidome and provides a foundation to understand the cellular immunity of bats.
Collapse
Affiliation(s)
- Zehui Qu
- Department of Microbiology and Immunology, College of Veterinary Medicine, China Agricultural University, Haidian District, Beijing 100094, China
| | - Zibin Li
- Department of Microbiology and Immunology, College of Veterinary Medicine, China Agricultural University, Haidian District, Beijing 100094, China
| | - Lizhen Ma
- Department of Microbiology and Immunology, College of Veterinary Medicine, China Agricultural University, Haidian District, Beijing 100094, China
| | - Xiaohui Wei
- Department of Microbiology and Immunology, College of Veterinary Medicine, China Agricultural University, Haidian District, Beijing 100094, China
| | - Lijie Zhang
- Department of Microbiology and Immunology, College of Veterinary Medicine, China Agricultural University, Haidian District, Beijing 100094, China
| | - Ruiying Liang
- Department of Microbiology and Immunology, College of Veterinary Medicine, China Agricultural University, Haidian District, Beijing 100094, China
| | - Geng Meng
- Department of Veterinary Biomedicine, College of Veterinary Medicine, China Agricultural University, Haidian District, Beijing 100094, China; and
| | - Nianzhi Zhang
- Department of Microbiology and Immunology, College of Veterinary Medicine, China Agricultural University, Haidian District, Beijing 100094, China;
| | - Chun Xia
- Department of Microbiology and Immunology, College of Veterinary Medicine, China Agricultural University, Haidian District, Beijing 100094, China; .,Key Laboratory of Animal Epidemiology, Ministry of Agriculture, College of Veterinary Medicine, China Agricultural University, Haidian District, Beijing 100094, China
| |
Collapse
|
35
|
Saito MA, Bertrand EM, Duffy ME, Gaylord DA, Held NA, Hervey WJ, Hettich RL, Jagtap PD, Janech MG, Kinkade DB, Leary DH, McIlvin MR, Moore EK, Morris RM, Neely BA, Nunn BL, Saunders JK, Shepherd AI, Symmonds NI, Walsh DA. Progress and Challenges in Ocean Metaproteomics and Proposed Best Practices for Data Sharing. J Proteome Res 2019; 18:1461-1476. [PMID: 30702898 DOI: 10.1021/acs.jproteome.8b00761] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Ocean metaproteomics is an emerging field enabling discoveries about marine microbial communities and their impact on global biogeochemical processes. Recent ocean metaproteomic studies have provided insight into microbial nutrient transport, colimitation of carbon fixation, the metabolism of microbial biofilms, and dynamics of carbon flux in marine ecosystems. Future methodological developments could provide new capabilities such as characterizing long-term ecosystem changes, biogeochemical reaction rates, and in situ stoichiometries. Yet challenges remain for ocean metaproteomics due to the great biological diversity that produces highly complex mass spectra, as well as the difficulty in obtaining and working with environmental samples. This review summarizes the progress and challenges facing ocean metaproteomic scientists and proposes best practices for data sharing of ocean metaproteomic data sets, including the data types and metadata needed to enable intercomparisons of protein distributions and annotations that could foster global ocean metaproteomic capabilities.
Collapse
Affiliation(s)
- Mak A Saito
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Erin M Bertrand
- Department of Biology , Dalhousie University , Halifax , Nova Scotia B3H 4R2 , Canada
| | - Megan E Duffy
- School of Oceanography , University of Washington , Seattle , Washington 98195-7940 , United States
| | - David A Gaylord
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Noelle A Held
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | | | - Robert L Hettich
- Oak Ridge National Laboratory and Microbiology Department , University of Tennessee , Knoxville , Tennessee 37996 , United States
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics , University of Minnesota , Saint Paul , Minnesota 55108 , United States
| | - Michael G Janech
- College of Charleston , Charleston , South Carolina 29424 , United States
| | - Danie B Kinkade
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Dagmar H Leary
- U.S. Naval Research Laboratory , Washington , D.C. 20375 , United States
| | - Matthew R McIlvin
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Eli K Moore
- Department of Environmental Science , Rowan University , Glassboro , New Jersey 08028 , United States
| | - Robert M Morris
- School of Oceanography , University of Washington , Seattle , Washington 98195-7940 , United States
| | - Benjamin A Neely
- National Institute of Standards and Technology , Charleston , South Carolina 29412 , United States
| | - Brook L Nunn
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - Jaclyn K Saunders
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States.,School of Oceanography , University of Washington , Seattle , Washington 98195-7940 , United States
| | - Adam I Shepherd
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - Nicholas I Symmonds
- Woods Hole Oceanographic Institution , Woods Hole , Massachusetts 02543 , United States
| | - David A Walsh
- Department of Biology , Concordia University , Montreal , Quebec H4B 1R6 , Canada
| |
Collapse
|