1
|
Angelo M, Zhang W, Vilseck J, Aoki S. In silico λ-dynamics predicts protein binding specificities to modified RNAs. Nucleic Acids Res 2025; 53:gkaf166. [PMID: 40066880 PMCID: PMC11894534 DOI: 10.1093/nar/gkaf166] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 02/19/2025] [Accepted: 02/20/2025] [Indexed: 03/15/2025] Open
Abstract
RNA modifications shape gene expression through a variety of chemical changes to canonical RNA bases. Although numbering in the hundreds, only a few RNA modifications are well characterized, in part due to the absence of methods to identify modification sites. Antibodies remain a common tool to identify modified RNA and infer modification sites through straightforward applications. However, specificity issues can result in off-target binding and confound conclusions. This work utilizes in silico λ-dynamics to efficiently estimate binding free energy differences of modification-targeting antibodies between a variety of naturally occurring RNA modifications. Crystal structures of inosine and N6-methyladenosine (m6A) targeting antibodies bound to their modified ribonucleosides were determined and served as structural starting points. λ-Dynamics was utilized to predict RNA modifications that permit or inhibit binding to these antibodies. In vitro RNA-antibody binding assays supported the accuracy of these in silico results. High agreement between experimental and computed binding propensities demonstrated that λ-dynamics can serve as a predictive screen for antibody specificity against libraries of RNA modifications. More importantly, this strategy is an innovative way to elucidate how hundreds of known RNA modifications interact with biological molecules without the limitations imposed by in vitro or in vivo methodologies.
Collapse
Affiliation(s)
- Murphy Angelo
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 635 Barnhill Drive, Indianapolis, IN 46202, United States
| | - Wen Zhang
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 635 Barnhill Drive, Indianapolis, IN 46202, United States
- Melvin and Bren Simon Cancer Center, 535 Barnhill Drive, Indianapolis, IN 46202, United States
| | - Jonah Z Vilseck
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 635 Barnhill Drive, Indianapolis, IN 46202, United States
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, United States
| | - Scott T Aoki
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 635 Barnhill Drive, Indianapolis, IN 46202, United States
- Melvin and Bren Simon Cancer Center, 535 Barnhill Drive, Indianapolis, IN 46202, United States
| |
Collapse
|
2
|
Takan S, Allmer J. De Novo Sequencing of Peptides from Tandem Mass Spectra and Applications in Proteogenomics. Methods Mol Biol 2025; 2859:1-19. [PMID: 39436593 DOI: 10.1007/978-1-0716-4152-1_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
The changes in protein expression are hallmarks of development and disease. Protein expression can be established qualitatively and quantitatively using mass spectrometry (MS). Samples are prepared, proteins extracted and then analyzed using MS and MS/MS. The resulting spectra need to be processed computationally to assign peptide spectrum match. Database searches employ sequence databases or spectral libraries for matching possible peptides with the measured spectra. This route is well established but fails when peptides are not found in sequence repositories. In this case, de novo sequencing of MS/MS spectra can be employed. Many computational algorithms that establish the peptide sequence from MS/MS spectrum alone are available. While de novo sequencing assigns a sequence to an MS/MS spectrum, this assignment can be used in further processes for genome annotation. For example, novel exons can be assigned, known exons can be extended, and splice sites can be validated at the protein level. We compiled an extensive list of such algorithms, grouped them, and discussed the selected approaches. We also provide a roadmap of how de novo sequencing can enter mainstream proteogenomic analysis. In the future, de novo predictions can be added to sample-specific protein databases, including RNA-seq translations. These enriched databases can then be used for proteogenomics studies with existing pipelines.
Collapse
Affiliation(s)
- Savas Takan
- Department of artificial intelligence and data engineering, Faculty of Engineering, Ankara University, Ankara, Turkey
| | - Jens Allmer
- Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied Sciences, Mülheim adR., Germany.
| |
Collapse
|
3
|
Angelo M, Zhang W, Vilseck JZ, Aoki ST. In silico λ-dynamics predicts protein binding specificities to modified RNAs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.26.577511. [PMID: 38328125 PMCID: PMC10849657 DOI: 10.1101/2024.01.26.577511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
RNA modifications shape gene expression through a smorgasbord of chemical changes to canonical RNA bases. Although numbering in the hundreds, only a few RNA modifications are well characterized, in part due to the absence of methods to identify modification sites. Antibodies remain a common tool to identify modified RNA and infer modification sites through straightforward applications. However, specificity issues can result in off-target binding and confound conclusions. This work utilizes in silico λ-dynamics to efficiently estimate binding free energy differences of modification-targeting antibodies between a variety of naturally occurring RNA modifications. Crystal structures of inosine and N6-methyladenosine (m6A) targeting antibodies bound to their modified ribonucleosides were determined and served as structural starting points. λ-Dynamics was utilized to predict RNA modifications that permit or inhibit binding to these antibodies. In vitro RNA-antibody binding assays supported the accuracy of these in silico results. High agreement between experimental and computed binding propensities demonstrated that λ-dynamics can serve as a predictive screen for antibody specificity against libraries of RNA modifications. More importantly, this strategy is an innovative way to elucidate how hundreds of known RNA modifications interact with biological molecules without the limitations imposed by in vitro or in vivo methodologies.
Collapse
Affiliation(s)
- Murphy Angelo
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 635 Barnhill Drive, Indianapolis, IN 46202, USA
| | - Wen Zhang
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 635 Barnhill Drive, Indianapolis, IN 46202, USA
- Melvin and Bren Simon Cancer Center, 535 Barnhill Drive, Indianapolis, IN 46202, USA
| | - Jonah Z. Vilseck
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 635 Barnhill Drive, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Scott T. Aoki
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, 635 Barnhill Drive, Indianapolis, IN 46202, USA
- Melvin and Bren Simon Cancer Center, 535 Barnhill Drive, Indianapolis, IN 46202, USA
| |
Collapse
|
4
|
Gadush MV, Sautto GA, Chandrasekaran H, Bensussan A, Ross TM, Ippolito GC, Person MD. Template-Assisted De Novo Sequencing of SARS-CoV-2 and Influenza Monoclonal Antibodies by Mass Spectrometry. J Proteome Res 2022; 21:1616-1627. [PMID: 35653804 DOI: 10.1021/acs.jproteome.1c00913] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In this study, we used multiple enzyme digestions, coupled with higher-energy collisional dissociation (HCD) and electron-transfer/higher-energy collision dissociation (EThcD) fragmentation to develop a mass-spectrometric (MS) method for determining the complete protein sequence of monoclonal antibodies (mAbs). The method was refined on an mAb of a known sequence, a SARS-CoV-1 antireceptor binding domain (RBD) spike monoclonal antibody. The data were searched using Supernovo to generate a complete template-assisted de novo sequence for this and two SARS-CoV-2 mAbs of known sequences resulting in correct sequences for the variable regions and correct distinction of Ile and Leu residues. We then used the method on a set of 25 antihemagglutinin (HA) influenza antibodies of unknown sequences and determined high confidence sequences for >99% of the complementarity determining regions (CDRs). The heavy-chain and light-chain genes were cloned and transfected into cells for recombinant expression followed by affinity purification. The recombinant mAbs displayed binding curves matching the original mAbs with specificity to the HA influenza antigen. Our findings indicate that this methodology results in almost complete antibody sequence coverage with high confidence results for CDR regions on diverse mAb sequences.
Collapse
Affiliation(s)
- Michelle V Gadush
- Center for Biomedical Research Support, Biological Mass Spectrometry Facility, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Giuseppe A Sautto
- Center for Vaccines and Immunology, University of Georgia, Athens, Georgia 30602, United States
| | - Hamssika Chandrasekaran
- Center for Biomedical Research Support, Biological Mass Spectrometry Facility, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Alena Bensussan
- Department of Chemistry, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Ted M Ross
- Center for Vaccines and Immunology, University of Georgia, Athens, Georgia 30602, United States.,Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, Georgia 30602, United States
| | - Gregory C Ippolito
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas 78712, United States
| | - Maria D Person
- Center for Biomedical Research Support, Biological Mass Spectrometry Facility, The University of Texas at Austin, Austin, Texas 78712, United States
| |
Collapse
|
5
|
Aggarwal S, Raj A, Kumar D, Dash D, Yadav AK. False discovery rate: the Achilles' heel of proteogenomics. Brief Bioinform 2022; 23:6582880. [PMID: 35534181 DOI: 10.1093/bib/bbac163] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 03/14/2022] [Accepted: 04/12/2022] [Indexed: 12/25/2022] Open
Abstract
Proteogenomics refers to the integrated analysis of the genome and proteome that leverages mass-spectrometry (MS)-based proteomics data to improve genome annotations, understand gene expression control through proteoforms and find sequence variants to develop novel insights for disease classification and therapeutic strategies. However, proteogenomic studies often suffer from reduced sensitivity and specificity due to inflated database size. To control the error rates, proteogenomics depends on the target-decoy search strategy, the de-facto method for false discovery rate (FDR) estimation in proteomics. The proteogenomic databases constructed from three- or six-frame nucleotide database translation not only increase the search space and compute-time but also violate the equivalence of target and decoy databases. These searches result in poorer separation between target and decoy scores, leading to stringent FDR thresholds. Understanding these factors and applying modified strategies such as two-pass database search or peptide-class-specific FDR can result in a better interpretation of MS data without introducing additional statistical biases. Based on these considerations, a user can interpret the proteogenomics results appropriately and control false positives and negatives in a more informed manner. In this review, first, we briefly discuss the proteogenomic workflows and limitations in database construction, followed by various considerations that can influence potential novel discoveries in a proteogenomic study. We conclude with suggestions to counter these challenges for better proteogenomic data interpretation.
Collapse
Affiliation(s)
- Suruchi Aggarwal
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd milestone, PO Box No. 04, Faridabad-Gurgaon Expressway, Faridabad-121001, Haryana, India
| | - Anurag Raj
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics & Integrative Biology, South Campus, Mathura Road, New Delhi 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad-201002, India
| | - Dhirendra Kumar
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics & Integrative Biology, South Campus, Mathura Road, New Delhi 110025, India
| | - Debasis Dash
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics & Integrative Biology, South Campus, Mathura Road, New Delhi 110025, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad-201002, India
| | - Amit Kumar Yadav
- Translational Health Science and Technology Institute, NCR Biotech Science Cluster, 3rd milestone, PO Box No. 04, Faridabad-Gurgaon Expressway, Faridabad-121001, Haryana, India
| |
Collapse
|
6
|
Hong JM, Gibbons M, Bashir A, Wu D, Shao S, Cutts Z, Chavarha M, Chen Y, Schiff L, Foster M, Church VA, Ching L, Ahadi S, Hieu-Thao Le A, Tran A, Dimon M, Coram M, Williams B, Jess P, Berndl M, Pawlosky A. ProtSeq: Toward high-throughput, single-molecule protein sequencing via amino acid conversion into DNA barcodes. iScience 2022; 25:103586. [PMID: 35005536 PMCID: PMC8717419 DOI: 10.1016/j.isci.2021.103586] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 10/06/2021] [Accepted: 12/07/2021] [Indexed: 12/13/2022] Open
Abstract
We demonstrate early progress toward constructing a high-throughput, single-molecule protein sequencing technology utilizing barcoded DNA aptamers (binders) to recognize terminal amino acids of peptides (targets) tethered on a next-generation sequencing chip. DNA binders deposit unique, amino acid-identifying barcodes on the chip. The end goal is that, over multiple binding cycles, a sequential chain of DNA barcodes will identify the amino acid sequence of a peptide. Toward this, we demonstrate successful target identification with two sets of target-binder pairs: DNA-DNA and Peptide-Protein. For DNA-DNA binding, we show assembly and sequencing of DNA barcodes over six consecutive binding cycles. Intriguingly, our computational simulation predicts that a small set of semi-selective DNA binders offers significant coverage of the human proteome. Toward this end, we introduce a binder discovery pipeline that ultimately could merge with the chip assay into a technology called ProtSeq, for future high-throughput, single-molecule protein sequencing. Designed ProtSeq protein sequencing method compatible with widely used NGS technology Built Target-Switch SELEX to isolate aptamers specific to N-terminal amino acids (AAs) Showed binding, ligation, cleavage, and NGS of six DNA binders in ordered barcode chain Developed pipeline to deconvolve AAs from DNA barcodes to identify putative proteins
Collapse
Affiliation(s)
| | | | - Ali Bashir
- Google, LLC, Mountain View, CA 94043, USA
| | - Diana Wu
- Google, LLC, Mountain View, CA 94043, USA
| | | | | | | | - Ye Chen
- Google, LLC, Mountain View, CA 94043, USA
| | | | | | | | | | - Sara Ahadi
- Google, LLC, Mountain View, CA 94043, USA
| | | | | | | | - Marc Coram
- Google, LLC, Mountain View, CA 94043, USA
| | | | | | | | | |
Collapse
|
7
|
de Graaf SC, Hoek M, Tamara S, Heck AJR. A perspective toward mass spectrometry-based de novo sequencing of endogenous antibodies. MAbs 2022; 14:2079449. [PMID: 35699511 PMCID: PMC9225641 DOI: 10.1080/19420862.2022.2079449] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
A key step in therapeutic and endogenous humoral antibody characterization is identifying the amino acid sequence. So far, this task has been mainly tackled through sequencing of B-cell receptor (BCR) repertoires at the nucleotide level. Mass spectrometry (MS) has emerged as an alternative tool for obtaining sequence information directly at the – most relevant – protein level. Although several MS methods are now well established, analysis of recombinant and endogenous antibodies comes with a specific set of challenges, requiring approaches beyond the conventional proteomics workflows. Here, we review the challenges in MS-based sequencing of both recombinant as well as endogenous humoral antibodies and outline state-of-the-art methods attempting to overcome these obstacles. We highlight recent examples and discuss remaining challenges. We foresee a great future for these approaches making de novo antibody sequencing and discovery by MS-based techniques feasible, even for complex clinical samples from endogenous sources such as serum and other liquid biopsies.
Collapse
Affiliation(s)
- Sebastiaan C de Graaf
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| | - Max Hoek
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| | - Sem Tamara
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| | - Albert J R Heck
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| |
Collapse
|
8
|
Safonova Y, Pevzner PA. De novo Inference of Diversity Genes and Analysis of Non-canonical V(DD)J Recombination in Immunoglobulins. Front Immunol 2019; 10:987. [PMID: 31134072 PMCID: PMC6516046 DOI: 10.3389/fimmu.2019.00987] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 04/16/2019] [Indexed: 12/03/2022] Open
Abstract
The V(D)J recombination forms the immunoglobulin genes by joining the variable (V), diversity (D), and joining (J) germline genes. Since variations in germline genes have been linked to various diseases, personalized immunogenomics aims at finding alleles of germline genes across various patients. Although recent studies described algorithms for de novo inference of V and J genes from immunosequencing data, they stopped short of solving a more difficult problem of reconstructing D genes that form the highly divergent CDR3 regions and provide the most important contribution to the antigen binding. We present the IgScout algorithm for de novo D gene reconstruction and apply it to reveal new alleles of human D genes and previously unknown D genes in camel, an important model organism in immunology. We further analyze non-canonical V(DD)J recombination that results in unusually long CDR3s with tandem fused IGHD genes and thus expands the diversity of the antibody repertoires. We demonstrate that tandem CDR3s represent a consistent and functional feature of all analyzed immunosequencing datasets, reveal ultra-long CDR3s, and shed light on the mechanism responsible for their formation.
Collapse
Affiliation(s)
- Yana Safonova
- Center for Information Theory and Applications, University of California, San Diego, San Diego, CA, United States
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, San Diego, CA, United States
| |
Collapse
|
9
|
Menschaert G, David F. Proteogenomics from a bioinformatics angle: A growing field. MASS SPECTROMETRY REVIEWS 2017; 36:584-599. [PMID: 26670565 PMCID: PMC6101030 DOI: 10.1002/mas.21483] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Accepted: 09/01/2015] [Indexed: 05/16/2023]
Abstract
Proteogenomics is a research area that combines areas as proteomics and genomics in a multi-omics setup using both mass spectrometry and high-throughput sequencing technologies. Currently, the main goals of the field are to aid genome annotation or to unravel the proteome complexity. Mass spectrometry based identifications of matching or homologues peptides can further refine gene models. Also, the identification of novel proteoforms is also made possible based on detection of novel translation initiation sites (cognate or near-cognate), novel transcript isoforms, sequence variation or novel (small) open reading frames in intergenic or un-translated genic regions by analyzing high-throughput sequencing data from RNAseq or ribosome profiling experiments. Other proteogenomics studies using a combination of proteomics and genomics techniques focus on antibody sequencing, the identification of immunogenic peptides or venom peptides. Over the years, a growing amount of bioinformatics tools and databases became available to help streamlining these cross-omics studies. Some of these solutions only help in specific steps of the proteogenomics studies, e.g. building custom sequence databases (based on next generation sequencing output) for mass spectrometry fragmentation spectrum matching. Over the last few years a handful integrative tools also became available that can execute complete proteogenomics analyses. Some of these are presented as stand-alone solutions, whereas others are implemented in a web-based framework such as Galaxy. In this review we aimed at sketching a comprehensive overview of all the bioinformatics solutions that are available for this growing research area. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:584-599, 2017.
Collapse
Affiliation(s)
- Gerben Menschaert
- Lab of Bioinformatics and Computational Genomics, Department of
Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience
Engineering, Ghent University, Ghent, Belgium
- To whom correspondence should be addressed. Tel:
+32 9 264 99 22; Fax: +32 9 264 6220;
| | - Fenyö David
- Center for Health Informatics and Bioinformatics and Department of
Biochemistry and Molecular Pharmacology, New York University School of Medicine, New
York, New York, USA
| |
Collapse
|
10
|
Hernandez-Valladares M, Vaudel M, Selheim F, Berven F, Bruserud Ø. Proteogenomics approaches for studying cancer biology and their potential in the identification of acute myeloid leukemia biomarkers. Expert Rev Proteomics 2017; 14:649-663. [DOI: 10.1080/14789450.2017.1352474] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Maria Hernandez-Valladares
- Department of Clinical Science, Faculty of Medicine and Dentistry, University of Bergen, Bergen, Norway
- Proteomics Unit, Department of Biomedicine, Faculty of Medicine and Dentistry, University of Bergen, Bergen, Norway
| | - Marc Vaudel
- KG Jebsen Center for Diabetes Research, Department of Clinical Science, Faculty of Medicine and Dentistry, University of Bergen, Bergen, Norway
- Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway
| | - Frode Selheim
- Proteomics Unit, Department of Biomedicine, Faculty of Medicine and Dentistry, University of Bergen, Bergen, Norway
| | - Frode Berven
- Proteomics Unit, Department of Biomedicine, Faculty of Medicine and Dentistry, University of Bergen, Bergen, Norway
| | - Øystein Bruserud
- Department of Clinical Science, Faculty of Medicine and Dentistry, University of Bergen, Bergen, Norway
| |
Collapse
|
11
|
Sen KI, Tang WH, Nayak S, Kil YJ, Bern M, Ozoglu B, Ueberheide B, Davis D, Becker C. Automated Antibody De Novo Sequencing and Its Utility in Biopharmaceutical Discovery. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2017; 28:803-810. [PMID: 28105549 PMCID: PMC5392168 DOI: 10.1007/s13361-016-1580-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 12/02/2016] [Accepted: 12/04/2016] [Indexed: 05/12/2023]
Abstract
Applications of antibody de novo sequencing in the biopharmaceutical industry range from the discovery of new antibody drug candidates to identifying reagents for research and determining the primary structure of innovator products for biosimilar development. When murine, phage display, or patient-derived monoclonal antibodies against a target of interest are available, but the cDNA or the original cell line is not, de novo protein sequencing is required to humanize and recombinantly express these antibodies, followed by in vitro and in vivo testing for functional validation. Availability of fully automated software tools for monoclonal antibody de novo sequencing enables efficient and routine analysis. Here, we present a novel method to automatically de novo sequence antibodies using mass spectrometry and the Supernovo software. The robustness of the algorithm is demonstrated through a series of stress tests. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- K Ilker Sen
- Protein Metrics Inc, 1622 San Carlos Ave, Suite C, San Carlos, CA, 94070, USA.
| | - Wilfred H Tang
- Protein Metrics Inc, 1622 San Carlos Ave, Suite C, San Carlos, CA, 94070, USA
| | - Shruti Nayak
- Langone Medical Center, New York University, 430 East 29th street, 8th floor room 860, New York, NY, 10016, USA
| | - Yong J Kil
- Protein Metrics Inc, 1622 San Carlos Ave, Suite C, San Carlos, CA, 94070, USA
| | - Marshall Bern
- Protein Metrics Inc, 1622 San Carlos Ave, Suite C, San Carlos, CA, 94070, USA
| | - Berk Ozoglu
- Janssen Research and Development, LLC, 1400 McKean Road, Spring House, PA, 19477, USA
| | - Beatrix Ueberheide
- Langone Medical Center, New York University, 430 East 29th street, 8th floor room 860, New York, NY, 10016, USA
| | - Darryl Davis
- Janssen Research and Development, LLC, 1400 McKean Road, Spring House, PA, 19477, USA
| | - Christopher Becker
- Protein Metrics Inc, 1622 San Carlos Ave, Suite C, San Carlos, CA, 94070, USA
| |
Collapse
|
12
|
|
13
|
Fu S, Liu X, Luo M, Xie K, Nice EC, Zhang H, Huang C. Proteogenomic studies on cancer drug resistance: towards biomarker discovery and target identification. Expert Rev Proteomics 2017; 14:351-362. [PMID: 28276747 DOI: 10.1080/14789450.2017.1299006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
INTRODUCTION Chemoresistance is a major obstacle for current cancer treatment. Proteogenomics is a powerful multi-omics research field that uses customized protein sequence databases generated by genomic and transcriptomic information to identify novel genes (e.g. noncoding, mutation and fusion genes) from mass spectrometry-based proteomic data. By identifying aberrations that are differentially expressed between tumor and normal pairs, this approach can also be applied to validate protein variants in cancer, which may reveal the response to drug treatment. Areas covered: In this review, we will present recent advances in proteogenomic investigations of cancer drug resistance with an emphasis on integrative proteogenomic pipelines and the biomarker discovery which contributes to achieving the goal of using precision/personalized medicine for cancer treatment. Expert commentary: The discovery and comprehensive understanding of potential biomarkers help identify the cohort of patients who may benefit from particular treatments, and will assist real-time clinical decision-making to maximize therapeutic efficacy and minimize adverse effects. With the development of MS-based proteomics and NGS-based sequencing, a growing number of proteogenomic tools are being developed specifically to investigate cancer drug resistance.
Collapse
Affiliation(s)
- Shuyue Fu
- a State Key Laboratory of Biotherapy and Cancer Center , West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy , Chengdu , P.R. China
| | - Xiang Liu
- b Department of Pathology , Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital , Chengdu , P.R. China
| | - Maochao Luo
- c West China School of Public Health, Sichuan University , Chengdu , P.R.China
| | - Ke Xie
- d Department of Oncology , Sichuan Academy of Medical Sciences, Sichuan Provincial People's Hospital , Chengdu , P.R. China
| | - Edouard C Nice
- e Department of Biochemistry and Molecular Biology , Monash University , Clayton , Australia
| | - Haiyuan Zhang
- f School of Medicine , Yangtze University , P. R. China
| | - Canhua Huang
- a State Key Laboratory of Biotherapy and Cancer Center , West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy , Chengdu , P.R. China
| |
Collapse
|
14
|
Vyatkina K. De Novo Sequencing of Top-Down Tandem Mass Spectra: A Next Step towards Retrieving a Complete Protein Sequence. Proteomes 2017; 5:E6. [PMID: 28248257 PMCID: PMC5372227 DOI: 10.3390/proteomes5010006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Revised: 01/30/2017] [Accepted: 02/04/2017] [Indexed: 11/16/2022] Open
Abstract
De novo sequencing of tandem (MS/MS) mass spectra represents the only way to determine the sequence of proteins from organisms with unknown genomes, or the ones not directly inscribed in a genome-such as antibodies, or novel splice variants. Top-down mass spectrometry provides new opportunities for analyzing such proteins; however, retrieving a complete protein sequence from top-down MS/MS spectra still remains a distant goal. In this paper, we review the state-of-the-art on this subject, and enhance our previously developed Twister algorithm for de novo sequencing of peptides from top-down MS/MS spectra to derive longer sequence fragments of a target protein.
Collapse
Affiliation(s)
- Kira Vyatkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, 7-9 Universitetskaya nab., St. Petersburg 199034, Russia.
- Department of Mathematical and Information Technologies, Saint Petersburg Academic University, 8/3 Khlopina st., St. Petersburg 194021, Russia.
| |
Collapse
|
15
|
Guan X, Brownstein NC, Young NL, Marshall AG. Ultrahigh-resolution Fourier transform ion cyclotron resonance mass spectrometry and tandem mass spectrometry for peptide de novo amino acid sequencing for a seven-protein mixture by paired single-residue transposed Lys-N and Lys-C digestion. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2017; 31:207-217. [PMID: 27813191 DOI: 10.1002/rcm.7783] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Revised: 10/29/2016] [Accepted: 10/30/2016] [Indexed: 06/06/2023]
Abstract
RATIONALE Bottom-up tandem mass spectrometry (MS/MS) is regularly used in proteomics to identify proteins from a sequence database. De novo sequencing is also available for sequencing peptides with relatively short sequence lengths. We recently showed that paired Lys-C and Lys-N proteases produce peptides of identical mass and similar retention time, but different tandem mass spectra. Such parallel experiments provide complementary information, and allow for up to 100% MS/MS sequence coverage. METHODS Here, we report digestion by paired Lys-C and Lys-N proteases of a seven-protein mixture: human hemoglobin alpha, bovine carbonic anhydrase 2, horse skeletal muscle myoglobin, hen egg white lysozyme, bovine pancreatic ribonuclease, bovine rhodanese, and bovine serum albumin, followed by reversed-phase nanoflow liquid chromatography, collision-induced dissociation, and 14.5 T Fourier transform ion cyclotron resonance mass spectrometry. RESULTS Matched pairs of product peptide ions of equal precursor mass and similar retention times from each digestion are compared, leveraging single-residue transposed information with independent interferences to confidently identify fragment ion types, residues, and peptides. Selected pairs of product ion mass spectra for de novo sequenced protein segments from each member of the mixture are presented. CONCLUSIONS Pairs of the transposed product ions as well as complementary information from the parallel experiments allow for both high MS/MS coverage for long peptide sequences and high confidence in the amino acid identification. Moreover, the parallel experiments in the de novo sequencing reduce false-positive matches of product ions from the single-residue transposed peptides from the same segment, and thereby further improve the confidence in protein identification. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Xiaoyan Guan
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Florida State University, 1800 East Paul Dirac Drive, Tallahassee, FL, 32310, USA
| | - Naomi C Brownstein
- Department of Behavioral Sciences and Social Medicine, College of Medicine, Florida State University, 1115 W. Call St., Tallahassee, FL, 32306, USA
- Department of Statistics, Florida State University, 117 N. Woodward Ave., Tallahassee, FL, 32306, USA
| | - Nicolas L Young
- Verna & Marrs McLean Department of Biochemistry & Molecular Biology, Baylor College of Medicine, One Baylor Plaza, MS-125, Houston, TX, 77030-3411, USA
| | - Alan G Marshall
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Florida State University, 1800 East Paul Dirac Drive, Tallahassee, FL, 32310, USA
- Department of Chemistry and Biochemistry, Florida State University, 95 Chieftain Way, Tallahassee, FL, 32303, USA
| |
Collapse
|
16
|
Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016. [PMID: 27975219 DOI: 10.1007/978-3-319-41448-5_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register]
Abstract
Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.
Collapse
|
17
|
Rickert KW, Grinberg L, Woods RM, Wilson S, Bowen MA, Baca M. Combining phage display with de novo protein sequencing for reverse engineering of monoclonal antibodies. MAbs 2016; 8:501-12. [PMID: 26852694 DOI: 10.1080/19420862.2016.1145865] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
Abstract
The enormous diversity created by gene recombination and somatic hypermutation makes de novo protein sequencing of monoclonal antibodies a uniquely challenging problem. Modern mass spectrometry-based sequencing will rarely, if ever, provide a single unambiguous sequence for the variable domains. A more likely outcome is computation of an ensemble of highly similar sequences that can satisfy the experimental data. This outcome can result in the need for empirical testing of many candidate sequences, sometimes iteratively, to identity one which can replicate the activity of the parental antibody. Here we describe an improved approach to antibody protein sequencing by using phage display technology to generate a combinatorial library of sequences that satisfy the mass spectrometry data, and selecting for functional candidates that bind antigen. This approach was used to reverse engineer 2 commercially-obtained monoclonal antibodies against murine CD137. Proteomic data enabled us to assign the majority of the variable domain sequences, with the exception of 3-5% of the sequence located within or adjacent to complementarity-determining regions. To efficiently resolve the sequence in these regions, small phage-displayed libraries were generated and subjected to antigen binding selection. Following enrichment of antigen-binding clones, 2 clones were selected for each antibody and recombinantly expressed as antigen-binding fragments (Fabs). In both cases, the reverse-engineered Fabs exhibited identical antigen binding affinity, within error, as Fabs produced from the commercial IgGs. This combination of proteomic and protein engineering techniques provides a useful approach to simplifying the technically challenging process of reverse engineering monoclonal antibodies from protein material.
Collapse
Affiliation(s)
- Keith W Rickert
- a Department of Antibody Discovery and Protein Engineering , MedImmune, LLC , Gaithersburg , MD , USA
| | - Luba Grinberg
- a Department of Antibody Discovery and Protein Engineering , MedImmune, LLC , Gaithersburg , MD , USA
| | - Robert M Woods
- a Department of Antibody Discovery and Protein Engineering , MedImmune, LLC , Gaithersburg , MD , USA
| | - Susan Wilson
- a Department of Antibody Discovery and Protein Engineering , MedImmune, LLC , Gaithersburg , MD , USA
| | - Michael A Bowen
- a Department of Antibody Discovery and Protein Engineering , MedImmune, LLC , Gaithersburg , MD , USA
| | - Manuel Baca
- a Department of Antibody Discovery and Protein Engineering , MedImmune, LLC , Gaithersburg , MD , USA
| |
Collapse
|
18
|
Guthals A, Gan Y, Murray L, Chen Y, Stinson J, Nakamura G, Lill JR, Sandoval W, Bandeira N. De Novo MS/MS Sequencing of Native Human Antibodies. J Proteome Res 2016; 16:45-54. [PMID: 27779884 DOI: 10.1021/acs.jproteome.6b00608] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
One direct route for the discovery of therapeutic human monoclonal antibodies (mAbs) involves the isolation of peripheral B cells from survivors/sero-positive individuals after exposure to an infectious reagent or disease etiology, followed by single-cell sequencing or hybridoma generation. Peripheral B cells, however, are not always easy to obtain and represent only a small percentage of the total B-cell population across all bodily tissues. Although it has been demonstrated that tandem mass spectrometry (MS/MS) techniques can interrogate the full polyclonal antibody (pAb) response to an antigen in vivo, all current approaches identify MS/MS spectra against databases derived from genetic sequencing of B cells from the same patient. In this proof-of-concept study, we demonstrate the feasibility of a novel MS/MS antibody discovery approach in which only serum antibodies are required without the need for sequencing of genetic material. Peripheral pAbs from a cytomegalovirus-exposed individual were purified by glycoprotein B antigen affinity and de novo sequenced from MS/MS data. Purely MS-derived mAbs were then manufactured in mammalian cells to validate potency via antigen-binding ELISA. Interestingly, we found that these mAbs accounted for 1 to 2% of total donor IgG but were not detected in parallel sequencing of memory B cells from the same patient.
Collapse
Affiliation(s)
- Adrian Guthals
- Mapp Biopharmaceutical, Inc. , 6160 Lusk Boulevard #C105, San Diego, California 92121, United States
| | - Yutian Gan
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Laura Murray
- Department of Protein Chemistry, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Yongmei Chen
- Department of Antibody Engineering, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Jeremy Stinson
- Department of Molecular Biology, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Gerald Nakamura
- Department of Antibody Engineering, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Jennie R Lill
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Wendy Sandoval
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California, San Diego , 9500 Gilman Drive, Mail Code 0404, La Jolla, California 92093, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego , 9500 Gilman Drive, Mail Code 0657, La Jolla, California 92093, United States
| |
Collapse
|
19
|
Xiao Y, Vecchi MM, Wen D. Distinguishing between Leucine and Isoleucine by Integrated LC–MS Analysis Using an Orbitrap Fusion Mass Spectrometer. Anal Chem 2016; 88:10757-10766. [DOI: 10.1021/acs.analchem.6b03409] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Yongsheng Xiao
- Analytical
Biochemistry,
Department of Cell and Protein Sciences, Biogen, 250 Binney Street, Cambridge, Massachusetts 02142, United States
| | - Malgorzata M. Vecchi
- Analytical
Biochemistry,
Department of Cell and Protein Sciences, Biogen, 250 Binney Street, Cambridge, Massachusetts 02142, United States
| | - Dingyi Wen
- Analytical
Biochemistry,
Department of Cell and Protein Sciences, Biogen, 250 Binney Street, Cambridge, Massachusetts 02142, United States
| |
Collapse
|
20
|
Vyatkina K, Wu S, Dekker LJM, VanDuijn MM, Liu X, Tolić N, Luider TM, Paša-Tolić L, Pevzner PA. Top-down analysis of protein samples by de novo sequencing techniques. Bioinformatics 2016; 32:2753-9. [PMID: 27187201 PMCID: PMC6280873 DOI: 10.1093/bioinformatics/btw307] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Revised: 03/31/2016] [Accepted: 05/09/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Recent technological advances have made high-resolution mass spectrometers affordable to many laboratories, thus boosting rapid development of top-down mass spectrometry, and implying a need in efficient methods for analyzing this kind of data. RESULTS We describe a method for analysis of protein samples from top-down tandem mass spectrometry data, which capitalizes on de novo sequencing of fragments of the proteins present in the sample. Our algorithm takes as input a set of de novo amino acid strings derived from the given mass spectra using the recently proposed Twister approach, and combines them into aggregated strings endowed with offsets. The former typically constitute accurate sequence fragments of sufficiently well-represented proteins from the sample being analyzed, while the latter indicate their location in the protein sequence, and also bear information on post-translational modifications and fragmentation patterns. AVAILABILITY AND IMPLEMENTATION Freely available on the web at http://bioinf.spbau.ru/en/twister CONTACT vyatkina@spbau.ru or ppevzner@ucsd.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kira Vyatkina
- Algorithmic Biology Laboratory, Saint Petersburg Academic University, St Petersburg, Russia Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, St Petersburg, Russia
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, USA
| | - Lennard J M Dekker
- Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Martijn M VanDuijn
- Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Nikola Tolić
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Theo M Luider
- Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Ljiljana Paša-Tolić
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Pavel A Pevzner
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, St Petersburg, Russia Department of Computer Science and Engineering, University of California, San Diego, CA, USA
| |
Collapse
|
21
|
Complete De Novo Assembly of Monoclonal Antibody Sequences. Sci Rep 2016; 6:31730. [PMID: 27562653 PMCID: PMC4999880 DOI: 10.1038/srep31730] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 07/20/2016] [Indexed: 11/25/2022] Open
Abstract
De novo protein sequencing is one of the key problems in mass spectrometry-based proteomics, especially for novel proteins such as monoclonal antibodies for which genome information is often limited or not available. However, due to limitations in peptides fragmentation and coverage, as well as ambiguities in spectra interpretation, complete de novo assembly of unknown protein sequences still remains challenging. To address this problem, we propose an integrated system, ALPS, which for the first time can automatically assemble full-length monoclonal antibody sequences. Our system integrates de novo sequencing peptides, their quality scores and error-correction information from databases into a weighted de Bruijn graph to assemble protein sequences. We evaluated ALPS performance on two antibody data sets, each including a heavy chain and a light chain. The results show that ALPS was able to assemble three complete monoclonal antibody sequences of length 216–441 AA, at 100% coverage, and 96.64–100% accuracy.
Collapse
|
22
|
Sheynkman GM, Shortreed MR, Cesnik AJ, Smith LM. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2016; 9:521-45. [PMID: 27049631 PMCID: PMC4991544 DOI: 10.1146/annurev-anchem-071015-041722] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.
Collapse
Affiliation(s)
- Gloria M Sheynkman
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215;
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Michael R Shortreed
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Anthony J Cesnik
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
| | - Lloyd M Smith
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; ,
- Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin 53706;
| |
Collapse
|
23
|
Vyatkina K, Wu S, Dekker LJM, VanDuijn MM, Liu X, Tolić N, Dvorkin M, Alexandrova S, Luider TM, Paša-Tolić L, Pevzner PA. De Novo Sequencing of Peptides from Top-Down Tandem Mass Spectra. J Proteome Res 2015; 14:4450-62. [DOI: 10.1021/pr501244v] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Kira Vyatkina
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, Saint Petersburg 194021, Russia
- Center
for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, 7-9 Universitetskaya nab., Saint Petersburg 199034, Russia
| | - Si Wu
- Department
of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson
Pkwy, Norman, Oklahoma 73019, United States
| | - Lennard J. M. Dekker
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040,
3000 CA Rotterdam, The Netherlands
| | - Martijn M. VanDuijn
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040,
3000 CA Rotterdam, The Netherlands
| | - Xiaowen Liu
- Department
of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 535 West Michigan Street, IT 475, Indianapolis, Indiana 46202, United States
- Center
for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Suite 5000, Indianapolis, Indiana 46202, United States
| | - Nikola Tolić
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Mikhail Dvorkin
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, Saint Petersburg 194021, Russia
| | - Sonya Alexandrova
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, Saint Petersburg 194021, Russia
| | - Theo M. Luider
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040,
3000 CA Rotterdam, The Netherlands
| | - Ljiljana Paša-Tolić
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Pavel A. Pevzner
- Center
for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, 7-9 Universitetskaya nab., Saint Petersburg 199034, Russia
- Department
of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, United States
| |
Collapse
|
24
|
Dasari S, Theis JD, Vrana JA, Meureta OM, Quint PS, Muppa P, Zenka RM, Tschumper RC, Jelinek DF, Davila JI, Sarangi V, Kurtin PJ, Dogan A. Proteomic detection of immunoglobulin light chain variable region peptides from amyloidosis patient biopsies. J Proteome Res 2015; 14:1957-67. [PMID: 25734799 DOI: 10.1021/acs.jproteome.5b00015] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Immunoglobulin light chain (LC) amyloidosis (AL) is caused by deposition of clonal LCs produced by an underlying plasma cell neoplasm. The clonotypic LC sequences are unique to each patient, and they cannot be reliably detected by either immunoassays or standard proteomic workflows that target the constant regions of LCs. We addressed this issue by developing a novel sequence template-based workflow to detect LC variable (LCV) region peptides directly from AL amyloid deposits. The workflow was implemented in a CAP/CLIA compliant clinical laboratory dedicated to proteomic subtyping of amyloid deposits extracted from either formalin-fixed paraffin-embedded tissues or subcutaneous fat aspirates. We evaluated the performance of the workflow on a validation cohort of 30 AL patients, whose amyloidogenic clone was identified using a novel proteogenomics method, and 30 controls. The recall and negative predictive values of the workflow, when identifying the gene family of the AL clone, were 93 and 98%, respectively. Application of the workflow on a clinical cohort of 500 AL amyloidosis samples highlighted a bias in the LCV gene families used by the AL clones. We also detected similarity between AL clones deposited in multiple organs of systemic AL patients. In summary, AL proteomic data sets are rich in LCV region peptides of potential clinical significance that are recoverable with advanced bioinformatics.
Collapse
Affiliation(s)
- Surendra Dasari
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| | - Jason D Theis
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| | - Julie A Vrana
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| | - Oana M Meureta
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| | - Patrick S Quint
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| | - Prasuna Muppa
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| | - Roman M Zenka
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| | - Renee C Tschumper
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| | - Diane F Jelinek
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| | - Jaime I Davila
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| | - Vivekananda Sarangi
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| | - Paul J Kurtin
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| | - Ahmet Dogan
- †Department of Health Sciences Research, ‡Department of Laboratory Medicine and Pathology, §Information Technology Administration, and ∥Department of Immunology and Division of Hematology, Mayo Clinic, 200 First Street SW, Rochester, Minnesota 55905, United States
| |
Collapse
|
25
|
Chapman B, Bellgard M. High-throughput parallel proteogenomics: A bacterial case study. Proteomics 2014; 14:2780-9. [DOI: 10.1002/pmic.201400185] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 10/11/2014] [Accepted: 10/22/2014] [Indexed: 11/08/2022]
Affiliation(s)
- Brett Chapman
- Centre for Comparative Genomics; Murdoch University; Western Australia Australia
| | - Matthew Bellgard
- Centre for Comparative Genomics; Murdoch University; Western Australia Australia
| |
Collapse
|
26
|
Ghali F, Krishna R, Perkins S, Collins A, Xia D, Wastling J, Jones AR. ProteoAnnotator - Open source proteogenomics annotation software supporting PSI standards. Proteomics 2014; 14:2731-41. [DOI: 10.1002/pmic.201400265] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Revised: 09/10/2014] [Accepted: 10/02/2014] [Indexed: 11/08/2022]
Affiliation(s)
- Fawaz Ghali
- Institute of Integrative Biology; University of Liverpool; Liverpool UK
| | - Ritesh Krishna
- Institute of Integrative Biology; University of Liverpool; Liverpool UK
| | - Simon Perkins
- Institute of Integrative Biology; University of Liverpool; Liverpool UK
| | - Andrew Collins
- Institute of Integrative Biology; University of Liverpool; Liverpool UK
| | - Dong Xia
- Department of Infection Biology; Institute of Infection and Global Health; University of Liverpool; Liverpool UK
| | - Jonathan Wastling
- Department of Infection Biology; Institute of Infection and Global Health; University of Liverpool; Liverpool UK
- Health Protection Research Unit in Emerging and Zoonotic Infections; The National Institute for Health Research; University of Liverpool; Liverpool UK
| | - Andrew R. Jones
- Institute of Integrative Biology; University of Liverpool; Liverpool UK
| |
Collapse
|
27
|
Murray D, Barnidge D. Characterization of immunoglobulin by mass spectrometry with applications for the clinical laboratory. Crit Rev Clin Lab Sci 2014; 50:91-102. [PMID: 24156651 DOI: 10.3109/10408363.2013.838206] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Studies monitoring immunoglobulin (Ig) antigen specificity have brought to light key Ig biomarkers for immunity, autoimmunity, cancer detection, and immune system function evaluation. A fundamentally new approach to the detection of Igs based on the primary structure of the Ig is beginning to emerge in the literature. This approach has only become feasible in light of advances in proteomics and rapid improvements in mass spectrometry (MS). Driven primarily by the development of Ig pharmaceuticals, Ig MS-based proteomic methods are revealing structural features which were previously unavailable with other characterization techniques. The task of adapting these techniques to clinical chemistry is in its infancy, but these methods have the potential to dramatically alter testing for Ig biomarkers. The purpose of this article is to review the advances that have been made in proteomic characterization of Igs by MS and the early attempts to apply these methods to clinical samples.
Collapse
Affiliation(s)
- David Murray
- Department of Laboratory Medicine and Pathology, Mayo Clinic , Rochester, MN , USA
| | | |
Collapse
|
28
|
Agrawal GK, Sarkar A, Righetti PG, Pedreschi R, Carpentier S, Wang T, Barkla BJ, Kohli A, Ndimba BK, Bykova NV, Rampitsch C, Zolla L, Rafudeen MS, Cramer R, Bindschedler LV, Tsakirpaloglou N, Ndimba RJ, Farrant JM, Renaut J, Job D, Kikuchi S, Rakwal R. A decade of plant proteomics and mass spectrometry: translation of technical advancements to food security and safety issues. MASS SPECTROMETRY REVIEWS 2013; 32:335-65. [PMID: 23315723 DOI: 10.1002/mas.21365] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2012] [Revised: 09/10/2012] [Accepted: 09/10/2012] [Indexed: 05/21/2023]
Abstract
Tremendous progress in plant proteomics driven by mass spectrometry (MS) techniques has been made since 2000 when few proteomics reports were published and plant proteomics was in its infancy. These achievements include the refinement of existing techniques and the search for new techniques to address food security, safety, and health issues. It is projected that in 2050, the world's population will reach 9-12 billion people demanding a food production increase of 34-70% (FAO, 2009) from today's food production. Provision of food in a sustainable and environmentally committed manner for such a demand without threatening natural resources, requires that agricultural production increases significantly and that postharvest handling and food manufacturing systems become more efficient requiring lower energy expenditure, a decrease in postharvest losses, less waste generation and food with longer shelf life. There is also a need to look for alternative protein sources to animal based (i.e., plant based) to be able to fulfill the increase in protein demands by 2050. Thus, plant biology has a critical role to play as a science capable of addressing such challenges. In this review, we discuss proteomics especially MS, as a platform, being utilized in plant biology research for the past 10 years having the potential to expedite the process of understanding plant biology for human benefits. The increasing application of proteomics technologies in food security, analysis, and safety is emphasized in this review. But, we are aware that no unique approach/technology is capable to address the global food issues. Proteomics-generated information/resources must be integrated and correlated with other omics-based approaches, information, and conventional programs to ensure sufficient food and resources for human development now and in the future.
Collapse
Affiliation(s)
- Ganesh Kumar Agrawal
- Research Laboratory for Biotechnology and Biochemistry, PO Box 13265, Kathmandu, Nepal.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Armengaud J, Hartmann EM, Bland C. Proteogenomics for environmental microbiology. Proteomics 2013; 13:2731-42. [PMID: 23636904 DOI: 10.1002/pmic.201200576] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Revised: 03/06/2013] [Accepted: 04/09/2013] [Indexed: 11/09/2022]
Abstract
Proteogenomics sensu stricto refers to the use of proteomic data to refine the annotation of genomes from model organisms. Because of the limitations of automatic annotation pipelines, a relatively high number of errors occur during the structural annotation of genes coding for proteins. Whether putative orphan sequences or short genes encoding low-molecular-weight proteins really exist is still frequently a mystery. Whether start codons are well defined is also an open debate. These problems are exacerbated for genomes of microorganisms belonging to poorly documented genera, as related sequences are not always available for homology-guided annotation. The functional annotation of a significant proportion of genes is also another well-known issue when annotating environmental microorganisms. High-throughput shotgun proteomics has recently greatly evolved, allowing the exploration of the proteome from any microorganism at an unprecedented depth. The structural and functional annotation process may be usefully complemented with experimental data. Indeed, proteogenomic mapping has been successfully performed for a wide variety of organisms. Specific approaches devoted to systematically establishing the N-termini of a large set of proteins are being developed. N-terminomics is giving rise to datasets of experimentally proven translational start codons as well as validated peptide signals for secreted proteins. By extension, combining genomic and proteomic data is becoming routine in many research projects. The proteomic analysis of organisms with unfinished genome sequences, the so-called composite proteomics, and the search for microbial biomarkers by bottom-up and top-down combined approaches are some examples of proteogenomic-flavored studies. They illustrate the advent of a new era of environmental microbiology where proteomics and genomics are intimately integrated to answer key biological questions.
Collapse
Affiliation(s)
- Jean Armengaud
- CEA, DSV, IBEB, Lab Biochim System Perturb, Bagnols-sur-Cèze, France
| | | | | |
Collapse
|
30
|
Abstract
Proteogenomic searching is a useful method for identifying novel proteins, annotating genes and detecting peptides unique to an individual genome. The approach, however, can be laborious, as it often requires search segmentation and the use of several unintegrated tools. Furthermore, many proteogenomic efforts have been limited to small genomes, as large genomes can prove impractical due to the required amount of computer memory and computation time. We present Peppy, a software tool designed to perform every necessary task of proteogenomic searches quickly, accurately and automatically. The software generates a peptide database from a genome, tracks peptide loci, matches peptides to MS/MS spectra and assigns confidence values to those matches. Peppy automatically performs a decoy database generation, search and analysis to return identifications at the desired false discovery rate threshold. Written in Java for cross-platform execution, the software is fully multithreaded for enhanced speed. The program can run on regular desktop computers, opening the doors of proteogenomic searching to a wider audience of proteomics and genomics researchers. Peppy is available at http://geneffects.com/peppy .
Collapse
Affiliation(s)
- Brian A Risk
- Department of Biochemistry & Biophysics, UNC School of Medicine, Chapel Hill, North Carolina 27599, United States.
| | | | | |
Collapse
|
31
|
Guthals A, Clauser KR, Frank AM, Bandeira N. Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides. J Proteome Res 2013; 12:2846-57. [PMID: 23679345 DOI: 10.1021/pr400173d] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Full-length de novo sequencing of unknown proteins remains a challenging open problem. Traditional methods that sequence spectra individually are limited by short peptide length, incomplete peptide fragmentation, and ambiguous de novo interpretations. We address these issues by determining consensus sequences for assembled tandem mass (MS/MS) spectra from overlapping peptides (e.g., by using multiple enzymatic digests). We have combined electron-transfer dissociation (ETD) with collision-induced dissociation (CID) and higher-energy collision-induced dissociation (HCD) fragmentation methods to boost interpretation of long, highly charged peptides and take advantage of corroborating b/y/c/z ions in CID/HCD/ETD. Using these strategies, we show that triplet CID/HCD/ETD MS/MS spectra from overlapping peptides yield de novo sequences of average length 70 AA and as long as 200 AA at up to 99% sequencing accuracy.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, United States
| | | | | | | |
Collapse
|
32
|
Sheynkman GM, Shortreed MR, Frey BL, Smith LM. Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq. Mol Cell Proteomics 2013; 12:2341-53. [PMID: 23629695 DOI: 10.1074/mcp.o113.028142] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Human proteomic databases required for MS peptide identification are frequently updated and carefully curated, yet are still incomplete because it has been challenging to acquire every protein sequence from the diverse assemblage of proteoforms expressed in every tissue and cell type. In particular, alternative splicing has been shown to be a major source of this cell-specific proteomic variation. Many new alternative splice forms have been detected at the transcript level using next generation sequencing methods, especially RNA-Seq, but it is not known how many of these transcripts are being translated. Leveraging the unprecedented capabilities of next generation sequencing methods, we collected RNA-Seq and proteomics data from the same cell population (Jurkat cells) and created a bioinformatics pipeline that builds customized databases for the discovery of novel splice-junction peptides. Eighty million paired-end Illumina reads and ∼500,000 tandem mass spectra were used to identify 12,873 transcripts (19,320 including isoforms) and 6810 proteins. We developed a bioinformatics workflow to retrieve high-confidence, novel splice junction sequences from the RNA data, translate these sequences into the analogous polypeptide sequence, and create a customized splice junction database for MS searching. Based on the RefSeq gene models, we detected 136,123 annotated and 144,818 unannotated transcript junctions. Of those, 24,834 unannotated junctions passed various quality filters (e.g. minimum read depth) and these entries were translated into 33,589 polypeptide sequences and used for database searching. We discovered 57 splice junction peptides not present in the Uniprot-Trembl proteomic database comprising an array of different splicing events, including skipped exons, alternative donors and acceptors, and noncanonical transcriptional start sites. To our knowledge this is the first example of using sample-specific RNA-Seq data to create a splice-junction database and discover new peptides resulting from alternative splicing.
Collapse
Affiliation(s)
- Gloria M Sheynkman
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Ave., Madison, Wisconsin 53706, USA
| | | | | | | |
Collapse
|
33
|
Guthals A, Watrous JD, Dorrestein PC, Bandeira N. The spectral networks paradigm in high throughput mass spectrometry. MOLECULAR BIOSYSTEMS 2013; 8:2535-44. [PMID: 22610447 DOI: 10.1039/c2mb25085c] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.
Collapse
Affiliation(s)
- Adrian Guthals
- Dept. Computer Science and Engineering, University of California, San Diego, USA
| | | | | | | |
Collapse
|
34
|
Guthals A, Clauser KR, Bandeira N. Shotgun protein sequencing with meta-contig assembly. Mol Cell Proteomics 2012; 11:1084-96. [PMID: 22798278 DOI: 10.1074/mcp.m111.015768] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA.
| | | | | |
Collapse
|
35
|
Mass spectrometry and animal science: Protein identification strategies and particularities of farm animal species. J Proteomics 2012; 75:4190-206. [DOI: 10.1016/j.jprot.2012.04.009] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Revised: 03/17/2012] [Accepted: 04/08/2012] [Indexed: 12/30/2022]
|
36
|
Translational plant proteomics: a perspective. J Proteomics 2012; 75:4588-601. [PMID: 22516432 DOI: 10.1016/j.jprot.2012.03.055] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2011] [Revised: 02/25/2012] [Accepted: 03/25/2012] [Indexed: 11/21/2022]
Abstract
Translational proteomics is an emerging sub-discipline of the proteomics field in the biological sciences. Translational plant proteomics aims to integrate knowledge from basic sciences to translate it into field applications to solve issues related but not limited to the recreational and economic values of plants, food security and safety, and energy sustainability. In this review, we highlight the substantial progress reached in plant proteomics during the past decade which has paved the way for translational plant proteomics. Increasing proteomics knowledge in plants is not limited to model and non-model plants, proteogenomics, crop improvement, and food analysis, safety, and nutrition but to many more potential applications. Given the wealth of information generated and to some extent applied, there is the need for more efficient and broader channels to freely disseminate the information to the scientific community. This article is part of a Special Issue entitled: Translational Proteomics.
Collapse
|
37
|
Agrawal GK, Rakwal R. Rice proteomics: A move toward expanded proteome coverage to comparative and functional proteomics uncovers the mysteries of rice and plant biology. Proteomics 2011; 11:1630-49. [DOI: 10.1002/pmic.201000696] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2010] [Revised: 01/05/2011] [Accepted: 01/24/2011] [Indexed: 12/13/2022]
|
38
|
Castellana NE, McCutcheon K, Pham VC, Harden K, Nguyen A, Young J, Adams C, Schroeder K, Arnott D, Bafna V, Grogan JL, Lill JR. Resurrection of a clinical antibody: template proteogenomic de novo proteomic sequencing and reverse engineering of an anti-lymphotoxin-α antibody. Proteomics 2011; 11:395-405. [PMID: 21268269 DOI: 10.1002/pmic.201000487] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2010] [Revised: 10/12/2010] [Accepted: 11/03/2010] [Indexed: 11/06/2022]
Abstract
A mouse hybridoma antibody directed against a member of the tumour necrosis factor (TNF)-superfamily, lymphotoxin-alpha (LT-α), was isolated from stored mouse ascites and purified to homogeneity. After more than a decade of storage the genetic material was not available for cloning; however, biochemical assays with the ascites showed this antibody against LT-α (LT-3F12) to be a preclinical candidate for the treatment of several inflammatory pathologies. We have successfully rescued the LT-3F12 antibody by performing MS analysis, primary amino acid sequence determination by template proteogenomics, and synthesis of the corresponding recombinant DNA by reverse engineering. The resurrected antibody was expressed, purified and shown to demonstrate the desired specificity and binding properties in a panel of immuno-biochemical tests. The work described herein demonstrates the powerful combination of high-throughput informatic proteomic de novo sequencing with reverse engineering to reestablish monoclonal antibody-expressing cells from archived protein sample, exemplifying the development of novel therapeutics from cryptic protein sources.
Collapse
Affiliation(s)
- Natalie E Castellana
- Department of Computer Science, University of California-San Diego, San Diego, CA, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Castellana N, Bafna V. Proteogenomics to discover the full coding content of genomes: a computational perspective. J Proteomics 2010; 73:2124-35. [PMID: 20620248 DOI: 10.1016/j.jprot.2010.06.007] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2010] [Revised: 06/04/2010] [Accepted: 06/21/2010] [Indexed: 11/16/2022]
Abstract
Proteogenomics has emerged as a field at the junction of genomics and proteomics. It is a loose collection of technologies that allow the search of tandem mass spectra against genomic databases to identify and characterize protein-coding genes. Proteogenomic peptides provide invaluable information for gene annotation, which is difficult or impossible to ascertain using standard annotation methods. Examples include confirmation of translation, reading-frame determination, identification of gene and exon boundaries, evidence for post-translational processing, identification of splice-forms including alternative splicing, and also, prediction of completely novel genes. For proteogenomics to deliver on its promise, however, it must overcome a number of technological hurdles, including speed and accuracy of peptide identification, construction and search of specialized databases, correction of sampling bias, and others. This article reviews the state of the art of the field, focusing on the current successes, and the role of computation in overcoming these challenges. We describe how technological and algorithmic advances have already enabled large-scale proteogenomic studies in many model organisms, including arabidopsis, yeast, fly, and human. We also provide a preview of the field going forward, describing early efforts in tackling the problems of complex gene structures, searching against genomes of related species, and immunoglobulin gene reconstruction.
Collapse
Affiliation(s)
- Natalie Castellana
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093-0404, USA
| | | |
Collapse
|