1
|
He MT, Li N, Wang JH, Wei ZZ, Feng J, Li WT, Sui JH, Huang N, Dong MQ. Do-It-Yourself De Novo Antibody Sequencing Workflow that Achieves Complete Accuracy of the Variable Regions. J Proteome Res 2025. [PMID: 40323442 DOI: 10.1021/acs.jproteome.5c00210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2025]
Abstract
Antibodies are widely used as research tools or therapeutic agents. Knowing the sequences of the variable regions of an antibody─both the heavy chain and the light chain─is a prerequisite for the production of recombinant antibodies. Mass spectrometry-based de novo sequencing is a frequently used, and sometimes the only approach to gaining this information. Here, we describe a workflow that enables accurate sequence determination of monoclonal antibodies based on mass spectrometry data and freely available software tools. This workflow, which we developed using a homemade anti-FLAG monoclonal antibody as a reference sample, achieved 100% accuracy of the variable regions with clear distinction between leucine (L) and isoleucine (I). Using this workflow, we successfully decoded a monoclonal anti-HA antibody, for which we had no prior knowledge of its sequence. Based on the de novo sequencing result, we generated a recombinant anti-HA antibody, and demonstrated that it has the same specificity, sensitivity, and affinity as the commercial antibody.
Collapse
Affiliation(s)
- Meng-Ting He
- College of Life Sciences, Beijing Normal University, 19 Xinjiekouwai Avenue, Beijing 100875, China
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| | - Ning Li
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| | | | - Zhi-Zhong Wei
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| | - Jie Feng
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| | - Wen-Ting Li
- Bioinformatics Solutions Inc., Waterloo, ON N2L 6J2, Canada
| | - Jian-Hua Sui
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| | - Niu Huang
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 100084, China
| |
Collapse
|
2
|
Schulte D, Snijder J. A Handle on Mass Coincidence Errors in De Novo Sequencing of Antibodies by Bottom-up Proteomics. J Proteome Res 2024; 23:3552-3559. [PMID: 38932690 PMCID: PMC11301774 DOI: 10.1021/acs.jproteome.4c00188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 05/29/2024] [Accepted: 06/20/2024] [Indexed: 06/28/2024]
Abstract
Antibody sequences can be determined at 99% accuracy directly from the polypeptide product by using bottom-up proteomics techniques. Sequencing accuracy at the peptide level is limited by the isobaric residues leucine and isoleucine, incomplete fragmentation spectra in which the order of two or more residues remains ambiguous due to lacking fragment ions for the intermediate positions, and isobaric combinations of amino acids, of potentially different lengths, for example, GG = N and GA = Q. Here, we present several updates to Stitch (v1.5), which performs template-based assembly of de novo peptides to reconstruct antibody sequences. This version introduces a mass-based alignment algorithm that explicitly accounts for mass coincidence errors. In addition, it incorporates a postprocessing procedure to assign I/L residues based on secondary fragments (satellite ions, i.e., w-ions). Moreover, evidence for sequence assignments can now be directly evaluated with the addition of an integrated spectrum viewer. Lastly, input data from a wider selection of de novo peptide sequencing algorithms are allowed, now including Casanovo, PEAKS, Novor.Cloud, pNovo, and MaxNovo, in addition to flat text and FASTA. Combined, these changes make Stitch compatible with a larger range of data processing pipelines and improve its tolerance to peptide-level sequencing errors.
Collapse
Affiliation(s)
- Douwe Schulte
- Biomolecular Mass Spectrometry
and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht
Institute of Pharmaceutical Sciences, Utrecht
University, Padualaan 8, Utrecht 3584
CH, The Netherlands
| | - Joost Snijder
- Biomolecular Mass Spectrometry
and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht
Institute of Pharmaceutical Sciences, Utrecht
University, Padualaan 8, Utrecht 3584
CH, The Netherlands
| |
Collapse
|
3
|
Sinitcyn P, Richards AL, Weatheritt RJ, Brademan DR, Marx H, Shishkova E, Meyer JG, Hebert AS, Westphall MS, Blencowe BJ, Cox J, Coon JJ. Global detection of human variants and isoforms by deep proteome sequencing. Nat Biotechnol 2023; 41:1776-1786. [PMID: 36959352 PMCID: PMC10713452 DOI: 10.1038/s41587-023-01714-x] [Citation(s) in RCA: 94] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 02/15/2023] [Indexed: 03/25/2023]
Abstract
An average shotgun proteomics experiment detects approximately 10,000 human proteins from a single sample. However, individual proteins are typically identified by peptide sequences representing a small fraction of their total amino acids. Hence, an average shotgun experiment fails to distinguish different protein variants and isoforms. Deeper proteome sequencing is therefore required for the global discovery of protein isoforms. Using six different human cell lines, six proteases, deep fractionation and three tandem mass spectrometry fragmentation methods, we identify a million unique peptides from 17,717 protein groups, with a median sequence coverage of approximately 80%. Direct comparison with RNA expression data provides evidence for the translation of most nonsynonymous variants. We have also hypothesized that undetected variants likely arise from mutation-induced protein instability. We further observe comparable detection rates for exon-exon junction peptides representing constitutive and alternative splicing events. Our dataset represents a resource for proteoform discovery and provides direct evidence that most frame-preserving alternatively spliced isoforms are translated.
Collapse
Affiliation(s)
- Pavel Sinitcyn
- Computational Systems Biochemistry Research Group, Max Planck Institute of Biochemistry, Martinsried, Germany
- Morgridge Institute for Research, Madison, WI, USA
| | - Alicia L Richards
- National Center for Quantitative Biology of Complex Systems, University of Wisconsin-Madison, Madison, WI, USA
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Robert J Weatheritt
- EMBL Australia and Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Dain R Brademan
- Morgridge Institute for Research, Madison, WI, USA
- Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Harald Marx
- National Center for Quantitative Biology of Complex Systems, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI, USA
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Evgenia Shishkova
- National Center for Quantitative Biology of Complex Systems, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Jesse G Meyer
- National Center for Quantitative Biology of Complex Systems, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Alexander S Hebert
- National Center for Quantitative Biology of Complex Systems, University of Wisconsin-Madison, Madison, WI, USA
| | - Michael S Westphall
- National Center for Quantitative Biology of Complex Systems, University of Wisconsin-Madison, Madison, WI, USA
- Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI, USA
| | - Benjamin J Blencowe
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Jürgen Cox
- Computational Systems Biochemistry Research Group, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | - Joshua J Coon
- Morgridge Institute for Research, Madison, WI, USA.
- National Center for Quantitative Biology of Complex Systems, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
4
|
Beslic D, Tscheuschner G, Renard BY, Weller MG, Muth T. Comprehensive evaluation of peptide de novo sequencing tools for monoclonal antibody assembly. Brief Bioinform 2023; 24:bbac542. [PMID: 36545804 PMCID: PMC9851299 DOI: 10.1093/bib/bbac542] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/25/2022] [Accepted: 11/10/2022] [Indexed: 12/24/2022] Open
Abstract
Monoclonal antibodies are biotechnologically produced proteins with various applications in research, therapeutics and diagnostics. Their ability to recognize and bind to specific molecule structures makes them essential research tools and therapeutic agents. Sequence information of antibodies is helpful for understanding antibody-antigen interactions and ensuring their affinity and specificity. De novo protein sequencing based on mass spectrometry is a valuable method to obtain the amino acid sequence of peptides and proteins without a priori knowledge. In this study, we evaluated six recently developed de novo peptide sequencing algorithms (Novor, pNovo 3, DeepNovo, SMSNet, PointNovo and Casanovo), which were not specifically designed for antibody data. We validated their ability to identify and assemble antibody sequences on three multi-enzymatic data sets. The deep learning-based tools Casanovo and PointNovo showed an increased peptide recall across different enzymes and data sets compared with spectrum-graph-based approaches. We evaluated different error types of de novo peptide sequencing tools and their performance for different numbers of missing cleavage sites, noisy spectra and peptides of various lengths. We achieved a sequence coverage of 97.69-99.53% on the light chains of three different antibody data sets using the de Bruijn assembler ALPS and the predictions from Casanovo. However, low sequence coverage and accuracy on the heavy chains demonstrate that complete de novo protein sequencing remains a challenging issue in proteomics that requires improved de novo error correction, alternative digestion strategies and hybrid approaches such as homology search to achieve high accuracy on long protein sequences.
Collapse
Affiliation(s)
- Denis Beslic
- Robert Koch Institute, MF1, Nordufer 20, 13353 Berlin
| | - Georg Tscheuschner
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| | - Bernhard Y Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Prof.-Dr.-Helmert-Straße 2-3, 14482 Potsdam
| | - Michael G Weller
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| | - Thilo Muth
- Federal Institute for Materials Research and Testing (BAM), Richard-Willstätter-Straße 11, 12489 Berlin
| |
Collapse
|
5
|
Abstract
Accurate full-length sequencing of a purified unknown protein is still challenging nowadays due to the error-prone mass-spectrometry (MS)-based methods. De novo identified peptide sequence largely contain errors, undermining the accuracy of assembly. Bias on the detectability of the peptides also makes low-coverage regions, resulting in gaps. Although recent advances on multi-enzyme hydrolysis and algorithms showed complete assembly of full-length protein sequences in a few examples, the robustness in practical application is still to be improved. Here, inspired by genome assembly strategies, we demonstrate a contig-scaffolding strategy to assemble protein sequences with high robustness and accuracy. This strategy integrates multiple unspecific hydrolysis methods to minimize the bias in the hydrolysis process. After de novo identification of the peptides, our assembly algorithm, named Multiple Contigs & Scaffolding (MuCS), assembles the peptide sequences in a multistep, i.e., contig-scaffold manner, with error correction in each step. MS data from different hydrolysis experiments complement each other for robust contig extension and error correction. We demonstrated that our strategy on three proteins and three replications all reached 100% coverage (except one with 98.85%) and 98.69-100% accuracy. It can also efficiently deal with the membrane protein, although the transmembrane region was missing due to the limitation of the MS. The three replicates reached 88.85-92.57% coverage and 97.57-100% accuracy. In sum, we provided a practical, robust, and accurate solution for full-length protein sequencing. The MuCS software is available at http://chi-biotech.com/mucs/.
Collapse
Affiliation(s)
- Zhi-Biao Mai
- Big Data Decision Institute, Jinan University, Guangzhou 510632, China.,Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Zhong-Hua Zhou
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| | - Qing-Yu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| |
Collapse
|
6
|
de Graaf SC, Hoek M, Tamara S, Heck AJR. A perspective toward mass spectrometry-based de novo sequencing of endogenous antibodies. MAbs 2022; 14:2079449. [PMID: 35699511 PMCID: PMC9225641 DOI: 10.1080/19420862.2022.2079449] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
A key step in therapeutic and endogenous humoral antibody characterization is identifying the amino acid sequence. So far, this task has been mainly tackled through sequencing of B-cell receptor (BCR) repertoires at the nucleotide level. Mass spectrometry (MS) has emerged as an alternative tool for obtaining sequence information directly at the – most relevant – protein level. Although several MS methods are now well established, analysis of recombinant and endogenous antibodies comes with a specific set of challenges, requiring approaches beyond the conventional proteomics workflows. Here, we review the challenges in MS-based sequencing of both recombinant as well as endogenous humoral antibodies and outline state-of-the-art methods attempting to overcome these obstacles. We highlight recent examples and discuss remaining challenges. We foresee a great future for these approaches making de novo antibody sequencing and discovery by MS-based techniques feasible, even for complex clinical samples from endogenous sources such as serum and other liquid biopsies.
Collapse
Affiliation(s)
- Sebastiaan C de Graaf
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| | - Max Hoek
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| | - Sem Tamara
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| | - Albert J R Heck
- Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, Netherlands.,Netherlands Proteomics Center, Utrecht, Netherlands
| |
Collapse
|
7
|
Shaw JB, Liu W, Vasil′ev YV, Bracken CC, Malhan N, Guthals A, Beckman JS, Voinov VG. Direct Determination of Antibody Chain Pairing by Top-down and Middle-down Mass Spectrometry Using Electron Capture Dissociation and Ultraviolet Photodissociation. Anal Chem 2020; 92:766-773. [PMID: 31769659 PMCID: PMC7819135 DOI: 10.1021/acs.analchem.9b03129] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
One challenge associated with the discovery and development of monoclonal antibody (mAb) therapeutics is the determination of heavy chain and light chain pairing. Advances in MS instrumentation and MS/MS methods have greatly enhanced capabilities for the analysis of large intact proteins yielding much more detailed and accurate proteoform characterization. Consequently, direct interrogation of intact antibodies or F(ab')2 and Fab fragments has the potential to significantly streamline therapeutic mAb discovery processes. Here, we demonstrate for the first time the ability to efficiently cleave disulfide bonds linking heavy and light chains of mAbs using electron capture dissociation (ECD) and 157 nm ultraviolet photodissociation (UVPD). The combination of intact mAb, Fab, or F(ab')2 mass, intact LC and Fd masses, and CDR3 sequence coverage enabled determination of heavy chain and light chain pairing from a single experiment and experimental condition. These results demonstrate the potential of top-down and middle-down proteomics to significantly streamline therapeutic antibody discovery.
Collapse
Affiliation(s)
- Jared B. Shaw
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, 3335 Innovation Boulevard, Richland, Washington 99354, United States
| | - Weijing Liu
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, 3335 Innovation Boulevard, Richland, Washington 99354, United States
| | - Yury V. Vasil′ev
- e-MSion Inc., 2121 NE Jack London Drive, Corvallis, Oregon 97330, United States
- Linus Pauling Institute and the Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon 97331, United States
| | - Carter C. Bracken
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, 3335 Innovation Boulevard, Richland, Washington 99354, United States
| | - Neha Malhan
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, 3335 Innovation Boulevard, Richland, Washington 99354, United States
| | - Adrian Guthals
- Mapp Biopharmaceutical Inc., 6160 Lusk Boulevard #105, San Diego, California 92121, United States
| | - Joseph S. Beckman
- e-MSion Inc., 2121 NE Jack London Drive, Corvallis, Oregon 97330, United States
- Linus Pauling Institute and the Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon 97331, United States
| | - Valery G. Voinov
- e-MSion Inc., 2121 NE Jack London Drive, Corvallis, Oregon 97330, United States
- Linus Pauling Institute and the Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon 97331, United States
| |
Collapse
|
8
|
Muth T, Hartkopf F, Vaudel M, Renard BY. A Potential Golden Age to Come-Current Tools, Recent Use Cases, and Future Avenues for De Novo Sequencing in Proteomics. Proteomics 2018; 18:e1700150. [PMID: 29968278 DOI: 10.1002/pmic.201700150] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/23/2018] [Indexed: 01/15/2023]
Abstract
In shotgun proteomics, peptide and protein identification is most commonly conducted using database search engines, the method of choice when reference protein sequences are available. Despite its widespread use the database-driven approach is limited, mainly because of its static search space. In contrast, de novo sequencing derives peptide sequence information in an unbiased manner, using only the fragment ion information from the tandem mass spectra. In recent years, with the improvements in MS instrumentation, various new methods have been proposed for de novo sequencing. This review article provides an overview of existing de novo sequencing algorithms and software tools ranging from peptide sequencing to sequence-to-protein mapping. Various use cases are described for which de novo sequencing was successfully applied. Finally, limitations of current methods are highlighted and new directions are discussed for a wider acceptance of de novo sequencing in the community.
Collapse
Affiliation(s)
- Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Felix Hartkopf
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Marc Vaudel
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020, Bergen, Norway.,Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, 5020, Bergen, Norway
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| |
Collapse
|
9
|
Trevisan-Silva D, Bednaski AV, Fischer JSG, Veiga SS, Bandeira N, Guthals A, Marchini FK, Leprevost FV, Barbosa VC, Senff-Ribeiro A, Carvalho PC. A multi-protease, multi-dissociation, bottom-up-to-top-down proteomic view of the Loxosceles intermedia venom. Sci Data 2017; 4:170090. [PMID: 28696408 PMCID: PMC5505115 DOI: 10.1038/sdata.2017.90] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 05/12/2017] [Indexed: 12/15/2022] Open
Abstract
Venoms are a rich source for the discovery of molecules with biotechnological applications, but their analysis is challenging even for state-of-the-art proteomics. Here we report on a large-scale proteomic assessment of the venom of Loxosceles intermedia, the so-called brown spider. Venom was extracted from 200 spiders and fractioned into two aliquots relative to a 10 kDa cutoff mass. Each of these was further fractioned and digested with trypsin (4 h), trypsin (18 h), pepsin (18 h), and chymotrypsin (18 h), then analyzed by MudPIT on an LTQ-Orbitrap XL ETD mass spectrometer fragmenting precursors by CID, HCD, and ETD. Aliquots of undigested samples were also analyzed. Our experimental design allowed us to apply spectral networks, thus enabling us to obtain meta-contig assemblies, and consequently de novo sequencing of practically complete proteins, culminating in a deep proteome assessment of the venom. Data are available via ProteomeXchange, with identifier PXD005523.
Collapse
Affiliation(s)
- Dilza Trevisan-Silva
- Department of Cell Biology, Federal University of Paraná, Curitiba 81531-980, Brazil
| | - Aline V Bednaski
- Department of Cell Biology, Federal University of Paraná, Curitiba 81531-980, Brazil
| | - Juliana S G Fischer
- Computational Mass Spectrometry &Proteomics Group, Carlos Chagas Institute, Fiocruz, Curitiba 81.350-010, Brazil
| | - Silvio S Veiga
- Department of Cell Biology, Federal University of Paraná, Curitiba 81531-980, Brazil
| | - Nuno Bandeira
- Center for Computational Mass Spectrometry, University of California, San Diego 92093-0404, USA
| | - Adrian Guthals
- Center for Computational Mass Spectrometry, University of California, San Diego 92093-0404, USA
| | - Fabricio K Marchini
- Functional Genomics Laboratory, Carlos Chagas Institute, Fiocruz, Curitiba 81.350-010, Brazil.,Mass Spectrometry Facility RPT02H, Carlos Chagas Institute, Fiocruz, Curitiba 81.350-010, Brazil
| | - Felipe V Leprevost
- Computational Mass Spectrometry &Proteomics Group, Carlos Chagas Institute, Fiocruz, Curitiba 81.350-010, Brazil
| | - Valmir C Barbosa
- Systems Engineering and Computer Science Program, COPPE, Federal University of Rio de Janeiro, Rio de Janeiro 21941-914, Brazil
| | - Andrea Senff-Ribeiro
- Department of Cell Biology, Federal University of Paraná, Curitiba 81531-980, Brazil
| | - Paulo C Carvalho
- Computational Mass Spectrometry &Proteomics Group, Carlos Chagas Institute, Fiocruz, Curitiba 81.350-010, Brazil.,Laboratory of Toxinology, Fiocruz, Rio de Janeiro 21040-900, Brazil
| |
Collapse
|
10
|
Ruggles KV, Krug K, Wang X, Clauser KR, Wang J, Payne SH, Fenyö D, Zhang B, Mani DR. Methods, Tools and Current Perspectives in Proteogenomics. Mol Cell Proteomics 2017; 16:959-981. [PMID: 28456751 DOI: 10.1074/mcp.mr117.000024] [Citation(s) in RCA: 95] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Indexed: 12/20/2022] Open
Abstract
With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications.
Collapse
Affiliation(s)
- Kelly V Ruggles
- From the ‡Department of Medicine, New York University School of Medicine, New York, New York 10016
| | - Karsten Krug
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Xiaojing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Karl R Clauser
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | - Jing Wang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030.,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - Samuel H Payne
- **Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354
| | - David Fenyö
- ‡‡Department of Biochemistry and Molecular Pharmacology, New York University School of Medicine, New York, New York 10016; .,§§Institute for Systems Genetics, New York University School of Medicine, New York, New York 10016
| | - Bing Zhang
- ¶Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas 77030; .,‖Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030
| | - D R Mani
- §The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142;
| |
Collapse
|
11
|
Savidor A, Barzilay R, Elinger D, Yarden Y, Lindzen M, Gabashvili A, Adiv Tal O, Levin Y. Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination. Mol Cell Proteomics 2017; 16:1151-1161. [PMID: 28348172 DOI: 10.1074/mcp.o116.065417] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Revised: 03/22/2017] [Indexed: 01/16/2023] Open
Abstract
Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence.
Collapse
Affiliation(s)
- Alon Savidor
- From ‡The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot
| | - Rotem Barzilay
- From ‡The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot
| | - Dalia Elinger
- From ‡The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot
| | - Yosef Yarden
- the §Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel 76100
| | - Moshit Lindzen
- the §Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel 76100
| | - Alexandra Gabashvili
- From ‡The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot
| | - Ophir Adiv Tal
- From ‡The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot
| | - Yishai Levin
- From ‡The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot;
| |
Collapse
|
12
|
Guthals A, Gan Y, Murray L, Chen Y, Stinson J, Nakamura G, Lill JR, Sandoval W, Bandeira N. De Novo MS/MS Sequencing of Native Human Antibodies. J Proteome Res 2016; 16:45-54. [PMID: 27779884 DOI: 10.1021/acs.jproteome.6b00608] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
One direct route for the discovery of therapeutic human monoclonal antibodies (mAbs) involves the isolation of peripheral B cells from survivors/sero-positive individuals after exposure to an infectious reagent or disease etiology, followed by single-cell sequencing or hybridoma generation. Peripheral B cells, however, are not always easy to obtain and represent only a small percentage of the total B-cell population across all bodily tissues. Although it has been demonstrated that tandem mass spectrometry (MS/MS) techniques can interrogate the full polyclonal antibody (pAb) response to an antigen in vivo, all current approaches identify MS/MS spectra against databases derived from genetic sequencing of B cells from the same patient. In this proof-of-concept study, we demonstrate the feasibility of a novel MS/MS antibody discovery approach in which only serum antibodies are required without the need for sequencing of genetic material. Peripheral pAbs from a cytomegalovirus-exposed individual were purified by glycoprotein B antigen affinity and de novo sequenced from MS/MS data. Purely MS-derived mAbs were then manufactured in mammalian cells to validate potency via antigen-binding ELISA. Interestingly, we found that these mAbs accounted for 1 to 2% of total donor IgG but were not detected in parallel sequencing of memory B cells from the same patient.
Collapse
Affiliation(s)
- Adrian Guthals
- Mapp Biopharmaceutical, Inc. , 6160 Lusk Boulevard #C105, San Diego, California 92121, United States
| | - Yutian Gan
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Laura Murray
- Department of Protein Chemistry, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Yongmei Chen
- Department of Antibody Engineering, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Jeremy Stinson
- Department of Molecular Biology, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Gerald Nakamura
- Department of Antibody Engineering, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Jennie R Lill
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Wendy Sandoval
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California, San Diego , 9500 Gilman Drive, Mail Code 0404, La Jolla, California 92093, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego , 9500 Gilman Drive, Mail Code 0657, La Jolla, California 92093, United States
| |
Collapse
|
13
|
Na S, Payne SH, Bandeira N. Multi-species Identification of Polymorphic Peptide Variants via Propagation in Spectral Networks. Mol Cell Proteomics 2016; 15:3501-3512. [PMID: 27609420 PMCID: PMC5098046 DOI: 10.1074/mcp.o116.060913] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Indexed: 11/25/2022] Open
Abstract
Peptide and protein identification remains challenging in organisms with poorly annotated or rapidly evolving genomes, as are commonly encountered in environmental or biofuels research. Such limitations render tandem mass spectrometry (MS/MS) database search algorithms ineffective as they lack corresponding sequences required for peptide-spectrum matching. We address this challenge with the spectral networks approach to (1) match spectra of orthologous peptides across multiple related species and then (2) propagate peptide annotations from identified to unidentified spectra. We here present algorithms to assess the statistical significance of spectral alignments (Align-GF), reduce the impurity in spectral networks, and accurately estimate the error rate in propagated identifications. Analyzing three related Cyanothece species, a model organism for biohydrogen production, spectral networks identified peptides from highly divergent sequences from networks with dozens of variant peptides, including thousands of peptides in species lacking a sequenced genome. Our analysis further detected the presence of many novel putative peptides even in genomically characterized species, thus suggesting the possibility of gaps in our understanding of their proteomic and genomic expression. A web-based pipeline for spectral networks analysis is available at http://proteomics.ucsd.edu/software.
Collapse
Affiliation(s)
- Seungjin Na
- From the ‡Dept. of Computer Science and Engineering, University of California, San Diego, La Jolla, California, 92093.,§Center for Computational Mass Spectrometry, University of California, San Diego, La Jolla, California, 92093
| | - Samuel H Payne
- ¶Pacific Northwest National Laboratory, Richland, Washington 99354
| | - Nuno Bandeira
- From the ‡Dept. of Computer Science and Engineering, University of California, San Diego, La Jolla, California, 92093; .,§Center for Computational Mass Spectrometry, University of California, San Diego, La Jolla, California, 92093.,‖Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California, 92093
| |
Collapse
|
14
|
Complete De Novo Assembly of Monoclonal Antibody Sequences. Sci Rep 2016; 6:31730. [PMID: 27562653 PMCID: PMC4999880 DOI: 10.1038/srep31730] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 07/20/2016] [Indexed: 11/25/2022] Open
Abstract
De novo protein sequencing is one of the key problems in mass spectrometry-based proteomics, especially for novel proteins such as monoclonal antibodies for which genome information is often limited or not available. However, due to limitations in peptides fragmentation and coverage, as well as ambiguities in spectra interpretation, complete de novo assembly of unknown protein sequences still remains challenging. To address this problem, we propose an integrated system, ALPS, which for the first time can automatically assemble full-length monoclonal antibody sequences. Our system integrates de novo sequencing peptides, their quality scores and error-correction information from databases into a weighted de Bruijn graph to assemble protein sequences. We evaluated ALPS performance on two antibody data sets, each including a heavy chain and a light chain. The results show that ALPS was able to assemble three complete monoclonal antibody sequences of length 216–441 AA, at 100% coverage, and 96.64–100% accuracy.
Collapse
|
15
|
Perez-Riverol Y, Alpi E, Wang R, Hermjakob H, Vizcaíno JA. Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics 2015; 15:930-49. [PMID: 25158685 PMCID: PMC4409848 DOI: 10.1002/pmic.201400302] [Citation(s) in RCA: 141] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 08/06/2014] [Accepted: 08/22/2014] [Indexed: 01/10/2023]
Abstract
Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | | | |
Collapse
|
16
|
Sunagar K, Morgenstern D, Reitzel AM, Moran Y. Ecological venomics: How genomics, transcriptomics and proteomics can shed new light on the ecology and evolution of venom. J Proteomics 2015; 135:62-72. [PMID: 26385003 DOI: 10.1016/j.jprot.2015.09.015] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Revised: 09/02/2015] [Accepted: 09/09/2015] [Indexed: 01/18/2023]
Abstract
Animal venom is a complex cocktail of bioactive chemicals that traditionally drew interest mostly from biochemists and pharmacologists. However, in recent years the evolutionary and ecological importance of venom is realized as this trait has direct and strong influence on interactions between species. Moreover, venom content can be modulated by environmental factors. Like many other fields of biology, venom research has been revolutionized in recent years by the introduction of systems biology approaches, i.e., genomics, transcriptomics and proteomics. The employment of these methods in venom research is known as 'venomics'. In this review we describe the history and recent advancements of venomics and discuss how they are employed in studying venom in general and in particular in the context of evolutionary ecology. We also discuss the pitfalls and challenges of venomics and what the future may hold for this emerging scientific field.
Collapse
Affiliation(s)
- Kartik Sunagar
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - David Morgenstern
- Proteomics Resource Center, Langone Medical Center, New York University, New York, USA.
| | - Adam M Reitzel
- Department of Biological Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Yehu Moran
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, Hebrew University of Jerusalem, Jerusalem 91904, Israel.
| |
Collapse
|
17
|
Guthals A, Boucher C, Bandeira N. The generating function approach for Peptide identification in spectral networks. J Comput Biol 2015; 22:353-66. [PMID: 25423621 PMCID: PMC4425220 DOI: 10.1089/cmb.2014.0165] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Tandem mass (MS/MS) spectrometry has become the method of choice for protein identification and has launched a quest for the identification of every translated protein and peptide. However, computational developments have lagged behind the pace of modern data acquisition protocols and have become a major bottleneck in proteomics analysis of complex samples. As it stands today, attempts to identify MS/MS spectra against large databases (e.g., the human microbiome or 6-frame translation of the human genome) face a search space that is 10-100 times larger than the human proteome, where it becomes increasingly challenging to separate between true and false peptide matches. As a result, the sensitivity of current state-of-the-art database search methods drops by nearly 38% to such low identification rates that almost 90% of all MS/MS spectra are left as unidentified. We address this problem by extending the generating function approach to rigorously compute the joint spectral probability of multiple spectra being matched to peptides with overlapping sequences, thus enabling the confident assignment of higher significance to overlapping peptide-spectrum matches (PSMs). We find that these joint spectral probabilities can be several orders of magnitude more significant than individual PSMs, even in the ideal case when perfect separation between signal and noise peaks could be achieved per individual MS/MS spectrum. After benchmarking this approach on a typical lysate MS/MS dataset, we show that the proposed intersecting spectral probabilities for spectra from overlapping peptides improve peptide identification by 30-62%.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California–San Diego, La Jolla, California
| | - Christina Boucher
- Department of Computer Science, Colorado State University, Fort Collins, Colorado
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California–San Diego, La Jolla, California
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California–San Diego, La Jolla, California
| |
Collapse
|
18
|
Liu X, Dekker LJM, Wu S, Vanduijn MM, Luider TM, Tolić N, Kou Q, Dvorkin M, Alexandrova S, Vyatkina K, Paša-Tolić L, Pevzner PA. De Novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. J Proteome Res 2014; 13:3241-8. [DOI: 10.1021/pr401300m] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Xiaowen Liu
- Department
of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 535 West Michigan Street, IT 475, Indianapolis, Indiana 46202, United States
- Center
for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Suite 5000, Indianapolis, Indiana 46202, United States
| | - Lennard J. M. Dekker
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040, 3000
CA Rotterdam, The Netherlands
| | - Si Wu
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Martijn M. Vanduijn
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040, 3000
CA Rotterdam, The Netherlands
| | - Theo M. Luider
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040, 3000
CA Rotterdam, The Netherlands
| | - Nikola Tolić
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Qiang Kou
- Department
of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 535 West Michigan Street, IT 475, Indianapolis, Indiana 46202, United States
| | - Mikhail Dvorkin
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, St. Petersburg 194021, Russia
| | - Sonya Alexandrova
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, St. Petersburg 194021, Russia
| | - Kira Vyatkina
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, St. Petersburg 194021, Russia
| | - Ljiljana Paša-Tolić
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Pavel A. Pevzner
- Department
of Computer Science and Engineering, University of California, 9500 Gilman
Drive, San Diego, California 92093, United States
| |
Collapse
|
19
|
Leprevost FV, Valente RH, Lima DB, Perales J, Melani R, Yates JR, Barbosa VC, Junqueira M, Carvalho PC. PepExplorer: a similarity-driven tool for analyzing de novo sequencing results. Mol Cell Proteomics 2014; 13:2480-9. [PMID: 24878498 DOI: 10.1074/mcp.m113.037002] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Peptide spectrum matching is the current gold standard for protein identification via mass-spectrometry-based proteomics. Peptide spectrum matching compares experimental mass spectra against theoretical spectra generated from a protein sequence database to perform identification, but protein sequences not present in a database cannot be identified unless their sequences are in part conserved. The alternative approach, de novo sequencing, can make it possible to infer a peptide sequence directly from a mass spectrum, but interpreting long lists of peptide sequences resulting from large-scale experiments is not trivial. With this as motivation, PepExplorer was developed to use rigorous pattern recognition to assemble a list of homologue proteins using de novo sequencing data coupled to sequence alignment to allow biological interpretation of the data. PepExplorer can read the output of various widely adopted de novo sequencing tools and converge to a list of proteins with a global false-discovery rate. To this end, it employs a radial basis function neural network that considers precursor charge states, de novo sequencing scores, peptide lengths, and alignment scores to select similar protein candidates, from a target-decoy database, usually obtained from phylogenetically related species. Alignments are performed using a modified Smith-Waterman algorithm tailored for the task at hand. We verified the effectiveness of our approach using a reference set of identifications generated by ProLuCID when searching for Pyrococcus furiosus mass spectra on the corresponding NCBI RefSeq database. We then modified the sequence database by swapping amino acids until ProLuCID was no longer capable of identifying any proteins. By searching the mass spectra using PepExplorer on the modified database, we were able to recover most of the identifications at a 1% false-discovery rate. Finally, we employed PepExplorer to disclose a comprehensive proteomic assessment of the Bothrops jararaca plasma, a known biological source of natural inhibitors of snake toxins. PepExplorer is integrated into the PatternLab for Proteomics environment, which makes available various tools for downstream data analysis, including resources for quantitative and differential proteomics.
Collapse
Affiliation(s)
- Felipe V Leprevost
- From the ‡Laboratory for Proteomics and Protein Engineering, Carlos Chagas Institute, Fiocruz, Paraná, Brazil
| | - Richard H Valente
- §Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro, Brazil; ¶Instituto Nacional de Ciência e Tecnologia em Toxinas (INCTTox/CNPq), Brazil
| | - Diogo B Lima
- From the ‡Laboratory for Proteomics and Protein Engineering, Carlos Chagas Institute, Fiocruz, Paraná, Brazil
| | - Jonas Perales
- §Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro, Brazil; ¶Instituto Nacional de Ciência e Tecnologia em Toxinas (INCTTox/CNPq), Brazil
| | - Rafael Melani
- ‖Proteomics Unit, Rio de Janeiro Proteomics Network, Department of Biochemistry, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - John R Yates
- **Department of Chemical Physiology, The Scripps Research Institute, La Jolla, California
| | - Valmir C Barbosa
- ‡‡Systems Engineering and Computer Science Program, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Magno Junqueira
- ‖Proteomics Unit, Rio de Janeiro Proteomics Network, Department of Biochemistry, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Paulo C Carvalho
- From the ‡Laboratory for Proteomics and Protein Engineering, Carlos Chagas Institute, Fiocruz, Paraná, Brazil
| |
Collapse
|
20
|
Richards AL, Vincent CE, Guthals A, Rose CM, Westphall MS, Bandeira N, Coon JJ. Neutron-encoded signatures enable product ion annotation from tandem mass spectra. Mol Cell Proteomics 2013; 12:3812-23. [PMID: 24043425 DOI: 10.1074/mcp.m113.028951] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
We report the use of neutron-encoded (NeuCode) stable isotope labeling of amino acids in cell culture for the purpose of C-terminal product ion annotation. Two NeuCode labeling isotopologues of lysine, (13)C6(15)N2 and (2)H8, which differ by 36 mDa, were metabolically embedded in a sample proteome, and the resultant labeled proteins were combined, digested, and analyzed via liquid chromatography and mass spectrometry. With MS/MS scan resolving powers of ~50,000 or higher, product ions containing the C terminus (i.e. lysine) appear as a doublet spaced by exactly 36 mDa, whereas N-terminal fragments exist as a single m/z peak. Through theory and experiment, we demonstrate that over 90% of all y-type product ions have detectable doublets. We report on an algorithm that can extract these neutron signatures with high sensitivity and specificity. In other words, of 15,503 y-type product ion peaks, the y-type ion identification algorithm correctly identified 14,552 (93.2%) based on detection of the NeuCode doublet; 6.8% were misclassified (i.e. other ion types that were assigned as y-type products). Searching NeuCode labeled yeast with PepNovo(+) resulted in a 34% increase in correct de novo identifications relative to searching through MS/MS only. We use this tool to simplify spectra prior to database searching, to sort unmatched tandem mass spectra for spectral richness, for correlation of co-fragmented ions to their parent precursor, and for de novo sequence identification.
Collapse
Affiliation(s)
- Alicia L Richards
- Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706
| | | | | | | | | | | | | |
Collapse
|
21
|
Guthals A, Clauser KR, Frank AM, Bandeira N. Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides. J Proteome Res 2013; 12:2846-57. [PMID: 23679345 DOI: 10.1021/pr400173d] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Full-length de novo sequencing of unknown proteins remains a challenging open problem. Traditional methods that sequence spectra individually are limited by short peptide length, incomplete peptide fragmentation, and ambiguous de novo interpretations. We address these issues by determining consensus sequences for assembled tandem mass (MS/MS) spectra from overlapping peptides (e.g., by using multiple enzymatic digests). We have combined electron-transfer dissociation (ETD) with collision-induced dissociation (CID) and higher-energy collision-induced dissociation (HCD) fragmentation methods to boost interpretation of long, highly charged peptides and take advantage of corroborating b/y/c/z ions in CID/HCD/ETD. Using these strategies, we show that triplet CID/HCD/ETD MS/MS spectra from overlapping peptides yield de novo sequences of average length 70 AA and as long as 200 AA at up to 99% sequencing accuracy.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, United States
| | | | | | | |
Collapse
|
22
|
Guthals A, Watrous JD, Dorrestein PC, Bandeira N. The spectral networks paradigm in high throughput mass spectrometry. MOLECULAR BIOSYSTEMS 2013; 8:2535-44. [PMID: 22610447 DOI: 10.1039/c2mb25085c] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.
Collapse
Affiliation(s)
- Adrian Guthals
- Dept. Computer Science and Engineering, University of California, San Diego, USA
| | | | | | | |
Collapse
|