1
|
Plouviez M, Dubreucq E. Key Proteomics Tools for Fundamental and Applied Microalgal Research. Proteomes 2024; 12:13. [PMID: 38651372 PMCID: PMC11036299 DOI: 10.3390/proteomes12020013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 03/28/2024] [Accepted: 04/02/2024] [Indexed: 04/25/2024] Open
Abstract
Microscopic, photosynthetic prokaryotes and eukaryotes, collectively referred to as microalgae, are widely studied to improve our understanding of key metabolic pathways (e.g., photosynthesis) and for the development of biotechnological applications. Omics technologies, which are now common tools in biological research, have been shown to be critical in microalgal research. In the past decade, significant technological advancements have allowed omics technologies to become more affordable and efficient, with huge datasets being generated. In particular, where studies focused on a single or few proteins decades ago, it is now possible to study the whole proteome of a microalgae. The development of mass spectrometry-based methods has provided this leap forward with the high-throughput identification and quantification of proteins. This review specifically provides an overview of the use of proteomics in fundamental (e.g., photosynthesis) and applied (e.g., lipid production for biofuel) microalgal research, and presents future research directions in this field.
Collapse
Affiliation(s)
- Maxence Plouviez
- School of Agriculture and Environment, Massey University, Palmerston North 4410, New Zealand
- The Cawthron Institute, Nelson 7010, New Zealand
| | - Eric Dubreucq
- Agropolymer Engineering and Emerging Technologies, L’Institut Agro Montpellier, 34060 Montpellier, France;
| |
Collapse
|
2
|
Bobalova J, Strouhalova D, Bobal P. Common Post-translational Modifications (PTMs) of Proteins: Analysis by Up-to-Date Analytical Techniques with an Emphasis on Barley. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:14825-14837. [PMID: 37792446 PMCID: PMC10591476 DOI: 10.1021/acs.jafc.3c00886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 09/07/2023] [Accepted: 09/12/2023] [Indexed: 10/05/2023]
Abstract
Post-translational modifications (PTMs) of biomacromolecules can be useful for understanding the processes by which a relatively small number of individual genes in a particular genome can generate enormous biological complexity in different organisms. The proteomes of barley and the brewing process were investigated by different techniques. However, their diverse and complex PTMs remain understudied. As standard analytical approaches have limitations, innovative analytical approaches need to be developed and applied in PTM studies. To make further progress in this field, it is necessary to specify the sites of modification, as well as to characterize individual isoforms with increased selectivity and sensitivity. This review summarizes advances in the PTM analysis of barley proteins, particularly those involving mass spectrometric detection. Our focus is on monitoring phosphorylation, glycation, and glycosylation, which critically influence functional behavior in metabolism and regulation in organisms.
Collapse
Affiliation(s)
- Janette Bobalova
- Institute
of Analytical Chemistry of the CAS, v. v. i., Veveri 97, Brno 602 00, Czech Republic
| | - Dana Strouhalova
- Institute
of Analytical Chemistry of the CAS, v. v. i., Veveri 97, Brno 602 00, Czech Republic
| | - Pavel Bobal
- Masaryk
University, Department of Chemical Drugs,
Faculty of Pharmacy, Palackeho
1946/1, Brno 612 00, Czech Republic
| |
Collapse
|
3
|
Sun B, Liu Z, Liu J, Zhao S, Wang L, Wang F. The utility of proteases in proteomics, from sequence profiling to structure and function analysis. Proteomics 2023; 23:e2200132. [PMID: 36382392 DOI: 10.1002/pmic.202200132] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 11/08/2022] [Accepted: 11/08/2022] [Indexed: 11/18/2022]
Abstract
In mass spectrometry (MS)-based bottom-up proteomics, protease digestion plays an essential role in profiling both proteome sequences and post-translational modifications (PTMs). Trypsin is the gold standard in digesting intact proteins into small-size peptides, which are more suitable for high-performance liquid chromatography (HPLC) separation and tandem MS (MS/MS) characterization. However, protein sequences lacking Lys and Arg cannot be cleaved by trypsin and may be missed in conventional proteomic analysis. Proteases with cleavage sites complementary to trypsin are widely applied in proteomic analysis to greatly improve the coverage of proteome sequences and PTM sites. In this review, we survey the common and newly emerging proteases used in proteomics analysis mainly in the last 5 years, focusing on their unique cleavage features and specific proteomics applications such as missing protein characterization, new PTM discovery, and de novo sequencing. In addition, we summarize the applications of proteases in structural proteomics and protein function analysis in recent years. Finally, we discuss the future development directions of new proteases and applications in proteomics.
Collapse
Affiliation(s)
- Binwen Sun
- Engineering Research Center for New Materials and Precision Treatment Technology of Malignant Tumors Therapy, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 463 Zhongshan Road, Dalian, 116023, China
- Engineering Technology Research Center for Translational Medicine, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
| | - Zheyi Liu
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 463 Zhongshan Road, Dalian, 116023, China
| | - Jin Liu
- Engineering Research Center for New Materials and Precision Treatment Technology of Malignant Tumors Therapy, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- Engineering Technology Research Center for Translational Medicine, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- Division of Hepatobiliary and Pancreatic Surgery, Department of General Surgery, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
| | - Shan Zhao
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 463 Zhongshan Road, Dalian, 116023, China
| | - Liming Wang
- Engineering Research Center for New Materials and Precision Treatment Technology of Malignant Tumors Therapy, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- Engineering Technology Research Center for Translational Medicine, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
- Division of Hepatobiliary and Pancreatic Surgery, Department of General Surgery, Second Affiliated Hospital, Dalian Medical University, 467 Zhongshan Road, Dalian, 116027, China
| | - Fangjun Wang
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 463 Zhongshan Road, Dalian, 116023, China
- University of Chinese Academy of Sciences, 19 Yuquan Road, Beijing, 100049, China
| |
Collapse
|
4
|
Lu Y, Ge C, Cai B, Xu Q, Kong R, Chang S. Antibody sequences assembly method based on weighted de Bruijn graph. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:6174-6190. [PMID: 37161102 DOI: 10.3934/mbe.2023266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
With the development of next-generation protein sequencing technologies, sequence assembly algorithm has become a key technology for de novo sequencing process. At present, the existing methods can address the assembly of an unknown single protein chain. However, for monoclonal antibodies with light and heavy chains, the assembly is still an unsolved question. To address this problem, we propose a new assembly method, DBAS, which integrates the quality scores and sequence alignment scores from de novo sequencing peptides into a weighted de Bruijn graph to assemble the final protein sequences. The established method is used to assembling sequences from two datasets with mixed light and heavy chains from antibodies. The results show that the DBAS can assemble long antibody sequences for both mixed light and heavy chains and single chains. In addition, DBAS is able to distinguish the light and heavy chains by using BLAST sequence alignment. The results show that the algorithm has good performance for both target sequence coverage and contig assembly accuracy.
Collapse
Affiliation(s)
- Yi Lu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Cheng Ge
- Key Laboratory of Marine Drugs, Chinese Ministry of Education, School of Medicine and Pharmacy, Ocean University of China, Qingdao 266003, China
| | - Biao Cai
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Qing Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| |
Collapse
|
5
|
Abstract
Accurate full-length sequencing of a purified unknown protein is still challenging nowadays due to the error-prone mass-spectrometry (MS)-based methods. De novo identified peptide sequence largely contain errors, undermining the accuracy of assembly. Bias on the detectability of the peptides also makes low-coverage regions, resulting in gaps. Although recent advances on multi-enzyme hydrolysis and algorithms showed complete assembly of full-length protein sequences in a few examples, the robustness in practical application is still to be improved. Here, inspired by genome assembly strategies, we demonstrate a contig-scaffolding strategy to assemble protein sequences with high robustness and accuracy. This strategy integrates multiple unspecific hydrolysis methods to minimize the bias in the hydrolysis process. After de novo identification of the peptides, our assembly algorithm, named Multiple Contigs & Scaffolding (MuCS), assembles the peptide sequences in a multistep, i.e., contig-scaffold manner, with error correction in each step. MS data from different hydrolysis experiments complement each other for robust contig extension and error correction. We demonstrated that our strategy on three proteins and three replications all reached 100% coverage (except one with 98.85%) and 98.69-100% accuracy. It can also efficiently deal with the membrane protein, although the transmembrane region was missing due to the limitation of the MS. The three replicates reached 88.85-92.57% coverage and 97.57-100% accuracy. In sum, we provided a practical, robust, and accurate solution for full-length protein sequencing. The MuCS software is available at http://chi-biotech.com/mucs/.
Collapse
Affiliation(s)
- Zhi-Biao Mai
- Big Data Decision Institute, Jinan University, Guangzhou 510632, China.,Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Zhong-Hua Zhou
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| | - Qing-Yu He
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou 510632, China
| |
Collapse
|
6
|
Abd El-Aziz TM, Soares AG, Stockand JD. Advances in venomics: Modern separation techniques and mass spectrometry. J Chromatogr B Analyt Technol Biomed Life Sci 2020; 1160:122352. [PMID: 32971366 PMCID: PMC8174749 DOI: 10.1016/j.jchromb.2020.122352] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 08/25/2020] [Accepted: 08/26/2020] [Indexed: 12/31/2022]
Abstract
Snake venoms are complex chemical mixtures of biologically active proteins and non-protein components. Toxins have a wide range of targets and effects to include ion channels and membrane receptors, and platelet aggregation and platelet plug formation. Toxins target these effectors and effects at high affinity and selectivity. From a pharmacological perspective, snake venom compounds are a valuable resource for drug discovery and development. However, a major challenge to drug discovery using snake venoms is isolating and analyzing the bioactive proteins and peptides in these complex mixtures. Getting molecular information from complex mixtures such as snake venoms requires proteomic analyses, generally combined with transcriptomic analyses of venom glands. The present review summarizes current knowledge and highlights important recent advances in venomics with special emphasis on contemporary separation techniques and bioinformatics that have begun to elaborate the complexity of snake venoms. Several analytical techniques such as two-dimensional gel electrophoresis, RP-HPLC, size exclusion chromatography, ion exchange chromatography, MALDI-TOF-MS, and LC-ESI-QTOF-MS have been employed in this regard. The improvement of separation approaches such as multidimensional-HPLC, 2D-electrophoresis coupled to soft-ionization (MALDI and ESI) mass spectrometry has been critical to obtain an accurate picture of the startling complexity of venoms. In the case of bioinformatics, a variety of software tools such as PEAKS also has been used successfully. Such information gleaned from venomics is important to both predicting and resolving the biological activity of the active components of venoms, which in turn is key for the development of new drugs based on these venom components.
Collapse
Affiliation(s)
- Tarek Mohamed Abd El-Aziz
- Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, San Antonio, Texas 78229-3900, USA; Zoology Department, Faculty of Science, Minia University, El-Minia 61519, Egypt.
| | - Antonio G Soares
- Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, San Antonio, Texas 78229-3900, USA
| | - James D Stockand
- Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, San Antonio, Texas 78229-3900, USA
| |
Collapse
|
7
|
Mao Y, Daly TJ, Li N. Lys-Sequencer: An algorithm for de novo sequencing of peptides by paired single residue transposed Lys-C and Lys-N digestion coupled with high-resolution mass spectrometry. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2020; 34:e8574. [PMID: 31499586 DOI: 10.1002/rcm.8574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Revised: 08/27/2019] [Accepted: 09/02/2019] [Indexed: 06/10/2023]
Abstract
RATIONALE Database-dependent identification of proteins by mass spectrometry is well established, but has limitations when there are novel proteins, mutations, splice variants, and post-translational modifications (PTMs) not available in the established reference database. De novo sequencing as a database-independent approach could address these limitations by deducing peptide sequences directly from experimental tandem mass spectrometry spectra, while concomitantly yielding residue-by-residue confidence metrics. METHODS Equal amounts of bovine serum albumin (BSA) sample aliquots were digested separately with Lys-C and Lys-N complementary peptidases, separated by reversed-phase ultra-high-performance liquid chromatography (UPLC), and analyzed by collision-induced dissociation (CID)-based mass spectrometry on an Orbitrap mass spectrometer. In the Lys-Sequencer algorithm, matched tandem mass spectra with equal precursor ion mass from complementary digestions were paired, and fragment ion types were identified based on the unique mass relationship between fragment ions extracted from a spectrum pair followed by de novo sequencing of peptides with identification confidence assigned at the residue level. RESULTS In all the matched spectrum pairs, 34 top-ranked BSA peptides were identified, from which 391 amino acid residues were identified correctly, covering ~67% of the full sequence of BSA (583 residues) with only ~6% (35 residues) exhibiting ambiguity in the sequence order (although amino acid compositions were still correctly assigned). Of note, this approach identified peptide sequences up to 17 amino acids in length without ambiguity, with the exception of the N-terminal or C-terminal peptides containing lysine (18-mer). CONCLUSIONS The algorithm ("Lys-Sequencer") developed in this work achieves high precision for de novo sequencing of peptides. This method facilitates the identification of point mutation and new PTMs in the protein characterization and discovery of new peptides and proteins with varying levels of confidence.
Collapse
Affiliation(s)
- Yuan Mao
- Department of Analytical Chemistry, Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Thomas J Daly
- Department of Analytical Chemistry, Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| | - Ning Li
- Department of Analytical Chemistry, Regeneron Pharmaceuticals, Inc., 777 Old Saw Mill River Road, Tarrytown, NY, 10591, USA
| |
Collapse
|
8
|
Muth T, Hartkopf F, Vaudel M, Renard BY. A Potential Golden Age to Come-Current Tools, Recent Use Cases, and Future Avenues for De Novo Sequencing in Proteomics. Proteomics 2018; 18:e1700150. [PMID: 29968278 DOI: 10.1002/pmic.201700150] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/23/2018] [Indexed: 01/15/2023]
Abstract
In shotgun proteomics, peptide and protein identification is most commonly conducted using database search engines, the method of choice when reference protein sequences are available. Despite its widespread use the database-driven approach is limited, mainly because of its static search space. In contrast, de novo sequencing derives peptide sequence information in an unbiased manner, using only the fragment ion information from the tandem mass spectra. In recent years, with the improvements in MS instrumentation, various new methods have been proposed for de novo sequencing. This review article provides an overview of existing de novo sequencing algorithms and software tools ranging from peptide sequencing to sequence-to-protein mapping. Various use cases are described for which de novo sequencing was successfully applied. Finally, limitations of current methods are highlighted and new directions are discussed for a wider acceptance of de novo sequencing in the community.
Collapse
Affiliation(s)
- Thilo Muth
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Felix Hartkopf
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| | - Marc Vaudel
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, 5020, Bergen, Norway.,Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, 5020, Bergen, Norway
| | - Bernhard Y Renard
- Bioinformatics Unit (MF 1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353, Berlin, Germany
| |
Collapse
|
9
|
Robinson SD, Undheim EAB, Ueberheide B, King GF. Venom peptides as therapeutics: advances, challenges and the future of venom-peptide discovery. Expert Rev Proteomics 2017; 14:931-939. [DOI: 10.1080/14789450.2017.1377613] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Samuel D. Robinson
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Australia
- Centre for Advanced Imaging, University of Queensland, St Lucia, Australia
| | | | | | - Glenn F. King
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Australia
| |
Collapse
|
10
|
Savidor A, Barzilay R, Elinger D, Yarden Y, Lindzen M, Gabashvili A, Adiv Tal O, Levin Y. Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination. Mol Cell Proteomics 2017; 16:1151-1161. [PMID: 28348172 DOI: 10.1074/mcp.o116.065417] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Revised: 03/22/2017] [Indexed: 01/16/2023] Open
Abstract
Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence.
Collapse
Affiliation(s)
- Alon Savidor
- From ‡The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot
| | - Rotem Barzilay
- From ‡The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot
| | - Dalia Elinger
- From ‡The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot
| | - Yosef Yarden
- the §Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel 76100
| | - Moshit Lindzen
- the §Department of Biological Regulation, Weizmann Institute of Science, Rehovot, Israel 76100
| | - Alexandra Gabashvili
- From ‡The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot
| | - Ophir Adiv Tal
- From ‡The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot
| | - Yishai Levin
- From ‡The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot;
| |
Collapse
|
11
|
Vyatkina K. De Novo Sequencing of Top-Down Tandem Mass Spectra: A Next Step towards Retrieving a Complete Protein Sequence. Proteomes 2017; 5:E6. [PMID: 28248257 PMCID: PMC5372227 DOI: 10.3390/proteomes5010006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Revised: 01/30/2017] [Accepted: 02/04/2017] [Indexed: 11/16/2022] Open
Abstract
De novo sequencing of tandem (MS/MS) mass spectra represents the only way to determine the sequence of proteins from organisms with unknown genomes, or the ones not directly inscribed in a genome-such as antibodies, or novel splice variants. Top-down mass spectrometry provides new opportunities for analyzing such proteins; however, retrieving a complete protein sequence from top-down MS/MS spectra still remains a distant goal. In this paper, we review the state-of-the-art on this subject, and enhance our previously developed Twister algorithm for de novo sequencing of peptides from top-down MS/MS spectra to derive longer sequence fragments of a target protein.
Collapse
Affiliation(s)
- Kira Vyatkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, 7-9 Universitetskaya nab., St. Petersburg 199034, Russia.
- Department of Mathematical and Information Technologies, Saint Petersburg Academic University, 8/3 Khlopina st., St. Petersburg 194021, Russia.
| |
Collapse
|
12
|
Guan X, Brownstein NC, Young NL, Marshall AG. Ultrahigh-resolution Fourier transform ion cyclotron resonance mass spectrometry and tandem mass spectrometry for peptide de novo amino acid sequencing for a seven-protein mixture by paired single-residue transposed Lys-N and Lys-C digestion. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2017; 31:207-217. [PMID: 27813191 DOI: 10.1002/rcm.7783] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Revised: 10/29/2016] [Accepted: 10/30/2016] [Indexed: 06/06/2023]
Abstract
RATIONALE Bottom-up tandem mass spectrometry (MS/MS) is regularly used in proteomics to identify proteins from a sequence database. De novo sequencing is also available for sequencing peptides with relatively short sequence lengths. We recently showed that paired Lys-C and Lys-N proteases produce peptides of identical mass and similar retention time, but different tandem mass spectra. Such parallel experiments provide complementary information, and allow for up to 100% MS/MS sequence coverage. METHODS Here, we report digestion by paired Lys-C and Lys-N proteases of a seven-protein mixture: human hemoglobin alpha, bovine carbonic anhydrase 2, horse skeletal muscle myoglobin, hen egg white lysozyme, bovine pancreatic ribonuclease, bovine rhodanese, and bovine serum albumin, followed by reversed-phase nanoflow liquid chromatography, collision-induced dissociation, and 14.5 T Fourier transform ion cyclotron resonance mass spectrometry. RESULTS Matched pairs of product peptide ions of equal precursor mass and similar retention times from each digestion are compared, leveraging single-residue transposed information with independent interferences to confidently identify fragment ion types, residues, and peptides. Selected pairs of product ion mass spectra for de novo sequenced protein segments from each member of the mixture are presented. CONCLUSIONS Pairs of the transposed product ions as well as complementary information from the parallel experiments allow for both high MS/MS coverage for long peptide sequences and high confidence in the amino acid identification. Moreover, the parallel experiments in the de novo sequencing reduce false-positive matches of product ions from the single-residue transposed peptides from the same segment, and thereby further improve the confidence in protein identification. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Xiaoyan Guan
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Florida State University, 1800 East Paul Dirac Drive, Tallahassee, FL, 32310, USA
| | - Naomi C Brownstein
- Department of Behavioral Sciences and Social Medicine, College of Medicine, Florida State University, 1115 W. Call St., Tallahassee, FL, 32306, USA
- Department of Statistics, Florida State University, 117 N. Woodward Ave., Tallahassee, FL, 32306, USA
| | - Nicolas L Young
- Verna & Marrs McLean Department of Biochemistry & Molecular Biology, Baylor College of Medicine, One Baylor Plaza, MS-125, Houston, TX, 77030-3411, USA
| | - Alan G Marshall
- Ion Cyclotron Resonance Program, National High Magnetic Field Laboratory, Florida State University, 1800 East Paul Dirac Drive, Tallahassee, FL, 32310, USA
- Department of Chemistry and Biochemistry, Florida State University, 95 Chieftain Way, Tallahassee, FL, 32303, USA
| |
Collapse
|
13
|
Abstract
Through advances in molecular biology, comparative analysis of DNA sequences is currently the cornerstone in the study of molecular evolution and phylogenetics. Nevertheless, protein mass spectrometry offers some unique opportunities to enable phylogenetic analyses in organisms where DNA may be difficult or costly to obtain. To date, the methods of phylogenetic analysis using protein mass spectrometry can be classified into three categories: (1) de novo protein sequencing followed by classical phylogenetic reconstruction, (2) direct phylogenetic reconstruction using proteolytic peptide mass maps, and (3) mapping of mass spectral data onto classical phylogenetic trees. In this chapter, we provide a brief description of the three methods and the protocol for each method along with relevant tools and algorithms.
Collapse
Affiliation(s)
- Shiyong Ma
- Prince of Wales Clinical School, UNSW Australia, Sydney, NSW, 2052, Australia
- Lowy Cancer Research Centre, UNSW, Corner of High and Botany St, Kensington, NSW, 2033, Australia
| | - Kevin M Downard
- Prince of Wales Clinical School, UNSW Australia, Sydney, NSW, 2052, Australia
- Lowy Cancer Research Centre, UNSW, Corner of High and Botany St, Kensington, NSW, 2033, Australia
| | - Jason W H Wong
- Prince of Wales Clinical School, UNSW Australia, Sydney, NSW, 2052, Australia.
- Lowy Cancer Research Centre, UNSW, Corner of High and Botany St, Kensington, NSW, 2033, Australia.
| |
Collapse
|
14
|
Abstract
The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions.
Collapse
|
15
|
Ma B. De novo Peptide Sequencing. PROTEOME INFORMATICS 2016:15-38. [DOI: 10.1039/9781782626732-00015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
De novo peptide sequencing refers to the process of determining a peptide’s amino acid sequence from its MS/MS spectrum alone. The principle of this process is fairly straightforward: a high-quality spectrum may present a ladder of fragment ion peaks. The mass difference between every two adjacent peaks in the ladder is used to determine a residue of the peptide. However, most practical spectra do not have sufficient quality to support this straightforward process. Therefore, research in de novo sequencing has largely been a battle against the errors in the data. This chapter reviews some of the major developments in this field. The chapter starts with a quick review of the history in Section 1. Then manual de novo sequencing is examined in Section 2. Section 3 introduces a few commonly used de novo sequencing algorithms. An important aspect of automated de novo sequencing software is a good scoring function that serves as the optimization goal of the algorithm. Thus, Section 4 is devoted for the methods to define good scoring functions. Section 5 reviews a list of relevant software. The chapter concludes with a discussion of the applications and limitations of de novosequencing in Section 6.
Collapse
Affiliation(s)
- Bin Ma
- School of Computer Science, University of Waterloo Canada
| |
Collapse
|
16
|
Guthals A, Gan Y, Murray L, Chen Y, Stinson J, Nakamura G, Lill JR, Sandoval W, Bandeira N. De Novo MS/MS Sequencing of Native Human Antibodies. J Proteome Res 2016; 16:45-54. [PMID: 27779884 DOI: 10.1021/acs.jproteome.6b00608] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
One direct route for the discovery of therapeutic human monoclonal antibodies (mAbs) involves the isolation of peripheral B cells from survivors/sero-positive individuals after exposure to an infectious reagent or disease etiology, followed by single-cell sequencing or hybridoma generation. Peripheral B cells, however, are not always easy to obtain and represent only a small percentage of the total B-cell population across all bodily tissues. Although it has been demonstrated that tandem mass spectrometry (MS/MS) techniques can interrogate the full polyclonal antibody (pAb) response to an antigen in vivo, all current approaches identify MS/MS spectra against databases derived from genetic sequencing of B cells from the same patient. In this proof-of-concept study, we demonstrate the feasibility of a novel MS/MS antibody discovery approach in which only serum antibodies are required without the need for sequencing of genetic material. Peripheral pAbs from a cytomegalovirus-exposed individual were purified by glycoprotein B antigen affinity and de novo sequenced from MS/MS data. Purely MS-derived mAbs were then manufactured in mammalian cells to validate potency via antigen-binding ELISA. Interestingly, we found that these mAbs accounted for 1 to 2% of total donor IgG but were not detected in parallel sequencing of memory B cells from the same patient.
Collapse
Affiliation(s)
- Adrian Guthals
- Mapp Biopharmaceutical, Inc. , 6160 Lusk Boulevard #C105, San Diego, California 92121, United States
| | - Yutian Gan
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Laura Murray
- Department of Protein Chemistry, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Yongmei Chen
- Department of Antibody Engineering, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Jeremy Stinson
- Department of Molecular Biology, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Gerald Nakamura
- Department of Antibody Engineering, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Jennie R Lill
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Wendy Sandoval
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California, San Diego , 9500 Gilman Drive, Mail Code 0404, La Jolla, California 92093, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego , 9500 Gilman Drive, Mail Code 0657, La Jolla, California 92093, United States
| |
Collapse
|
17
|
Vyatkina K, Wu S, Dekker LJM, VanDuijn MM, Liu X, Tolić N, Luider TM, Paša-Tolić L, Pevzner PA. Top-down analysis of protein samples by de novo sequencing techniques. Bioinformatics 2016; 32:2753-9. [PMID: 27187201 PMCID: PMC6280873 DOI: 10.1093/bioinformatics/btw307] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Revised: 03/31/2016] [Accepted: 05/09/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Recent technological advances have made high-resolution mass spectrometers affordable to many laboratories, thus boosting rapid development of top-down mass spectrometry, and implying a need in efficient methods for analyzing this kind of data. RESULTS We describe a method for analysis of protein samples from top-down tandem mass spectrometry data, which capitalizes on de novo sequencing of fragments of the proteins present in the sample. Our algorithm takes as input a set of de novo amino acid strings derived from the given mass spectra using the recently proposed Twister approach, and combines them into aggregated strings endowed with offsets. The former typically constitute accurate sequence fragments of sufficiently well-represented proteins from the sample being analyzed, while the latter indicate their location in the protein sequence, and also bear information on post-translational modifications and fragmentation patterns. AVAILABILITY AND IMPLEMENTATION Freely available on the web at http://bioinf.spbau.ru/en/twister CONTACT vyatkina@spbau.ru or ppevzner@ucsd.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kira Vyatkina
- Algorithmic Biology Laboratory, Saint Petersburg Academic University, St Petersburg, Russia Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, St Petersburg, Russia
| | - Si Wu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK, USA
| | - Lennard J M Dekker
- Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Martijn M VanDuijn
- Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN, USA Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Nikola Tolić
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Theo M Luider
- Department of Neurology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Ljiljana Paša-Tolić
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Pavel A Pevzner
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, St Petersburg, Russia Department of Computer Science and Engineering, University of California, San Diego, CA, USA
| |
Collapse
|
18
|
Complete De Novo Assembly of Monoclonal Antibody Sequences. Sci Rep 2016; 6:31730. [PMID: 27562653 PMCID: PMC4999880 DOI: 10.1038/srep31730] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 07/20/2016] [Indexed: 11/25/2022] Open
Abstract
De novo protein sequencing is one of the key problems in mass spectrometry-based proteomics, especially for novel proteins such as monoclonal antibodies for which genome information is often limited or not available. However, due to limitations in peptides fragmentation and coverage, as well as ambiguities in spectra interpretation, complete de novo assembly of unknown protein sequences still remains challenging. To address this problem, we propose an integrated system, ALPS, which for the first time can automatically assemble full-length monoclonal antibody sequences. Our system integrates de novo sequencing peptides, their quality scores and error-correction information from databases into a weighted de Bruijn graph to assemble protein sequences. We evaluated ALPS performance on two antibody data sets, each including a heavy chain and a light chain. The results show that ALPS was able to assemble three complete monoclonal antibody sequences of length 216–441 AA, at 100% coverage, and 96.64–100% accuracy.
Collapse
|
19
|
Griss J. Spectral library searching in proteomics. Proteomics 2016; 16:729-40. [PMID: 26616598 DOI: 10.1002/pmic.201500296] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 10/15/2015] [Accepted: 10/29/2015] [Indexed: 12/12/2022]
Abstract
Spectral library searching has become a mature method to identify tandem mass spectra in proteomics data analysis. This review provides a comprehensive overview of available spectral library search engines and highlights their distinct features. Additionally, resources providing spectral libraries are summarized and tools presented that extend experimental spectral libraries by simulating spectra. Finally, spectrum clustering algorithms are discussed that utilize the same spectrum-to-spectrum matching algorithms as spectral library search engines and allow novel methods to analyse proteomics data.
Collapse
Affiliation(s)
- Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
20
|
Carregari VC, Dai J, Verano-Braga T, Rocha T, Ponce-Soto LA, Marangoni S, Roepstorff P. Revealing the functional structure of a new PLA2 K49 from Bothriopsis taeniata snake venom employing automatic “de novo” sequencing using CID/HCD/ETD MS/MS analyses. J Proteomics 2016; 131:131-139. [DOI: 10.1016/j.jprot.2015.10.020] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Revised: 10/14/2015] [Accepted: 10/15/2015] [Indexed: 11/24/2022]
|
21
|
Vyatkina K, Wu S, Dekker LJM, VanDuijn MM, Liu X, Tolić N, Dvorkin M, Alexandrova S, Luider TM, Paša-Tolić L, Pevzner PA. De Novo Sequencing of Peptides from Top-Down Tandem Mass Spectra. J Proteome Res 2015; 14:4450-62. [DOI: 10.1021/pr501244v] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Kira Vyatkina
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, Saint Petersburg 194021, Russia
- Center
for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, 7-9 Universitetskaya nab., Saint Petersburg 199034, Russia
| | - Si Wu
- Department
of Chemistry and Biochemistry, University of Oklahoma, 101 Stephenson
Pkwy, Norman, Oklahoma 73019, United States
| | - Lennard J. M. Dekker
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040,
3000 CA Rotterdam, The Netherlands
| | - Martijn M. VanDuijn
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040,
3000 CA Rotterdam, The Netherlands
| | - Xiaowen Liu
- Department
of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 535 West Michigan Street, IT 475, Indianapolis, Indiana 46202, United States
- Center
for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Suite 5000, Indianapolis, Indiana 46202, United States
| | - Nikola Tolić
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Mikhail Dvorkin
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, Saint Petersburg 194021, Russia
| | - Sonya Alexandrova
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, Saint Petersburg 194021, Russia
| | - Theo M. Luider
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040,
3000 CA Rotterdam, The Netherlands
| | - Ljiljana Paša-Tolić
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Pavel A. Pevzner
- Center
for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, 7-9 Universitetskaya nab., Saint Petersburg 199034, Russia
- Department
of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, United States
| |
Collapse
|
22
|
Melani RD, Araujo GD, Carvalho PC, Goto L, Nogueira FC, Junqueira M, Domont GB. Seeing beyond the tip of the iceberg: A deep analysis of the venome of the Brazilian Rattlesnake, Crotalus durissus terrificus. EUPA OPEN PROTEOMICS 2015. [DOI: 10.1016/j.euprot.2015.05.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
23
|
Abstract
Over a period of more than 300 million years, spiders have evolved complex venoms containing an extraordinary array of toxins for prey capture and defense against predators. The major components of most spider venoms are small disulfide-bridged peptides that are highly stable and resistant to proteolytic degradation. Moreover, many of these peptides have high specificity and potency toward molecular targets of therapeutic importance. This unique combination of bioactivity and stability has made spider-venom peptides valuable both as pharmacological tools and as leads for drug development. This review describes recent advances in spider-venom-based drug discovery pipelines. We discuss spider-venom-derived peptides that are currently under investigation for treatment of a diverse range of pathologies including pain, stroke and cancer.
Collapse
|
24
|
Szabo Z, Janaky T. Challenges and developments in protein identification using mass spectrometry. Trends Analyt Chem 2015. [DOI: 10.1016/j.trac.2015.03.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
25
|
Guthals A, Boucher C, Bandeira N. The generating function approach for Peptide identification in spectral networks. J Comput Biol 2015; 22:353-66. [PMID: 25423621 PMCID: PMC4425220 DOI: 10.1089/cmb.2014.0165] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Tandem mass (MS/MS) spectrometry has become the method of choice for protein identification and has launched a quest for the identification of every translated protein and peptide. However, computational developments have lagged behind the pace of modern data acquisition protocols and have become a major bottleneck in proteomics analysis of complex samples. As it stands today, attempts to identify MS/MS spectra against large databases (e.g., the human microbiome or 6-frame translation of the human genome) face a search space that is 10-100 times larger than the human proteome, where it becomes increasingly challenging to separate between true and false peptide matches. As a result, the sensitivity of current state-of-the-art database search methods drops by nearly 38% to such low identification rates that almost 90% of all MS/MS spectra are left as unidentified. We address this problem by extending the generating function approach to rigorously compute the joint spectral probability of multiple spectra being matched to peptides with overlapping sequences, thus enabling the confident assignment of higher significance to overlapping peptide-spectrum matches (PSMs). We find that these joint spectral probabilities can be several orders of magnitude more significant than individual PSMs, even in the ideal case when perfect separation between signal and noise peaks could be achieved per individual MS/MS spectrum. After benchmarking this approach on a typical lysate MS/MS dataset, we show that the proposed intersecting spectral probabilities for spectra from overlapping peptides improve peptide identification by 30-62%.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California–San Diego, La Jolla, California
| | - Christina Boucher
- Department of Computer Science, Colorado State University, Fort Collins, Colorado
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California–San Diego, La Jolla, California
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California–San Diego, La Jolla, California
| |
Collapse
|
26
|
Petras D, Heiss P, Süssmuth RD, Calvete JJ. Venom Proteomics of Indonesian King Cobra, Ophiophagus hannah: Integrating Top-Down and Bottom-Up Approaches. J Proteome Res 2015; 14:2539-56. [DOI: 10.1021/acs.jproteome.5b00305] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Daniel Petras
- Institut
für Chemie, Technische Universität Berlin, Müller-Breslau-Straße
10, 10623 Berlin, Germany
| | - Paul Heiss
- Institut
für Chemie, Technische Universität Berlin, Müller-Breslau-Straße
10, 10623 Berlin, Germany
| | - Roderich D. Süssmuth
- Institut
für Chemie, Technische Universität Berlin, Müller-Breslau-Straße
10, 10623 Berlin, Germany
| | - Juan J. Calvete
- Laboratorio
de Venómica Estructural y Funcional, Instituto de Biomedicina de Valencia, CSIC, 46010 Valencia, Spain
| |
Collapse
|
27
|
Liu X, Dekker LJM, Wu S, Vanduijn MM, Luider TM, Tolić N, Kou Q, Dvorkin M, Alexandrova S, Vyatkina K, Paša-Tolić L, Pevzner PA. De Novo Protein Sequencing by Combining Top-Down and Bottom-Up Tandem Mass Spectra. J Proteome Res 2014; 13:3241-8. [DOI: 10.1021/pr401300m] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Xiaowen Liu
- Department
of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 535 West Michigan Street, IT 475, Indianapolis, Indiana 46202, United States
- Center
for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, Suite 5000, Indianapolis, Indiana 46202, United States
| | - Lennard J. M. Dekker
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040, 3000
CA Rotterdam, The Netherlands
| | - Si Wu
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Martijn M. Vanduijn
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040, 3000
CA Rotterdam, The Netherlands
| | - Theo M. Luider
- Department
of Neurology, Erasmus University Medical Center, Postbus 2040, 3000
CA Rotterdam, The Netherlands
| | - Nikola Tolić
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Qiang Kou
- Department
of BioHealth Informatics, Indiana University-Purdue University Indianapolis, 535 West Michigan Street, IT 475, Indianapolis, Indiana 46202, United States
| | - Mikhail Dvorkin
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, St. Petersburg 194021, Russia
| | - Sonya Alexandrova
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, St. Petersburg 194021, Russia
| | - Kira Vyatkina
- Algorithmic
Biology Laboratory, Saint Petersburg Academic University, 8/3 Khlopina
Str, St. Petersburg 194021, Russia
| | - Ljiljana Paša-Tolić
- Environmental
Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Pavel A. Pevzner
- Department
of Computer Science and Engineering, University of California, 9500 Gilman
Drive, San Diego, California 92093, United States
| |
Collapse
|
28
|
Killian CE, Wilt FH. Investigating protein function in biomineralized tissues using molecular biology techniques. Methods Enzymol 2014; 532:367-88. [PMID: 24188776 DOI: 10.1016/b978-0-12-416617-2.00017-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We describe modern molecular biology methods currently used in the study of biomineralization. We focus our descriptions on two areas of biomineralization research in which these methods have been particularly powerful. The first area is the use of modern molecular methods to identify and characterize the so-called occluded matrix proteins present in mineralized tissues. More specifically, we describe the use of RNA-seq and the next generation of DNA sequencers and the use of direct protein sequencing and mass spectrometers as ways of identifying proteins present in mineralized tissues. The second area is the use of molecular methods to examine the function of proteins in biomineralization. RNA interference (RNAi), morpholino antisense, and other methods are described and discussed as ways of elucidating protein function.
Collapse
|
29
|
Calvete JJ. Proteomic tools against the neglected pathology of snake bite envenoming. Expert Rev Proteomics 2014; 8:739-58. [DOI: 10.1586/epr.11.61] [Citation(s) in RCA: 140] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
30
|
Snake venomics: From the inventory of toxins to biology. Toxicon 2013; 75:44-62. [DOI: 10.1016/j.toxicon.2013.03.020] [Citation(s) in RCA: 148] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2013] [Revised: 03/06/2013] [Accepted: 03/13/2013] [Indexed: 01/05/2023]
|
31
|
Guthals A, Clauser KR, Frank AM, Bandeira N. Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides. J Proteome Res 2013; 12:2846-57. [PMID: 23679345 DOI: 10.1021/pr400173d] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Full-length de novo sequencing of unknown proteins remains a challenging open problem. Traditional methods that sequence spectra individually are limited by short peptide length, incomplete peptide fragmentation, and ambiguous de novo interpretations. We address these issues by determining consensus sequences for assembled tandem mass (MS/MS) spectra from overlapping peptides (e.g., by using multiple enzymatic digests). We have combined electron-transfer dissociation (ETD) with collision-induced dissociation (CID) and higher-energy collision-induced dissociation (HCD) fragmentation methods to boost interpretation of long, highly charged peptides and take advantage of corroborating b/y/c/z ions in CID/HCD/ETD. Using these strategies, we show that triplet CID/HCD/ETD MS/MS spectra from overlapping peptides yield de novo sequences of average length 70 AA and as long as 200 AA at up to 99% sequencing accuracy.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, United States
| | | | | | | |
Collapse
|
32
|
Zhang Y, Fonslow BR, Shan B, Baek MC, Yates JR. Protein analysis by shotgun/bottom-up proteomics. Chem Rev 2013; 113:2343-94. [PMID: 23438204 PMCID: PMC3751594 DOI: 10.1021/cr3003533] [Citation(s) in RCA: 1025] [Impact Index Per Article: 85.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Yaoyang Zhang
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bryan R. Fonslow
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bing Shan
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Moon-Chang Baek
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- Department of Molecular Medicine, Cell and Matrix Biology Research Institute, School of Medicine, Kyungpook National University, Daegu 700-422, Republic of Korea
| | - John R. Yates
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| |
Collapse
|
33
|
Clowers BH, Wunschel DS, Kreuzer HW, Engelmann HE, Valentine N, Wahl KL. Characterization of Residual Medium Peptides from Yersinia pestis Cultures. Anal Chem 2013; 85:3933-9. [DOI: 10.1021/ac3034272] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Brian H. Clowers
- Pacific Northwest National Laboratory, Richland, Washington
99352, United States
| | - David S. Wunschel
- Pacific Northwest National Laboratory, Richland, Washington
99352, United States
| | - Helen W. Kreuzer
- Pacific Northwest National Laboratory, Richland, Washington
99352, United States
| | - Heather E. Engelmann
- Pacific Northwest National Laboratory, Richland, Washington
99352, United States
| | - Nancy Valentine
- Pacific Northwest National Laboratory, Richland, Washington
99352, United States
| | - Karen L. Wahl
- Pacific Northwest National Laboratory, Richland, Washington
99352, United States
| |
Collapse
|
34
|
Guthals A, Watrous JD, Dorrestein PC, Bandeira N. The spectral networks paradigm in high throughput mass spectrometry. MOLECULAR BIOSYSTEMS 2013; 8:2535-44. [PMID: 22610447 DOI: 10.1039/c2mb25085c] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.
Collapse
Affiliation(s)
- Adrian Guthals
- Dept. Computer Science and Engineering, University of California, San Diego, USA
| | | | | | | |
Collapse
|
35
|
Chi H, Chen H, He K, Wu L, Yang B, Sun RX, Liu J, Zeng WF, Song CQ, He SM, Dong MQ. pNovo+: De Novo Peptide Sequencing Using Complementary HCD and ETD Tandem Mass Spectra. J Proteome Res 2012; 12:615-25. [DOI: 10.1021/pr3006843] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haifeng Chen
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kun He
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Long Wu
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bing Yang
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Rui-Xiang Sun
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Jianyun Liu
- Laboratory of Intelligent Recognition
and Image Processing, Beijing Key Laboratory of Digital Media, Beihang University, Beijing, 100191, China
| | - Wen-Feng Zeng
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
- Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chun-Qing Song
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| | - Si-Min He
- Key Lab of Intelligent Information
Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing, Beijing 102206, China
| |
Collapse
|
36
|
Abstract
Discovery or shotgun proteomics has emerged as the most powerful technique to comprehensively map out a proteome. Reconstruction of protein identities from the raw mass spectrometric data constitutes a cornerstone of any shotgun proteomics workflow. The inherent uncertainty of mass spectrometric data and the complexity of a proteome render protein inference and the statistical validation of protein identifications a non-trivial task, still being a subject of ongoing research. This review aims to survey the different conceptual approaches to the different tasks of inferring and statistically validating protein identifications and to discuss their implications on the scope of proteome exploration.
Collapse
Affiliation(s)
- Manfred Claassen
- Computer Science Department, Stanford University, Stanford, CA 94305-9010, USA.
| |
Collapse
|
37
|
Guthals A, Clauser KR, Bandeira N. Shotgun protein sequencing with meta-contig assembly. Mol Cell Proteomics 2012; 11:1084-96. [PMID: 22798278 DOI: 10.1074/mcp.m111.015768] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.
Collapse
Affiliation(s)
- Adrian Guthals
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, California 92093, USA.
| | | | | |
Collapse
|
38
|
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012; 19:455-77. [PMID: 22506599 DOI: 10.1089/cmb.2012.0021] [Citation(s) in RCA: 17763] [Impact Index Per Article: 1366.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.
Collapse
Affiliation(s)
- Anton Bankevich
- Algorithmic Biology Laboratory, St. Petersburg Academic University, Russian Academy of Sciences, St. Petersburg, Russia
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Ma B, Johnson R. De novo sequencing and homology searching. Mol Cell Proteomics 2012; 11:O111.014902. [PMID: 22090170 PMCID: PMC3277775 DOI: 10.1074/mcp.o111.014902] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Revised: 11/08/2011] [Indexed: 11/06/2022] Open
Abstract
In proteomics, de novo sequencing is the process of deriving peptide sequences from tandem mass spectra without the assistance of a sequence database. Such analyses have traditionally been performed manually by human experts, and more recently by computer programs that have been developed because of the need for higher throughput. Although powerful, de novo sequencing often can only determine partially correct sequence tags because of imperfect tandem mass spectra. However, these sequence tags can then be searched in a sequence database to identify the exact or a homologous peptide. Homology searches are particularly useful for the study of organisms whose genomes have not been sequenced. This tutorial will present background important to understanding de novo sequencing, suggestions on how to do this manually, plus descriptions of computer algorithms used to automate this process and to subsequently carryout homology-based database searches. This Tutorial is part of the International Proteomics Tutorial Programme (IPTP 1).
Collapse
Affiliation(s)
- Bin Ma
- From the ‡School of Computer Science, University of Waterloo, 200 University Ave. W, Waterloo, ON, Canada N2L 3G1
| | | |
Collapse
|
40
|
Yang YL, Xu Y, Kersten RD, Liu WT, Meehan MJ, Moore BS, Bandeira N, Dorrestein PC. Connecting chemotypes and phenotypes of cultured marine microbial assemblages by imaging mass spectrometry. Angew Chem Int Ed Engl 2011; 50:5839-42. [PMID: 21574228 DOI: 10.1002/anie.201101225] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2011] [Revised: 04/05/2011] [Indexed: 01/14/2023]
Affiliation(s)
- Yu-Liang Yang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, CA, USA
| | | | | | | | | | | | | | | |
Collapse
|
41
|
Yang YL, Xu Y, Kersten RD, Liu WT, Meehan MJ, Moore BS, Bandeira N, Dorrestein PC. Connecting Chemotypes and Phenotypes of Cultured Marine Microbial Assemblages by Imaging Mass Spectrometry. Angew Chem Int Ed Engl 2011. [DOI: 10.1002/ange.201101225] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
42
|
Godzik A. Metagenomics and the protein universe. Curr Opin Struct Biol 2011; 21:398-403. [PMID: 21497084 DOI: 10.1016/j.sbi.2011.03.010] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2011] [Revised: 03/07/2011] [Accepted: 03/24/2011] [Indexed: 02/07/2023]
Abstract
Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones.
Collapse
Affiliation(s)
- Adam Godzik
- Program on Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92037, USA.
| |
Collapse
|
43
|
Dasari S, Chambers MC, Codreanu SG, Liebler DC, Collins BC, Pennington SR, Gallagher WM, Tabb DL. Sequence tagging reveals unexpected modifications in toxicoproteomics. Chem Res Toxicol 2011; 24:204-16. [PMID: 21214251 DOI: 10.1021/tx100275t] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Toxicoproteomic samples are rich in posttranslational modifications (PTMs) of proteins. Identifying these modifications via standard database searching can incur significant performance penalties. Here, we describe the latest developments in TagRecon, an algorithm that leverages inferred sequence tags to identify modified peptides in toxicoproteomic data sets. TagRecon identifies known modifications more effectively than the MyriMatch database search engine. TagRecon outperformed state of the art software in recognizing unanticipated modifications from LTQ, Orbitrap, and QTOF data sets. We developed user-friendly software for detecting persistent mass shifts from samples. We follow a three-step strategy for detecting unanticipated PTMs in samples. First, we identify the proteins present in the sample with a standard database search. Next, identified proteins are interrogated for unexpected PTMs with a sequence tag-based search. Finally, additional evidence is gathered for the detected mass shifts with a refinement search. Application of this technology on toxicoproteomic data sets revealed unintended cross-reactions between proteins and sample processing reagents. Twenty-five proteins in rat liver showed signs of oxidative stress when exposed to potentially toxic drugs. These results demonstrate the value of mining toxicoproteomic data sets for modifications.
Collapse
Affiliation(s)
- Surendra Dasari
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37232-0006, United States
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Abstract
While advances in tandem mass spectrometry (MS/MS) steadily increase the rate of generation of MS/MS spectra, standard algorithmic approaches for peptide identification recently seemed to be reaching the limit on the amount of information that could be extracted from MS/MS spectra. However, a closer look reveals that a common limiting procedure is to analyze each spectrum in isolation, even though high throughput mass spectrometry regularly generates many spectra from related peptides. By capitalizing on this redundancy we show that, similarly to the alignment of protein sequences, unidentified MS/MS spectra can also be aligned for the identification of modified and unmodified variants of the same peptide. Moreover, this alignment procedure can be iterated for the accurate grouping of multiple modification variants of the same peptides. Furthermore, the combination of shotgun proteomics with the alignment of spectra from overlapping peptides led to the development of Shotgun Protein Sequencing - similarly to the assembly of DNA reads into whole genomic sequences, we show that assembly of MS/MS spectra enables the highest ever de novo sequencing accuracy, while recovering nearly complete protein sequences. We further show that shotgun protein sequencing has the potential to overcome the limitations of -current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.
Collapse
Affiliation(s)
- Nuno Bandeira
- Center for Computational Mass Spectrometry, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
45
|
Chi H, Sun RX, Yang B, Song CQ, Wang LH, Liu C, Fu Y, Yuan ZF, Wang HP, He SM, Dong MQ. pNovo: de novo peptide sequencing and identification using HCD spectra. J Proteome Res 2010; 9:2713-24. [PMID: 20329752 DOI: 10.1021/pr100182k] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
De novo peptide sequencing has improved remarkably in the past decade as a result of better instruments and computational algorithms. However, de novo sequencing can correctly interpret only approximately 30% of high- and medium-quality spectra generated by collision-induced dissociation (CID), which is much less than database search. This is mainly due to incomplete fragmentation and overlap of different ion series in CID spectra. In this study, we show that higher-energy collisional dissociation (HCD) is of great help to de novo sequencing because it produces high mass accuracy tandem mass spectrometry (MS/MS) spectra without the low-mass cutoff associated with CID in ion trap instruments. Besides, abundant internal and immonium ions in the HCD spectra can help differentiate similar peptide sequences. Taking advantage of these characteristics, we developed an algorithm called pNovo for efficient de novo sequencing of peptides from HCD spectra. pNovo gave correct identifications to 80% or more of the HCD spectra identified by database search. The number of correct full-length peptides sequenced by pNovo is comparable with that obtained by database search. A distinct advantage of de novo sequencing is that deamidated peptides and peptides with amino acid mutations can be identified efficiently without extra cost in computation. In summary, implementation of the HCD characteristics makes pNovo an excellent tool for de novo peptide sequencing from HCD spectra.
Collapse
Affiliation(s)
- Hao Chi
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, People's Republic of China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Castellana NE, Pham V, Arnott D, Lill JR, Bafna V. Template proteogenomics: sequencing whole proteins using an imperfect database. Mol Cell Proteomics 2010; 9:1260-70. [PMID: 20164058 DOI: 10.1074/mcp.m900504-mcp200] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Database search algorithms are the primary workhorses for the identification of tandem mass spectra. However, these methods are limited to the identification of spectra for which peptides are present in the database, preventing the identification of peptides from mutated or alternatively spliced sequences. A variety of methods has been developed to search a spectrum against a sequence allowing for variations. Some tools determine the sequence of the homologous protein in the related species but do not report the peptide in the target organism. Other tools consider variations, including modifications and mutations, in reconstructing the target sequence. However, these tools will not work if the template (homologous peptide) is missing in the database, and they do not attempt to reconstruct the entire protein target sequence. De novo identification of peptide sequences is another possibility, because it does not require a protein database. However, the lack of database reduces the accuracy. We present a novel proteogenomic approach, GenoMS, that draws on the strengths of database and de novo peptide identification methods. Protein sequence templates (i.e. proteins or genomic sequences that are similar to the target protein) are identified using the database search tool InsPecT. The templates are then used to recruit, align, and de novo sequence regions of the target protein that have diverged from the database or are missing. We used GenoMS to reconstruct the full sequence of an antibody by using spectra acquired from multiple digests using different proteases. Antibodies are a prime example of proteins that confound standard database identification techniques. The mature antibody genes result from large-scale genome rearrangements with flexible fusion boundaries and somatic hypermutation. Using GenoMS we automatically reconstruct the complete sequences of two immunoglobulin chains with accuracy greater than 98% using a diverged protein database. Using the genome as the template, we achieve accuracy exceeding 97%.
Collapse
Affiliation(s)
- Natalie E Castellana
- Department of Computer Science, University of California, San Diego, San Diego, California 92093, USA
| | | | | | | | | |
Collapse
|
47
|
Ma B. Challenges in Computational Analysis of Mass Spectrometry Data for Proteomics. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 2010; 25:107-123. [DOI: 10.1007/s11390-010-9309-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
48
|
Datta R, Bern M. Spectrum fusion: using multiple mass spectra for de novo Peptide sequencing. J Comput Biol 2009; 16:1169-82. [PMID: 19645594 DOI: 10.1089/cmb.2009.0122] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Abstract
We report on a new algorithm for combining the information from several mass spectra of the same peptide. The algorithm automatically learns peptide fragmentation patterns, so that it can handle spectra from any instrument and fragmentation technique. We demonstrate the utility of the algorithm, and the power of multiple spectra, by showing that combining pairs of spectra (one CID and one ETD) greatly improves de novo sequencing success rates.
Collapse
Affiliation(s)
- Ritendra Datta
- Computing Science Lab, Palo Alto Research Center (PARC) , Palo Alto, California, USA.
| | | |
Collapse
|
49
|
Bramham K, Mistry HD, Poston L, Chappell LC, Thompson AJ. The non-invasive biopsy--will urinary proteomics make the renal tissue biopsy redundant? QJM 2009; 102:523-38. [PMID: 19553250 DOI: 10.1093/qjmed/hcp071] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Proteomics is a rapidly advancing technique which gives functional insight into gene expression in living organisms. Urine is an ideal medium for study as it is readily available, easily obtained and less complex than other bodily fluids. Considerable progress has been made over the last 5 years in the study of urinary proteomics as a diagnostic tool for renal disease. Advantages over the traditional renal biopsy include accessibility, safety, the possibility of serial sampling and the potential for non-invasive prognostic and diagnostic monitoring of disease and an individual's response to treatment. Urinary proteomics is now moving from a discovery phase in small studies to a validation phase in much larger numbers of patients with renal disease. Whilst there are still some limitations in methodology, which are assessed in this review, the possibility of urinary proteomics replacing the invasive tissue biopsy for diagnosis of renal disease is becoming an increasingly realistic option.
Collapse
Affiliation(s)
- K Bramham
- Maternal and Fetal Research Unit, KCL Division of Reproduction and Endocrinology, St Thomas' Hospital, Westminster Bridge Road, London SE1 7EH, UK.
| | | | | | | | | |
Collapse
|
50
|
Liu X, Han Y, Yuen D, Ma B. Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy. ACTA ACUST UNITED AC 2009; 25:2174-80. [PMID: 19535534 DOI: 10.1093/bioinformatics/btp366] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The bottom-up tandem mass spectrometry (MS/MS) is regularly used in proteomics nowadays for identifying proteins from a sequence database. De novo sequencing software is also available for sequencing novel peptides with relatively short sequence lengths. However, automated sequencing of novel proteins from MS/MS remains a challenging problem. RESULTS Very often, although the target protein is novel, it has a homologous protein included in a known database. When this happens, we propose a novel algorithm and automated software tool, named Champs, for sequencing the complete protein from MS/MS data of a few enzymatic digestions of the purified protein. Validation with two standard proteins showed that our automated method yields >99% sequence coverage and 100% sequence accuracy on these two proteins. Our method is useful to sequence novel proteins or 're-sequence' a protein that has mutations comparing with the database protein sequence.
Collapse
Affiliation(s)
- Xiaowen Liu
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada
| | | | | | | |
Collapse
|