1
|
Klein J, Carvalho L, Zaia J. Expanding N-glycopeptide identifications by modeling fragmentation, elution, and glycome connectivity. Nat Commun 2024; 15:6168. [PMID: 39039063 PMCID: PMC11263600 DOI: 10.1038/s41467-024-50338-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 07/08/2024] [Indexed: 07/24/2024] Open
Abstract
Accurate glycopeptide identification in mass spectrometry-based glycoproteomics is a challenging problem at scale. Recent innovation has been made in increasing the scope and accuracy of glycopeptide identifications, with more precise uncertainty estimates for each part of the structure. We present a dynamically adapting relative retention time model for detecting and correcting ambiguous glycan assignments that are difficult to detect from fragmentation alone, a layered approach to glycopeptide fragmentation modeling that improves N-glycopeptide identification in samples without compromising identification quality, and a site-specific method to increase the depth of the glycoproteome confidently identifiable even further. We demonstrate our techniques on a set of previously published datasets, showing the performance gains at each stage of optimization. These techniques are provided in the open-source glycomics and glycoproteomics platform GlycReSoft available at https://github.com/mobiusklein/glycresoft .
Collapse
Affiliation(s)
- Joshua Klein
- Program for Bioinformatics, Boston University, Boston, MA, US.
| | - Luis Carvalho
- Program for Bioinformatics, Boston University, Boston, MA, US
- Department of Math and Statistics, Boston University, Boston, MA, US
| | - Joseph Zaia
- Program for Bioinformatics, Boston University, Boston, MA, US.
- Department of Biochemistry and Cell Biology, Boston University, Boston, MA, US.
| |
Collapse
|
2
|
Ye J, He X, Wang S, Dong MQ, Wu F, Lu S, Feng F. Test-Time Training for Deep MS/MS Spectrum Prediction Improves Peptide Identification. J Proteome Res 2024; 23:550-559. [PMID: 38153036 DOI: 10.1021/acs.jproteome.3c00229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
In bottom-up proteomics, peptide-spectrum matching is critical for peptide and protein identification. Recently, deep learning models have been used to predict tandem mass spectra of peptides, enabling the calculation of similarity scores between the predicted and experimental spectra for peptide-spectrum matching. These models follow the supervised learning paradigm, which trains a general model using paired peptides and spectra from standard data sets and directly employs the model on experimental data. However, this approach can lead to inaccurate predictions due to differences between the training data and the experimental data, such as sample types, enzyme specificity, and instrument calibration. To tackle this problem, we developed a test-time training paradigm that adapts the pretrained model to generate experimental data-specific models, namely, PepT3. PepT3 yields a 10-40% increase in peptide identification depending on the variability in training and experimental data. Intriguingly, when applied to a patient-derived immunopeptidomic sample, PepT3 increases the identification of tumor-specific immunopeptide candidates by 60%. Two-thirds of the newly identified candidates are predicted to bind to the patient's human leukocyte antigen isoforms. To facilitate access of the model and all the results, we have archived all the intermediate files in Zenodo.org with identifier 8231084.
Collapse
Affiliation(s)
- Jianbai Ye
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Xiangnan He
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Shujuan Wang
- National Institute of Biological Sciences, Beijing 102206, China
| | - Meng-Qiu Dong
- National Institute of Biological Sciences, Beijing 102206, China
| | - Feng Wu
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
| | - Shan Lu
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, California 92093, United States
| | - Fuli Feng
- MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
3
|
Wu HT, Riggs DL, Lyon YA, Julian RR. Statistical Framework for Identifying Differences in Similar Mass Spectra: Expanding Possibilities for Isomer Identification. Anal Chem 2023; 95:6996-7005. [PMID: 37128750 PMCID: PMC10157605 DOI: 10.1021/acs.analchem.3c00495] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 04/04/2023] [Indexed: 05/03/2023]
Abstract
Isomeric molecules are important analytes in many biological and chemical arenas, yet their similarity poses challenges for many analytical methods, including mass spectrometry (MS). Tandem-MS provides significantly more information about isomers than intact mass analysis, but highly similar fragmentation patterns are common and include cases where no unique m/z peaks are generated between isomeric pairs. However, even in such situations, differences in peak intensity can exist and potentially contain additional information. Herein, we present a framework for comparing mass spectra that differ only in terms of peak intensity and include calculation of a statistical probability that the spectra derive from different analytes. This framework allows for confident identification of peptide isomers by collision-induced dissociation, higher-energy collisional dissociation, electron-transfer dissociation, and radical-directed dissociation. The method successfully identified many types of isomers including various d/l amino acid substitutions, Leu/Ile, and Asp/IsoAsp. The method can accommodate a wide range of changes in instrumental settings including source voltages, isolation widths, and resolution without influencing the analysis. It is shown that quantification of the composition of isomeric mixtures can be enabled with calibration curves, which were found to be highly linear and reproducible. The analysis can be implemented with data collected by either direct infusion or liquid-chromatography MS. Although this framework is presented in the context of isomer characterization, it should also prove useful in many other contexts where similar mass spectra are generated.
Collapse
Affiliation(s)
- Hoi-Ting Wu
- Department of Chemistry, University of California, Riverside, California 92521, United States
| | - Dylan L. Riggs
- Department of Chemistry, University of California, Riverside, California 92521, United States
| | - Yana A. Lyon
- Department of Chemistry, University of California, Riverside, California 92521, United States
| | - Ryan R. Julian
- Department of Chemistry, University of California, Riverside, California 92521, United States
| |
Collapse
|
4
|
Cox J. Prediction of peptide mass spectral libraries with machine learning. Nat Biotechnol 2023; 41:33-43. [PMID: 36008611 DOI: 10.1038/s41587-022-01424-w] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 07/11/2022] [Indexed: 01/21/2023]
Abstract
The recent development of machine learning methods to identify peptides in complex mass spectrometric data constitutes a major breakthrough in proteomics. Longstanding methods for peptide identification, such as search engines and experimental spectral libraries, are being superseded by deep learning models that allow the fragmentation spectra of peptides to be predicted from their amino acid sequence. These new approaches, including recurrent neural networks and convolutional neural networks, use predicted in silico spectral libraries rather than experimental libraries to achieve higher sensitivity and/or specificity in the analysis of proteomics data. Machine learning is galvanizing applications that involve large search spaces, such as immunopeptidomics and proteogenomics. Current challenges in the field include the prediction of spectra for peptides with post-translational modifications and for cross-linked pairs of peptides. Permeation of machine-learning-based spectral prediction into search engines and spectrum-centric data-independent acquisition workflows for diverse peptide classes and measurement conditions will continue to push sensitivity and dynamic range in proteomics applications in the coming years.
Collapse
Affiliation(s)
- Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany.
- Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway.
| |
Collapse
|
5
|
Shin H, Park Y, Ahn K, Kim S. Accurate Prediction of y Ions in Beam-Type Collision-Induced Dissociation Using Deep Learning. Anal Chem 2022; 94:7752-7758. [PMID: 35609248 PMCID: PMC9178553 DOI: 10.1021/acs.analchem.1c03184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Peptide fragmentation spectra contain critical information for the identification of peptides by mass spectrometry. In this study, we developed an algorithm that more accurately predicts the high-intensity peaks among the peptide spectra. The training data are composed of 180,833 peptides from the National Institute of Standards and Technology and Proteomics Identification database, which were fragmented by either quadrupole time-of-flight or triple-quadrupole collision-induced dissociation methods. Exploratory analysis of the peptide fragmentation pattern was focused on the highest intensity peaks that showed proline, peptide length, and a sliding window of four amino acid combination that can be exploited as key features. The amino acid sequence of each peptide and each of the key features were allocated to different layers of the model, where recurrent neural network, convolutional neural network, and fully connected neural network were used. The trained model, PrAI-frag, accurately predicts the fragmentation spectra compared to previous machine learning-based prediction algorithms. The model excels at high-intensity peak prediction, which is advantageous to selective/multiple reaction monitoring application. PrAI-frag is provided via a Web server which can be used for peptides of length 6-15.
Collapse
Affiliation(s)
- HyeonSeok Shin
- Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea
| | - Youngmin Park
- Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea
| | - Kyunggeun Ahn
- Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea
| | - Sungsoo Kim
- Bio Convergence Research Institute, Bertis Inc., Heungdeok 1-ro, Giheung-gu, Yongin-si, 16954 Gyeonggi-do, Republic of Korea
| |
Collapse
|
6
|
Abstract
Peptides play a crucial role in many vitally important functions of living organisms. The goal of peptidomics is the identification of the "peptidome," the whole peptide content of a cell, organ, tissue, body fluid, or organism. In peptidomic or proteomic studies, capillary electrophoresis (CE) is an alternative technique for liquid chromatography. It is a highly efficient and fast separation method requiring extremely low amounts of sample. In peptidomic approaches, CE is commonly combined with mass spectrometric (MS) detection. Most often, CE is coupled with electrospray ionization MS and less frequently with matrix-assisted laser desorption/ionization MS. CE-MS has been employed in numerous studies dealing with determination of peptide biomarkers in different body fluids for various diseases, or in food peptidomic research for the analysis and identification of peptides with special biological activities. In addition to the above topics, sample preparation techniques commonly applied in peptidomics before CE separation and possibilities for peptide identification and quantification by CE-MS or CE-MS/MS methods are discussed in this chapter.
Collapse
|
7
|
Johnson J, Harman VM, Franco C, Emmott E, Rockliffe N, Sun Y, Liu LN, Takemori A, Takemori N, Beynon RJ. Construction of à la carte QconCAT protein standards for multiplexed quantification of user-specified target proteins. BMC Biol 2021; 19:195. [PMID: 34496840 PMCID: PMC8425055 DOI: 10.1186/s12915-021-01135-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 08/28/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND QconCATs are quantitative concatamers for proteomic applications that yield stoichiometric quantities of sets of stable isotope-labelled internal standards. However, changing a QconCAT design, for example, to replace poorly performing peptide standards has been a protracted process. RESULTS We report a new approach to the assembly and construction of QconCATs, based on synthetic biology precepts of biobricks, making use of loop assembly to construct larger entities from individual biobricks. The basic building block (a Qbrick) is a segment of DNA that encodes two or more quantification peptides for a single protein, readily held in a repository as a library resource. These Qbricks are then assembled in a one tube ligation reaction that enforces the order of assembly, to yield short QconCATs that are useable for small quantification products. However, the DNA context of the short construct also allows a second cycle of loop assembly such that five different short QconCATs can be assembled into a longer QconCAT in a second, single tube ligation. From a library of Qbricks, a bespoke QconCAT can be assembled quickly and efficiently in a form suitable for expression and labelling in vivo or in vitro. CONCLUSIONS We refer to this approach as the ALACAT strategy as it permits à la carte design of quantification standards. ALACAT methodology is a major gain in flexibility of QconCAT implementation as it supports rapid editing and improvement of QconCATs and permits, for example, substitution of one peptide by another.
Collapse
Affiliation(s)
- James Johnson
- GeneMill, Institute of Systems Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool, L69 7ZB, UK
| | - Victoria M Harman
- Centre for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool, L697ZB, UK
| | - Catarina Franco
- Centre for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool, L697ZB, UK
| | - Edward Emmott
- Centre for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool, L697ZB, UK
| | - Nichola Rockliffe
- GeneMill, Institute of Systems Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool, L69 7ZB, UK
| | - Yaqi Sun
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool, L697ZB, UK
| | - Lu-Ning Liu
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool, L697ZB, UK
| | - Ayako Takemori
- Division of Analytical Bio-Medicine, Advanced Research Support Center, Ehime University, Shitsukawa, Toon, Ehime, Japan
| | - Nobuaki Takemori
- Division of Analytical Bio-Medicine, Advanced Research Support Center, Ehime University, Shitsukawa, Toon, Ehime, Japan
| | - Robert J Beynon
- Centre for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool, L697ZB, UK.
| |
Collapse
|
8
|
Guan S, Bythell BJ. Size Dependent Fragmentation Chemistry of Short Doubly Protonated Tryptic Peptides. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:1020-1032. [PMID: 33779179 DOI: 10.1021/jasms.1c00009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Tandem mass spectrometry of electrospray ionized multiply charged peptide ions is commonly used to identify the sequence of peptide(s) and infer the identity of source protein(s). Doubly protonated peptide ions are consistently the most efficiently sequenced ions following collision-induced dissociation of peptides generated by tryptic digestion. While the broad characteristics of longer (N ≥ 8 residue) doubly protonated peptides have been investigated, there is comparatively little data on shorter systems where charge repulsion should exhibit the greatest influence on the dissociation chemistry. To address this gap and further understand the chemistry underlying collisional-dissociation of doubly charged tryptic peptides, two series of analytes ([GxR+2H]2+ and [AxR+2H]2+, x = 2-5) were investigated experimentally and with theory. We find distinct differences in the preference of bond cleavage sites for these peptides as a function of size and to a lesser extent composition. Density functional calculations at two levels of theory predict that the threshold relative energies required for bond cleavages at the same site for peptides of different size are quite similar (for example, b2-yN-2). In isolation, this finding is inconsistent with experiment. However, the predicted extent of entropy change of these reactions is size dependent. Subsequent RRKM rate constant calculations provide a far clearer picture of the kinetics of the competing bond cleavage reactions enabling rationalization of experimental findings. The M06-2X data were substantially more consistent with experiment than were the B3LYP data.
Collapse
Affiliation(s)
- Shanshan Guan
- Department of Chemistry and Biochemistry, Ohio University, 307 Chemistry Building, Athens, Ohio 45701, United States
- Department of Chemistry and Biochemistry, University of Missouri-St. Louis, 1 University Boulevard, St. Louis, Missouri 63121, United States
| | - Benjamin J Bythell
- Department of Chemistry and Biochemistry, Ohio University, 307 Chemistry Building, Athens, Ohio 45701, United States
- Department of Chemistry and Biochemistry, University of Missouri-St. Louis, 1 University Boulevard, St. Louis, Missouri 63121, United States
| |
Collapse
|
9
|
Wilburn DB, Richards AL, Swaney DL, Searle BC. CIDer: A Statistical Framework for Interpreting Differences in CID and HCD Fragmentation. J Proteome Res 2021; 20:1951-1965. [PMID: 33729787 DOI: 10.1021/acs.jproteome.0c00964] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Library searching is a powerful technique for detecting peptides using either data independent or data dependent acquisition. While both large-scale spectrum library curators and deep learning prediction approaches have focused on beam-type CID fragmentation (HCD), resonance CID fragmentation remains a popular technique. Here we demonstrate an approach to model the differences between HCD and CID spectra, and present a software tool, CIDer, for converting libraries between the two fragmentation methods. We demonstrate that just using a combination of simple linear models and basic principles of peptide fragmentation, we can explain up to 43% of the variation between ions fragmented by HCD and CID across an array of collision energy settings. We further show that in some circumstances, searching converted CID libraries can detect more peptides than searching existing CID libraries or libraries of machine learning predictions from FASTA databases. These results suggest that leveraging information in existing libraries by converting between HCD and CID libraries may be an effective interim solution while large-scale CID libraries are being developed.
Collapse
Affiliation(s)
- Damien B Wilburn
- Institute for Systems Biology, Seattle, Washington 98109, United States.,Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Alicia L Richards
- Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California 94158, United States.,J. David Gladstone Institutes, San Francisco, California 94158, United States.,Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, California 94158, United States
| | - Danielle L Swaney
- Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California 94158, United States.,J. David Gladstone Institutes, San Francisco, California 94158, United States.,Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, California 94158, United States
| | - Brian C Searle
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
10
|
Qin C, Luo X, Deng C, Shu K, Zhu W, Griss J, Hermjakob H, Bai M, Perez-Riverol Y. Deep learning embedder method and tool for mass spectra similarity search. J Proteomics 2020; 232:104070. [PMID: 33307250 DOI: 10.1016/j.jprot.2020.104070] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 11/25/2020] [Accepted: 12/01/2020] [Indexed: 12/31/2022]
Abstract
Spectral similarity calculation is widely used in protein identification tools and mass spectra clustering algorithms while comparing theoretical or experimental spectra. The performance of the spectral similarity calculation plays an important role in these tools and algorithms especially in the analysis of large-scale datasets. Recently, deep learning methods have been proposed to improve the performance of clustering algorithms and protein identification by training the algorithms with existing data and the use of multiple spectra and identified peptide features. While the efficiency of these algorithms is still under study in comparison with traditional approaches, their application in proteomics data analysis is becoming more common. Here, we propose the use of deep learning to improve spectral similarity comparison. We assessed the performance of deep learning for spectral similarity, with GLEAMS and a newly trained embedder model (DLEAMSE), which uses high-quality spectra from PRIDE Cluster. Also, we developed a new bioinformatics tool (mslookup - https://github.com/bigbio/DLEAMSE/) that allows users to quickly search for spectra in previously identified mass spectra publish in public repositories and spectral libraries. Finally, we released a human database to enable bioinformaticians and biologists to search for identified spectra in their machines. SIGNIFICANCE STATEMENT: Spectral similarity calculation plays an important role in proteomics data analysis. With deep learning's ability to learn the implicit and effective features from large-scale training datasets, deep learning-based MS/MS spectra embedding models has emerged as a solution to improve mass spectral clustering similarity calculation algorithms. We compare multiple similarity scoring and deep learning methods in terms of accuracy (compute the similarity for a pair of the mass spectrum) and computing-time performance. The benchmark results showed no major differences in accuracy between DLEAMSE and normalized dot product for spectrum similarity calculations. The DLEAMSE GPU implementation is faster than NDP in preprocessing on the GPU server and the similarity calculation of DLEAMSE (Euclidean distance on 32-D vectors) takes about 1/3 of dot product calculations. The deep learning model (DLEAMSE) encoding and embedding steps needed to run once for each spectrum and the embedded 32-D points can be persisted in the repository for future comparison, which is faster for future comparisons and large-scale data. Based on these, we proposed a new tool mslookup that enables the researcher to find spectra previously identified in public data. The tool can be also used to generate in-house databases of previously identified spectra to share with other laboratories and consortiums.
Collapse
Affiliation(s)
- Chunyuan Qin
- Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and telecommunications, Chongqing, China
| | - Xiyang Luo
- Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and telecommunications, Chongqing, China
| | - Chuan Deng
- Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and telecommunications, Chongqing, China
| | - Kunxian Shu
- Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and telecommunications, Chongqing, China
| | - Weimin Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China
| | - Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; Department of Dermatology, Medical University of Vienna, 1090 Vienna, Austria
| | - Henning Hermjakob
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Mingze Bai
- Chongqing Key Laboratory on Big Data for Bio Intelligence, Chongqing University of Posts and telecommunications, Chongqing, China; State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics, Beijing 102206, China.
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| |
Collapse
|
11
|
Wen B, Zeng W, Liao Y, Shi Z, Savage SR, Jiang W, Zhang B. Deep Learning in Proteomics. Proteomics 2020; 20:e1900335. [PMID: 32939979 PMCID: PMC7757195 DOI: 10.1002/pmic.201900335] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 09/14/2020] [Indexed: 12/17/2022]
Abstract
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Collapse
Affiliation(s)
- Bo Wen
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen‐Feng Zeng
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Chinese Academy of SciencesInstitute of Computing TechnologyBeijing100190China
| | - Yuxing Liao
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Zhiao Shi
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Sara R. Savage
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Wen Jiang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Bing Zhang
- Lester and Sue Smith Breast CenterBaylor College of MedicineHoustonTX77030USA
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| |
Collapse
|
12
|
Xu R, Sheng J, Bai M, Shu K, Zhu Y, Chang C. A Comprehensive Evaluation of MS/MS Spectrum Prediction Tools for Shotgun Proteomics. Proteomics 2020; 20:e1900345. [DOI: 10.1002/pmic.201900345] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 04/29/2020] [Indexed: 01/27/2023]
Affiliation(s)
- Rui Xu
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Beijing Institute of Lifeomics Beijing 102206 China
- Chongqing Key Laboratory on Big Data for Bio Intelligence Chongqing University of Posts and Telecommunications Chongqing 400065 China
| | - Jie Sheng
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Beijing Institute of Lifeomics Beijing 102206 China
- Chongqing Key Laboratory on Big Data for Bio Intelligence Chongqing University of Posts and Telecommunications Chongqing 400065 China
| | - Mingze Bai
- Chongqing Key Laboratory on Big Data for Bio Intelligence Chongqing University of Posts and Telecommunications Chongqing 400065 China
| | - Kunxian Shu
- Chongqing Key Laboratory on Big Data for Bio Intelligence Chongqing University of Posts and Telecommunications Chongqing 400065 China
| | - Yunping Zhu
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Beijing Institute of Lifeomics Beijing 102206 China
| | - Cheng Chang
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Beijing Institute of Lifeomics Beijing 102206 China
| |
Collapse
|
13
|
Ramachandran S, Thomas T. A Frequency-Based Approach to Predict the Low-Energy Collision-Induced Dissociation Fragmentation Spectra. ACS OMEGA 2020; 5:12615-12622. [PMID: 32548445 PMCID: PMC7288360 DOI: 10.1021/acsomega.9b03935] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 05/12/2020] [Indexed: 06/11/2023]
Abstract
Peptide identification algorithms rely on the comparison between the experimental tandem mass spectrometry spectrum and the theoretical spectrum to identify a peptide from the tandem mass spectra. Hence, it is important to understand the fragmentation process and predict the tandem mass spectra for high-throughput proteomics research. In this study, a novel method was developed to predict the theoretical ion trap collision-induced dissociation (CID) tandem mass spectra of the singly, doubly, and triply charged tryptic peptides. The fragmentation statistics of the ion trap CID spectra were used to predict the theoretical tandem mass spectra of the peptide sequence. The study estimated the relative cleavage frequency for each pair of adjacent amino acids along the peptide length. The study showed that the cleavage frequency can be directly used to predict the tandem mass spectra. The predicted spectra show a high correlation with the experimental spectra used in this study; 99.73% of the high-quality reference spectra have correlation scores greater than 0.8. The new method predicts the theoretical spectrum and correlates significantly better with the experimental spectrum as compared to the existing spectrum prediction tools OpenMS_Simulator, MS2PIP, and MS2PBPI, where only 80, 85.76, and 85.80% of the spectral count, respectively, has a correlation score greater than 0.8.
Collapse
|
14
|
Software-aided detection and structural characterization of cyclic peptide metabolites in biological matrix by high-resolution mass spectrometry. J Pharm Anal 2020; 10:240-246. [PMID: 32612870 PMCID: PMC7322757 DOI: 10.1016/j.jpha.2020.05.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 05/25/2020] [Accepted: 05/25/2020] [Indexed: 11/21/2022] Open
Abstract
Compared to their linear counterparts, cyclic peptides show better biological activities, such as antibacterial, immunosuppressive, and anti-tumor activities, and pharmaceutical properties due to their conformational rigidity. However, cyclic peptides could form numerous putative metabolites from potential hydrolytic cleavages and their fragments are very difficult to interpret. These characteristics pose a great challenge when analyzing metabolites of cyclic peptides by mass spectrometry. This study was to assess and apply a software-aided analytical workflow for the detection and structural characterization of cyclic peptide metabolites. Insulin and atrial natriuretic peptide (ANP) as model cyclic peptides were incubated with trypsin/chymotrypsin and/or rat liver S9, followed by data acquisition using TripleTOF® 5600. Resultant full-scan MS and MS/MS datasets were automatically processed through a combination of targeted and untargeted peak finding strategies. MS/MS spectra of predicted metabolites were interrogated against putative metabolite sequences, in light of a, b, y and internal fragment series. The resulting fragment assignments led to the confirmation and ranking of the metabolite sequences and identification of metabolic modification. As a result, 29 metabolites with linear or cyclic structures were detected in the insulin incubation with the hydrolytic enzymes. Sequences of twenty insulin metabolites were further determined, which were consistent with the hydrolytic sites of these enzymes. In the same manner, multiple metabolites of insulin and ANP formed in rat liver S9 incubation were detected and structurally characterized, some of which have not been previously reported. The results demonstrated the utility of software-aided data processing tool in detection and identification of cyclic peptide metabolites. A software-aided workflow enabling detection and characterization of cyclic peptide metabolites by LC/HRMS. Automatically data processing through a combination of targeted and untargeted peak finding strategies. MS/MS spectra of predicted metabolites interrogated against putative metabolite sequences. Rapidly determining metabolite profiles of insulin and atrial natriuretic peptide in rat liver S9. Potentially applicable to metabolic soft spot analysis and in vitro metabolism across species in drug discovery.
Collapse
|
15
|
Liu K, Li S, Wang L, Ye Y, Tang H. Full-Spectrum Prediction of Peptides Tandem Mass Spectra using Deep Neural Network. Anal Chem 2020; 92:4275-4283. [PMID: 32053352 DOI: 10.1021/acs.analchem.9b04867] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The ability to predict tandem mass (MS/MS) spectra from peptide sequences can significantly enhance our understanding of the peptide fragmentation process and could improve peptide identification in proteomics. However, current approaches for predicting high-energy collisional dissociation (HCD) spectra are limited to predict the intensities of expected ion types, that is, the a/b/c/x/y/z ions and their neutral loss derivatives (referred to as backbone ions). In practice, backbone ions only account for <70% of total ion intensities in HCD spectra, indicating many intense ions are ignored by current predictors. In this paper, we present a deep learning approach that can predict the complete spectra (both backbone and nonbackbone ions) directly from peptide sequences. We made no assumptions or expectations on which kind of ions to predict but instead predicting the intensities for all possible m/z. Training this model needs no annotations of fragment ion nor any prior knowledge of the fragmentation rules. Our analyses show that the predicted 2+ and 3+ HCD spectra are highly similar to the experimental spectra, with average full-spectrum cosine similarities of 0.820 (±0.088) and 0.786 (±0.085), respectively, very close to the similarities between the experimental replicated spectra. In contrast, the best-performed backbone only models can only achieve an average similarity below 0.75 and 0.70 for 2+ and 3+ spectra, respectively. Furthermore, we developed a multitask learning (MTL) approach for predicting spectra of insufficient training samples, which allows our model to make accurate predictions for electron transfer dissociation (ETD) spectra and HCD spectra of less abundant charges (1+ and 4+).
Collapse
Affiliation(s)
- Kaiyuan Liu
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| | - Sujun Li
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| | - Lei Wang
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| | - Yuzhen Ye
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| | - Haixu Tang
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana 47405, United States
| |
Collapse
|
16
|
Gromova OA, Torshin IY, Zgoda VG, Tikhonova OV. [An analysis of the peptide composition of a 'light' peptide fraction of cerebrolysin]. Zh Nevrol Psikhiatr Im S S Korsakova 2020; 119:75-83. [PMID: 31626174 DOI: 10.17116/jnevro201911908175] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
AIM To analyze the peptide composition of a light peptide fraction of cerebrolysin. MATERIAL AND METHODS Mass spectrometry (MS) with orbital ion traps and modern de novo MS-sequencing algorithms was performed. RESULTS The amino acid sequences of 14 635 peptides corresponding to the 1643 porcine proteome neuronal proteins are identified. An analysis of the human proteome annotation shows that these peptides can mimic the corresponding human peptides. In particular, 405 peptide fragments correspond to 300 known biologically active peptides, including fragments of antibacterial peptides (defensins, histatins), immunomodulatory (granulin, manserin) and vasoactive (endothelin, VIP) peptides. At the same time, 8953 of 14 635 peptides can modulate the activity of 275 human signaling proteins, including kinases CDK1, CDK2, TGFBR2, GSK3, MTOR, pro-apoptotic caspases CASP1, CASP3 and CASP6 etc. The results confirm the presence of Leu- and Met-enkephalins, fragments of neuropeptide orexin, neuropeptide VF, galanin and nerve growth factor that have a neurotrophic effect. CONCLUSION The results of a proteomic study of the peptide composition of cerebrolysin indicate the widest range of molecular mechanisms responsible for the clinical efficacy of this drug.
Collapse
Affiliation(s)
- O A Gromova
- Federal Research Center 'Computer Science and Control' of the Russian Academy of Sciences, Moscow, Russia; Big Data Storage and Analysis Center, Lomonosov Moscow State University, Moscow, Russia
| | - I Yu Torshin
- Federal Research Center 'Computer Science and Control' of the Russian Academy of Sciences, Moscow, Russia; Big Data Storage and Analysis Center, Lomonosov Moscow State University, Moscow, Russia
| | - V G Zgoda
- Orekhovich Research Institute of Biomedical Chemistry, Moscow, Russia
| | - O V Tikhonova
- Orekhovich Research Institute of Biomedical Chemistry, Moscow, Russia
| |
Collapse
|
17
|
Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat Methods 2019; 16:509-518. [DOI: 10.1038/s41592-019-0426-7] [Citation(s) in RCA: 340] [Impact Index Per Article: 56.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 04/18/2019] [Indexed: 11/08/2022]
|
18
|
Peptide Sequencing Directly on Solid Surfaces Using MALDI Mass Spectrometry. Sci Rep 2017; 7:17811. [PMID: 29259225 PMCID: PMC5736625 DOI: 10.1038/s41598-017-18105-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 12/05/2017] [Indexed: 11/08/2022] Open
Abstract
There are an increasing variety of applications in which peptides are both synthesized and used attached to solid surfaces. This has created a need for high throughput sequence analysis directly on surfaces. However, common sequencing approaches that can be adapted to surface bound peptides lack the throughput often needed in library-based applications. Here we describe a simple approach for sequence analysis directly on solid surfaces that is both high speed and high throughput, utilizing equipment available in most protein analysis facilities. In this approach, surface bound peptides, selectively labeled at their N-termini with a positive charge-bearing group, are subjected to controlled degradation in ammonia gas, resulting in a set of fragments differing by a single amino acid that remain spatially confined on the surface they were bound to. These fragments can then be analyzed by MALDI mass spectrometry, and the peptide sequences read directly from the resulting spectra.
Collapse
|
19
|
Zhou XX, Zeng WF, Chi H, Luo C, Liu C, Zhan J, He SM, Zhang Z. pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning. Anal Chem 2017; 89:12690-12697. [DOI: 10.1021/acs.analchem.7b02566] [Citation(s) in RCA: 128] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Xie-Xuan Zhou
- State
Key Laboratory of Computer Architecture, Institute of Computing Technology
(ICT), Chinese Academy of Sciences (CAS), Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Wen-Feng Zeng
- University of Chinese Academy of Sciences, Beijing, China
- Key
Laboratory of Intelligent Information Processing of CAS, ICT, Chinese Academy of Sciences, Beijing 100190, China
| | - Hao Chi
- University of Chinese Academy of Sciences, Beijing, China
- Key
Laboratory of Intelligent Information Processing of CAS, ICT, Chinese Academy of Sciences, Beijing 100190, China
| | - Chunjie Luo
- State
Key Laboratory of Computer Architecture, Institute of Computing Technology
(ICT), Chinese Academy of Sciences (CAS), Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Chao Liu
- University of Chinese Academy of Sciences, Beijing, China
- Key
Laboratory of Intelligent Information Processing of CAS, ICT, Chinese Academy of Sciences, Beijing 100190, China
| | - Jianfeng Zhan
- State
Key Laboratory of Computer Architecture, Institute of Computing Technology
(ICT), Chinese Academy of Sciences (CAS), Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Si-Min He
- University of Chinese Academy of Sciences, Beijing, China
- Key
Laboratory of Intelligent Information Processing of CAS, ICT, Chinese Academy of Sciences, Beijing 100190, China
| | - Zhifei Zhang
- Capital Medical University, Beijing 100069, China
| |
Collapse
|
20
|
Hu H, Khatri K, Zaia J. Algorithms and design strategies towards automated glycoproteomics analysis. MASS SPECTROMETRY REVIEWS 2017; 36:475-498. [PMID: 26728195 PMCID: PMC4931994 DOI: 10.1002/mas.21487] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 11/30/2015] [Indexed: 05/09/2023]
Abstract
Glycoproteomics involves the study of glycosylation events on protein sequences ranging from purified proteins to whole proteome scales. Understanding these complex post-translational modification (PTM) events requires elucidation of the glycan moieties (monosaccharide sequences and glycosidic linkages between residues), protein sequences, as well as site-specific attachment of glycan moieties onto protein sequences, in a spatial and temporal manner in a variety of biological contexts. Compared with proteomics, bioinformatics for glycoproteomics is immature and many researchers still rely on tedious manual interpretation of glycoproteomics data. As sample preparation protocols and analysis techniques have matured, the number of publications on glycoproteomics and bioinformatics has increased substantially; however, the lack of consensus on tool development and code reuse limits the dissemination of bioinformatics tools because it requires significant effort to migrate a computational tool tailored for one method design to alternative methods. This review discusses algorithms and methods in glycoproteomics, and refers to the general proteomics field for potential solutions. It also introduces general strategies for tool integration and pipeline construction in order to better serve the glycoproteomics community. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 36:475-498, 2017.
Collapse
Affiliation(s)
- Han Hu
- Bioinformatics Program, Boston University, Boston, Massachusetts 02215, USA
- Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA
| | - Kshitij Khatri
- Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA
| | - Joseph Zaia
- Center for Biomedical Mass Spectrometry, Department of Biochemistry, Boston University School of Medicine, Boston University, Boston, Massachusetts 02118, USA
| |
Collapse
|
21
|
Abstract
Scoring functions that assess spectrum similarity play a crucial role in many computational mass spectrometry algorithms. These functions are used to compare an experimentally acquired fragmentation (MS/MS) spectrum against two different types of target MS/MS spectra: either against a theoretical MS/MS spectrum derived from a peptide from a sequence database, or against another, previously acquired MS/MS spectrum. The former is typically encountered in database searching, while the latter is used in spectrum clustering and spectral library searching. The comparison between acquired versus theoretical MS/MS spectra is most commonly performed using cross-correlations or probability derived scoring functions, while the comparison of two acquired MS/MS spectra typically makes use of a normalized dot product, especially in spectrum library search algorithms. In addition to these scoring functions, Pearson's or Spearman's correlation coefficients, mean squared error, or median absolute deviation scores can also be used for the same purpose. Here, we describe and evaluate these scoring functions with regards to their ability to assess spectrum similarity for theoretical versus acquired, and acquired versus acquired spectra.
Collapse
Affiliation(s)
- Şule Yilmaz
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium
| | - Elien Vandermarliere
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium
| | - Lennart Martens
- Medical Biotechnology Center, VIB, Albert Baertsoenkaai 3, Ghent, 9000, Belgium.
- Department of Biochemistry, Ghent University, Ghent, 9000, Belgium.
- Bioinformatics Institute Ghent, Ghent University, Ghent, 9000, Belgium.
| |
Collapse
|
22
|
Abstract
Through advances in molecular biology, comparative analysis of DNA sequences is currently the cornerstone in the study of molecular evolution and phylogenetics. Nevertheless, protein mass spectrometry offers some unique opportunities to enable phylogenetic analyses in organisms where DNA may be difficult or costly to obtain. To date, the methods of phylogenetic analysis using protein mass spectrometry can be classified into three categories: (1) de novo protein sequencing followed by classical phylogenetic reconstruction, (2) direct phylogenetic reconstruction using proteolytic peptide mass maps, and (3) mapping of mass spectral data onto classical phylogenetic trees. In this chapter, we provide a brief description of the three methods and the protocol for each method along with relevant tools and algorithms.
Collapse
Affiliation(s)
- Shiyong Ma
- Prince of Wales Clinical School, UNSW Australia, Sydney, NSW, 2052, Australia
- Lowy Cancer Research Centre, UNSW, Corner of High and Botany St, Kensington, NSW, 2033, Australia
| | - Kevin M Downard
- Prince of Wales Clinical School, UNSW Australia, Sydney, NSW, 2052, Australia
- Lowy Cancer Research Centre, UNSW, Corner of High and Botany St, Kensington, NSW, 2033, Australia
| | - Jason W H Wong
- Prince of Wales Clinical School, UNSW Australia, Sydney, NSW, 2052, Australia.
- Lowy Cancer Research Centre, UNSW, Corner of High and Botany St, Kensington, NSW, 2033, Australia.
| |
Collapse
|
23
|
Guthals A, Gan Y, Murray L, Chen Y, Stinson J, Nakamura G, Lill JR, Sandoval W, Bandeira N. De Novo MS/MS Sequencing of Native Human Antibodies. J Proteome Res 2016; 16:45-54. [PMID: 27779884 DOI: 10.1021/acs.jproteome.6b00608] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
One direct route for the discovery of therapeutic human monoclonal antibodies (mAbs) involves the isolation of peripheral B cells from survivors/sero-positive individuals after exposure to an infectious reagent or disease etiology, followed by single-cell sequencing or hybridoma generation. Peripheral B cells, however, are not always easy to obtain and represent only a small percentage of the total B-cell population across all bodily tissues. Although it has been demonstrated that tandem mass spectrometry (MS/MS) techniques can interrogate the full polyclonal antibody (pAb) response to an antigen in vivo, all current approaches identify MS/MS spectra against databases derived from genetic sequencing of B cells from the same patient. In this proof-of-concept study, we demonstrate the feasibility of a novel MS/MS antibody discovery approach in which only serum antibodies are required without the need for sequencing of genetic material. Peripheral pAbs from a cytomegalovirus-exposed individual were purified by glycoprotein B antigen affinity and de novo sequenced from MS/MS data. Purely MS-derived mAbs were then manufactured in mammalian cells to validate potency via antigen-binding ELISA. Interestingly, we found that these mAbs accounted for 1 to 2% of total donor IgG but were not detected in parallel sequencing of memory B cells from the same patient.
Collapse
Affiliation(s)
- Adrian Guthals
- Mapp Biopharmaceutical, Inc. , 6160 Lusk Boulevard #C105, San Diego, California 92121, United States
| | - Yutian Gan
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Laura Murray
- Department of Protein Chemistry, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Yongmei Chen
- Department of Antibody Engineering, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Jeremy Stinson
- Department of Molecular Biology, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Gerald Nakamura
- Department of Antibody Engineering, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Jennie R Lill
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Wendy Sandoval
- Department of Proteomics & Biological Resources, Genentech, Inc. , South San Francisco, California 94080, United States
| | - Nuno Bandeira
- Department of Computer Science and Engineering, University of California, San Diego , 9500 Gilman Drive, Mail Code 0404, La Jolla, California 92093, United States.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego , 9500 Gilman Drive, Mail Code 0657, La Jolla, California 92093, United States
| |
Collapse
|
24
|
Dereplication of peptidic natural products through database search of mass spectra. Nat Chem Biol 2016; 13:30-37. [PMID: 27820803 PMCID: PMC5409158 DOI: 10.1038/nchembio.2219] [Citation(s) in RCA: 169] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 08/17/2016] [Indexed: 11/08/2022]
Abstract
Peptidic Natural Products (PNPs) are widely used compounds that include many antibiotics and a variety of other bioactive peptides. While recent breakthroughs in PNP discovery raised the challenge of developing new algorithms for their analysis, identification of PNPs via database search of tandem mass spectra remains an open problem. To address this problem, natural product researchers utilize dereplication strategies that identify known PNPs and lead to the discovery of new ones even in cases when the reference spectra are not present in existing spectral libraries. DEREPLICATOR is a new dereplication algorithm that enabled high-throughput PNP identification and that is compatible with large-scale mass spectrometry-based screening platforms for natural product discovery. After searching nearly one hundred million tandem mass spectra in the Global Natural Products Social (GNPS) molecular networking infrastructure, DEREPLICATOR identified an order of magnitude more PNPs (and their new variants) than any previous dereplication efforts.
Collapse
|
25
|
Gorshkov V, Hotta SYK, Verano-Braga T, Kjeldsen F. Peptide de novo sequencing of mixture tandem mass spectra. Proteomics 2016; 16:2470-9. [PMID: 27329701 PMCID: PMC5297990 DOI: 10.1002/pmic.201500549] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2015] [Revised: 04/27/2016] [Accepted: 06/17/2016] [Indexed: 02/02/2023]
Abstract
The impact of mixture spectra deconvolution on the performance of four popular de novo sequencing programs was tested using artificially constructed mixture spectra as well as experimental proteomics data. Mixture fragmentation spectra are recognized as a limitation in proteomics because they decrease the identification performance using database search engines. De novo sequencing approaches are expected to be even more sensitive to the reduction in mass spectrum quality resulting from peptide precursor co‐isolation and thus prone to false identifications. The deconvolution approach matched complementary b‐, y‐ions to each precursor peptide mass, which allowed the creation of virtual spectra containing sequence specific fragment ions of each co‐isolated peptide. Deconvolution processing resulted in equally efficient identification rates but increased the absolute number of correctly sequenced peptides. The improvement was in the range of 20–35% additional peptide identifications for a HeLa lysate sample. Some correct sequences were identified only using unprocessed spectra; however, the number of these was lower than those where improvement was obtained by mass spectral deconvolution. Tight candidate peptide score distribution and high sensitivity to small changes in the mass spectrum introduced by the employed deconvolution method could explain some of the missing peptide identifications.
Collapse
Affiliation(s)
- Vladimir Gorshkov
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark.
| | | | - Thiago Verano-Braga
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark.,Department of Physiology and Biophysics, Federal University of Minas Gerais Belo Horizonte - MG, Belo Horizonte, Brazil
| | - Frank Kjeldsen
- Department of Biochemistry and Molecular Biology, University of Southern Denmark Odense M, Odense, Denmark
| |
Collapse
|
26
|
Wang Y, Yang F, Wu P, Bu D, Sun S. OpenMS-Simulator: an open-source software for theoretical tandem mass spectrum prediction. BMC Bioinformatics 2015; 16:110. [PMID: 25887925 PMCID: PMC4415337 DOI: 10.1186/s12859-015-0540-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Accepted: 02/16/2015] [Indexed: 12/18/2022] Open
Abstract
Background Tandem mass spectrometry (MS/MS) acts as a key technique for peptide identification. The MS/MS-based peptide identification approaches can be categorized into two families, namely, de novo and database search. Both of the two types of approaches can benefit from an accurate prediction of theoretical spectrum. A theoretical spectrum consists of m/z and intensity of possibly occurring ions, which are estimated via simulating the spectrum generating process. Extensive researches have been conducted for theoretical spectrum prediction; however, the prediction methods suffer from low prediciton accuracy due to oversimplifications in the spectrum simulation process. Results In the study, we present an open-source software package, called OpenMS-Simulator, to predict theoretical spectrum for a given peptide sequence. Based on the mobile-proton hypothesis for peptide fragmentation, OpenMS-Simulator trained a closed-form model for the intensity ratio of adjacent y ions, from which the whole theoretical spectrum can be constructed. On a collection of representative spectra datasets with annotated peptide sequences, experimental results suggest that OpenMS-Simulator can predict theoretical spectra with considerable accuracy. The study also presents an application of OpenMS-Simulator: the similarity between theoretical spectra and query spectra can be used to re-rank the peptide sequence reported by SEQUEST/X!Tandem. Conclusions OpenMS-Simulator implements a novel model to predict theoretical spectrum for a given peptide sequence. Compared with existing theoretical spectrum prediction tools, say MassAnalyzer and MSSimulator, our method not only simplifies the computation process, but also improves the prediction accuracy. Currently, OpenMS-Simulator supports the prediction of CID and HCD spectrum for peptides with double charges. The extension to cover more fragmentation models and support multiple-charged peptides remains as one of the future works. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0540-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yaojun Wang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190, China. .,University of Chinese Academy of Sciences, 19A, Yuquan Road, Beijing, 100049, China.
| | - Fei Yang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190, China. .,University of Chinese Academy of Sciences, 19A, Yuquan Road, Beijing, 100049, China.
| | - Peng Wu
- Institute of Biophysics, Chinese Academy of Sciences, 15, Datun Road, Chaoyang District, Beijing, 100101, China.
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190, China.
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 6, Kexueyuan South Road, Zhongguancun, Beijing, 100190, China.
| |
Collapse
|
27
|
Götze M, Pettelkau J, Fritzsche R, Ihling CH, Schäfer M, Sinz A. Automated assignment of MS/MS cleavable cross-links in protein 3D-structure analysis. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2015; 26:83-97. [PMID: 25261217 DOI: 10.1007/s13361-014-1001-1] [Citation(s) in RCA: 171] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Revised: 09/08/2014] [Accepted: 09/09/2014] [Indexed: 05/03/2023]
Abstract
CID-MS/MS cleavable cross-linkers hold an enormous potential for an automated analysis of cross-linked products, which is essential for conducting structural proteomics studies. The created characteristic fragment ion patterns can easily be used for an automated assignment and discrimination of cross-linked products. To date, there are only a few software solutions available that make use of these properties, but none allows for an automated analysis of cleavable cross-linked products. The MeroX software fills this gap and presents a powerful tool for protein 3D-structure analysis in combination with MS/MS cleavable cross-linkers. We show that MeroX allows an automatic screening of characteristic fragment ions, considering static and variable peptide modifications, and effectively scores different types of cross-links. No manual input is required for a correct assignment of cross-links and false discovery rates are calculated. The self-explanatory graphical user interface of MeroX provides easy access for an automated cross-link search platform that is compatible with commonly used data file formats, enabling analysis of data originating from different instruments. The combination of an MS/MS cleavable cross-linker with a dedicated software tool for data analysis provides an automated workflow for 3D-structure analysis of proteins. MeroX is available at www.StavroX.com .
Collapse
Affiliation(s)
- Michael Götze
- Institute for Biochemistry and Biotechnology, Martin-Luther University Halle-Wittenberg, 06120, Halle (Saale), Germany,
| | | | | | | | | | | |
Collapse
|
28
|
Dong NP, Liang YZ, Xu QS, Mok DKW, Yi LZ, Lu HM, He M, Fan W. Prediction of Peptide Fragment Ion Mass Spectra by Data Mining Techniques. Anal Chem 2014; 86:7446-54. [DOI: 10.1021/ac501094m] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
| | | | | | - Daniel K. W. Mok
- Department
of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
- State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), Shenzhen, 518000, P. R. China
| | - Lun-zhao Yi
- Yunnan
Food Safety Research Institute, Kunming University of Science and Technology, Kunming, 650500, P. R. China
| | | | - Min He
- Department of
Pharmaceutical Engineering,
School of Chemical Engineering, Xiangtan University, Xiangtan, 411105, P.R. China
| | - Wei Fan
- College of
Bioscience and Biotechnology, Hunan Agricultural University, Changsha, 410083, P. R. China
| |
Collapse
|
29
|
Kelchtermans P, Bittremieux W, De Grave K, Degroeve S, Ramon J, Laukens K, Valkenborg D, Barsnes H, Martens L. Machine learning applications in proteomics research: how the past can boost the future. Proteomics 2014; 14:353-66. [PMID: 24323524 DOI: 10.1002/pmic.201300289] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/24/2013] [Accepted: 10/14/2013] [Indexed: 01/22/2023]
Abstract
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.
Collapse
Affiliation(s)
- Pieter Kelchtermans
- Department of Medical Protein Research, VIB, Ghent, Belgium; Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium; Flemish Institute for Technological Research (VITO), Boeretang, Mol, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Diedrich JK, Pinto AFM, Yates JR. Energy dependence of HCD on peptide fragmentation: stepped collisional energy finds the sweet spot. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2013; 24:1690-9. [PMID: 23963813 PMCID: PMC3815594 DOI: 10.1007/s13361-013-0709-7] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2012] [Revised: 06/30/2013] [Accepted: 07/06/2013] [Indexed: 05/10/2023]
Abstract
An understanding of the process of peptide fragmentation and what parameters are best to obtain the most useful information is important. This is especially true for large-scale proteomics where data collection and data analysis are most often automated, and manual interpretation of spectra is rare because of the vast amounts of data generated. We show herein that collisional cell peptide fragmentation, in this case higher collisional dissociation (HCD) in the Q Exactive, is significantly affected by the normalized energy applied. Both peptide sequence and energy applied determine what ion fragments are observed. However, by applying a stepped normalized collisional energy scheme and combining ions from low, medium, and high collision energies, we are able to increase the diversity of fragmentation ions generated. Application of stepped collision energy to HEK293T lysate demonstrated a minimal effect on peptide and protein identification in a large-scale proteomics dataset, but improved phospho site localization through increased sequence coverage. Stepped HCD is also beneficial for tandem mass tagged (TMT) experiments, increasing intensity of TMT reporters used for quantitation without adversely effecting peptide identification.
Collapse
Affiliation(s)
- Jolene K Diedrich
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | | | | |
Collapse
|
31
|
Bruce C, Stone K, Gulcicek E, Williams K. Proteomics and the analysis of proteomic data: 2013 overview of current protein-profiling technologies. ACTA ACUST UNITED AC 2013; Chapter 13:13.21.1-13.21.17. [PMID: 23504934 DOI: 10.1002/0471250953.bi1321s41] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Mass spectrometry has become a major tool in the study of proteomes. The analysis of proteolytic peptides and their fragment ions by this technique enables the identification and quantitation of the precursor proteins in a mixture. However, deducing chemical structures and then protein sequences from mass-to-charge ratios is a challenging computational task. Software tools incorporating powerful algorithms and statistical methods improved our ability to process the large quantities of proteomics data. Repositories of spectral data make both data analysis and experimental design more efficient. New approaches in quantitative and statistical proteomics make possible a greater coverage of the proteome, the identification of more post-translational modifications, and a greater sensitivity in the quantitation of targeted proteins.
Collapse
Affiliation(s)
- Can Bruce
- W.M. Keck Foundation Biotechnology Resource Laboratory and Molecular Biochemistry and Biophysics Department, Yale University, New Haven, Connecticut, USA
| | | | | | | |
Collapse
|
32
|
Dong NP, Liang YZ, Yi LZ, Lu HM. Investigation of scrambled ions in tandem mass spectra, part 2. On the influence of the ions on peptide identification. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2013; 24:857-867. [PMID: 23504644 DOI: 10.1007/s13361-013-0591-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Revised: 01/19/2013] [Accepted: 01/20/2013] [Indexed: 06/01/2023]
Abstract
A comprehensive investigation was performed to understand the influence of sequence scrambling in peptide ions on peptide identification results. To achieve this, four tandem mass spectrometry datasets with scrambled ions included and with them excluded were analyzed by Crux, X!Tandem, SpectraST, Lutefisk, and PepNovo. While the different algorithms differed in their performance, an increase in the number of correctly identified peptides was generally observed when removing scrambled ions, with the exception of the SpectraST algorithm. However, the variation of the match scores upon removal was unpredictable. Following these investigations, an interpretation was given on how the scrambled ions affect peptide identification. Lastly, a simulated theoretical mass spectral library derived from the NIST peptide Libraries was constructed and searched by SpectraST to study whether scrambled ions in predicted mass spectra could affect peptide identification. Consistent with the peptide library search results, no significant variations for dot product scores as well as peptide identification results were observed when these ions were included in the theoretical MS/MS spectra. From the five adopted algorithms, the SpectraST and Crux provided the most robust results, whereas X!Tandem, PepNovo, and Lutefisk were sensitive to the existence of the scrambled ions, especially the latter two de novo sequencing algorithms.
Collapse
Affiliation(s)
- Nai-ping Dong
- College of Chemistry and Chemical Engineering, Central South University, Changsha, People's Republic of China
| | | | | | | |
Collapse
|
33
|
Colangelo CM, Chung L, Bruce C, Cheung KH. Review of software tools for design and analysis of large scale MRM proteomic datasets. Methods 2013; 61:287-98. [PMID: 23702368 DOI: 10.1016/j.ymeth.2013.05.004] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Revised: 05/06/2013] [Accepted: 05/11/2013] [Indexed: 12/13/2022] Open
Abstract
Selective or Multiple Reaction monitoring (SRM/MRM) is a liquid-chromatography (LC)/tandem-mass spectrometry (MS/MS) method that enables the quantitation of specific proteins in a sample by analyzing precursor ions and the fragment ions of their selected tryptic peptides. Instrumentation software has advanced to the point that thousands of transitions (pairs of primary and secondary m/z values) can be measured in a triple quadrupole instrument coupled to an LC, by a well-designed scheduling and selection of m/z windows. The design of a good MRM assay relies on the availability of peptide spectra from previous discovery-phase LC-MS/MS studies. The tedious aspect of manually developing and processing MRM assays involving thousands of transitions has spurred to development of software tools to automate this process. Software packages have been developed for project management, assay development, assay validation, data export, peak integration, quality assessment, and biostatistical analysis. No single tool provides a complete end-to-end solution, thus this article reviews the current state and discusses future directions of these software tools in order to enable researchers to combine these tools for a comprehensive targeted proteomics workflow.
Collapse
Affiliation(s)
- Christopher M Colangelo
- W.M. Keck Foundation Biotechnology Resource Laboratory, School of Medicine, Yale University, New Haven, CT 06510, USA.
| | | | | | | |
Collapse
|
34
|
Robidart J, Callister SJ, Song P, Nicora CD, Wheat CG, Girguis PR. Characterizing microbial community and geochemical dynamics at hydrothermal vents using osmotically driven continuous fluid samplers. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2013; 47:4399-4407. [PMID: 23495803 DOI: 10.1021/es3037302] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Microbes play a key role in mediating aquatic biogeochemical cycles. However, our understanding of the relationships between microbial phylogenetic/physiological diversity and habitat physicochemical characteristics is restrained by our limited capacity to concurrently collect microbial and geochemical samples at appropriate spatial and temporal scales. Accordingly, we have developed a low-cost, continuous fluid sampling system (the Biological OsmoSampling System, or BOSS) to address this limitation. The BOSS does not use electricity, can be deployed in harsh/remote environments, and collects/preserves samples with daily resolution for >1 year. Here, we present data on the efficacy of DNA and protein preservation during a 1.5 year laboratory study as well as the results of two field deployments at deep-sea hydrothermal vents, wherein we examined changes in microbial diversity, protein expression, and geochemistry over time. Our data reveal marked changes in microbial composition co-occurring with changes in hydrothermal fluid composition as well as the temporal dynamics of an enigmatic sulfide-oxidizing symbiont in its free-living state. We also present the first data on in situ protein preservation and expression dynamics highlighting the BOSS's potential utility in meta-proteomic studies. These data illustrate the value of using BOSS to study relationships among microbial and geochemical phenomena and environmental conditions.
Collapse
Affiliation(s)
- Julie Robidart
- Harvard University, Department of Organismic and Evolutionary Biology, 16 Divinity Avenue, Cambridge, Massachusetts 02138, USA
| | | | | | | | | | | |
Collapse
|
35
|
Wang D, Dasari S, Chambers MC, Holman JD, Chen K, Liebler DC, Orton DJ, Purvine SO, Monroe ME, Chung CY, Rose KL, Tabb DL. Basophile: accurate fragment charge state prediction improves peptide identification rates. GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:86-95. [PMID: 23499924 PMCID: PMC3737598 DOI: 10.1016/j.gpb.2012.11.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Revised: 11/03/2012] [Accepted: 11/22/2012] [Indexed: 01/14/2023]
Abstract
In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of charged peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.
Collapse
Affiliation(s)
- Dong Wang
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Van Riper SK, de Jong EP, Carlis JV, Griffin TJ. Mass Spectrometry-Based Proteomics: Basic Principles and Emerging Technologies and Directions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 990:1-35. [DOI: 10.1007/978-94-007-5896-4_1] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
37
|
Thalassinos K, Vissers JPC, Tenzer S, Levin Y, Thompson JW, Daniel D, Mann D, DeLong MR, Moseley MA, America AH, Ottens AK, Cavey GS, Efstathiou G, Scrivens JH, Langridge JI, Geromanos SJ. Design and application of a data-independent precursor and product ion repository. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2012; 23:1808-1820. [PMID: 22847389 DOI: 10.1007/s13361-012-0416-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Revised: 05/09/2012] [Accepted: 05/13/2012] [Indexed: 06/01/2023]
Abstract
The functional design and application of a data-independent LC-MS precursor and product ion repository for protein identification, quantification, and validation is conceptually described. The ion repository was constructed from the sequence search results of a broad range of discovery experiments investigating various tissue types of two closely related mammalian species. The relative high degree of similarity in protein complement, ion detection, and peptide and protein identification allows for the analysis of normalized precursor and product ion intensity values, as well as standardized retention times, creating a multidimensional/orthogonal queryable, qualitative, and quantitative space. Peptide ion map selection for identification and quantification is primarily based on replication and limited variation. The information is stored in a relational database and is used to create peptide- and protein-specific fragment ion maps that can be queried in a targeted fashion against the raw or time aligned ion detections. These queries can be conducted either individually or as groups, where the latter affords pathway and molecular machinery analysis of the protein complement. The presented results also suggest that peptide ionization and fragmentation efficiencies are highly conserved between experiments and practically independent of the analyzed biological sample when using similar instrumentation. Moreover, the data illustrate only minor variation in ionization efficiency with amino acid sequence substitutions occurring between species. Finally, the data and the presented results illustrate how LC-MS performance metrics can be extracted and utilized to ensure optimal performance of the employed analytical workflows.
Collapse
|
38
|
Niedermeyer THJ, Strohalm M. mMass as a software tool for the annotation of cyclic peptide tandem mass spectra. PLoS One 2012; 7:e44913. [PMID: 23028676 PMCID: PMC3441486 DOI: 10.1371/journal.pone.0044913] [Citation(s) in RCA: 219] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/09/2012] [Indexed: 11/19/2022] Open
Abstract
Natural or synthetic cyclic peptides often possess pronounced bioactivity. Their mass spectrometric characterization is difficult due to the predominant occurrence of non-proteinogenic monomers and the complex fragmentation patterns observed. Even though several software tools for cyclic peptide tandem mass spectra annotation have been published, these tools are still unable to annotate a majority of the signals observed in experimentally obtained mass spectra. They are thus not suitable for extensive mass spectrometric characterization of these compounds. This lack of advanced and user-friendly software tools has motivated us to extend the fragmentation module of a freely available open-source software, mMass (http://www.mmass.org), to allow for cyclic peptide tandem mass spectra annotation and interpretation. The resulting software has been tested on several cyanobacterial and other naturally occurring peptides. It has been found to be superior to other currently available tools concerning both usability and annotation extensiveness. Thus it is highly useful for accelerating the structure confirmation and elucidation of cyclic as well as linear peptides and depsipeptides.
Collapse
|
39
|
Pechan T, Gwaltney SR. Calculations of relative intensities of fragment ions in the MSMS spectra of a doubly charged penta-peptide. BMC Bioinformatics 2012; 13 Suppl 15:S13. [PMID: 23046347 PMCID: PMC3439735 DOI: 10.1186/1471-2105-13-s15-s13] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Currently, the tandem mass spectrometry (MSMS) of peptides is a dominant technique used to identify peptides and consequently proteins. The peptide fragmentation inside the mass analyzer typically offers a spectrum containing several different groups of ions. The mass to charge (m/z) values of these ions can be exactly calculated following simple rules based on the possible peptide fragmentation reactions. But the (relative) intensities of the particular ions cannot be simply predicted from the amino-acid sequence of the peptide. This study presents initial work towards developing a theoretical fundamental approach to ion intensity elucidation by utilizing quantum mechanical computations. METHODS MSMS spectra of the doubly charged GAVLK peptide were collected on electrospray ion trap mass spectrometers using low energy modes of fragmentation. Density functional theory (DFT) calculations were performed on the population of ion precursors to determine the fragment ion intensities corresponding to a Boltzmann distribution of the protonation of nitrogens in the peptide backbone amide bonds. RESULTS We were able to a) predict the y and b ions intensities order in concert with the experimental observation; b) predict relative intensities of y ions with errors not exceeding the experimental variation. CONCLUSIONS These results suggest that the GAVLK peptide fragmentation process in the ion trap mass spectrometer is predominantly driven by the thermodynamic stability of the precursor ions formed upon ionization of the sample. The computational approach presented in this manuscript successfully calculated ion intensities in the mass spectra of this doubly charged tryptic peptide, based solely on its amino acid sequence. As such, this work indicates a potential of incorporating quantum mechanical calculations into mass spectrometry based algorithms for molecular identification.
Collapse
Affiliation(s)
- Tibor Pechan
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi Agricultural and Forestry Experiment Station, High Performance Computing Collaboratory, Mississippi State University, Mississippi State, MS 39762, USA.
| | | |
Collapse
|
40
|
Sun S, Yang F, Yang Q, Zhang H, Wang Y, Bu D, Ma B. MS-Simulator: Predicting Y-Ion Intensities for Peptides with Two Charges Based on the Intensity Ratio of Neighboring Ions. J Proteome Res 2012; 11:4509-16. [DOI: 10.1021/pr300235v] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Shiwei Sun
- Advanced Research
Laboratory,
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Intelligent
Information Processing, Chinese Academy of Sciences, Beijing, China
| | - Fuquan Yang
- Proteomics Platform, Institute
of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Qing Yang
- Advanced Research
Laboratory,
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- School
of Computer Science, University of Science and Technology, Beijing, China
| | - Hong Zhang
- College of Food
Science and Biological
Engineering, Zhejiang Gongshang University, Hangzhou, China
| | - Yaojun Wang
- Advanced Research
Laboratory,
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
| | - Dongbo Bu
- Advanced Research
Laboratory,
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Intelligent
Information Processing, Chinese Academy of Sciences, Beijing, China
| | - Bin Ma
- School
of Computer Science, University of Waterloo, Waterloo, Ontario, Canada
| |
Collapse
|
41
|
Key issues in the acquisition and analysis of qualitative and quantitative mass spectrometry data for peptide-centric proteomic experiments. Amino Acids 2012; 43:1075-85. [PMID: 22821266 DOI: 10.1007/s00726-012-1287-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2010] [Accepted: 04/03/2012] [Indexed: 01/05/2023]
Abstract
Proteomic technologies have matured to a level enabling accurate and reproducible quantitation of peptides and proteins from complex biological matrices. Analysis of samples as diverse as assembled protein complexes, whole cell lysates or sub-cellular proteomes from cell cultures, and direct analysis of animal and human tissues and fluids demonstrate the incredible versatility of the fundamental nature of the technique that forms the basis of most proteomic applications today (mass spectrometry). Determining the mass of biomolecules and their fragments or related products with high accuracy can convey a highly specific assay for detection and identification. Importantly, ion currents representative of these specifically identified analytes can be accurately quantified with the correct application of smart isobaric tagging chemistries, heavy and light isotopically derivatised samples or standards, or by careful application of workflows to compare unlabelled samples in so-called 'label-free' and targeted selected reaction monitoring experiments. In terms of exploring biology, a myriad of protein changes and modifications are being increasingly probed and quantified, including diverse chemical changes from relatively decisive modifications such as protein splicing and truncation, to more transient dynamic modifications such as phosphorylation, acetylation and ubiquitination. Proteomic workflows can be complex beasts and several key considerations to ensure effective applications have been outlined in the recent literature. The past year has witnessed the publication of several excellent reviews that thoroughly describe the fundamental principles underlying the state of the art. This review further elaborates on specific critical issues introduced by these publications and raises other important unaddressed considerations and new developments that directly impact on the effectiveness of proteomic technologies, in particular for, but not necessarily exclusive to peptide-centric experiments. These factors are discussed both in terms of qualitative analyses, including dynamic range and sampling issues, and developments to improve the translation of peptide fragmentation data into peptide and protein identities, as well as quantitative analyses, including data normalisation and the utility of ontology or functional annotation, the effects of modified peptides, and considered experimental design to facilitate the use of robust statistical methods.
Collapse
|
42
|
Gandhi T, Puri P, Fusetti F, Breitling R, Poolman B, Permentier HP. Effect of iTRAQ Labeling on the Relative Abundance of Peptide Fragment Ions Produced by MALDI-MS/MS. J Proteome Res 2012; 11:4044-51. [DOI: 10.1021/pr300083x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Tejas Gandhi
- Department of Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, Netherlands Proteomics Centre & Zernike Institute for Advanced Materials, University of Groningen, Nijenborgh 4, 9747 AG, Groningen, The Netherlands
| | - Pranav Puri
- Department of Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, Netherlands Proteomics Centre & Zernike Institute for Advanced Materials, University of Groningen, Nijenborgh 4, 9747 AG, Groningen, The Netherlands
| | - Fabrizia Fusetti
- Department of Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, Netherlands Proteomics Centre & Zernike Institute for Advanced Materials, University of Groningen, Nijenborgh 4, 9747 AG, Groningen, The Netherlands
| | - Rainer Breitling
- Groningen
Bioinformatics Centre,
Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands
- Institute of Molecular, Cell and
Systems Biology, College of Medical, Veterinary and Life Sciences,
Joseph Black Building, University of Glasgow, Glasgow, United Kingdom
| | - Bert Poolman
- Department of Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, Netherlands Proteomics Centre & Zernike Institute for Advanced Materials, University of Groningen, Nijenborgh 4, 9747 AG, Groningen, The Netherlands
| | - Hjalmar P. Permentier
- Department of Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, Netherlands Proteomics Centre & Zernike Institute for Advanced Materials, University of Groningen, Nijenborgh 4, 9747 AG, Groningen, The Netherlands
- Mass Spectrometry Core Facility, University of Groningen, A Deusinglaan 1, 9713 AV,
Groningen, The Netherlands
| |
Collapse
|
43
|
Wu C, Wei W, Li C, Li Q, Sheng Q, Zeng R. Delicate Analysis of Post-Translational Modifications on Dishevelled 3. J Proteome Res 2012; 11:3829-37. [DOI: 10.1021/pr300314d] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Chaochao Wu
- Key Laboratory
of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Wei Wei
- Laboratory of Molecular Cell
Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes
for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Chen Li
- Key Laboratory
of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Qingrun Li
- Key Laboratory
of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Quanhu Sheng
- Key Laboratory
of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Science, Chinese Academy of Sciences, Shanghai 200031, China
| | - Rong Zeng
- Key Laboratory
of Systems Biology,
Institute of Biochemistry and Cell Biology, Shanghai Institutes for
Biological Science, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
44
|
Helsens K, Mueller M, Hulstaert N, Martens L. Sigpep: Calculating unique peptide signature transition sets in a complete proteome background. Proteomics 2012; 12:1142-6. [DOI: 10.1002/pmic.201100566] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Affiliation(s)
- Kenny Helsens
- Department of Medical Protein Research; VIB; Ghent Belgium
- Department of Biochemistry; Ghent University; Ghent Belgium
| | - Michael Mueller
- EMBL Outstation, European Bioinformatics Institute; Wellcome Trust Genome Campus; Cambridge UK
| | - Niels Hulstaert
- Department of Medical Protein Research; VIB; Ghent Belgium
- Department of Biochemistry; Ghent University; Ghent Belgium
| | - Lennart Martens
- Department of Medical Protein Research; VIB; Ghent Belgium
- Department of Biochemistry; Ghent University; Ghent Belgium
| |
Collapse
|
45
|
Chen YT, Chen HW, Domanski D, Smith DS, Liang KH, Wu CC, Chen CL, Chung T, Chen MC, Chang YS, Parker CE, Borchers CH, Yu JS. Multiplexed quantification of 63 proteins in human urine by multiple reaction monitoring-based mass spectrometry for discovery of potential bladder cancer biomarkers. J Proteomics 2012; 75:3529-45. [PMID: 22236518 DOI: 10.1016/j.jprot.2011.12.031] [Citation(s) in RCA: 118] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Revised: 12/17/2011] [Accepted: 12/20/2011] [Indexed: 12/11/2022]
Abstract
Three common urological diseases are bladder cancer, urinary tract infection, and hematuria. Seventeen bladder cancer biomarkers were previously discovered using iTRAQ - these findings were verified by MRM-MS in this current study. Urine samples from 156 patients with hernia (n=57, control), bladder cancer (n=76), or urinary tract infection/hematuria (n=23) were collected and subjected to multiplexed LC-MRM/MS to determine the concentrations of 63 proteins that are normally considered to be plasma proteins, but which include proteins found in our earlier iTRAQ study. Sixty-five stable isotope-labeled standard proteotypic peptides were used as internal standards for 63 targeted proteins. Twelve proteins showed higher concentrations in the bladder cancer group than in the hernia and the urinary tract infection/hematuria groups, and thus represent potential urinary biomarkers for detection of bladder cancer. Prothrombin had the highest AUC (0.796), with 71.1% sensitivity and 75.0% specificity for differentiating bladder cancer (n=76) from non-cancerous (n=80) patients. The multiplexed MRM-MS data was used to generate a six-peptide marker panel. This six-peptide panel (afamin, adiponectin, complement C4 gamma chain, apolipoprotein A-II precursor, ceruloplasmin, and prothrombin) can discriminate bladder cancer subjects from non-cancerous subjects with an AUC of 0.814, with a 76.3% positive predictive value, and a 77.5% negative predictive value. This article is part of a Special Section entitled: Understanding genome regulation and genetic diversity by mass spectrometry.
Collapse
Affiliation(s)
- Yi-Ting Chen
- Molecular Medicine Research Center, Chang Gung University, Taoyuan 333, Taiwan
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Niedermeyer THJ, Strohalm M. mMass as a software tool for the annotation of cyclic peptide tandem mass spectra. PLoS One 2012. [PMID: 23028676 DOI: 10.1055/s-0032-1321299] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2023] Open
Abstract
Natural or synthetic cyclic peptides often possess pronounced bioactivity. Their mass spectrometric characterization is difficult due to the predominant occurrence of non-proteinogenic monomers and the complex fragmentation patterns observed. Even though several software tools for cyclic peptide tandem mass spectra annotation have been published, these tools are still unable to annotate a majority of the signals observed in experimentally obtained mass spectra. They are thus not suitable for extensive mass spectrometric characterization of these compounds. This lack of advanced and user-friendly software tools has motivated us to extend the fragmentation module of a freely available open-source software, mMass (http://www.mmass.org), to allow for cyclic peptide tandem mass spectra annotation and interpretation. The resulting software has been tested on several cyanobacterial and other naturally occurring peptides. It has been found to be superior to other currently available tools concerning both usability and annotation extensiveness. Thus it is highly useful for accelerating the structure confirmation and elucidation of cyclic as well as linear peptides and depsipeptides.
Collapse
|
47
|
A semi-empirical approach for predicting unobserved peptide MS/MS spectra from spectral libraries. Proteomics 2011; 11:4702-11. [DOI: 10.1002/pmic.201100316] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Revised: 08/30/2011] [Accepted: 09/30/2011] [Indexed: 01/07/2023]
|
48
|
Valentine SJ, Ewing MA, Dilger JM, Glover MS, Geromanos S, Hughes C, Clemmer DE. Using ion mobility data to improve peptide identification: intrinsic amino acid size parameters. J Proteome Res 2011; 10:2318-29. [PMID: 21417239 PMCID: PMC3138335 DOI: 10.1021/pr1011312] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
A new method for enhancing peptide ion identification in proteomics analyses using ion mobility data is presented. Ideally, direct comparisons of experimental drift times (t(D)) with a standard mobility database could be used to rank candidate peptide sequence assignments. Such a database would represent only a fraction of sequences in protein databases and significant difficulties associated with the verification of data for constituent peptide ions would exist. A method that employs intrinsic amino acid size parameters to obtain ion mobility predictions that can be used to rank candidate peptide ion assignments is proposed. Intrinsic amino acid size parameters have been determined for doubly charged peptide ions from an annotated yeast proteome. Predictions of ion mobilities using the intrinsic size parameters are more accurate than those obtained from a polynomial fit to t(D) versus molecular weight data. More than a 2-fold improvement in prediction accuracy has been observed for a group of arginine-terminated peptide ions 12 residues in length. The use of this predictive enhancement as a means to aid peptide ion identification is discussed, and a simple peptide ion scoring scheme is presented.
Collapse
Affiliation(s)
- Stephen J Valentine
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405, United States
| | | | | | | | | | | | | |
Collapse
|
49
|
Bielow C, Aiche S, Andreotti S, Reinert K. MSSimulator: Simulation of Mass Spectrometry Data. J Proteome Res 2011; 10:2922-9. [DOI: 10.1021/pr200155f] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Chris Bielow
- Institute of Computer Science, Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- International Max Planck Research School for Computational Biology and Scientific Computing, Berlin, Germany
| | - Stephan Aiche
- Institute of Computer Science, Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- International Max Planck Research School for Computational Biology and Scientific Computing, Berlin, Germany
| | - Sandro Andreotti
- Institute of Computer Science, Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- International Max Planck Research School for Computational Biology and Scientific Computing, Berlin, Germany
| | - Knut Reinert
- Institute of Computer Science, Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
50
|
Yadav AK, Kumar D, Dash D. MassWiz: A Novel Scoring Algorithm with Target-Decoy Based Analysis Pipeline for Tandem Mass Spectrometry. J Proteome Res 2011; 10:2154-60. [DOI: 10.1021/pr200031z] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Amit Kumar Yadav
- Institute of Genomics and Integrative Biology (CSIR), Mall Road, Delhi, India
| | - Dhirendra Kumar
- Institute of Genomics and Integrative Biology (CSIR), Mall Road, Delhi, India
| | - Debasis Dash
- Institute of Genomics and Integrative Biology (CSIR), Mall Road, Delhi, India
| |
Collapse
|