1
|
Carrà A, Spezia R. In Silico
Tandem Mass Spectrometer: an Analytical and Fundamental Tool. ACTA ACUST UNITED AC 2021. [DOI: 10.1002/cmtd.202000071] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Andrea Carrà
- Agilent Technologies Italia Via Piero Gobetti 2/C 20063 Cernusco SN, Milano Italy
| | - Riccardo Spezia
- Laboratoire de Chimie Théorique Sorbonne Université, UMR 7616 CNRS 4, Place Jussieu 75005 Paris France
| |
Collapse
|
2
|
Lin YM, Chen CT, Chang JM. MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks. BMC Genomics 2019; 20:906. [PMID: 31874640 PMCID: PMC6929458 DOI: 10.1186/s12864-019-6297-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 11/15/2019] [Indexed: 01/22/2023] Open
Abstract
Background Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been previously identified. An accurate tandem mass spectrum prediction tool is thus crucial in expanding the peptide space and increasing the coverage of spectral library search. Results We propose MS2CNN, a non-linear regression model based on deep convolutional neural networks, a deep learning algorithm. The features for our model are amino acid composition, predicted secondary structure, and physical-chemical features such as isoelectric point, aromaticity, helicity, hydrophobicity, and basicity. MS2CNN was trained with five-fold cross validation on a three-way data split on the large-scale human HCD MS2 dataset of Orbitrap LC-MS/MS downloaded from the National Institute of Standards and Technology. It was then evaluated on a publicly available independent test dataset of human HeLa cell lysate from LC-MS experiments. On average, our model shows better cosine similarity and Pearson correlation coefficient (0.690 and 0.632) than MS2PIP (0.647 and 0.601) and is comparable with pDeep (0.692 and 0.642). Notably, for the more complex MS2 spectra of 3+ peptides, MS2PIP is significantly better than both MS2PIP and pDeep. Conclusions We showed that MS2CNN outperforms MS2PIP for 2+ and 3+ peptides and pDeep for 3+ peptides. This implies that MS2CNN, the proposed convolutional neural network model, generates highly accurate MS2 spectra for LC-MS/MS experiments using Orbitrap machines, which can be of great help in protein and peptide identifications. The results suggest that incorporating more data for deep learning model may improve performance.
Collapse
Affiliation(s)
- Yang-Ming Lin
- Department of Computer Science, National Chengchi University, 11605, Taipei City, Taiwan
| | - Ching-Tai Chen
- Institute of Information Science, Academia Sinica, 115, Taipei City, Taiwan
| | - Jia-Ming Chang
- Department of Computer Science, National Chengchi University, 11605, Taipei City, Taiwan.
| |
Collapse
|
3
|
Shao W, Lam H. Tandem mass spectral libraries of peptides and their roles in proteomics research. MASS SPECTROMETRY REVIEWS 2017; 36:634-648. [PMID: 27403644 DOI: 10.1002/mas.21512] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 05/21/2016] [Indexed: 05/15/2023]
Abstract
Proteomics is a rapidly maturing field aimed at the high-throughput identification and quantification of all proteins in a biological system. The cornerstone of proteomic technology is tandem mass spectrometry of peptides resulting from the digestion of protein mixtures. The fragmentation pattern of each peptide ion is captured in its tandem mass spectrum, which enables its identification and acts as a fingerprint for the peptide. Spectral libraries are simply searchable collections of these fingerprints, which have taken on an increasingly prominent role in proteomic data analysis. This review describes the historical development of spectral libraries in proteomics, details the computational procedures behind library building and searching, surveys the current applications of spectral libraries, and discusses the outstanding challenges. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 36:634-648, 2017.
Collapse
Affiliation(s)
- Wenguang Shao
- Department of Biology, Institute of Molecular Systems Biology, Eidgenössische Technische Hochschule (ETH) Zurich, Zurich, Switzerland
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Henry Lam
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
- Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| |
Collapse
|
4
|
Li S, Dabir A, Misal SA, Tang H, Radivojac P, Reilly JP. Impact of Amidination on Peptide Fragmentation and Identification in Shotgun Proteomics. J Proteome Res 2016; 15:3656-3665. [PMID: 27615690 DOI: 10.1021/acs.jproteome.6b00468] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Peptide amidination labeling using S-methyl thioacetimidate (SMTA) is investigated in an attempt to increase the number and types of peptides that can be detected in a bottom-up proteomics experiment. This derivatization method affects the basicity of lysine residues and is shown here to significantly impact the idiosyncracies of peptide fragmentation and peptide detectability. The unique and highly reproducible fragmentation properties of SMTA-labeled peptides, such as the strong propensity for forming b1 fragment ions, can be further exploited to modify the scoring of peptide-spectrum pairs and improve peptide identification. To this end, we have developed a supervised postprocessing algorithm to exploit these characteristics of peptides labeled by SMTA. Our experiments show that although the overall number of identifications are similar, the SMTA modification enabled the detection of 16-26% peptides not previously observed in comparable CID/HCD tandem mass spectrometry experiments without SMTA labeling.
Collapse
Affiliation(s)
- Sujun Li
- School of Informatics and Computing, Indiana University , Bloomington, Indiana 47405, United States
| | - Aditi Dabir
- Department of Chemistry, Indiana University , Bloomington, Indiana 47405, United States
| | - Santosh A Misal
- Department of Chemistry, Indiana University , Bloomington, Indiana 47405, United States
| | - Haixu Tang
- School of Informatics and Computing, Indiana University , Bloomington, Indiana 47405, United States
| | - Predrag Radivojac
- School of Informatics and Computing, Indiana University , Bloomington, Indiana 47405, United States
| | - James P Reilly
- Department of Chemistry, Indiana University , Bloomington, Indiana 47405, United States
| |
Collapse
|
5
|
Pai PJ, Hu Y, Lam H. Direct glycan structure determination of intact N-linked glycopeptides by low-energy collision-induced dissociation tandem mass spectrometry and predicted spectral library searching. Anal Chim Acta 2016; 934:152-62. [DOI: 10.1016/j.aca.2016.05.049] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Revised: 05/24/2016] [Accepted: 05/30/2016] [Indexed: 11/24/2022]
|
6
|
Bazsó FL, Ozohanics O, Schlosser G, Ludányi K, Vékey K, Drahos L. Quantitative Comparison of Tandem Mass Spectra Obtained on Various Instruments. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2016; 27:1357-1365. [PMID: 27206510 DOI: 10.1007/s13361-016-1408-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Revised: 04/08/2016] [Accepted: 04/13/2016] [Indexed: 06/05/2023]
Abstract
The similarity between two tandem mass spectra, which were measured on different instruments, was compared quantitatively using the similarity index (SI), defined as the dot product of the square root of peak intensities in the respective spectra. This function was found to be useful for comparing energy-dependent tandem mass spectra obtained on various instruments. Spectral comparisons show the similarity index in a 2D "heat map", indicating which collision energy combinations result in similar spectra, and how good this agreement is. The results and methodology can be used in the pharma industry to design experiments and equipment well suited for good reproducibility. We suggest that to get good long-term reproducibility, it is best to adjust the collision energy to yield a spectrum very similar to a reference spectrum. It is likely to yield better results than using the same tuning file, which, for example, does not take into account that contamination of the ion source due to extended use may influence instrument tuning. The methodology may be used to characterize energy dependence on various instrument types, to optimize instrumentation, and to study the influence or correlation between various experimental parameters. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Fanni Laura Bazsó
- MS Proteomics Research Group, Research Center for Natural Sciences, Hungarian Academy of Sciences, H-1117, Magyar tudósok krt. 2, Budapest, Hungary
| | - Oliver Ozohanics
- MTA-TTK NAP B MS Neuroproteomics Research Group, Research Center for Natural Sciences, Hungarian Academy of Sciences, H-1117, Magyar tudósok krt. 2, Budapest, Hungary
| | - Gitta Schlosser
- MTA-ELTE Research Group of Peptide Chemistry, Hungarian Academy of Sciences, Eötvös Loránd University, 1117, Budapest, Hungary
| | - Krisztina Ludányi
- Department of Pharmaceutics, Semmelweis University, Hőgyes E. Street 7-9, H-1092, Budapest, Hungary
| | - Károly Vékey
- MS Proteomics Research Group, Research Center for Natural Sciences, Hungarian Academy of Sciences, H-1117, Magyar tudósok krt. 2, Budapest, Hungary
- Core Technologies Center, Research Center for Natural Sciences, Hungarian Academy of Sciences, H-1117, Magyar tudosok krt. 2, Budapest, Hungary
| | - László Drahos
- MS Proteomics Research Group, Research Center for Natural Sciences, Hungarian Academy of Sciences, H-1117, Magyar tudósok krt. 2, Budapest, Hungary.
- MTA-TTK NAP B MS Neuroproteomics Research Group, Research Center for Natural Sciences, Hungarian Academy of Sciences, H-1117, Magyar tudósok krt. 2, Budapest, Hungary.
| |
Collapse
|
7
|
Griss J. Spectral library searching in proteomics. Proteomics 2016; 16:729-40. [PMID: 26616598 DOI: 10.1002/pmic.201500296] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 10/15/2015] [Accepted: 10/29/2015] [Indexed: 12/12/2022]
Abstract
Spectral library searching has become a mature method to identify tandem mass spectra in proteomics data analysis. This review provides a comprehensive overview of available spectral library search engines and highlights their distinct features. Additionally, resources providing spectral libraries are summarized and tools presented that extend experimental spectral libraries by simulating spectra. Finally, spectrum clustering algorithms are discussed that utilize the same spectrum-to-spectrum matching algorithms as spectral library search engines and allow novel methods to analyse proteomics data.
Collapse
Affiliation(s)
- Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
8
|
Cho JY, Lee HJ, Jeong SK, Kim KY, Kwon KH, Yoo JS, Omenn GS, Baker MS, Hancock WS, Paik YK. Combination of Multiple Spectral Libraries Improves the Current Search Methods Used to Identify Missing Proteins in the Chromosome-Centric Human Proteome Project. J Proteome Res 2015; 14:4959-66. [DOI: 10.1021/acs.jproteome.5b00578] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jin-Young Cho
- Yonsei
Proteome Research Center, Department of Integrated OMICS for Biomedical
Science and Department of Biochemistry, College of Life Science and
Biotechnology, Yonsei University, 50 Yonsei-Ro, Seodaemoon-gu, Seoul 120-749, Korea
| | - Hyoung-Joo Lee
- Yonsei
Proteome Research Center, Department of Integrated OMICS for Biomedical
Science and Department of Biochemistry, College of Life Science and
Biotechnology, Yonsei University, 50 Yonsei-Ro, Seodaemoon-gu, Seoul 120-749, Korea
| | - Seul-Ki Jeong
- Yonsei
Proteome Research Center, Department of Integrated OMICS for Biomedical
Science and Department of Biochemistry, College of Life Science and
Biotechnology, Yonsei University, 50 Yonsei-Ro, Seodaemoon-gu, Seoul 120-749, Korea
| | - Kwang-Youl Kim
- Yonsei
Proteome Research Center, Department of Integrated OMICS for Biomedical
Science and Department of Biochemistry, College of Life Science and
Biotechnology, Yonsei University, 50 Yonsei-Ro, Seodaemoon-gu, Seoul 120-749, Korea
| | | | | | - Gilbert S. Omenn
- Center
for Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor 48109, Michigan United States
| | - Mark S. Baker
- Department
of Biomedical Science, Faculty of Medicine and Health Science, Macquarie University, New South Wales 2109, Australia
| | | | - Young-Ki Paik
- Yonsei
Proteome Research Center, Department of Integrated OMICS for Biomedical
Science and Department of Biochemistry, College of Life Science and
Biotechnology, Yonsei University, 50 Yonsei-Ro, Seodaemoon-gu, Seoul 120-749, Korea
| |
Collapse
|
9
|
Kelchtermans P, Bittremieux W, De Grave K, Degroeve S, Ramon J, Laukens K, Valkenborg D, Barsnes H, Martens L. Machine learning applications in proteomics research: how the past can boost the future. Proteomics 2014; 14:353-66. [PMID: 24323524 DOI: 10.1002/pmic.201300289] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/24/2013] [Accepted: 10/14/2013] [Indexed: 01/22/2023]
Abstract
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.
Collapse
Affiliation(s)
- Pieter Kelchtermans
- Department of Medical Protein Research, VIB, Ghent, Belgium; Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium; Flemish Institute for Technological Research (VITO), Boeretang, Mol, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|