1
|
Adams C, Laukens K, Bittremieux W, Boonen K. Machine learning-based peptide-spectrum match rescoring opens up the immunopeptidome. Proteomics 2024; 24:e2300336. [PMID: 38009585 DOI: 10.1002/pmic.202300336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 11/29/2023]
Abstract
Immunopeptidomics is a key technology in the discovery of targets for immunotherapy and vaccine development. However, identifying immunopeptides remains challenging due to their non-tryptic nature, which results in distinct spectral characteristics. Moreover, the absence of strict digestion rules leads to extensive search spaces, further amplified by the incorporation of somatic mutations, pathogen genomes, unannotated open reading frames, and post-translational modifications. This inflation in search space leads to an increase in random high-scoring matches, resulting in fewer identifications at a given false discovery rate. Peptide-spectrum match rescoring has emerged as a machine learning-based solution to address challenges in mass spectrometry-based immunopeptidomics data analysis. It involves post-processing unfiltered spectrum annotations to better distinguish between correct and incorrect peptide-spectrum matches. Recently, features based on predicted peptidoform properties, including fragment ion intensities, retention time, and collisional cross section, have been used to improve the accuracy and sensitivity of immunopeptide identification. In this review, we describe the diverse bioinformatics pipelines that are currently available for peptide-spectrum match rescoring and discuss how they can be used for the analysis of immunopeptidomics data. Finally, we provide insights into current and future machine learning solutions to boost immunopeptide identification.
Collapse
Affiliation(s)
- Charlotte Adams
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Wout Bittremieux
- Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kurt Boonen
- Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
- ImmuneSpec BV, Niel, Belgium
| |
Collapse
|
2
|
Skiadopoulou D, Vašíček J, Kuznetsova K, Bouyssié D, Käll L, Vaudel M. Retention Time and Fragmentation Predictors Increase Confidence in Identification of Common Variant Peptides. J Proteome Res 2023; 22:3190-3199. [PMID: 37656829 PMCID: PMC10563157 DOI: 10.1021/acs.jproteome.3c00243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Indexed: 09/03/2023]
Abstract
Precision medicine focuses on adapting care to the individual profile of patients, for example, accounting for their unique genetic makeup. Being able to account for the effect of genetic variation on the proteome holds great promise toward this goal. However, identifying the protein products of genetic variation using mass spectrometry has proven very challenging. Here we show that the identification of variant peptides can be improved by the integration of retention time and fragmentation predictors into a unified proteogenomic pipeline. By combining these intrinsic peptide characteristics using the search-engine post-processor Percolator, we demonstrate improved discrimination power between correct and incorrect peptide-spectrum matches. Our results demonstrate that the drop in performance that is induced when expanding a protein sequence database can be compensated, hence enabling efficient identification of genetic variation products in proteomics data. We anticipate that this enhancement of proteogenomic pipelines can provide a more refined picture of the unique proteome of patients and thereby contribute to improving patient care.
Collapse
Affiliation(s)
- Dafni Skiadopoulou
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - Jakub Vašíček
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - Ksenia Kuznetsova
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
| | - David Bouyssié
- Institut
de Pharmacologie et de Biologie Structurale (IPBS), Université
de Toulouse, CNRS, Université Toulouse III—Paul Sabatier
(UT3), 31000 Toulouse, France
| | - Lukas Käll
- Science
for Life Laboratory, School of Engineering Sciences in Chemistry,
Biotechnology and Health, KTH Royal Institute
of Technology, SE-100 44 Stockholm, Sweden
| | - Marc Vaudel
- Mohn
Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, NO-5020 Bergen, Norway
- Computational
Biology Unit, Department of Informatics, University of Bergen, NO-5020 Bergen, Norway
- Department
of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, N-0213 Oslo, Norway
| |
Collapse
|
3
|
Yang KL, Yu F, Teo GC, Li K, Demichev V, Ralser M, Nesvizhskii AI. MSBooster: improving peptide identification rates using deep learning-based features. Nat Commun 2023; 14:4539. [PMID: 37500632 PMCID: PMC10374903 DOI: 10.1038/s41467-023-40129-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 07/06/2023] [Indexed: 07/29/2023] Open
Abstract
Peptide identification in liquid chromatography-tandem mass spectrometry (LC-MS/MS) experiments relies on computational algorithms for matching acquired MS/MS spectra against sequences of candidate peptides using database search tools, such as MSFragger. Here, we present a new tool, MSBooster, for rescoring peptide-to-spectrum matches using additional features incorporating deep learning-based predictions of peptide properties, such as LC retention time, ion mobility, and MS/MS spectra. We demonstrate the utility of MSBooster, in tandem with MSFragger and Percolator, in several different workflows, including nonspecific searches (immunopeptidomics), direct identification of peptides from data independent acquisition data, single-cell proteomics, and data generated on an ion mobility separation-enabled timsTOF MS platform. MSBooster is fast, robust, and fully integrated into the widely used FragPipe computational platform.
Collapse
Affiliation(s)
- Kevin L Yang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Fengchao Yu
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| | - Guo Ci Teo
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA
| | - Kai Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Vadim Demichev
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Markus Ralser
- Department of Biochemistry, Charité Universitätsmedizin, Berlin, Germany
- Nuffield Department of Medicine, The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
4
|
Geer LY, Lapin J, Slotta DJ, Mak TD, Stein SE. AIomics: Exploring More of the Proteome Using Mass Spectral Libraries Extended by Artificial Intelligence. J Proteome Res 2023; 22:2246-2255. [PMID: 37232537 PMCID: PMC10542943 DOI: 10.1021/acs.jproteome.2c00807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The unbounded permutations of biological molecules, including proteins and their constituent peptides, present a dilemma in identifying the components of complex biosamples. Sequence search algorithms used to identify peptide spectra can be expanded to cover larger classes of molecules, including more modifications, isoforms, and atypical cleavage, but at the cost of false positives or false negatives due to the simplified spectra they compute from sequence records. Spectral library searching can help solve this issue by precisely matching experimental spectra to library spectra with excellent sensitivity and specificity. However, compiling spectral libraries that span entire proteomes is pragmatically difficult. Neural networks that predict complete spectra containing a full range of annotated and unannotated ions can be used to replace these simplified spectra with libraries of fully predicted spectra, including modified peptides. Using such a network, we created predicted spectral libraries that were used to rescore matches from a sequence search done over a large search space, including a large number of modifications. Rescoring improved the separation of true and false hits by 82%, yielding an 8% increase in peptide identifications, including a 21% increase in nonspecifically cleaved peptides and a 17% increase in phosphopeptides.
Collapse
Affiliation(s)
- Lewis Y. Geer
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Joel Lapin
- Department of Physics, Georgetown University, Washington, DC 20057, United States
- Associate, Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Douglas J. Slotta
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Tytus D. Mak
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| | - Stephen E. Stein
- Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States
| |
Collapse
|
5
|
Qin J, Guo J, Tang G, Li L, Yao SQ. Multiplex Identification of Post-Translational Modifications at Point-of-Care by Deep Learning-Assisted Hydrogel Sensors. Angew Chem Int Ed Engl 2023; 62:e202218412. [PMID: 36815677 DOI: 10.1002/anie.202218412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 02/02/2023] [Accepted: 02/23/2023] [Indexed: 02/24/2023]
Abstract
Multiplex detection of protein post-translational modifications (PTMs), especially at point-of-care, is of great significance in cancer diagnosis. Herein, we report a machine learning-assisted photonic crystal hydrogel (PCH) sensor for multiplex detection of PTMs. With closely-related PCH sensors microfabricated on a single chip, our design achieved not only rapid screening of PTMs at specific protein sites by using only naked eyes/cellphone, but also the feasibility of real-time monitoring of phosphorylation reactions. By taking advantage of multiplex sensor chips and a neural network algorithm, accurate prediction of PTMs by both their types and concentrations was enabled. This approach was ultimately used to detect and differentiate up/down regulation of different phosphorylation sites within the same protein in live mammalian cells. Our developed method thus holds potential for POC identification of various PTMs in early-stage diagnosis of protein-related diseases.
Collapse
Affiliation(s)
- Junjie Qin
- Department of Chemistry, National University of Singapore, Singapore, 117543, Singapore
| | - Jia Guo
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Guanghui Tang
- Department of Chemistry, National University of Singapore, Singapore, 117543, Singapore
| | - Lin Li
- The Institute of Flexible Electronics (IFE, Future Technologies), Xiamen University, Xiamen, 361005, Fujian, China
| | - Shao Q Yao
- Department of Chemistry, National University of Singapore, Singapore, 117543, Singapore
| |
Collapse
|
6
|
Wińska P, Sobiepanek A, Pawlak K, Staniszewska M, Cieśla J. Phosphorylation of Thymidylate Synthase and Dihydrofolate Reductase in Cancer Cells and the Effect of CK2α Silencing. Int J Mol Sci 2023; 24:ijms24033023. [PMID: 36769342 PMCID: PMC9917831 DOI: 10.3390/ijms24033023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 01/30/2023] [Accepted: 02/01/2023] [Indexed: 02/08/2023] Open
Abstract
Our previous research suggests an important regulatory role of CK2-mediated phosphorylation of enzymes involved in the thymidylate biosynthesis cycle, i.e., thymidylate synthase (TS), dihydrofolate reductase (DHFR), and serine hydroxymethyltransferase (SHMT). The aim of this study was to show whether silencing of the CK2α gene affects TS and DHFR expression in A-549 cells. Additionally, we attempted to identify the endogenous kinases that phosphorylate TS and DHFR in CCRF-CEM and A-549 cells. We used immunodetection, immunofluorescence/confocal analyses, reverse transcription-quantitative polymerase chain reaction (RT-qPCR), in-gel kinase assay, and mass spectrometry analysis. Our results demonstrate that silencing of the CK2α gene in lung adenocarcinoma cells significantly increases both TS and DHFR expression and affects their cellular distribution. Additionally, we show for the first time that both TS and DHFR are very likely phosphorylated by endogenous CK2 in two types of cancer cells, i.e., acute lymphoblastic leukaemia and lung adenocarcinoma. Moreover, our studies indicate that DHFR is phosphorylated intracellularly by CK2 to a greater extent in leukaemia cells than in lung adenocarcinoma cells. Interestingly, in-gel kinase assay results indicate that the CK2α' isoform was more active than the CK2α subunit. Our results confirm the previous studies concerning the physiological relevance of CK2-mediated phosphorylation of TS and DHFR.
Collapse
Affiliation(s)
- Patrycja Wińska
- Chair of Drug and Cosmetics Biotechnology, Faculty of Chemistry, Warsaw University of Technology, 00-664 Warsaw, Poland
- Correspondence: (P.W.); (M.S.); Tel.: +48-222-345-573 (P.W.); +48-606-438-241 (M.S.)
| | - Anna Sobiepanek
- Chair of Drug and Cosmetics Biotechnology, Faculty of Chemistry, Warsaw University of Technology, 00-664 Warsaw, Poland
| | - Katarzyna Pawlak
- Chair of Analytical Chemistry, Faculty of Chemistry, Warsaw University of Technology, 00-664 Warsaw, Poland
| | - Monika Staniszewska
- Centre for Advanced Materials and Technologies, Warsaw University of Technology, Poleczki 19, 02-822 Warsaw, Poland
- Correspondence: (P.W.); (M.S.); Tel.: +48-222-345-573 (P.W.); +48-606-438-241 (M.S.)
| | - Joanna Cieśla
- Chair of Drug and Cosmetics Biotechnology, Faculty of Chemistry, Warsaw University of Technology, 00-664 Warsaw, Poland
| |
Collapse
|
7
|
Searle BC, Shannon AE, Wilburn DB. Scribe: Next Generation Library Searching for DDA Experiments. J Proteome Res 2023; 22:482-490. [PMID: 36695531 DOI: 10.1021/acs.jproteome.2c00672] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Spectrum library searching is a powerful alternative to database searching for data dependent acquisition experiments, but has been historically limited to identifying previously observed peptides in libraries. Here we present Scribe, a new library search engine designed to leverage deep learning fragmentation prediction software such as Prosit. Rather than relying on highly curated DDA libraries, this approach predicts fragmentation and retention times for every peptide in a FASTA database. Scribe embeds Percolator for false discovery rate correction and an interference tolerant, label-free quantification integrator for an end-to-end proteomics workflow. By leveraging expected relative fragmentation and retention time values, we find that library searching with Scribe can outperform traditional database searching tools both in terms of sensitivity and quantitative precision. Scribe and its graphical interface are easy to use, freely accessible, and fully open source.
Collapse
Affiliation(s)
- Brian C Searle
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States.,Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States.,Proteome Software Inc., Portland, Oregon97219, United States
| | - Ariana E Shannon
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States.,Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| | - Damien Beau Wilburn
- Department of Biomedical Informatics, The Ohio State University Medical Center, Columbus, Ohio43210, United States.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio43210, United States.,Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio43210, United States
| |
Collapse
|
8
|
Cox J. Prediction of peptide mass spectral libraries with machine learning. Nat Biotechnol 2023; 41:33-43. [PMID: 36008611 DOI: 10.1038/s41587-022-01424-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 07/11/2022] [Indexed: 01/21/2023]
Abstract
The recent development of machine learning methods to identify peptides in complex mass spectrometric data constitutes a major breakthrough in proteomics. Longstanding methods for peptide identification, such as search engines and experimental spectral libraries, are being superseded by deep learning models that allow the fragmentation spectra of peptides to be predicted from their amino acid sequence. These new approaches, including recurrent neural networks and convolutional neural networks, use predicted in silico spectral libraries rather than experimental libraries to achieve higher sensitivity and/or specificity in the analysis of proteomics data. Machine learning is galvanizing applications that involve large search spaces, such as immunopeptidomics and proteogenomics. Current challenges in the field include the prediction of spectra for peptides with post-translational modifications and for cross-linked pairs of peptides. Permeation of machine-learning-based spectral prediction into search engines and spectrum-centric data-independent acquisition workflows for diverse peptide classes and measurement conditions will continue to push sensitivity and dynamic range in proteomics applications in the coming years.
Collapse
Affiliation(s)
- Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany.
- Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway.
| |
Collapse
|
9
|
McDonnell K, Howley E, Abram F. Critical evaluation of the use of artificial data for machine learning based de novo peptide identification. Comput Struct Biotechnol J 2023; 21:2732-2743. [PMID: 37168871 PMCID: PMC10165132 DOI: 10.1016/j.csbj.2023.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/16/2023] [Accepted: 04/16/2023] [Indexed: 05/13/2023] Open
Abstract
Proteins are essential components of all living cells and so the study of their in situ expression, proteomics, has wide reaching applications. Peptide identification in proteomics typically relies on matching high resolution tandem mass spectra to a protein database but can also be performed de novo. While artificial spectra have been successfully incorporated into database search pipelines to increase peptide identification rates, little work has been done to investigate the utility of artificial spectra in the context of de novo peptide identification. Here, we perform a critical analysis of the use of artificial data for the training and evaluation of de novo peptide identification algorithms. First, we classify the different fragment ion types present in real spectra and then estimate the number of spurious matches using random peptides. We then categorise the different types of noise present in real spectra. Finally, we transfer this knowledge to artificial data and test the performance of a state-of-the-art de novo peptide identification algorithm trained using artificial spectra with and without relevant noise addition. Noise supplementation increased artificial training data performance from 30% to 77% of real training data peptide recall. While real data performance was not fully replicated, this work provides the first steps towards an artificial spectrum framework for the training and evaluation of de novo peptide identification algorithms. Further enhanced artificial spectra may allow for more in depth analysis of de novo algorithms as well as alleviating the reliance on database searches for training data.
Collapse
Affiliation(s)
- Kevin McDonnell
- Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland
- School of Computer Science, University of Galway, Ireland
- Corresponding author at: Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland.
| | - Enda Howley
- School of Computer Science, University of Galway, Ireland
| | - Florence Abram
- Functional Environmental Microbiology, School of Natural Sciences, Ryan Institute, University of Galway, Ireland
- Corresponding author.
| |
Collapse
|
10
|
Cormican JA, Horokhovskyi Y, Soh WT, Mishto M, Liepe J. inSPIRE: An Open-Source Tool for Increased Mass Spectrometry Identification Rates Using Prosit Spectral Prediction. Mol Cell Proteomics 2022; 21:100432. [PMID: 36280141 PMCID: PMC9720494 DOI: 10.1016/j.mcpro.2022.100432] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 10/17/2022] [Accepted: 10/19/2022] [Indexed: 11/05/2022] Open
Abstract
Rescoring of mass spectrometry (MS) search results using spectral predictors can strongly increase peptide spectrum match (PSM) identification rates. This approach is particularly effective when aiming to search MS data against large databases, for example, when dealing with nonspecific cleavage in immunopeptidomics or inflation of the reference database for noncanonical peptide identification. Here, we present inSPIRE (in silico Spectral Predictor Informed REscoring), a flexible and performant open-source rescoring pipeline built on Prosit MS spectral prediction, which is compatible with common database search engines. inSPIRE allows large-scale rescoring with data from multiple MS search files, increases sensitivity to minor differences in amino acid residue position, and can be applied to various MS sample types, including tryptic proteome digestions and immunopeptidomes. inSPIRE boosts PSM identification rates in immunopeptidomics, leading to better performance than the original Prosit rescoring pipeline, as confirmed by benchmarking of inSPIRE performance on ground truth datasets. The integration of various features in the inSPIRE backbone further boosts the PSM identification in immunopeptidomics, with a potential benefit for the identification of noncanonical peptides.
Collapse
Affiliation(s)
- John A Cormican
- Max-Planck-Institute for Multidisciplinary Sciences (MPI-NAT), Göttingen, Germany
| | - Yehor Horokhovskyi
- Max-Planck-Institute for Multidisciplinary Sciences (MPI-NAT), Göttingen, Germany
| | - Wai Tuck Soh
- Max-Planck-Institute for Multidisciplinary Sciences (MPI-NAT), Göttingen, Germany
| | - Michele Mishto
- Centre for Inflammation Biology and Cancer Immunology (CIBCI) & Peter Gorer Department of Immunobiology, King's College London, London, United Kingdom; The Francis Crick Institute, London, United Kingdom.
| | - Juliane Liepe
- Max-Planck-Institute for Multidisciplinary Sciences (MPI-NAT), Göttingen, Germany.
| |
Collapse
|
11
|
Abstract
Artificial intelligence (AI) methods have been and are now being increasingly integrated in prediction software implemented in bioinformatics and its glycoscience branch known as glycoinformatics. AI techniques have evolved in the past decades, and their applications in glycoscience are not yet widespread. This limited use is partly explained by the peculiarities of glyco-data that are notoriously hard to produce and analyze. Nonetheless, as time goes, the accumulation of glycomics, glycoproteomics, and glycan-binding data has reached a point where even the most recent deep learning methods can provide predictors with good performance. We discuss the historical development of the application of various AI methods in the broader field of glycoinformatics. A particular focus is placed on shining a light on challenges in glyco-data handling, contextualized by lessons learnt from related disciplines. Ending on the discussion of state-of-the-art deep learning approaches in glycoinformatics, we also envision the future of glycoinformatics, including development that need to occur in order to truly unleash the capabilities of glycoscience in the systems biology era.
Collapse
Affiliation(s)
- Daniel Bojar
- Department
of Chemistry and Molecular Biology, University
of Gothenburg, Gothenburg 41390, Sweden
- Wallenberg
Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 41390, Sweden
| | - Frederique Lisacek
- Proteome
Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
- Computer
Science Department & Section of Biology, University of Geneva, route de Drize 7, CH-1227, Geneva, Switzerland
| |
Collapse
|
12
|
Na S, Choi H, Paek E. Deephos: Predicted spectral database search for TMT-labeled phosphopeptides and its false discovery rate estimation. Bioinformatics 2022; 38:2980-2987. [PMID: 35441674 DOI: 10.1093/bioinformatics/btac280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 03/26/2022] [Accepted: 04/14/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Tandem mass tag (TMT)-based tandem mass spectrometry (MS/MS) has become the method of choice for the quantification of post-translational modifications in complex mixtures. Many cancer proteogenomic studies have highlighted the importance of large-scale phosphopeptide quantification coupled with TMT labeling. Herein, we propose a predicted Spectral DataBase (pSDB) search strategy called Deephos that can improve both sensitivity and specificity in identifying MS/MS spectra of TMT-labeled phosphopeptides. RESULTS With deep learning-based fragment ion prediction, we compiled a pSDB of TMT-labeled phosphopeptides generated from ∼8,000 human phosphoproteins annotated in UniProt. Deep learning could successfully recognize the fragmentation patterns altered by both TMT labeling and phosphorylation. In addition, we discuss the decoy spectra for false discovery rate (FDR) estimation in the pSDB search. We show that FDR could be inaccurately estimated by the existing decoy spectra generation methods and propose an innovative method to generate decoy spectra for more accurate FDR estimation. The utilities of Deephos were demonstrated in multi-stage analyses (coupled with database searches) of glioblastoma, acute myeloid leukemia, and breast cancer phosphoproteomes. AVAILABILITY Deephos pSDB and the search software are available at https://github.com/seungjinna/deephos.
Collapse
Affiliation(s)
- Seungjin Na
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, 04763, Republic of Korea
| | - Hyunjin Choi
- Department of Automotive Engineering, Hanyang University, Seoul, 04763, Republic of Korea
| | - Eunok Paek
- Institute for Artificial Intelligence Research, Hanyang University, Seoul, 04763, Republic of Korea.,Department of Computer Science, Hanyang University, Seoul, 04763, Republic of Korea
| |
Collapse
|
13
|
Liu Y, Wang H, Gui S, Zeng B, Pu J, Zheng P, Zeng L, Luo Y, Wu Y, Zhou C, Song J, Ji P, Wei H, Xie P. Proteomics analysis of the gut-brain axis in a gut microbiota-dysbiosis model of depression. Transl Psychiatry 2021; 11:568. [PMID: 34744165 PMCID: PMC8572885 DOI: 10.1038/s41398-021-01689-w] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 10/17/2021] [Accepted: 10/20/2021] [Indexed: 12/21/2022] Open
Abstract
Major depressive disorder (MDD) is a serious mental illness. Increasing evidence from both animal and human studies suggested that the gut microbiota might be involved in the onset of depression via the gut-brain axis. However, the mechanism in depression remains unclear. To explore the protein changes of the gut-brain axis modulated by gut microbiota, germ-free mice were transplanted with gut microbiota from MDD patients to induce depression-like behaviors. Behavioral tests were performed following fecal microbiota transplantation. A quantitative proteomics approach was used to examine changes in protein expression in the prefrontal cortex (PFC), liver, cecum, and serum. Then differential protein analysis and weighted gene coexpression network analysis were used to identify microbiota-related protein modules. Our results suggested that gut microbiota induced the alteration of protein expression levels in multiple tissues of the gut-brain axis in mice with depression-like phenotype, and these changes of the PFC and liver were model specific compared to chronic stress models. Gene ontology enrichment analysis revealed that the protein changes of the gut-brain axis were involved in a variety of biological functions, including metabolic process and inflammatory response, in which energy metabolism is the core change of the protein network. Our data provide clues for future studies in the gut-brain axis on protein level and deepen the understanding of how gut microbiota cause depression-like behaviors.
Collapse
Affiliation(s)
- Yiyun Liu
- grid.452206.70000 0004 1758 417XNHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Haiyang Wang
- grid.452206.70000 0004 1758 417XNHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Siwen Gui
- grid.452206.70000 0004 1758 417XNHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Benhua Zeng
- grid.410570.70000 0004 1760 6682Department of Laboratory Animal Science, College of Basic Medical Sciences, Third Military Medical University, Chongqing, China
| | - Juncai Pu
- grid.452206.70000 0004 1758 417XNHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Peng Zheng
- grid.452206.70000 0004 1758 417XNHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Li Zeng
- grid.452206.70000 0004 1758 417XNHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Yuanyuan Luo
- grid.452206.70000 0004 1758 417XNHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - You Wu
- grid.452206.70000 0004 1758 417XNHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Chanjuan Zhou
- grid.452206.70000 0004 1758 417XNHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Jinlin Song
- grid.203458.80000 0000 8653 0555College of Stomatology, Chongqing Medical University, Chongqing, China
| | - Ping Ji
- grid.203458.80000 0000 8653 0555College of Stomatology, Chongqing Medical University, Chongqing, China
| | - Hong Wei
- Department of Laboratory Animal Science, College of Basic Medical Sciences, Third Military Medical University, Chongqing, China.
| | - Peng Xie
- NHC Key Laboratory of Diagnosis and Treatment on Brain Functional Diseases, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China.
| |
Collapse
|
14
|
Britt HM, Cragnolini T, Thalassinos K. Integration of Mass Spectrometry Data for Structural Biology. Chem Rev 2021; 122:7952-7986. [PMID: 34506113 DOI: 10.1021/acs.chemrev.1c00356] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Mass spectrometry (MS) is increasingly being used to probe the structure and dynamics of proteins and the complexes they form with other macromolecules. There are now several specialized MS methods, each with unique sample preparation, data acquisition, and data processing protocols. Collectively, these methods are referred to as structural MS and include cross-linking, hydrogen-deuterium exchange, hydroxyl radical footprinting, native, ion mobility, and top-down MS. Each of these provides a unique type of structural information, ranging from composition and stoichiometry through to residue level proximity and solvent accessibility. Structural MS has proved particularly beneficial in studying protein classes for which analysis by classic structural biology techniques proves challenging such as glycosylated or intrinsically disordered proteins. To capture the structural details for a particular system, especially larger multiprotein complexes, more than one structural MS method with other structural and biophysical techniques is often required. Key to integrating these diverse data are computational strategies and software solutions to facilitate this process. We provide a background to the structural MS methods and briefly summarize other structural methods and how these are combined with MS. We then describe current state of the art approaches for the integration of structural MS data for structural biology. We quantify how often these methods are used together and provide examples where such combinations have been fruitful. To illustrate the power of integrative approaches, we discuss progress in solving the structures of the proteasome and the nuclear pore complex. We also discuss how information from structural MS, particularly pertaining to protein dynamics, is not currently utilized in integrative workflows and how such information can provide a more accurate picture of the systems studied. We conclude by discussing new developments in the MS and computational fields that will further enable in-cell structural studies.
Collapse
Affiliation(s)
- Hannah M Britt
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, United Kingdom
| | - Tristan Cragnolini
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, United Kingdom.,Institute of Structural and Molecular Biology, Birkbeck College, University of London, London WC1E 7HX, United Kingdom
| | - Konstantinos Thalassinos
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London WC1E 6BT, United Kingdom.,Institute of Structural and Molecular Biology, Birkbeck College, University of London, London WC1E 7HX, United Kingdom
| |
Collapse
|
15
|
Mann M, Kumar C, Zeng WF, Strauss MT. Artificial intelligence for proteomics and biomarker discovery. Cell Syst 2021; 12:759-770. [PMID: 34411543 DOI: 10.1016/j.cels.2021.06.006] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/07/2021] [Accepted: 06/28/2021] [Indexed: 12/14/2022]
Abstract
There is an avalanche of biomedical data generation and a parallel expansion in computational capabilities to analyze and make sense of these data. Starting with genome sequencing and widely employed deep sequencing technologies, these trends have now taken hold in all omics disciplines and increasingly call for multi-omics integration as well as data interpretation by artificial intelligence technologies. Here, we focus on mass spectrometry (MS)-based proteomics and describe how machine learning and, in particular, deep learning now predicts experimental peptide measurements from amino acid sequences alone. This will dramatically improve the quality and reliability of analytical workflows because experimental results should agree with predictions in a multi-dimensional data landscape. Machine learning has also become central to biomarker discovery from proteomics data, which now starts to outperform existing best-in-class assays. Finally, we discuss model transparency and explainability and data privacy that are required to deploy MS-based biomarkers in clinical settings.
Collapse
Affiliation(s)
- Matthias Mann
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | - Chanchal Kumar
- Translational Science & Experimental Medicine, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden.
| | - Wen-Feng Zeng
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | | |
Collapse
|
16
|
Feng S, Sterzenbach R, Guo X. Deep learning for peptide identification from metaproteomics datasets. J Proteomics 2021; 247:104316. [PMID: 34246788 DOI: 10.1016/j.jprot.2021.104316] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 06/02/2021] [Accepted: 06/18/2021] [Indexed: 10/20/2022]
Abstract
Metaproteomics is becoming widely used in microbiome research for gaining insights into the functional state of the microbial community. Current metaproteomics studies are generally based on high-throughput tandem mass spectrometry (MS/MS) coupled with liquid chromatography. In this paper, we proposed a deep-learning-based algorithm, named DeepFilter, for improving peptide identifications from a collection of tandem mass spectra. The key advantage of the DeepFilter is that it does not need ad hoc training or fine-tuning as in existing filtering tools. DeepFilter is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/DeepFilter. SIGNIFICANCE: The identification of peptides and proteins from MS data involves the computational procedure of searching MS/MS spectra against a predefined protein sequence database and assigning top-scored peptides to spectra. Existing computational tools are still far from being able to extract all the information out of MS/MS data sets acquired from metaproteome samples. Systematical experiment results demonstrate that the DeepFilter identified up to 12% and 9% more peptide-spectrum-matches and proteins, respectively, compared with existing filtering algorithms, including Percolator, Q-ranker, PeptideProphet, and iProphet, on marine and soil microbial metaproteome samples with false discovery rate at 1%. The taxonomic analysis shows that DeepFilter found up to 7%, 10%, and 14% more species from marine, soil, and human gut samples compared with existing filtering algorithms. Therefore, DeepFilter was believed to generalize properly to new, previously unseen peptide-spectrum-matches and can be readily applied in peptide identification from metaproteomics data.
Collapse
Affiliation(s)
- Shichao Feng
- Department of Computer Science and Engineering, University of North Texas, TX, USA
| | - Ryan Sterzenbach
- Department of Biomedical Engineering, University of North Texas, TX, USA
| | - Xuan Guo
- Department of Computer Science and Engineering, University of North Texas, TX, USA.
| |
Collapse
|
17
|
Taking the leap between analytical chemistry and artificial intelligence: A tutorial review. Anal Chim Acta 2021; 1161:338403. [DOI: 10.1016/j.aca.2021.338403] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 03/02/2021] [Accepted: 03/03/2021] [Indexed: 01/01/2023]
|
18
|
Guan S, Bythell BJ. Size Dependent Fragmentation Chemistry of Short Doubly Protonated Tryptic Peptides. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2021; 32:1020-1032. [PMID: 33779179 DOI: 10.1021/jasms.1c00009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Tandem mass spectrometry of electrospray ionized multiply charged peptide ions is commonly used to identify the sequence of peptide(s) and infer the identity of source protein(s). Doubly protonated peptide ions are consistently the most efficiently sequenced ions following collision-induced dissociation of peptides generated by tryptic digestion. While the broad characteristics of longer (N ≥ 8 residue) doubly protonated peptides have been investigated, there is comparatively little data on shorter systems where charge repulsion should exhibit the greatest influence on the dissociation chemistry. To address this gap and further understand the chemistry underlying collisional-dissociation of doubly charged tryptic peptides, two series of analytes ([GxR+2H]2+ and [AxR+2H]2+, x = 2-5) were investigated experimentally and with theory. We find distinct differences in the preference of bond cleavage sites for these peptides as a function of size and to a lesser extent composition. Density functional calculations at two levels of theory predict that the threshold relative energies required for bond cleavages at the same site for peptides of different size are quite similar (for example, b2-yN-2). In isolation, this finding is inconsistent with experiment. However, the predicted extent of entropy change of these reactions is size dependent. Subsequent RRKM rate constant calculations provide a far clearer picture of the kinetics of the competing bond cleavage reactions enabling rationalization of experimental findings. The M06-2X data were substantially more consistent with experiment than were the B3LYP data.
Collapse
Affiliation(s)
- Shanshan Guan
- Department of Chemistry and Biochemistry, Ohio University, 307 Chemistry Building, Athens, Ohio 45701, United States
- Department of Chemistry and Biochemistry, University of Missouri-St. Louis, 1 University Boulevard, St. Louis, Missouri 63121, United States
| | - Benjamin J Bythell
- Department of Chemistry and Biochemistry, Ohio University, 307 Chemistry Building, Athens, Ohio 45701, United States
- Department of Chemistry and Biochemistry, University of Missouri-St. Louis, 1 University Boulevard, St. Louis, Missouri 63121, United States
| |
Collapse
|
19
|
Łącki MK, Startek MP, Brehmer S, Distler U, Tenzer S. OpenTIMS, TimsPy, and TimsR: Open and Easy Access to timsTOF Raw Data. J Proteome Res 2021; 20:2122-2129. [PMID: 33724840 DOI: 10.1021/acs.jproteome.0c00962] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The Bruker timsTOF Pro is an instrument that couples trapped ion mobility spectrometry (TIMS) to high-resolution time-of-flight (TOF) mass spectrometry (MS). For proteomics, lipidomics, and metabolomics applications, the instrument is typically interfaced with a liquid chromatography (LC) system. The resulting LC-TIMS-MS data sets are, in general, several gigabytes in size and are stored in the proprietary Bruker Tims data format (TDF). The raw data can be accessed using proprietary binaries in C, C++, and Python on Windows and Linux operating systems. Here we introduce a suite of computer programs for data accession, including OpenTIMS, TimsR, and TimsPy. OpenTIMS is a C++ library capable of reading Bruker TDF files. It opens up Bruker's proprietary codebase. TimsPy and TimsR build on top of OpenTIMS, enabling swift and user-friendly data access to the raw data with Python and R. Both programs are available under a GPL3 license on all major platforms, extending the possibility to interact with timsTOF data to macOS. Additionally, OpenTIMS is capable of translating Bruker data into HDF5 files that can be easily analyzed from Python with the vaex module. OpenTIMS and TimsPy therefore provide easy and quick access to Bruker timsTOF raw data.
Collapse
Affiliation(s)
- Mateusz K Łącki
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
| | - Michał P Startek
- Department of Mathematics, Informatics, and Mechanics, University of Warsaw, 02-097 Warsaw, Poland
| | | | - Ute Distler
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
| | - Stefan Tenzer
- Institute of Immunology, University Medical Center of the Johannes-Gutenberg University Mainz, 55131 Mainz, Germany
| |
Collapse
|
20
|
Hwang H, Szucs MJ, Ding LJ, Allen A, Ren X, Haensgen H, Gao F, Rhim H, Andrade A, Pan JQ, Carr SA, Ahmad R, Xu W. Neurogranin, Encoded by the Schizophrenia Risk Gene NRGN, Bidirectionally Modulates Synaptic Plasticity via Calmodulin-Dependent Regulation of the Neuronal Phosphoproteome. Biol Psychiatry 2021; 89:256-269. [PMID: 33032807 PMCID: PMC9258036 DOI: 10.1016/j.biopsych.2020.07.014] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 07/22/2020] [Accepted: 07/22/2020] [Indexed: 12/22/2022]
Abstract
BACKGROUND Neurogranin (Ng), encoded by the schizophrenia risk gene NRGN, is a calmodulin-binding protein enriched in the postsynaptic compartments, and its expression is reduced in the postmortem brains of patients with schizophrenia. Experience-dependent translation of Ng is critical for encoding contextual memory, and Ng regulates developmental plasticity in the primary visual cortex during the critical period. However, the overall impact of Ng on the neuronal signaling that regulates synaptic plasticity is unknown. METHODS Altered Ng expression was achieved via virus-mediated gene manipulation in mice. The effect on long-term potentiation (LTP) was accessed using spike timing-dependent plasticity protocols. Quantitative phosphoproteomics analyses led to discoveries in significant phosphorylated targets. An identified candidate was examined with high-throughput planar patch clamp and was validated with pharmacological manipulation. RESULTS Ng bidirectionally modulated LTP in the hippocampus. Decreasing Ng levels significantly affected the phosphorylation pattern of postsynaptic density proteins, including glutamate receptors, GTPases, kinases, RNA binding proteins, selective ion channels, and ionic transporters, some of which highlighted clusters of schizophrenia- and autism-related genes. Hypophosphorylation of NMDA receptor subunit Grin2A, one significant phosphorylated target, resulted in accelerated decay of NMDA receptor currents. Blocking protein phosphatase PP2B activity rescued the accelerated NMDA receptor current decay and the impairment of LTP mediated by Ng knockdown, implicating the requirement of synaptic PP2B activity for the deficits. CONCLUSIONS Altered Ng levels affect the phosphorylation landscape of neuronal proteins. PP2B activity is required for mediating the deficit in synaptic plasticity caused by decreasing Ng levels, revealing a novel mechanistic link of a schizophrenia risk gene to cognitive deficits.
Collapse
Affiliation(s)
- Hongik Hwang
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts; Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts; Center for Neuroscience, Brain Science Institute, Korea Institute of Science and Technology, Seoul, Republic of Korea.
| | | | - Lei J. Ding
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Andrew Allen
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Xiaobai Ren
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Henny Haensgen
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Fan Gao
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Hyewhon Rhim
- Center for Neuroscience, Brain Science Institute, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea.,Division of Bio-Medical Science & Technology, KIST School, Korea University of Science and Technology, Seoul 02792, Republic of Korea
| | - Arturo Andrade
- Department of Biological Sciences, University of New Hampshire, Durham, NH 03824, USA
| | - Jen Q. Pan
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Steven A. Carr
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Rushdy Ahmad
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Weifeng Xu
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, Massachusetts; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts.
| |
Collapse
|
21
|
Wei Y, Varanasi RS, Schwarz T, Gomell L, Zhao H, Larson DJ, Sun B, Liu G, Chen H, Raabe D, Gault B. Machine-learning-enhanced time-of-flight mass spectrometry analysis. PATTERNS 2021; 2:100192. [PMID: 33659909 PMCID: PMC7892357 DOI: 10.1016/j.patter.2020.100192] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 11/13/2020] [Accepted: 12/17/2020] [Indexed: 01/06/2023]
Abstract
Mass spectrometry is a widespread approach used to work out what the constituents of a material are. Atoms and molecules are removed from the material and collected, and subsequently, a critical step is to infer their correct identities based on patterns formed in their mass-to-charge ratios and relative isotopic abundances. However, this identification step still mainly relies on individual users' expertise, making its standardization challenging, and hindering efficient data processing. Here, we introduce an approach that leverages modern machine learning technique to identify peak patterns in time-of-flight mass spectra within microseconds, outperforming human users without loss of accuracy. Our approach is cross-validated on mass spectra generated from different time-of-flight mass spectrometry (ToF-MS) techniques, offering the ToF-MS community an open-source, intelligent mass spectra analysis. A machine-learning method provides reliable atomic/molecular labels for ToF-MS No human labeling or prior information required The training dataset is artificially generated based on isotopic abundances Method validated on a variety of materials and two ToF-MS-based techniques
Time-of-flight mass spectrometry (ToF-MS) is a mainstream analytical technique widely used in biology, chemistry, and materials science. ToF-MS provides quantitative compositional analysis with high sensitivity across a wide dynamic range of mass-to-charge ratios. A critical step in ToF-MS is to infer the identity of the detected ions. Here, we introduce a machine-learning-enhanced algorithm to provide a user-independent approach to performing this identification using patterns from the natural isotopic abundances of individual atomic and molecular ions, without human labeling or prior knowledge of composition. Results from several materials and techniques are compared with those obtained by field experts. Our open-source, easy-to-implement, reliable analytic method accelerates this identification process. A wide range of ToF-MS-based applications can benefit from our approach, e.g., hunting for patterns of biomarkers or for contamination on solid surfaces in high-throughput data.
Collapse
Affiliation(s)
- Ye Wei
- Max-Planck-Institut für Eisenforschung, Max-Planck-Strasse 1, 40237 Düsseldorf, Germany
| | | | - Torsten Schwarz
- Max-Planck-Institut für Eisenforschung, Max-Planck-Strasse 1, 40237 Düsseldorf, Germany
| | - Leonie Gomell
- Max-Planck-Institut für Eisenforschung, Max-Planck-Strasse 1, 40237 Düsseldorf, Germany
| | - Huan Zhao
- Max-Planck-Institut für Eisenforschung, Max-Planck-Strasse 1, 40237 Düsseldorf, Germany
| | - David J Larson
- CAMECA Instruments, 5470 Nobel Drive, Madison, WI 53711, USA
| | - Binhan Sun
- Max-Planck-Institut für Eisenforschung, Max-Planck-Strasse 1, 40237 Düsseldorf, Germany
| | - Geng Liu
- Key Laboratory for Advanced Materials of Ministry of Education, School of Materials Science and Engineering, Tsinghua University, Beijing 100084, China
| | - Hao Chen
- Key Laboratory for Advanced Materials of Ministry of Education, School of Materials Science and Engineering, Tsinghua University, Beijing 100084, China
| | - Dierk Raabe
- Max-Planck-Institut für Eisenforschung, Max-Planck-Strasse 1, 40237 Düsseldorf, Germany
| | - Baptiste Gault
- Max-Planck-Institut für Eisenforschung, Max-Planck-Strasse 1, 40237 Düsseldorf, Germany.,Department of Materials, Royal School of Mines, Imperial College, London SW7 2AZ, UK
| |
Collapse
|
22
|
Ye Z, Vakhrushev SY. The Role of Data-Independent Acquisition for Glycoproteomics. Mol Cell Proteomics 2021; 20:100042. [PMID: 33372048 PMCID: PMC8724878 DOI: 10.1074/mcp.r120.002204] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 12/26/2020] [Accepted: 12/28/2020] [Indexed: 12/13/2022] Open
Abstract
Data-independent acquisition (DIA) is now an emerging method in bottom–up proteomics and capable of achieving deep proteome coverage and accurate label-free quantification. However, for post-translational modifications, such as glycosylation, DIA methodology is still in the early stage of development. The full characterization of glycoproteins requires site-specific glycan identification as well as subsequent quantification of glycan structures at each site. The tremendous complexity of glycosylation represents a significant analytical challenge in glycoproteomics. This review focuses on the development and perspectives of DIA methodology for N- and O-linked glycoproteomics and posits that DIA-based glycoproteomics could be a method of choice to address some of the challenging aspects of glycoproteomics. First, the current challenges in glycoproteomics and the basic principles of DIA are briefly introduced. DIA-based glycoproteomics is then summarized and described into four aspects based on the actual samples. Finally, we discussed the important challenges and future perspectives in the field. We believe that DIA can significantly facilitate glycoproteomic studies and contribute to the development of future advanced tools and approaches in the field of glycoproteomics. Protein glycosylation and challenges in glycoproteomics. Data-independent acquisition for deglycosylated and intact N-linked glycopeptides. Unbiased screening of oxonium ions from all glycopeptide precursors. Glyco–data-independent acquisition on mucin-type O-glycopeptides.
Collapse
Affiliation(s)
- Zilu Ye
- Departments of Cellular and Molecular Medicine, Faculty of Health Sciences, Copenhagen Center for Glycomics, University of Copenhagen, Copenhagen N, Denmark
| | - Sergey Y Vakhrushev
- Departments of Cellular and Molecular Medicine, Faculty of Health Sciences, Copenhagen Center for Glycomics, University of Copenhagen, Copenhagen N, Denmark.
| |
Collapse
|
23
|
Hendrickx JO, van Gastel J, Leysen H, Martin B, Maudsley S. High-dimensionality Data Analysis of Pharmacological Systems Associated with Complex Diseases. Pharmacol Rev 2020; 72:191-217. [PMID: 31843941 DOI: 10.1124/pr.119.017921] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
It is widely accepted that molecular reductionist views of highly complex human physiologic activity, e.g., the aging process, as well as therapeutic drug efficacy are largely oversimplifications. Currently some of the most effective appreciation of biologic disease and drug response complexity is achieved using high-dimensionality (H-D) data streams from transcriptomic, proteomic, metabolomics, or epigenomic pipelines. Multiple H-D data sets are now common and freely accessible for complex diseases such as metabolic syndrome, cardiovascular disease, and neurodegenerative conditions such as Alzheimer's disease. Over the last decade our ability to interrogate these high-dimensionality data streams has been profoundly enhanced through the development and implementation of highly effective bioinformatic platforms. Employing these computational approaches to understand the complexity of age-related diseases provides a facile mechanism to then synergize this pathologic appreciation with a similar level of understanding of therapeutic-mediated signaling. For informative pathology and drug-based analytics that are able to generate meaningful therapeutic insight across diverse data streams, novel informatics processes such as latent semantic indexing and topological data analyses will likely be important. Elucidation of H-D molecular disease signatures from diverse data streams will likely generate and refine new therapeutic strategies that will be designed with a cognizance of a realistic appreciation of the complexity of human age-related disease and drug effects. We contend that informatic platforms should be synergistic with more advanced chemical/drug and phenotypic cellular/tissue-based analytical predictive models to assist in either de novo drug prioritization or effective repurposing for the intervention of aging-related diseases. SIGNIFICANCE STATEMENT: All diseases, as well as pharmacological mechanisms, are far more complex than previously thought a decade ago. With the advent of commonplace access to technologies that produce large volumes of high-dimensionality data (e.g., transcriptomics, proteomics, metabolomics), it is now imperative that effective tools to appreciate this highly nuanced data are developed. Being able to appreciate the subtleties of high-dimensionality data will allow molecular pharmacologists to develop the most effective multidimensional therapeutics with effectively engineered efficacy profiles.
Collapse
Affiliation(s)
- Jhana O Hendrickx
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| | - Jaana van Gastel
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| | - Hanne Leysen
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| | - Bronwen Martin
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| | - Stuart Maudsley
- Receptor Biology Laboratory, Department of Biomedical Research (J.O.H., J.v.G., H.L., S.M.) and Faculty of Pharmacy, Biomedical and Veterinary Sciences (J.O.H., J.v.G., H.L., B.M., S.M.), University of Antwerp, Antwerp, Belgium
| |
Collapse
|
24
|
Xu R, Sheng J, Bai M, Shu K, Zhu Y, Chang C. A Comprehensive Evaluation of MS/MS Spectrum Prediction Tools for Shotgun Proteomics. Proteomics 2020; 20:e1900345. [DOI: 10.1002/pmic.201900345] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 04/29/2020] [Indexed: 01/27/2023]
Affiliation(s)
- Rui Xu
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Beijing Institute of Lifeomics Beijing 102206 China
- Chongqing Key Laboratory on Big Data for Bio Intelligence Chongqing University of Posts and Telecommunications Chongqing 400065 China
| | - Jie Sheng
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Beijing Institute of Lifeomics Beijing 102206 China
- Chongqing Key Laboratory on Big Data for Bio Intelligence Chongqing University of Posts and Telecommunications Chongqing 400065 China
| | - Mingze Bai
- Chongqing Key Laboratory on Big Data for Bio Intelligence Chongqing University of Posts and Telecommunications Chongqing 400065 China
| | - Kunxian Shu
- Chongqing Key Laboratory on Big Data for Bio Intelligence Chongqing University of Posts and Telecommunications Chongqing 400065 China
| | - Yunping Zhu
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Beijing Institute of Lifeomics Beijing 102206 China
| | - Cheng Chang
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing) Beijing Institute of Lifeomics Beijing 102206 China
| |
Collapse
|
25
|
Bouwmeester R, Gabriels R, Van Den Bossche T, Martens L, Degroeve S. The Age of Data-Driven Proteomics: How Machine Learning Enables Novel Workflows. Proteomics 2020; 20:e1900351. [PMID: 32267083 DOI: 10.1002/pmic.201900351] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 03/21/2020] [Indexed: 12/30/2022]
Abstract
A lot of energy in the field of proteomics is dedicated to the application of challenging experimental workflows, which include metaproteomics, proteogenomics, data independent acquisition (DIA), non-specific proteolysis, immunopeptidomics, and open modification searches. These workflows are all challenging because of ambiguity in the identification stage; they either expand the search space and thus increase the ambiguity of identifications, or, in the case of DIA, they generate data that is inherently more ambiguous. In this context, machine learning-based predictive models are now generating considerable excitement in the field of proteomics because these predictive models hold great potential to drastically reduce the ambiguity in the identification process of the above-mentioned workflows. Indeed, the field has already produced classical machine learning and deep learning models to predict almost every aspect of a liquid chromatography-mass spectrometry (LC-MS) experiment. Yet despite all the excitement, thorough integration of predictive models in these challenging LC-MS workflows is still limited, and further improvements to the modeling and validation procedures can still be made. Therefore, highly promising recent machine learning developments in proteomics are pointed out in this viewpoint, alongside some of the remaining challenges.
Collapse
Affiliation(s)
- Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Ralf Gabriels
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Tim Van Den Bossche
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| | - Sven Degroeve
- VIB-UGent Center for Medical Biotechnology, VIB, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium.,Department of Biomolecular Medicine, Ghent University, Albert Baertsoenkaai 3, B-9000, Ghent, Belgium
| |
Collapse
|
26
|
R Cerqueira F, Vasconcelos ATR. OCCAM: prediction of small ORFs in bacterial genomes by means of a target-decoy database approach and machine learning techniques. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5989499. [PMID: 33206960 PMCID: PMC7673341 DOI: 10.1093/database/baaa067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 07/11/2020] [Accepted: 07/27/2020] [Indexed: 11/14/2022]
Abstract
Small open reading frames (ORFs) have been systematically disregarded by automatic genome annotation. The difficulty in finding patterns in tiny sequences is the main reason that makes small ORFs to be overlooked by computational procedures. However, advances in experimental methods show that small proteins can play vital roles in cellular activities. Hence, it is urgent to make progress in the development of computational approaches to speed up the identification of potential small ORFs. In this work, our focus is on bacterial genomes. We improve a previous approach to identify small ORFs in bacteria. Our method uses machine learning techniques and decoy subject sequences to filter out spurious ORF alignments. We show that an advanced multivariate analysis can be more effective in terms of sensitivity than applying the simplistic and widely used e-value cutoff. This is particularly important in the case of small ORFs for which alignments present higher e-values than usual. Experiments with control datasets show that the machine learning algorithms used in our method to curate significant alignments can achieve average sensitivity and specificity of 97.06% and 99.61%, respectively. Therefore, an important step is provided here toward the construction of more accurate computational tools for the identification of small ORFs in bacteria.
Collapse
Affiliation(s)
- Fabio R Cerqueira
- Department of Production Engineering, Universidade Federal Fluminense, Rua Domingos Silvério s/n, Petrópolis, 25 650-050, Rio de Janeiro, Brazil.,Graduate Program in Computer Science, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
| | | |
Collapse
|
27
|
Affiliation(s)
- Hannes L Röst
- Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
28
|
Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat Methods 2019; 16:509-518. [DOI: 10.1038/s41592-019-0426-7] [Citation(s) in RCA: 340] [Impact Index Per Article: 68.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 04/18/2019] [Indexed: 11/08/2022]
|
29
|
Kirik U, Refsgaard JC, Jensen LJ. Improving Peptide-Spectrum Matching by Fragmentation Prediction Using Hidden Markov Models. J Proteome Res 2019; 18:2385-2396. [PMID: 31074280 DOI: 10.1021/acs.jproteome.8b00499] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Tandem mass spectrometry has become the method of choice for high-throughput, quantitative analysis in proteomics. Peptide spectrum matching algorithms score the concordance between the experimental and the theoretical spectra of candidate peptides by evaluating the number (or proportion) of theoretically possible fragment ions observed in the experimental spectra without any discrimination. However, the assumption that each theoretical fragment is just as likely to be observed is inaccurate. On the contrary, MS2 spectra often have few dominant fragments. Using millions of MS/MS spectra we show that there is high reproducibility across different fragmentation spectra given the precursor peptide and charge state, implying that there is a pattern to fragmentation. To capture this pattern we propose a novel prediction algorithm based on hidden Markov models with an efficient training process. We investigated the performance of our interpolated-HMM model, trained on millions of MS2 spectra, and found that our model picks up meaningful patterns in peptide fragmentation. Second, looking at the variability of the prediction performance by varying the train/test data split, we observed that our model performs well independent of the specific peptides that are present in the training data. Furthermore, we propose that the real value of this model is as a preprocessing step in the peptide identification process. The model can discern fragment ions that are unlikely to be intense for a given candidate peptide rather than using the actual predicted intensities. As such, probabilistic measures of concordance between experimental and theoretical spectra will leverage better statistics.
Collapse
Affiliation(s)
- Ufuk Kirik
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Science , University of Copenhagen , Blegdamsvej 3B , DK-2200 Copenhagen , Denmark
| | - Jan C Refsgaard
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Science , University of Copenhagen , Blegdamsvej 3B , DK-2200 Copenhagen , Denmark.,Intomics A/S , Lottenborgvej 26 , DK-2800 Kongens Lyngby , Denmark
| | - Lars J Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Science , University of Copenhagen , Blegdamsvej 3B , DK-2200 Copenhagen , Denmark
| |
Collapse
|
30
|
Solovyeva EM, Kopysov VN, Pereverzev AY, Lobas AA, Moshkovskii SA, Gorshkov MV, Boyarkin OV. Method for Identification of Threonine Isoforms in Peptides by Ultraviolet Photofragmentation of Cold Ions. Anal Chem 2019; 91:6709-6715. [DOI: 10.1021/acs.analchem.9b00770] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Elizaveta M. Solovyeva
- Laboratoire de Chimie Physique Moléculaire, École Polytechnique Fédérale de Lausanne, Station-6, 1015 Lausanne, Switzerland
- Moscow Institute of Physics and Technology (State University), 9 Institutskiy per., Dolgoprudny, Moscow Region, 141701, Russia
- V.L. Talrose Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld.2 Moscow, 119334, Russia
| | - Vladimir N. Kopysov
- Laboratoire de Chimie Physique Moléculaire, École Polytechnique Fédérale de Lausanne, Station-6, 1015 Lausanne, Switzerland
| | - Aleksandr Y. Pereverzev
- Laboratoire de Chimie Physique Moléculaire, École Polytechnique Fédérale de Lausanne, Station-6, 1015 Lausanne, Switzerland
| | - Anna A. Lobas
- V.L. Talrose Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld.2 Moscow, 119334, Russia
| | | | - Mikhail V. Gorshkov
- V.L. Talrose Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, 38 Leninsky Pr., Bld.2 Moscow, 119334, Russia
| | - Oleg V. Boyarkin
- Laboratoire de Chimie Physique Moléculaire, École Polytechnique Fédérale de Lausanne, Station-6, 1015 Lausanne, Switzerland
| |
Collapse
|
31
|
Muth T, Renard BY. Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification? Brief Bioinform 2019; 19:954-970. [PMID: 28369237 DOI: 10.1093/bib/bbx033] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Indexed: 01/24/2023] Open
Abstract
While peptide identifications in mass spectrometry (MS)-based shotgun proteomics are mostly obtained using database search methods, high-resolution spectrum data from modern MS instruments nowadays offer the prospect of improving the performance of computational de novo peptide sequencing. The major benefit of de novo sequencing is that it does not require a reference database to deduce full-length or partial tag-based peptide sequences directly from experimental tandem mass spectrometry spectra. Although various algorithms have been developed for automated de novo sequencing, the prediction accuracy of proposed solutions has been rarely evaluated in independent benchmarking studies. The main objective of this work is to provide a detailed evaluation on the performance of de novo sequencing algorithms on high-resolution data. For this purpose, we processed four experimental data sets acquired from different instrument types from collision-induced dissociation and higher energy collisional dissociation (HCD) fragmentation mode using the software packages Novor, PEAKS and PepNovo. Moreover, the accuracy of these algorithms is also tested on ground truth data based on simulated spectra generated from peak intensity prediction software. We found that Novor shows the overall best performance compared with PEAKS and PepNovo with respect to the accuracy of correct full peptide, tag-based and single-residue predictions. In addition, the same tool outpaced the commercial competitor PEAKS in terms of running time speedup by factors of around 12-17. Despite around 35% prediction accuracy for complete peptide sequences on HCD data sets, taken as a whole, the evaluated algorithms perform moderately on experimental data but show a significantly better performance on simulated data (up to 84% accuracy). Further, we describe the most frequently occurring de novo sequencing errors and evaluate the influence of missing fragment ion peaks and spectral noise on the accuracy. Finally, we discuss the potential of de novo sequencing for now becoming more widely used in the field.
Collapse
Affiliation(s)
- Thilo Muth
- Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y Renard
- Research Group Bioinformatics, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
32
|
Hutchins PD, Russell JD, Coon JJ. Mapping Lipid Fragmentation for Tailored Mass Spectral Libraries. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2019; 30:659-668. [PMID: 30756325 PMCID: PMC6447430 DOI: 10.1007/s13361-018-02125-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 12/17/2018] [Accepted: 12/17/2018] [Indexed: 05/17/2023]
Abstract
Libraries of simulated lipid fragmentation spectra enable the identification of hundreds of unique lipids from complex lipid extracts, even when the corresponding lipid reference standards do not exist. Often, these in silico libraries are generated through expert annotation of spectra to extract and model fragmentation rules common to a given lipid class. Although useful for a given sample source or instrumental platform, the time-consuming nature of this approach renders it impractical for the growing array of dissociation techniques and instrument platforms. Here, we introduce Library Forge, a unique algorithm capable of deriving lipid fragment mass-to-charge (m/z) and intensity patterns directly from high-resolution experimental spectra with minimal user input. Library Forge exploits the modular construction of lipids to generate m/z transformed spectra in silico which reveal the underlying fragmentation pathways common to a given lipid class. By learning these fragmentation patterns directly from observed spectra, the algorithm increases lipid spectral matching confidence while reducing spectral library development time from days to minutes. We embed the algorithm within the preexisting lipid analysis architecture of LipiDex to integrate automated and robust library generation within a comprehensive LC-MS/MS lipidomics workflow. Graphical Abstract.
Collapse
Affiliation(s)
- Paul D Hutchins
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA
- Genome Center of Wisconsin, Madison, WI, 53706, USA
| | - Jason D Russell
- Genome Center of Wisconsin, Madison, WI, 53706, USA
- Morgridge Institute for Research, Madison, WI, 53715, USA
| | - Joshua J Coon
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA.
- Genome Center of Wisconsin, Madison, WI, 53706, USA.
- Morgridge Institute for Research, Madison, WI, 53715, USA.
- Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| |
Collapse
|
33
|
Maudsley S, Devanarayan V, Martin B, Geerts H. Intelligent and effective informatic deconvolution of “Big Data” and its future impact on the quantitative nature of neurodegenerative disease therapy. Alzheimers Dement 2018; 14:961-975. [DOI: 10.1016/j.jalz.2018.01.014] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Revised: 10/03/2017] [Accepted: 01/18/2018] [Indexed: 12/31/2022]
Affiliation(s)
- Stuart Maudsley
- Department of Biomedical ResearchUniversity of AntwerpAntwerpBelgium
- VIB Center for Molecular NeurologyAntwerpBelgium
| | | | - Bronwen Martin
- Department of Biomedical ResearchUniversity of AntwerpAntwerpBelgium
| | | | | |
Collapse
|
34
|
Ciach MA, Łącki MK, Miasojedow B, Lermyte F, Valkenborg D, Sobott F, Gambin A. Estimation of Rates of Reactions Triggered by Electron Transfer in Top-Down Mass Spectrometry. J Comput Biol 2017; 25:282-301. [PMID: 28945460 DOI: 10.1089/cmb.2017.0156] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Electron transfer dissociation (ETD) is a versatile technique used in mass spectrometry for the high-throughput characterization of proteins. It consists of several concurrent reactions triggered by the transfer of an electron from its anion source to sample cations. Transferring an electron causes peptide backbone cleavage while leaving labile post-translational modifications intact. The obtained fragmentation spectra provide valuable information for sequence and structure analyses. In this study, we propose a formal mathematical model of the ETD fragmentation process in the form of a system of stochastic differential equations describing its joint dynamics. Parameters of the model correspond to the rates of occurring reactions. Their estimates for various experimental settings give insight into the dynamics of the ETD process. We estimate the model parameters from the relative quantities of fragmentation products in a given mass spectrum by solving a nonlinear optimization problem. The cost function penalizes for the differences between the analytically derived average number of reaction products and their experimental counterparts. The presented method proves highly robust to noise in silico. Moreover, the model can explain a considerable amount of experimental results for a wide range of instrumentation settings. The implementation of the presented workflow, code-named ETDetective, is freely available under the two-clause BSD license.
Collapse
Affiliation(s)
| | | | - Błażej Miasojedow
- 1 Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw , Warsaw, Poland
| | - Frederik Lermyte
- 2 Biomolecular and Analytical Mass Spectrometry Group, Department of Chemistry, University of Antwerp , Antwerp, Belgium .,3 Centre for Proteomics, University of Antwerp , Antwerp, Belgium
| | - Dirk Valkenborg
- 3 Centre for Proteomics, University of Antwerp , Antwerp, Belgium .,4 Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University , Hasselt, Belgium
| | - Frank Sobott
- 2 Biomolecular and Analytical Mass Spectrometry Group, Department of Chemistry, University of Antwerp , Antwerp, Belgium
| | - Anna Gambin
- 1 Faculty of Mathematics, Informatics, and Mechanics, University of Warsaw , Warsaw, Poland
| |
Collapse
|
35
|
Shao W, Lam H. Tandem mass spectral libraries of peptides and their roles in proteomics research. MASS SPECTROMETRY REVIEWS 2017; 36:634-648. [PMID: 27403644 DOI: 10.1002/mas.21512] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 05/21/2016] [Indexed: 05/15/2023]
Abstract
Proteomics is a rapidly maturing field aimed at the high-throughput identification and quantification of all proteins in a biological system. The cornerstone of proteomic technology is tandem mass spectrometry of peptides resulting from the digestion of protein mixtures. The fragmentation pattern of each peptide ion is captured in its tandem mass spectrum, which enables its identification and acts as a fingerprint for the peptide. Spectral libraries are simply searchable collections of these fingerprints, which have taken on an increasingly prominent role in proteomic data analysis. This review describes the historical development of spectral libraries in proteomics, details the computational procedures behind library building and searching, surveys the current applications of spectral libraries, and discusses the outstanding challenges. © 2016 Wiley Periodicals, Inc. Mass Spec Rev 36:634-648, 2017.
Collapse
Affiliation(s)
- Wenguang Shao
- Department of Biology, Institute of Molecular Systems Biology, Eidgenössische Technische Hochschule (ETH) Zurich, Zurich, Switzerland
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Henry Lam
- Division of Biomedical Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
- Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| |
Collapse
|
36
|
Tschager T, Rösch S, Gillet L, Widmayer P. A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses. Algorithms Mol Biol 2017; 12:12. [PMID: 28603547 PMCID: PMC5464308 DOI: 10.1186/s13015-017-0104-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 04/19/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the idealde novopeptide sequencing problem: Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. The (real, noisy) de novo peptide sequencing problem therefore has a noisy input: a few of the prefix and suffix masses of the peptide are missing and a few other masses are given in addition. For this setting, we ask for an amino acid string that explains the given masses as accurately as possible. RESULTS Past approaches interpreted accuracy by searching for a string that explains as many masses as possible. We feel, however, that it is not only bad to not explain a mass that appears, but also to explain a mass that does not appear. We propose to minimize the symmetric difference between the set of given masses and the set of masses that the string explains. For this new optimization problem, we propose an efficient algorithm that computes both the best and the k best solutions. Proof-of-concept experiments on measurements of synthesized peptides show that our approach leads to better results compared to finding a string that explains as many given masses as possible. CONCLUSIONS We conclude that considering the symmetric difference as optimization goal can improve the identification rates for de novo peptide sequencing. A preliminary version of this work has been presented at WABI 2016.
Collapse
|
37
|
Cerqueira FR, Ricardo AM, de Paiva Oliveira A, Graber A, Baumgartner C. MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm. BMC Bioinformatics 2016; 17:472. [PMID: 28105913 PMCID: PMC5249030 DOI: 10.1186/s12859-016-1341-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND This work presents a machine learning strategy to increase sensitivity in tandem mass spectrometry (MS/MS) data analysis for peptide/protein identification. MS/MS yields thousands of spectra in a single run which are then interpreted by software. Most of these computer programs use a protein database to match peptide sequences to the observed spectra. The peptide-spectrum matches (PSMs) must also be assessed by computational tools since manual evaluation is not practicable. The target-decoy database strategy is largely used for error estimation in PSM assessment. However, in general, that strategy does not account for sensitivity. RESULTS In a previous study, we proposed the method MUMAL that applies an artificial neural network to effectively generate a model to classify PSMs using decoy hits with increased sensitivity. Nevertheless, the present approach shows that the sensitivity can be further improved with the use of a cost matrix associated with the learning algorithm. We also demonstrate that using a threshold selector algorithm for probability adjustment leads to more coherent probability values assigned to the PSMs. Our new approach, termed MUMAL2, provides a two-fold contribution to shotgun proteomics. First, the increase in the number of correctly interpreted spectra in the peptide level augments the chance of identifying more proteins. Second, the more appropriate PSM probability values that are produced by the threshold selector algorithm impact the protein inference stage performed by programs that take probabilities into account, such as ProteinProphet. Our experiments demonstrate that MUMAL2 reached around 15% of improvement in sensitivity compared to the best current method. Furthermore, the area under the ROC curve obtained was 0.93, demonstrating that the probabilities generated by our model are in fact appropriate. Finally, Venn diagrams comparing MUMAL2 with the best current method show that the number of exclusive peptides found by our method was nearly 4-fold higher, which directly impacts the proteome coverage. CONCLUSIONS The inclusion of a cost matrix and a probability threshold selector algorithm to the learning task further improves the target-decoy database analysis for identifying peptides, which optimally contributes to the challenging task of protein level identification, resulting in a powerful computational tool for shotgun proteomics.
Collapse
Affiliation(s)
| | - Adilson Mendes Ricardo
- Department of Informatics, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil.,Department of Computing and Construction, Centro Federal de Educação Tecnológica de Minas Gerais, Rua 19 de Novembro, 121, Timóteo, 35180-008, Brazil
| | - Alcione de Paiva Oliveira
- Department of Informatics, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil.,Department of Computer Science, University of Sheffield, Western Bank, S10 2TN, Sheffield, UK
| | - Armin Graber
- Research and Product Development of Genoptix, a Novartis company, 2110 Rutherford Rd, Carlsbad, 92008, USA
| | - Christian Baumgartner
- Institute of Health Care Engineering with European Notified Body of Medical Devices, Graz University of Technology, Stremayrgasse 16/II, Graz, A-8010, Austria
| |
Collapse
|
38
|
Williams EG, Wu Y, Jha P, Dubuis S, Blattmann P, Argmann CA, Houten SM, Amariuta T, Wolski W, Zamboni N, Aebersold R, Auwerx J. Systems proteomics of liver mitochondria function. Science 2016; 352:aad0189. [PMID: 27284200 PMCID: PMC10859670 DOI: 10.1126/science.aad0189] [Citation(s) in RCA: 208] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 04/15/2016] [Indexed: 12/14/2022]
Abstract
Recent improvements in quantitative proteomics approaches, including Sequential Window Acquisition of all Theoretical Mass Spectra (SWATH-MS), permit reproducible large-scale protein measurements across diverse cohorts. Together with genomics, transcriptomics, and other technologies, transomic data sets can be generated that permit detailed analyses across broad molecular interaction networks. Here, we examine mitochondrial links to liver metabolism through the genome, transcriptome, proteome, and metabolome of 386 individuals in the BXD mouse reference population. Several links were validated between genetic variants toward transcripts, proteins, metabolites, and phenotypes. Among these, sequence variants in Cox7a2l alter its protein's activity, which in turn leads to downstream differences in mitochondrial supercomplex formation. This data set demonstrates that the proteome can now be quantified comprehensively, serving as a key complement to transcriptomics, genomics, and metabolomics--a combination moving us forward in complex trait analysis.
Collapse
Affiliation(s)
- Evan G Williams
- Laboratory of Integrative and Systems Physiology, Interfaculty Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, CH-1015, Switzerland. These authors contributed equally to this work
| | - Yibo Wu
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, CH-8093, Switzerland. These authors contributed equally to this work
| | - Pooja Jha
- Laboratory of Integrative and Systems Physiology, Interfaculty Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, CH-1015, Switzerland
| | - Sébastien Dubuis
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, CH-8093, Switzerland
| | - Peter Blattmann
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, CH-8093, Switzerland
| | - Carmen A Argmann
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, Box 1498, New York, NY 10029, USA
| | - Sander M Houten
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, Box 1498, New York, NY 10029, USA
| | - Tiffany Amariuta
- Laboratory of Integrative and Systems Physiology, Interfaculty Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, CH-1015, Switzerland
| | - Witold Wolski
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, CH-8093, Switzerland
| | - Nicola Zamboni
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, CH-8093, Switzerland
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, CH-8093, Switzerland. Faculty of Science, University of Zurich, CH-8057, Switzerland.
| | - Johan Auwerx
- Laboratory of Integrative and Systems Physiology, Interfaculty Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, CH-1015, Switzerland.
| |
Collapse
|
39
|
Du YM, Hu Y, Xia Y, Ouyang Z. Power Normalization for Mass Spectrometry Data Analysis and Analytical Method Assessment. Anal Chem 2016; 88:3156-63. [PMID: 26882462 PMCID: PMC8135100 DOI: 10.1021/acs.analchem.5b04418] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Biomarker profiling using mass spectrometry plays an essential role in biological studies and is highly dependent on the data analysis for sample classification. In this study, we introduced power nomination of the mass spectra as a method for systematically altering the weights of peaks at different intensity levels. In combination with the use of support vector machine method (SVM), the impact on the sample classification has been characterized using data in four studies previously reported, including the distinctions of anomeric configurations of sugars, types of bacteria, stages of melanoma, and the types of breast cancer. Comprehensive analysis of the data with normalization at different power normalization index (PNI) was developed and analysis tools, including error-PNI plots, reference profiles, and error source profiles, were used to assess the potential of the analytical methods as well as to find the proper approaches to classify the samples.
Collapse
Affiliation(s)
- Y. Melodie Du
- Weldon School of Biomedical Engineering, Purdue University, 206 South Martin Jischke Drive, West Lafayette, Indiana 47907, United States
| | - Ye Hu
- Department of Nanomedicine, Houston Methodist Research Institute, 6565 Fannin Street, Houston, Texas 77030, United States
| | - Yu Xia
- Department of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| | - Zheng Ouyang
- Weldon School of Biomedical Engineering, Purdue University, 206 South Martin Jischke Drive, West Lafayette, Indiana 47907, United States
- Department of Chemistry, Purdue University, 560 Oval Drive, West Lafayette, Indiana 47907, United States
| |
Collapse
|
40
|
Bischoff R, Permentier H, Guryev V, Horvatovich P. Genomic variability and protein species — Improving sequence coverage for proteogenomics. J Proteomics 2016; 134:25-36. [DOI: 10.1016/j.jprot.2015.09.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2015] [Revised: 09/06/2015] [Accepted: 09/14/2015] [Indexed: 12/30/2022]
|
41
|
Lund RR, Leth-Larsen R, Caterino TD, Terp MG, Nissen J, Lænkholm AV, Jensen ON, Ditzel HJ. NADH-Cytochrome b5 Reductase 3 Promotes Colonization and Metastasis Formation and Is a Prognostic Marker of Disease-Free and Overall Survival in Estrogen Receptor-Negative Breast Cancer. Mol Cell Proteomics 2015; 14:2988-99. [PMID: 26351264 DOI: 10.1074/mcp.m115.050385] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Indexed: 01/11/2023] Open
Abstract
Metastasis is the main cause of cancer-related deaths and remains the most significant challenge to management of the disease. Metastases are established through a complex multistep process involving intracellular signaling pathways. To gain insight to proteins central to specific steps in metastasis formation, we used a metastasis cell line model that allows investigation of extravasation and colonization of circulating cancer cells to lungs in mice. Using stable isotopic labeling by amino acids in cell culture and subcellular fractionation, the nuclear, cytosol, and mitochondria proteomes were analyzed by LC-MS/MS, identifying a number of proteins that exhibited altered expression in isogenic metastatic versus nonmetastatic cancer cell lines, including NADH-cytochrome b5 reductase 3 (CYB5R3), l-lactate dehydrogenase A (LDHA), Niemann-pick c1 protein (NPC1), and nucleolar RNA helicase 2 (NRH2). The altered expression levels were validated at the protein and transcriptional levels, and analysis of breast cancer biopsies from two cohorts of patients demonstrated a significant correlation between high CYB5R3 expression and poor disease-free and overall survival in patients with estrogen receptor-negative tumors (DFS: p = .02, OS: p = .04). CYB5R3 gene knock-down using siRNA in metastasizing cells led to significantly decreased tumor burden in lungs when injected intravenously in immunodeficient mice. The cellular effects of CYB5R3 knock-down showed signaling alterations associated with extravasation, TGFβ and HIFα pathways, and apoptosis. The decreased apoptosis of CYB5R3 knock-down metastatic cancer cell lines was confirmed in functional assays. Our study reveals a central role of CYB5R3 in extravasation/colonization of cancer cells and demonstrates the ability of our quantitative, comparative proteomic approach to identify key proteins of specific important biological processes that may also prove useful as potential biomarkers of clinical relevance. MS data are available via ProteomeXchange with identifier PXD001391.
Collapse
Affiliation(s)
- Rikke R Lund
- From the ‡Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, J. B. Winsløws Vej 25.3, DK-5000 Odense C, Denmark
| | - Rikke Leth-Larsen
- From the ‡Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, J. B. Winsløws Vej 25.3, DK-5000 Odense C, Denmark
| | - Tina Di Caterino
- §Clinic of Pathological Anatomy and Cytology, Sydvestjysk Hospital, Finsensgade 35, DK-6700 Esbjerg, Denmark
| | - Mikkel G Terp
- From the ‡Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, J. B. Winsløws Vej 25.3, DK-5000 Odense C, Denmark
| | - Jeanette Nissen
- From the ‡Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, J. B. Winsløws Vej 25.3, DK-5000 Odense C, Denmark
| | - Anne-Vibeke Lænkholm
- ¶Department of Pathology, Slagelse Hospital, Ingemannsvej 18, DK-4200 Slagelse, Denmark
| | - Ole N Jensen
- ‖Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark
| | - Henrik J Ditzel
- From the ‡Department of Cancer and Inflammation Research, Institute of Molecular Medicine, University of Southern Denmark, J. B. Winsløws Vej 25.3, DK-5000 Odense C, Denmark.; **Department of Oncology, Odense University Hospital, Søndre Boulevard 29, DK-5000 Odense C, Denmark.
| |
Collapse
|
42
|
Goto R, Nakamura Y, Takami T, Sanke T, Tozuka Z. Quantitative LC-MS/MS Analysis of Proteins Involved in Metastasis of Breast Cancer. PLoS One 2015; 10:e0130760. [PMID: 26176947 PMCID: PMC4503764 DOI: 10.1371/journal.pone.0130760] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 05/22/2015] [Indexed: 12/29/2022] Open
Abstract
The purpose of this study was to develop quantitative liquid chromatography-tandem mass spectrometry (LC-MS/MS) methods for the analysis of proteins involved in metastasis of breast cancer for diagnosis and determining disease prognosis, as well as to further our understand of metastatic mechanisms. We have previously demonstrated that the protein type XIV collagen may be specifically expressed in metastatic tissues by two dimensional LC-MS/MS. In this study, we developed quantitative LC-MS/MS methods for type XIV collagen. Type XIV collagen was quantified by analyzing 2 peptides generated by digesting type XIV collagen using stable isotope-labeled peptides. The individual concentrations were equivalent between 2 different peptides of type XIV collagen by evaluation of imprecise transitions and using the best transition for the peptide concentration. The results indicated that type XIV collagen is highly expressed in metastatic tissues of patients with massive lymph node involvement compared to non-metastatic tissues. These findings were validated by quantitative real-time RT-PCR. Further studies on type XIV collagen are desired to verify its role as a prognostic factor and diagnosis marker for metastasis.
Collapse
Affiliation(s)
- Rieko Goto
- Department of Clinical Laboratory Medicine, Wakayama Medical University,Wakayama, Japan
- JCL Bioassay Corporation, Nishiwaki, Hyogo, Japan
- * E-mail:
| | - Yasushi Nakamura
- Department of Clinical Laboratory Medicine, Wakayama Medical University,Wakayama, Japan
| | | | - Tokio Sanke
- Department of Clinical Laboratory Medicine, Wakayama Medical University,Wakayama, Japan
| | - Zenzaburo Tozuka
- Graduate School of Pharmaceutical Science Osaka University, Suita, Osaka, Japan
| |
Collapse
|
43
|
Raulfs MDM, Breci L, Bernier M, Hamdy OM, Janiga A, Wysocki V, Poutsma JC. Investigations of the mechanism of the "proline effect" in tandem mass spectrometry experiments: the "pipecolic acid effect". JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2014; 25:1705-1715. [PMID: 25078156 DOI: 10.1007/s13361-014-0953-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Revised: 06/12/2014] [Accepted: 06/14/2014] [Indexed: 06/03/2023]
Abstract
The fragmentation behavior of a set of model peptides containing proline, its four-membered ring analog azetidine-2-carboxylic acid (Aze), its six-membered ring analog pipecolic acid (Pip), an acyclic secondary amine residue N-methyl-alanine (NMeA), and the D stereoisomers of Pro and Pip has been determined using collision-induced dissociation in ESI-tandem mass spectrometers. Experimental results for AAXAA, AVXLG, AAAXA, AGXGA, and AXPAA peptides are presented, where X represents Pro, Aze, Pip, or NMeA. Aze- and Pro-containing peptides fragment according to the well-established "proline effect" through selective cleavage of the amide bond N-terminal to the Aze/Pro residue to give yn (+) ions. In contrast, Pip- and NMA-fragment through a different mechanism, the "pipecolic acid effect," selectively at the amide bond C-terminal to the Pip/NMA residue to give bn (+) ions. Calculations of the relative basicities of various sites in model peptide molecules containing Aze, Pro, Pip, or NMeA indicate that whereas the "proline effect' can in part be rationalized by the increased basicity of the prolyl-amide site, the "pipecolic acid effect" cannot be justified through the basicity of the residue. Rather, the increased flexibility of the Pip and NMeA residues allow for conformations of the peptide for which transfer of the mobile proton to the amide site C-terminal to the Pip/NMeA becomes energetically favorable. This argument is supported by the differing results obtained for AAPAA versus AA(D-Pro)AA, a result that can best be explained by steric effects. Fragmentation of pentapeptides containing both Pro and Pip indicate that the "pipecolic acid effect" is stronger than the "proline effect."
Collapse
Affiliation(s)
- Mary Disa M Raulfs
- Department of Chemistry, The College of William and Mary, Williamsburg, VA, 23187, USA
| | | | | | | | | | | | | |
Collapse
|
44
|
Dong NP, Liang YZ, Xu QS, Mok DKW, Yi LZ, Lu HM, He M, Fan W. Prediction of Peptide Fragment Ion Mass Spectra by Data Mining Techniques. Anal Chem 2014; 86:7446-54. [DOI: 10.1021/ac501094m] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
| | | | | | - Daniel K. W. Mok
- Department
of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
- State Key Laboratory of Chinese Medicine and Molecular Pharmacology (Incubation), Shenzhen, 518000, P. R. China
| | - Lun-zhao Yi
- Yunnan
Food Safety Research Institute, Kunming University of Science and Technology, Kunming, 650500, P. R. China
| | | | - Min He
- Department of
Pharmaceutical Engineering,
School of Chemical Engineering, Xiangtan University, Xiangtan, 411105, P.R. China
| | - Wei Fan
- College of
Bioscience and Biotechnology, Hunan Agricultural University, Changsha, 410083, P. R. China
| |
Collapse
|
45
|
Pathway and network analysis in proteomics. J Theor Biol 2014; 362:44-52. [PMID: 24911777 DOI: 10.1016/j.jtbi.2014.05.031] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Revised: 05/15/2014] [Accepted: 05/21/2014] [Indexed: 12/14/2022]
Abstract
Proteomics is inherently a systems science that studies not only measured protein and their expressions in a cell, but also the interplay of proteins, protein complexes, signaling pathways, and network modules. There is a rapid accumulation of Proteomics data in recent years. However, Proteomics data are highly variable, with results sensitive to data preparation methods, sample condition, instrument types, and analytical methods. To address the challenge in Proteomics data analysis, we review current tools being developed to incorporate biological function and network topological information. We categorize these tools into four types: tools with basic functional information and little topological features (e.g., GO category analysis), tools with rich functional information and little topological features (e.g., GSEA), tools with basic functional information and rich topological features (e.g., Cytoscape), and tools with rich functional information and rich topological features (e.g., PathwayExpress). We first review the potential application of these tools to Proteomics; then we review tools that can achieve automated learning of pathway modules and features, and tools that help perform integrated network visual analytics.
Collapse
|
46
|
Smith R, Mathis AD, Ventura D, Prince JT. Proteomics, lipidomics, metabolomics: a mass spectrometry tutorial from a computer scientist's point of view. BMC Bioinformatics 2014; 15 Suppl 7:S9. [PMID: 25078324 PMCID: PMC4110734 DOI: 10.1186/1471-2105-15-s7-s9] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background For decades, mass spectrometry data has been analyzed to investigate a wide array of research interests, including disease diagnostics, biological and chemical theory, genomics, and drug development. Progress towards solving any of these disparate problems depends upon overcoming the common challenge of interpreting the large data sets generated. Despite interim successes, many data interpretation problems in mass spectrometry are still challenging. Further, though these challenges are inherently interdisciplinary in nature, the significant domain-specific knowledge gap between disciplines makes interdisciplinary contributions difficult. Results This paper provides an introduction to the burgeoning field of computational mass spectrometry. We illustrate key concepts, vocabulary, and open problems in MS-omics, as well as provide invaluable resources such as open data sets and key search terms and references. Conclusions This paper will facilitate contributions from mathematicians, computer scientists, and statisticians to MS-omics that will fundamentally improve results over existing approaches and inform novel algorithmic solutions to open problems.
Collapse
|
47
|
Liang SY, Wu SW, Pu TH, Chang FY, Khoo KH. An adaptive workflow coupled with Random Forest algorithm to identify intact N-glycopeptides detected from mass spectrometry. Bioinformatics 2014; 30:1908-16. [DOI: 10.1093/bioinformatics/btu139] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
|
48
|
Kelchtermans P, Bittremieux W, De Grave K, Degroeve S, Ramon J, Laukens K, Valkenborg D, Barsnes H, Martens L. Machine learning applications in proteomics research: how the past can boost the future. Proteomics 2014; 14:353-66. [PMID: 24323524 DOI: 10.1002/pmic.201300289] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/24/2013] [Accepted: 10/14/2013] [Indexed: 01/22/2023]
Abstract
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.
Collapse
Affiliation(s)
- Pieter Kelchtermans
- Department of Medical Protein Research, VIB, Ghent, Belgium; Faculty of Medicine and Health Sciences, Department of Biochemistry, Ghent University, Ghent, Belgium; Flemish Institute for Technological Research (VITO), Boeretang, Mol, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Meyer JG, Kim S, Maltby DA, Ghassemian M, Bandeira N, Komives EA. Expanding proteome coverage with orthogonal-specificity α-lytic proteases. Mol Cell Proteomics 2014; 13:823-35. [PMID: 24425750 DOI: 10.1074/mcp.m113.034710] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Bottom-up proteomics studies traditionally involve proteome digestion with a single protease, trypsin. However, trypsin alone does not generate peptides that encompass the entire proteome. Alternative proteases have been explored, but most have specificity for charged amino acid side chains. Therefore, additional proteases that improve proteome coverage through cleavage at sequences complementary to trypsin's may increase proteome coverage. We demonstrate the novel application of two proteases for bottom-up proteomics: wild type α-lytic protease (WaLP) and an active site mutant of WaLP, M190A α-lytic protease (MaLP). We assess several relevant factors, including MS/MS fragmentation, peptide length, peptide yield, and protease specificity. When data from separate digestions with trypsin, LysC, WaLP, and MaLP were combined, proteome coverage was increased by 101% relative to that achieved with trypsin digestion alone. To demonstrate how the gained sequence coverage can yield additional post-translational modification information, we show the identification of a number of novel phosphorylation sites in the Schizosaccharomyces pombe proteome and include an illustrative example from the protein MPD2 wherein two novel sites are identified, one in a tryptic peptide too short to identify and the other in a sequence devoid of tryptic sites. The specificity of WaLP and MaLP for aliphatic amino acid side chains was particularly valuable for coverage of membrane protein sequences, which increased 350% when the data from trypsin, LysC, WaLP, and MaLP were combined.
Collapse
Affiliation(s)
- Jesse G Meyer
- Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Dr., La Jolla, California 92093-0378
| | | | | | | | | | | |
Collapse
|
50
|
|