1
|
Yang Y, Fang Q. Prediction of glycopeptide fragment mass spectra by deep learning. Nat Commun 2024; 15:2448. [PMID: 38503734 PMCID: PMC10951270 DOI: 10.1038/s41467-024-46771-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 03/11/2024] [Indexed: 03/21/2024] Open
Abstract
Deep learning has achieved a notable success in mass spectrometry-based proteomics and is now emerging in glycoproteomics. While various deep learning models can predict fragment mass spectra of peptides with good accuracy, they cannot cope with the non-linear glycan structure in an intact glycopeptide. Herein, we present DeepGlyco, a deep learning-based approach for the prediction of fragment spectra of intact glycopeptides. Our model adopts tree-structured long-short term memory networks to process the glycan moiety and a graph neural network architecture to incorporate potential fragmentation pathways of a specific glycan structure. This feature is beneficial to model explainability and differentiation ability of glycan structural isomers. We further demonstrate that predicted spectral libraries can be used for data-independent acquisition glycoproteomics as a supplement for library completeness. We expect that this work will provide a valuable deep learning resource for glycoproteomics.
Collapse
Affiliation(s)
- Yi Yang
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, 311200, China.
| | - Qun Fang
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, 311200, China.
- Department of Chemistry, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
2
|
Yi X, Wen B, Ji S, Saltzman AB, Jaehnig EJ, Lei JT, Gao Q, Zhang B. Deep Learning Prediction Boosts Phosphoproteomics-Based Discoveries Through Improved Phosphopeptide Identification. Mol Cell Proteomics 2024; 23:100707. [PMID: 38154692 PMCID: PMC10831110 DOI: 10.1016/j.mcpro.2023.100707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 11/06/2023] [Accepted: 12/23/2023] [Indexed: 12/30/2023] Open
Abstract
Shotgun phosphoproteomics enables high-throughput analysis of phosphopeptides in biological samples. One of the primary challenges associated with this technology is the relatively low rate of phosphopeptide identification during data analysis. This limitation hampers the full realization of the potential offered by shotgun phosphoproteomics. Here we present DeepRescore2, a computational workflow that leverages deep learning-based retention time and fragment ion intensity predictions to improve phosphopeptide identification and phosphosite localization. Using a state-of-the-art computational workflow as a benchmark, DeepRescore2 increases the number of correctly identified peptide-spectrum matches by 17% in a synthetic dataset and identifies 19% to 46% more phosphopeptides in biological datasets. In a liver cancer dataset, 30% of the significantly altered phosphosites between tumor and normal tissues and 60% of the prognosis-associated phosphosites identified from DeepRescore2-processed data could not be identified based on the state-of-the-art workflow. Notably, DeepRescore2-processed data uniquely identifies EGFR hyperactivation as a new target in poor-prognosis liver cancer, which is validated experimentally. Integration of deep learning prediction in DeepRescore2 improves phosphopeptide identification and facilitates biological discoveries.
Collapse
Affiliation(s)
- Xinpei Yi
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Bo Wen
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Shuyi Ji
- Department of Liver Surgery and Transplantation, Liver Cancer Institute, Zhongshan Hospital and Key Laboratory of Carcinogenesis and Cancer Invasion of the Ministry of China, Fudan University, Shanghai, China
| | - Alexander B Saltzman
- Mass Spectrometry Proteomics Core, Advanced Technology Cores, Baylor College of Medicine, Houston, Texas, USA
| | - Eric J Jaehnig
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Jonathan T Lei
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Qiang Gao
- Department of Liver Surgery and Transplantation, Liver Cancer Institute, Zhongshan Hospital and Key Laboratory of Carcinogenesis and Cancer Invasion of the Ministry of China, Fudan University, Shanghai, China
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
3
|
Lou R, Shui W. Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Mol Cell Proteomics 2024; 23:100712. [PMID: 38182042 PMCID: PMC10847697 DOI: 10.1016/j.mcpro.2024.100712] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/27/2023] [Accepted: 01/02/2024] [Indexed: 01/07/2024] Open
Abstract
Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. This review provides a comprehensive overview of recent advances in both the experimental and computational methods for DIA proteomics, from data acquisition schemes to analysis strategies and software tools. DIA acquisition schemes are categorized based on the design of precursor isolation windows, highlighting wide-window, overlapping-window, narrow-window, scanning quadrupole-based, and parallel accumulation-serial fragmentation-enhanced DIA methods. For DIA data analysis, major strategies are classified into spectrum reconstruction, sequence-based search, library-based search, de novo sequencing, and sequencing-independent approaches. A wide array of software tools implementing these strategies are reviewed, with details on their overall workflows and scoring approaches at different steps. The generation and optimization of spectral libraries, which are critical resources for DIA analysis, are also discussed. Publicly available benchmark datasets covering global proteomics and phosphoproteomics are summarized to facilitate performance evaluation of various software tools and analysis workflows. Continued advances and synergistic developments of versatile components in DIA workflows are expected to further enhance the power of DIA-based proteomics.
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
4
|
Higgins L, Gerdes H, Cutillas PR. Principles of phosphoproteomics and applications in cancer research. Biochem J 2023; 480:403-420. [PMID: 36961757 PMCID: PMC10212522 DOI: 10.1042/bcj20220220] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 02/24/2023] [Accepted: 02/28/2023] [Indexed: 03/25/2023]
Abstract
Phosphorylation constitutes the most common and best-studied regulatory post-translational modification in biological systems and archetypal signalling pathways driven by protein and lipid kinases are disrupted in essentially all cancer types. Thus, the study of the phosphoproteome stands to provide unique biological information on signalling pathway activity and on kinase network circuitry that is not captured by genetic or transcriptomic technologies. Here, we discuss the methods and tools used in phosphoproteomics and highlight how this technique has been used, and can be used in the future, for cancer research. Challenges still exist in mass spectrometry phosphoproteomics and in the software required to provide biological information from these datasets. Nevertheless, improvements in mass spectrometers with enhanced scan rates, separation capabilities and sensitivity, in biochemical methods for sample preparation and in computational pipelines are enabling an increasingly deep analysis of the phosphoproteome, where previous bottlenecks in data acquisition, processing and interpretation are being relieved. These powerful hardware and algorithmic innovations are not only providing exciting new mechanistic insights into tumour biology, from where new drug targets may be derived, but are also leading to the discovery of phosphoproteins as mediators of drug sensitivity and resistance and as classifiers of disease subtypes. These studies are, therefore, uncovering phosphoproteins as a new generation of disruptive biomarkers to improve personalised anti-cancer therapies.
Collapse
Affiliation(s)
- Luke Higgins
- Cell Signaling and Proteomics Group, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, U.K
| | - Henry Gerdes
- Cell Signaling and Proteomics Group, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, U.K
| | - Pedro R. Cutillas
- Cell Signaling and Proteomics Group, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, U.K
- Alan Turing Institute, The British Library, London, U.K
- Digital Environment Research Institute, Queen Mary University of London, London, U.K
| |
Collapse
|
5
|
Yi X, Wen B, Ji S, Saltzman A, Jaehnig EJ, Lei JT, Gao Q, Zhang B. Deep learning prediction boosts phosphoproteomics-based discoveries through improved phosphopeptide identification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.11.523329. [PMID: 36711982 PMCID: PMC9882090 DOI: 10.1101/2023.01.11.523329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Shotgun phosphoproteomics enables high-throughput analysis of phosphopeptides in biological samples, but low phosphopeptide identification rate in data analysis limits the potential of this technology. Here we present DeepRescore2, a computational workflow that leverages deep learning-based retention time and fragment ion intensity predictions to improve phosphopeptide identification and phosphosite localization. Using a state-of-the-art computational workflow as a benchmark, DeepRescore2 increases the number of correctly identified peptide-spectrum matches by 17% in a synthetic dataset and identifies 19%-46% more phosphopeptides in biological datasets. In a liver cancer dataset, 30% of the significantly altered phosphosites between tumor and normal tissues and 60% of the prognosis-associated phosphosites identified from DeepRescore2-processed data could not be identified based on the state-of-the-art workflow. Notably, DeepRescore2-processed data uniquely identifies EGFR hyperactivation as a new target in poor-prognosis liver cancer, which is validated experimentally. Integration of deep learning prediction in DeepRescore2 improves phosphopeptide identification and facilitates biological discoveries.
Collapse
|
6
|
Cox J. Prediction of peptide mass spectral libraries with machine learning. Nat Biotechnol 2023; 41:33-43. [PMID: 36008611 DOI: 10.1038/s41587-022-01424-w] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 07/11/2022] [Indexed: 01/21/2023]
Abstract
The recent development of machine learning methods to identify peptides in complex mass spectrometric data constitutes a major breakthrough in proteomics. Longstanding methods for peptide identification, such as search engines and experimental spectral libraries, are being superseded by deep learning models that allow the fragmentation spectra of peptides to be predicted from their amino acid sequence. These new approaches, including recurrent neural networks and convolutional neural networks, use predicted in silico spectral libraries rather than experimental libraries to achieve higher sensitivity and/or specificity in the analysis of proteomics data. Machine learning is galvanizing applications that involve large search spaces, such as immunopeptidomics and proteogenomics. Current challenges in the field include the prediction of spectra for peptides with post-translational modifications and for cross-linked pairs of peptides. Permeation of machine-learning-based spectral prediction into search engines and spectrum-centric data-independent acquisition workflows for diverse peptide classes and measurement conditions will continue to push sensitivity and dynamic range in proteomics applications in the coming years.
Collapse
Affiliation(s)
- Jürgen Cox
- Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany.
- Department of Biological and Medical Psychology, University of Bergen, Bergen, Norway.
| |
Collapse
|
7
|
Zhang Y, Dreyer B, Govorukhina N, Heberle AM, Končarević S, Krisp C, Opitz CA, Pfänder P, Bischoff R, Schlüter H, Kwiatkowski M, Thedieck K, Horvatovich PL. Comparative Assessment of Quantification Methods for Tumor Tissue Phosphoproteomics. Anal Chem 2022; 94:10893-10906. [PMID: 35880733 PMCID: PMC9366746 DOI: 10.1021/acs.analchem.2c01036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
![]()
With increasing sensitivity and accuracy in mass spectrometry,
the tumor phosphoproteome is getting into reach. However, the selection
of quantitation techniques best-suited to the biomedical question
and diagnostic requirements remains a trial and error decision as
no study has directly compared their performance for tumor tissue
phosphoproteomics. We compared label-free quantification (LFQ), spike-in-SILAC
(stable isotope labeling by amino acids in cell culture), and tandem
mass tag (TMT) isobaric tandem mass tags technology for quantitative
phosphosite profiling in tumor tissue. Compared to the classic SILAC
method, spike-in-SILAC is not limited to cell culture analysis, making
it suitable for quantitative analysis of tumor tissue samples. TMT
offered the lowest accuracy and the highest precision and robustness
toward different phosphosite abundances and matrices. Spike-in-SILAC
offered the best compromise between these features but suffered from
a low phosphosite coverage. LFQ offered the lowest precision but the
highest number of identifications. Both spike-in-SILAC and LFQ presented
susceptibility to matrix effects. Match between run (MBR)-based analysis
enhanced the phosphosite coverage across technical replicates in LFQ
and spike-in-SILAC but further reduced the precision and robustness
of quantification. The choice of quantitative methodology is critical
for both study design such as sample size in sample groups and quantified
phosphosites and comparison of published cancer phosphoproteomes.
Using ovarian cancer tissue as an example, our study builds a resource
for the design and analysis of quantitative phosphoproteomic studies
in cancer research and diagnostics.
Collapse
Affiliation(s)
- Yang Zhang
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, 9713 AV Groningen, The Netherlands.,Institute of Biochemistry and Center for Molecular Biosciences Innsbruck, University of Innsbruck, 6020 Innsbruck, Austria.,Laboratory of Pediatrics, Section Systems Medicine of Metabolism and Signaling, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, The Netherlands
| | - Benjamin Dreyer
- Section/Core Facility Mass Spectrometry and Proteomics, Institute of Clinical Chemistry and Laboratory Medicine, University Medical Center Hamburg-Eppendorf, Martinistraße 52, 20246 Hamburg, Germany
| | - Natalia Govorukhina
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, 9713 AV Groningen, The Netherlands
| | - Alexander M Heberle
- Institute of Biochemistry and Center for Molecular Biosciences Innsbruck, University of Innsbruck, 6020 Innsbruck, Austria.,Laboratory of Pediatrics, Section Systems Medicine of Metabolism and Signaling, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, The Netherlands
| | - Saša Končarević
- Proteome Sciences R&D GmbH & Co. KG, Altenhöferallee 3, 60438 Frankfurt/Main, Germany
| | - Christoph Krisp
- Section/Core Facility Mass Spectrometry and Proteomics, Institute of Clinical Chemistry and Laboratory Medicine, University Medical Center Hamburg-Eppendorf, Martinistraße 52, 20246 Hamburg, Germany
| | - Christiane A Opitz
- Metabolic Crosstalk in Cancer, German Consortium of Translational Cancer Research (DKTK), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.,Department of Neurology, National Center for Tumor Diseases, University Hospital Heidelberg, 69120 Heidelberg, Germany
| | - Pauline Pfänder
- Metabolic Crosstalk in Cancer, German Consortium of Translational Cancer Research (DKTK), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany.,Faculty of Bioscience, Heidelberg University, 69117 Heidelberg, Germany
| | - Rainer Bischoff
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, 9713 AV Groningen, The Netherlands
| | - Hartmut Schlüter
- Section/Core Facility Mass Spectrometry and Proteomics, Institute of Clinical Chemistry and Laboratory Medicine, University Medical Center Hamburg-Eppendorf, Martinistraße 52, 20246 Hamburg, Germany
| | - Marcel Kwiatkowski
- Institute of Biochemistry and Center for Molecular Biosciences Innsbruck, University of Innsbruck, 6020 Innsbruck, Austria.,Department of Molecular Pharmacology, Groningen Research Institute for Pharmacy, University of Groningen, Groningen 9700 AD, The Netherlands.,Groningen Research Institute for Asthma and COPD, University Medical Center Groningen, University of Groningen, Groningen 9700 AD, The Netherlands
| | - Kathrin Thedieck
- Institute of Biochemistry and Center for Molecular Biosciences Innsbruck, University of Innsbruck, 6020 Innsbruck, Austria.,Laboratory of Pediatrics, Section Systems Medicine of Metabolism and Signaling, University of Groningen, University Medical Center Groningen, 9713 AV Groningen, The Netherlands.,Department of Neuroscience, School of Medicine and Health Sciences, Carl von Ossietzky University Oldenburg, 26129 Oldenburg, Germany
| | - Peter L Horvatovich
- Department of Analytical Biochemistry, Groningen Research Institute of Pharmacy, University of Groningen, 9713 AV Groningen, The Netherlands
| |
Collapse
|
8
|
Abstract
There are probably no biological samples that did more to spur interest in proteomics than serum and plasma. The belief was that comparing the proteomes of these samples obtained from healthy and disease-affected individuals would lead to biomarkers that could be used to diagnose conditions such as cancer. While the continuing development of mass spectrometers with greater sensitivity and resolution has been invaluable, the invention of strategic strategies to separate circulatory proteins has been just as critical. Novel and creative separation techniques were required because serum and plasma probably have the greatest dynamic range of protein concentration of any biological sample. The concentrations of circulating proteins can range over twelve orders of magnitude, making it a challenge to identify low-abundance proteins where the bulk of the useful biomarkers are believed to exist. The major goals of this article are to (i) provide an historical perspective on the rapid development of serum and plasma proteomics; (ii) describe various separation techniques that have made obtaining an in-depth view of the proteome of these biological samples possible; and (iii) describe applications where serum and plasma proteomics have been employed to discover potential biomarkers for pathological conditions.
Collapse
|
9
|
Urban J. A review on recent trends in the phosphoproteomics workflow. From sample preparation to data analysis. Anal Chim Acta 2022; 1199:338857. [PMID: 35227377 DOI: 10.1016/j.aca.2021.338857] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 07/14/2021] [Accepted: 07/15/2021] [Indexed: 12/12/2022]
|
10
|
Yang Y, Lin L, Qiao L. Deep learning approaches for data-independent acquisition proteomics. Expert Rev Proteomics 2021; 18:1031-1043. [PMID: 34918987 DOI: 10.1080/14789450.2021.2020654] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
INTRODUCTION Data-independent acquisition (DIA) is an emerging technology for large-scale proteomic studies. DIA data analysis methods are evolving rapidly, and deep learning has cut a conspicuous figure in this field. AREAS COVERED This review discusses and provides an overview of the deep learning methods that are used for DIA data analysis, including spectral library prediction, feature scoring, and statistical control in peptide-centric analysis, as well as de novo peptide sequencing. Literature searches were performed for articles, including preprints, up to December 2021 from PubMed, Scopus, and Web of Science databases. EXPERT OPINION While spectral library prediction has broken through the limitation on proteome coverage of experimental libraries, the statistical burden due to the large query space is the remaining challenge of utilizing proteome-wide predicted libraries. Analysis of post-translational modifications is another promising direction of deep learning-based DIA methods.
Collapse
Affiliation(s)
- Yi Yang
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| | - Ling Lin
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| | - Liang Qiao
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| |
Collapse
|
11
|
Lou R, Liu W, Li R, Li S, He X, Shui W. DeepPhospho accelerates DIA phosphoproteome profiling through in silico library generation. Nat Commun 2021; 12:6685. [PMID: 34795227 PMCID: PMC8602247 DOI: 10.1038/s41467-021-26979-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 10/26/2021] [Indexed: 12/27/2022] Open
Abstract
Phosphoproteomics integrating data-independent acquisition (DIA) enables deep phosphoproteome profiling with improved quantification reproducibility and accuracy compared to data-dependent acquisition (DDA)-based phosphoproteomics. DIA data mining heavily relies on a spectral library that in most cases is built on DDA analysis of the same sample. Construction of this project-specific DDA library impairs the analytical throughput, limits the proteome coverage, and increases the sample size for DIA phosphoproteomics. Herein we introduce a deep neural network, DeepPhospho, which conceptually differs from previous deep learning models to achieve accurate predictions of LC-MS/MS data for phosphopeptides. By leveraging in silico libraries generated by DeepPhospho, we establish a DIA workflow for phosphoproteome profiling which involves DIA data acquisition and data mining with DeepPhospho predicted libraries, thus circumventing the need of DDA library construction. Our DeepPhospho-empowered workflow substantially expands the phosphoproteome coverage while maintaining high quantification performance, which leads to the discovery of more signaling pathways and regulated kinases in an EGF signaling study than the DDA library-based approach. DeepPhospho is provided as a web server as well as an offline app to facilitate user access to model training, predictions and library generation.
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai, 201210, China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Weizhen Liu
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Rongjie Li
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Shanshan Li
- iHuman Institute, ShanghaiTech University, Shanghai, 201210, China
| | - Xuming He
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
- Shanghai Engineering Research Center of Intelligent Vision and Imaging, Shanghai, 201210, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai, 201210, China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
| |
Collapse
|
12
|
Mann M, Kumar C, Zeng WF, Strauss MT. Artificial intelligence for proteomics and biomarker discovery. Cell Syst 2021; 12:759-770. [PMID: 34411543 DOI: 10.1016/j.cels.2021.06.006] [Citation(s) in RCA: 150] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/07/2021] [Accepted: 06/28/2021] [Indexed: 12/14/2022]
Abstract
There is an avalanche of biomedical data generation and a parallel expansion in computational capabilities to analyze and make sense of these data. Starting with genome sequencing and widely employed deep sequencing technologies, these trends have now taken hold in all omics disciplines and increasingly call for multi-omics integration as well as data interpretation by artificial intelligence technologies. Here, we focus on mass spectrometry (MS)-based proteomics and describe how machine learning and, in particular, deep learning now predicts experimental peptide measurements from amino acid sequences alone. This will dramatically improve the quality and reliability of analytical workflows because experimental results should agree with predictions in a multi-dimensional data landscape. Machine learning has also become central to biomarker discovery from proteomics data, which now starts to outperform existing best-in-class assays. Finally, we discuss model transparency and explainability and data privacy that are required to deploy MS-based biomarkers in clinical settings.
Collapse
Affiliation(s)
- Matthias Mann
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | - Chanchal Kumar
- Translational Science & Experimental Medicine, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden.
| | - Wen-Feng Zeng
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | | |
Collapse
|
13
|
Chen ZL, Mao PZ, Zeng WF, Chi H, He SM. pDeepXL: MS/MS Spectrum Prediction for Cross-Linked Peptide Pairs by Deep Learning. J Proteome Res 2021; 20:2570-2582. [PMID: 33821641 DOI: 10.1021/acs.jproteome.0c01004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In cross-linking mass spectrometry, the identification of cross-linked peptide pairs heavily relies on the ability of a database search engine to measure the similarities between experimental and theoretical MS/MS spectra. However, the lack of accurate ion intensities in theoretical spectra impairs the performance of search engines, in particular, on proteome scales. Here we introduce pDeepXL, a deep neural network to predict MS/MS spectra of cross-linked peptide pairs. To train pDeepXL, we used the transfer-learning technique because it facilitated the training with limited benchmark data of cross-linked peptide pairs. Test results on more than ten data sets showed that pDeepXL accurately predicted the spectra of both noncleavable DSS/BS3/Leiker cross-linked peptide pairs (>80% of predicted spectra have Pearson's r values higher than 0.9) and cleavable DSSO/DSBU cross-linked peptide pairs (>75% of predicted spectra have Pearson's r values higher than 0.9). pDeepXL also achieved the accurate prediction on unseen data sets using an online fine-tuning technique. Lastly, integrating pDeepXL into a database search engine increased the number of identified cross-link spectra by 18% on average.
Collapse
Affiliation(s)
- Zhen-Lin Chen
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Peng-Zhi Mao
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wen-Feng Zeng
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hao Chi
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Si-Min He
- Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
14
|
Hansen FM, Tanzer MC, Brüning F, Bludau I, Stafford C, Schulman BA, Robles MS, Karayel O, Mann M. Data-independent acquisition method for ubiquitinome analysis reveals regulation of circadian biology. Nat Commun 2021; 12:254. [PMID: 33431886 PMCID: PMC7801436 DOI: 10.1038/s41467-020-20509-1] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 11/27/2020] [Indexed: 12/11/2022] Open
Abstract
Protein ubiquitination is involved in virtually all cellular processes. Enrichment strategies employing antibodies targeting ubiquitin-derived diGly remnants combined with mass spectrometry (MS) have enabled investigations of ubiquitin signaling at a large scale. However, so far the power of data independent acquisition (DIA) with regards to sensitivity in single run analysis and data completeness have not yet been explored. Here, we develop a sensitive workflow combining diGly antibody-based enrichment and optimized Orbitrap-based DIA with comprehensive spectral libraries together containing more than 90,000 diGly peptides. This approach identifies 35,000 diGly peptides in single measurements of proteasome inhibitor-treated cells - double the number and quantitative accuracy of data dependent acquisition. Applied to TNF signaling, the workflow comprehensively captures known sites while adding many novel ones. An in-depth, systems-wide investigation of ubiquitination across the circadian cycle uncovers hundreds of cycling ubiquitination sites and dozens of cycling ubiquitin clusters within individual membrane protein receptors and transporters, highlighting new connections between metabolism and circadian regulation.
Collapse
Affiliation(s)
- Fynn M Hansen
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Maria C Tanzer
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Franziska Brüning
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
- Institute of Medical Psychology, Faculty of Medicine, LMU, Munich, Germany
| | - Isabell Bludau
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Che Stafford
- Gene Center and Department of Biochemistry, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Brenda A Schulman
- Department of Molecular Machines and Signaling, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Maria S Robles
- Institute of Medical Psychology, Faculty of Medicine, LMU, Munich, Germany.
| | - Ozge Karayel
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | - Matthias Mann
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| |
Collapse
|