1
|
Liu Z, Liu P, Sun Y, Nie Z, Zhang X, Zhang Y, Chen Y, Guo T. DIA-BERT: pre-trained end-to-end transformer models for enhanced DIA proteomics data analysis. Nat Commun 2025; 16:3530. [PMID: 40229248 PMCID: PMC11997033 DOI: 10.1038/s41467-025-58866-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Accepted: 04/04/2025] [Indexed: 04/16/2025] Open
Abstract
Data-independent acquisition mass spectrometry (DIA-MS) has become increasingly pivotal in quantitative proteomics. In this study, we present DIA-BERT, a software tool that harnesses a transformer-based pre-trained artificial intelligence (AI) model for analyzing DIA proteomics data. The identification model was trained using over 276 million high-quality peptide precursors extracted from existing DIA-MS files, while the quantification model was trained on 34 million peptide precursors from synthetic DIA-MS files. When compared to DIA-NN, DIA-BERT demonstrated a 51% increase in protein identifications and 22% more peptide precursors on average across five human cancer sample sets (cervical cancer, pancreatic adenocarcinoma, myosarcoma, gallbladder cancer, and gastric carcinoma), achieving high quantitative accuracy. This study underscores the potential of leveraging pre-trained models and synthetic datasets to enhance the analysis of DIA proteomics.
Collapse
Affiliation(s)
- Zhiwei Liu
- Affiliated Hangzhou First People's Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake University, Hangzhou, Zhejiang Province, China
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China
- Research Center for Industries of the Future, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China
| | - Pu Liu
- Westlake Omics (Hangzhou) Biotechnology Co., Ltd., Hangzhou, China
| | - Yingying Sun
- Affiliated Hangzhou First People's Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake University, Hangzhou, Zhejiang Province, China
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China
- Research Center for Industries of the Future, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China
| | - Zongxiang Nie
- Affiliated Hangzhou First People's Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake University, Hangzhou, Zhejiang Province, China
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China
- Research Center for Industries of the Future, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China
| | - Xiaofan Zhang
- Affiliated Hangzhou First People's Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake University, Hangzhou, Zhejiang Province, China
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China
- Research Center for Industries of the Future, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China
| | - Yuqi Zhang
- Affiliated Hangzhou First People's Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake University, Hangzhou, Zhejiang Province, China
- Research Center for Industries of the Future, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China
| | - Yi Chen
- Affiliated Hangzhou First People's Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake University, Hangzhou, Zhejiang Province, China.
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China.
- Research Center for Industries of the Future, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China.
| | - Tiannan Guo
- Affiliated Hangzhou First People's Hospital, State Key Laboratory of Medical Proteomics, School of Medicine, Westlake University, Hangzhou, Zhejiang Province, China.
- Westlake Center for Intelligent Proteomics, Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang Province, China.
- Research Center for Industries of the Future, School of Life Sciences, Westlake University, Hangzhou, Zhejiang Province, China.
- Westlake Omics (Hangzhou) Biotechnology Co., Ltd., Hangzhou, China.
| |
Collapse
|
2
|
Ha A, Woolman M, Waas M, Govindarajan M, Kislinger T. Recent implementations of data-independent acquisition for cancer biomarker discovery in biological fluids. Expert Rev Proteomics 2025; 22:163-176. [PMID: 40227112 DOI: 10.1080/14789450.2025.2491355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Revised: 03/26/2025] [Accepted: 04/06/2025] [Indexed: 04/15/2025]
Abstract
INTRODUCTION Cancer is the second-leading cause of death worldwide and accurate biomarkers for early detection and disease monitoring are needed to improve outcomes. Biological fluids, such as blood and urine, are ideal samples for biomarker measurements as they can be routinely collected with relatively minimally invasive methods. However, proteomics analysis of fluids has been a challenge due to the high dynamic range of its protein content. Advances in data-independent acquisition (DIA) mass spectrometry-based proteomics can address some of the technical challenges in the analysis of biofluids, thus enabling the ability for mass spectrometry to propel large-scale biomarker discovery. AREAS COVERED We reviewed principles of DIA and its recent applications in cancer biomarker discovery using biofluids. We summarized DIA proteomics studies using biological fluids in the context of cancer research over the past decade, and provided a comprehensive overview of the benefits and challenges of DIA-MS. EXPERT OPINION Various studies showed the potential of DIA-MS in identifying putative cancer biomarkers in a high-throughput manner. However, the lack of proper study design and standardization of methods across platforms still needs to be addressed to fully utilize the benefits of DIA-MS to accelerate the biomarker discovery and verification processes.
Collapse
Affiliation(s)
- Annie Ha
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Michael Woolman
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Matthew Waas
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Meinusha Govindarajan
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Thomas Kislinger
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| |
Collapse
|
3
|
Sun C, Zhang W, Zhou M, Myu M, Xu W. Full Window Data-Independent Acquisition Method for Deeper Top-Down Proteomics. Anal Chem 2025; 97:6620-6628. [PMID: 40119838 DOI: 10.1021/acs.analchem.4c06471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2025]
Abstract
Top-down proteomics (TDP) is emerging as a vital tool for the comprehensive characterization of proteoforms. However, as its core technology, top-down mass spectrometry (TDMS) still faces significant analytical challenges. While data-independent acquisition (DIA) has revolutionized bottom-up proteomics and metabolomics, they are rarely employed in TDP. The unique feature of protein ions in an electrospray mass spectrum as well as the data complexity require the development of new DIA strategies. This study introduces a machine learning-assisted Full Window DIA (FW-DIA) method that eliminates precursor ion isolation, making it compatible with a wide range of commercial mass spectrometers. Moreover, FW-DIA leverages all precursor protein ions to generate high-quality tandem mass spectra, enhancing signal intensities by ∼50-fold and protein sequence coverage by 3-fold in a modular protein analysis. The method was successfully applied to the analysis of a five-protein mixture under native conditions and Escherichia coli ribosomal proteoform characterization.
Collapse
Affiliation(s)
- Chen Sun
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Wenjing Zhang
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Mowei Zhou
- Department of Chemistry, Zhejiang University, Hangzhou 310058, China
| | - Martin Myu
- Institute of Food Safety, Chinese Academy of Inspection and Quarantine, Beijing 100176, China
| | - Wei Xu
- School of Medical Technology, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
4
|
Liu Y, Mei L, Liang C, Zhong CQ, Tong M, Yu R. Cross-Run Hybrid Features Improve the Identification of Data-Independent Acquisition Proteomics. ACS OMEGA 2024; 9:46362-46372. [PMID: 39583733 PMCID: PMC11579728 DOI: 10.1021/acsomega.4c07398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 09/25/2024] [Accepted: 10/02/2024] [Indexed: 11/26/2024]
Abstract
The analysis of data-independent acquisition (DIA) mass spectrometry data is crucial for comprehensive proteomics studies. However, traditional single-run methods often fall short in terms of identification depth and consistency. We present HFDiscrim, a specialized multirun DIA analysis tool aimed at enhancing the depth and consistency of reliable peptide identifications of DIA analysis tools. HFDiscrim was extensively benchmarked on multiple data sets, including the MCB data set, the ccRCC data set, and a three-species benchmark mixture. Compared to PyProphet, HFDiscrim identified 22.04% more precursors, 19.1% more peptides, and 13.2% more proteins while maintaining a controllable false discovery rate. Furthermore, HFDiscrim demonstrated higher identification rates and improved reproducibility across multiple runs. HFDiscrim is publicly available at https://github.com/yachliu/HFDiscrim.
Collapse
Affiliation(s)
- Yachen Liu
- School
of Informatics, Xiamen University, Xiamen, Fujian 361000, China
- National
Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Longfei Mei
- School
of Informatics, Xiamen University, Xiamen, Fujian 361000, China
| | - Chenyu Liang
- National
Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Chuan-Qi Zhong
- School
of Life Sciences, Xiamen University, Xiamen, Fujian 361102, China
| | - Mengsha Tong
- National
Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
- School
of Life Sciences, Xiamen University, Xiamen, Fujian 361102, China
| | - Rongshan Yu
- School
of Informatics, Xiamen University, Xiamen, Fujian 361000, China
- National
Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
- Aginome
Scientific, Xiamen, Fujian 361005, China
| |
Collapse
|
5
|
He Q, Guo H, Li Y, He G, Li X, Shuai J. SeFilter-DIA: Squeeze-and-Excitation Network for Filtering High-Confidence Peptides of Data-Independent Acquisition Proteomics. Interdiscip Sci 2024; 16:579-592. [PMID: 38472692 DOI: 10.1007/s12539-024-00611-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/12/2024] [Accepted: 01/21/2024] [Indexed: 03/14/2024]
Abstract
Mass spectrometry is crucial in proteomics analysis, particularly using Data Independent Acquisition (DIA) for reliable and reproducible mass spectrometry data acquisition, enabling broad mass-to-charge ratio coverage and high throughput. DIA-NN, a prominent deep learning software in DIA proteome analysis, generates peptide results but may include low-confidence peptides. Conventionally, biologists have to manually screen peptide fragment ion chromatogram peaks (XIC) for identifying high-confidence peptides, a time-consuming and subjective process prone to variability. In this study, we introduce SeFilter-DIA, a deep learning algorithm, aiming at automating the identification of high-confidence peptides. Leveraging compressed excitation neural network and residual network models, SeFilter-DIA extracts XIC features and effectively discerns between high and low-confidence peptides. Evaluation of the benchmark datasets demonstrates SeFilter-DIA achieving 99.6% AUC on the test set and 97% for other performance indicators. Furthermore, SeFilter-DIA is applicable for screening peptides with phosphorylation modifications. These results demonstrate the potential of SeFilter-DIA to replace manual screening, providing an efficient and objective approach for high-confidence peptide identification while mitigating associated limitations.
Collapse
Affiliation(s)
- Qingzu He
- Department of Physics, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China
| | - Huan Guo
- Department of Physics, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China
| | - Yulin Li
- Department of Physics, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China
| | - Guoqiang He
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China
| | - Xiang Li
- Department of Physics, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361005, China.
| | - Jianwei Shuai
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, 325001, China.
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou, 325001, China.
| |
Collapse
|
6
|
Fröhlich K, Fahrner M, Brombacher E, Seredynska A, Maldacker M, Kreutz C, Schmidt A, Schilling O. Data-Independent Acquisition: A Milestone and Prospect in Clinical Mass Spectrometry-Based Proteomics. Mol Cell Proteomics 2024; 23:100800. [PMID: 38880244 PMCID: PMC11380018 DOI: 10.1016/j.mcpro.2024.100800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 06/08/2024] [Accepted: 06/13/2024] [Indexed: 06/18/2024] Open
Abstract
Data-independent acquisition (DIA) has revolutionized the field of mass spectrometry (MS)-based proteomics over the past few years. DIA stands out for its ability to systematically sample all peptides in a given m/z range, allowing an unbiased acquisition of proteomics data. This greatly mitigates the issue of missing values and significantly enhances quantitative accuracy, precision, and reproducibility compared to many traditional methods. This review focuses on the critical role of DIA analysis software tools, primarily focusing on their capabilities and the challenges they address in proteomic research. Advances in MS technology, such as trapped ion mobility spectrometry, or high field asymmetric waveform ion mobility spectrometry require sophisticated analysis software capable of handling the increased data complexity and exploiting the full potential of DIA. We identify and critically evaluate leading software tools in the DIA landscape, discussing their unique features, and the reliability of their quantitative and qualitative outputs. We present the biological and clinical relevance of DIA-MS and discuss crucial publications that paved the way for in-depth proteomic characterization in patient-derived specimens. Furthermore, we provide a perspective on emerging trends in clinical applications and present upcoming challenges including standardization and certification of MS-based acquisition strategies in molecular diagnostics. While we emphasize the need for continuous development of software tools to keep pace with evolving technologies, we advise researchers against uncritically accepting the results from DIA software tools. Each tool may have its own biases, and some may not be as sensitive or reliable as others. Our overarching recommendation for both researchers and clinicians is to employ multiple DIA analysis tools, utilizing orthogonal analysis approaches to enhance the robustness and reliability of their findings.
Collapse
Affiliation(s)
- Klemens Fröhlich
- Proteomics Core Facility, Biozentrum Basel, University of Basel, Basel, Switzerland
| | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Freiburg, Germany
| | - Eva Brombacher
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of Freiburg, Freiburg, Germany; Centre for Integrative Biological Signaling Studies (CIBSS), University of Freiburg, Freiburg, Germany; Spemann Graduate School of Biology and Medicine (SGBM), University of Freiburg, Freiburg, Germany; Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Adrianna Seredynska
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Freiburg, Germany; Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Maximilian Maldacker
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Clemens Kreutz
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of Freiburg, Freiburg, Germany; Centre for Integrative Biological Signaling Studies (CIBSS), University of Freiburg, Freiburg, Germany
| | - Alexander Schmidt
- Proteomics Core Facility, Biozentrum Basel, University of Basel, Basel, Switzerland
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; German Cancer Consortium (DKTK) and Cancer Research Center (DKFZ), Freiburg, Germany.
| |
Collapse
|
7
|
Wang Q, Ding X, Xu Z, Wang B, Wang A, Wang L, Ding Y, Song S, Chen Y, Zhang S, Jiang L, Ding X. The mouse multi-organ proteome from infancy to adulthood. Nat Commun 2024; 15:5752. [PMID: 38982135 PMCID: PMC11233712 DOI: 10.1038/s41467-024-50183-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 07/03/2024] [Indexed: 07/11/2024] Open
Abstract
The early-life organ development and maturation shape the fundamental blueprint for later-life phenotype. However, a multi-organ proteome atlas from infancy to adulthood is currently not available. Herein, we present a comprehensive proteomic analysis of ten mouse organs (brain, heart, lung, liver, kidney, spleen, stomach, intestine, muscle and skin) at three crucial developmental stages (1-, 4- and 8-weeks after birth) acquired using data-independent acquisition mass spectrometry. We detect and quantify 11,533 protein groups across the ten organs and obtain 115 age-related differentially expressed protein groups that are co-expressed in all organs from infancy to adulthood. We find that spliceosome proteins prevalently play crucial regulatory roles in the early-life development of multiple organs, and detect organ-specific expression patterns and sexual dimorphism. This multi-organ proteome atlas provides a fundamental resource for understanding the molecular mechanisms underlying early-life organ development and maturation.
Collapse
Affiliation(s)
- Qingwen Wang
- Department of Anesthesiology and Surgical Intensive Care Unit, Xinhua Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Xinwen Ding
- Department of Anesthesiology and Surgical Intensive Care Unit, Xinhua Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Zhixiao Xu
- Department of Anesthesiology and Surgical Intensive Care Unit, Xinhua Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Boqian Wang
- Department of Anesthesiology and Surgical Intensive Care Unit, Xinhua Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Aiting Wang
- Department of Anesthesiology and Surgical Intensive Care Unit, Xinhua Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Liping Wang
- Department of Anesthesiology and Surgical Intensive Care Unit, Xinhua Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yi Ding
- Department of Anesthesiology and Surgical Intensive Care Unit, Xinhua Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Sunfengda Song
- Department of Anesthesiology and Surgical Intensive Care Unit, Xinhua Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Youming Chen
- Department of Anesthesiology and Surgical Intensive Care Unit, Xinhua Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Shuang Zhang
- Department of Anesthesiology and Surgical Intensive Care Unit, Xinhua Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Lai Jiang
- Department of Anesthesiology and Surgical Intensive Care Unit, Xinhua Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Xianting Ding
- Department of Anesthesiology and Surgical Intensive Care Unit, Xinhua Hospital, School of Medicine and School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China.
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
8
|
Sing JC, Charkow J, AlHigaylan M, Horecka I, Xu L, Röst HL. MassDash: A Web-Based Dashboard for Data-Independent Acquisition Mass Spectrometry Visualization. J Proteome Res 2024. [PMID: 38684072 DOI: 10.1021/acs.jproteome.4c00026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
With the increased usage and diversity of methods and instruments being applied to analyze Data-Independent Acquisition (DIA) data, visualization is becoming increasingly important to validate automated software results. Here we present MassDash, a cross-platform DIA mass spectrometry visualization and validation software for comparing features and results across popular tools. MassDash provides a web-based interface and Python package for interactive feature visualizations and summary report plots across multiple automated DIA feature detection tools, including OpenSwath, DIA-NN, and dreamDIA. Furthermore, MassDash processes peptides on the fly, enabling interactive visualization of peptides across dozens of runs simultaneously on a personal computer. MassDash supports various multidimensional visualizations across retention time, ion mobility, m/z, and intensity, providing additional insights into the data. The modular framework is easily extendable, enabling rapid algorithm development of novel peak-picker techniques, such as deep-learning-based approaches and refinement of existing tools. MassDash is open-source under a BSD 3-Clause license and freely available at https://github.com/Roestlab/massdash, and a demo version can be accessed at https://massdash.streamlit.app.
Collapse
Affiliation(s)
- Justin C Sing
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5G 1A8, Canada
| | - Joshua Charkow
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5G 1A8, Canada
| | - Mohammed AlHigaylan
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5G 1A8, Canada
| | - Ira Horecka
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5G 1A8, Canada
| | - Leon Xu
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5G 1A8, Canada
| | - Hannes L Röst
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5G 1A8, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario M5G 1A8, Canada
| |
Collapse
|
9
|
Li Y, He Q, Guo H, Shuai SC, Cheng J, Liu L, Shuai J. AttnPep: A Self-Attention-Based Deep Learning Method for Peptide Identification in Shotgun Proteomics. J Proteome Res 2024; 23:834-843. [PMID: 38252705 DOI: 10.1021/acs.jproteome.3c00729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
In shotgun proteomics, the proteome search engine analyzes mass spectra obtained by experiments, and then a peptide-spectra match (PSM) is reported for each spectrum. However, most of the PSMs identified are incorrect, and therefore various postprocessing software have been developed for reranking the peptide identifications. Yet these methods suffer from issues such as dependency on distribution, reliance on shallow models, and limited effectiveness. In this work, we propose AttnPep, a deep learning model for rescoring PSM scores that utilizes the Self-Attention module. This module helps the neural network focus on features relevant to the classification of PSMs and ignore irrelevant features. This allows AttnPep to analyze the output of different search engines and improve PSM discrimination accuracy. We considered a PSM to be correct if it achieves a q-value <0.01 and compared AttnPep with existing mainstream software PeptideProphet, Percolator, and proteoTorch. The results indicated that AttnPep found an average increase in correct PSMs of 9.29% relative to the other methods. Additionally, AttnPep was able to better distinguish between correct and incorrect PSMs and found more synthetic peptides in the complex SWATH data set.
Collapse
Affiliation(s)
- Yulin Li
- Department of Physics, Xiamen University, Xiamen 361005, China
| | - Qingzu He
- Department of Physics, Xiamen University, Xiamen 361005, China
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
| | - Huan Guo
- Department of Physics, Xiamen University, Xiamen 361005, China
| | - Stella C Shuai
- Biological Science, Northwestern University, Evanston, Illinois 60208, United States
| | - Jinyan Cheng
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
| | - Liyu Liu
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
| | - Jianwei Shuai
- Department of Physics, Xiamen University, Xiamen 361005, China
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
| |
Collapse
|
10
|
Lou R, Shui W. Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Mol Cell Proteomics 2024; 23:100712. [PMID: 38182042 PMCID: PMC10847697 DOI: 10.1016/j.mcpro.2024.100712] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/27/2023] [Accepted: 01/02/2024] [Indexed: 01/07/2024] Open
Abstract
Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. This review provides a comprehensive overview of recent advances in both the experimental and computational methods for DIA proteomics, from data acquisition schemes to analysis strategies and software tools. DIA acquisition schemes are categorized based on the design of precursor isolation windows, highlighting wide-window, overlapping-window, narrow-window, scanning quadrupole-based, and parallel accumulation-serial fragmentation-enhanced DIA methods. For DIA data analysis, major strategies are classified into spectrum reconstruction, sequence-based search, library-based search, de novo sequencing, and sequencing-independent approaches. A wide array of software tools implementing these strategies are reviewed, with details on their overall workflows and scoring approaches at different steps. The generation and optimization of spectral libraries, which are critical resources for DIA analysis, are also discussed. Publicly available benchmark datasets covering global proteomics and phosphoproteomics are summarized to facilitate performance evaluation of various software tools and analysis workflows. Continued advances and synergistic developments of versatile components in DIA workflows are expected to further enhance the power of DIA-based proteomics.
Collapse
Affiliation(s)
- Ronghui Lou
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| | - Wenqing Shui
- iHuman Institute, ShanghaiTech University, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
| |
Collapse
|
11
|
He Q, Zhong CQ, Li X, Guo H, Li Y, Gao M, Yu R, Liu X, Zhang F, Guo D, Ye F, Guo T, Shuai J, Han J. Dear-DIA XMBD: Deep Autoencoder Enables Deconvolution of Data-Independent Acquisition Proteomics. RESEARCH (WASHINGTON, D.C.) 2023; 6:0179. [PMID: 37377457 PMCID: PMC10292580 DOI: 10.34133/research.0179] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 06/01/2023] [Indexed: 06/29/2023]
Abstract
Data-independent acquisition (DIA) technology for protein identification from mass spectrometry and related algorithms is developing rapidly. The spectrum-centric analysis of DIA data without the use of spectra library from data-dependent acquisition data represents a promising direction. In this paper, we proposed an untargeted analysis method, Dear-DIAXMBD, for direct analysis of DIA data. Dear-DIAXMBD first integrates the deep variational autoencoder and triplet loss to learn the representations of the extracted fragment ion chromatograms, then uses the k-means clustering algorithm to aggregate fragments with similar representations into the same classes, and finally establishes the inverted index tables to determine the precursors of fragment clusters between precursors and peptides and between fragments and peptides. We show that Dear-DIAXMBD performs superiorly with the highly complicated DIA data of different species obtained by different instrument platforms. Dear-DIAXMBD is publicly available at https://github.com/jianweishuai/Dear-DIA-XMBD.
Collapse
Affiliation(s)
- Qingzu He
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health) and Wenzhou Institute,
University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
| | - Chuan-Qi Zhong
- School of Life Sciences,
Xiamen University, Xiamen 361102, China
- State Key Laboratory of Cellular Stress Biology,
Innovation Center for Cell Signaling Network, Xiamen 361102, China
| | - Xiang Li
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
- State Key Laboratory of Cellular Stress Biology,
Innovation Center for Cell Signaling Network, Xiamen 361102, China
| | - Huan Guo
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
| | - Yiming Li
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
| | - Mingxuan Gao
- Department of Computer Science,
Xiamen University, Xiamen 361005, China
| | - Rongshan Yu
- Department of Computer Science,
Xiamen University, Xiamen 361005, China
- National Institute for Data Science in Health and Medicine, School of Medicine,
Xiamen University, Xiamen 361102, China
| | - Xianming Liu
- Bruker (Beijing) Scientific Technology Co. Ltd., Beijing, China
| | - Fangfei Zhang
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences,
Westlake University, 18 Shilongshan Road, Hangzhou 310024, China
- Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, China
| | - Donghui Guo
- Department of Electronic Engineering,
Xiamen University, Xiamen 361005, China
| | - Fangfu Ye
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health) and Wenzhou Institute,
University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
| | - Tiannan Guo
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences,
Westlake University, 18 Shilongshan Road, Hangzhou 310024, China
- Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou 310024, China
- Westlake Omics Ltd., Yunmeng Road 1, Hangzhou, China
| | - Jianwei Shuai
- Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research,
Xiamen University, Xiamen 361005, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health) and Wenzhou Institute,
University of Chinese Academy of Sciences, Wenzhou, Zhejiang 325001, China
- State Key Laboratory of Cellular Stress Biology,
Innovation Center for Cell Signaling Network, Xiamen 361102, China
- National Institute for Data Science in Health and Medicine, School of Medicine,
Xiamen University, Xiamen 361102, China
| | - Jiahuai Han
- School of Life Sciences,
Xiamen University, Xiamen 361102, China
- State Key Laboratory of Cellular Stress Biology,
Innovation Center for Cell Signaling Network, Xiamen 361102, China
- National Institute for Data Science in Health and Medicine, School of Medicine,
Xiamen University, Xiamen 361102, China
| |
Collapse
|
12
|
Liu D, Liu B, Lin T, Liu G, Yang G, Qi D, Qiu Y, Lu Y, Yuan Q, Shuai SC, Li X, Liu O, Tang X, Shuai J, Cao Y, Lin H. Measuring depression severity based on facial expression and body movement using deep convolutional neural network. Front Psychiatry 2022; 13:1017064. [PMID: 36620657 PMCID: PMC9810804 DOI: 10.3389/fpsyt.2022.1017064] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 12/02/2022] [Indexed: 12/24/2022] Open
Abstract
Introduction Real-time evaluations of the severity of depressive symptoms are of great significance for the diagnosis and treatment of patients with major depressive disorder (MDD). In clinical practice, the evaluation approaches are mainly based on psychological scales and doctor-patient interviews, which are time-consuming and labor-intensive. Also, the accuracy of results mainly depends on the subjective judgment of the clinician. With the development of artificial intelligence (AI) technology, more and more machine learning methods are used to diagnose depression by appearance characteristics. Most of the previous research focused on the study of single-modal data; however, in recent years, many studies have shown that multi-modal data has better prediction performance than single-modal data. This study aimed to develop a measurement of depression severity from expression and action features and to assess its validity among the patients with MDD. Methods We proposed a multi-modal deep convolutional neural network (CNN) to evaluate the severity of depressive symptoms in real-time, which was based on the detection of patients' facial expression and body movement from videos captured by ordinary cameras. We established behavioral depression degree (BDD) metrics, which combines expression entropy and action entropy to measure the depression severity of MDD patients. Results We found that the information extracted from different modes, when integrated in appropriate proportions, can significantly improve the accuracy of the evaluation, which has not been reported in previous studies. This method presented an over 74% Pearson similarity between BDD and self-rating depression scale (SDS), self-rating anxiety scale (SAS), and Hamilton depression scale (HAMD). In addition, we tracked and evaluated the changes of BDD in patients at different stages of a course of treatment and the results obtained were in agreement with the evaluation from the scales. Discussion The BDD can effectively measure the current state of patients' depression and its changing trend according to the patient's expression and action features. Our model may provide an automatic auxiliary tool for the diagnosis and treatment of MDD.
Collapse
Affiliation(s)
- Dongdong Liu
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
| | - Bowen Liu
- Department of Psychiatry, National Clinical Research Center for Mental Disorders, The Second Xiangya Hospital of Central South University, Changsha, China
- Department of Psychiatry, Baoan Mental Health Center, Shenzhen Baoan Center for Chronic Disease Control, Shenzhen, China
| | - Tao Lin
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Guangya Liu
- Integrated Chinese and Western Therapy of Depression Ward, Hunan Brain Hospital, Changsha, China
| | - Guoyu Yang
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
| | - Dezhen Qi
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
| | - Ye Qiu
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Yuer Lu
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Qinmei Yuan
- Department of Psychiatry, National Clinical Research Center for Mental Disorders, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Stella C. Shuai
- Department of Biological Sciences, Northwestern University, Evanston, IL, United States
| | - Xiang Li
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
| | - Ou Liu
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| | - Xiangdong Tang
- Sleep Medicine Center, Mental Health Center, Department of Respiratory and Critical Care Medicine, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Jianwei Shuai
- Department of Physics, Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, China
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
- State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China
| | - Yuping Cao
- Department of Psychiatry, National Clinical Research Center for Mental Disorders, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Hai Lin
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, China
| |
Collapse
|
13
|
Yang Y, Lin L, Qiao L. Deep learning approaches for data-independent acquisition proteomics. Expert Rev Proteomics 2021; 18:1031-1043. [PMID: 34918987 DOI: 10.1080/14789450.2021.2020654] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
INTRODUCTION Data-independent acquisition (DIA) is an emerging technology for large-scale proteomic studies. DIA data analysis methods are evolving rapidly, and deep learning has cut a conspicuous figure in this field. AREAS COVERED This review discusses and provides an overview of the deep learning methods that are used for DIA data analysis, including spectral library prediction, feature scoring, and statistical control in peptide-centric analysis, as well as de novo peptide sequencing. Literature searches were performed for articles, including preprints, up to December 2021 from PubMed, Scopus, and Web of Science databases. EXPERT OPINION While spectral library prediction has broken through the limitation on proteome coverage of experimental libraries, the statistical burden due to the large query space is the remaining challenge of utilizing proteome-wide predicted libraries. Analysis of post-translational modifications is another promising direction of deep learning-based DIA methods.
Collapse
Affiliation(s)
- Yi Yang
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| | - Ling Lin
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| | - Liang Qiao
- Department of Chemistry, Shanghai Stomatological Hospital, and Minhang Hospital, Fudan University, Shanghai China
| |
Collapse
|