1
|
iSanXoT: A standalone application for the integrative analysis of mass spectrometry-based quantitative proteomics data. Comput Struct Biotechnol J 2024; 23:452-459. [PMID: 38235360 PMCID: PMC10792623 DOI: 10.1016/j.csbj.2023.12.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 12/22/2023] [Accepted: 12/22/2023] [Indexed: 01/19/2024] Open
Abstract
Many bioinformatics tools are available for the quantitative analysis of proteomics experiments. Most of these tools use a dedicated statistical model to derive absolute quantitative protein values from mass spectrometry (MS) data. Here, we present iSanXoT, a standalone application that processes relative abundances between MS signals and then integrates them sequentially to upper levels using the previously published Generic Integration Algorithm (GIA). iSanXoT offers unique capabilities that complement conventional quantitative software applications, including statistical weighting and independent modeling of error distributions in each integration, aggregation of technical or biological replicates, quantification of posttranslational modifications, and analysis of coordinated protein behavior. iSanXoT is a standalone, user-friendly application that accepts output from popular proteomics pipelines and enables unrestricted creation of quantification workflows and fully customizable reports that can be reused across projects or shared among users. Numerous publications attest the successful application of diverse integrative workflows constructed using the GIA for the analysis of high-throughput quantitative proteomics experiments. iSanXoT has been tested with the main operating systems. Download links for the corresponding distributions are available at https://github.com/CNIC-Proteomics/iSanXoT/releases.
Collapse
|
2
|
MassSpecPreppy-An end-to-end solution for automated protein concentration determination and flexible sample digestion for proteomics applications. Proteomics 2024; 24:e2300294. [PMID: 37772677 DOI: 10.1002/pmic.202300294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/13/2023] [Accepted: 09/14/2023] [Indexed: 09/30/2023]
Abstract
In proteomics, fast, efficient, and highly reproducible sample preparation is of utmost importance, particularly in view of fast scanning mass spectrometers enabling analyses of large sample series. To address this need, we have developed the web application MassSpecPreppy that operates on the open science OT-2 liquid handling robot from Opentrons. This platform can prepare up to 96 samples at once, performing tasks like BCA protein concentration determination, sample digestion with normalization, reduction/alkylation and peptide elution into vials or loading specified peptide amounts onto Evotips in an automated and flexible manner. The performance of the developed workflows using MassSpecPreppy was compared with standard manual sample preparation workflows. The BCA assay experiments revealed an average recovery of 101.3% (SD: ± 7.82%) for the MassSpecPreppy workflow, while the manual workflow had a recovery of 96.3% (SD: ± 9.73%). The species mix used in the evaluation experiments showed that 94.5% of protein groups for OT-2 digestion and 95% for manual digestion passed the significance thresholds with comparable peptide level coefficient of variations. These results demonstrate that MassSpecPreppy is a versatile and scalable platform for automated sample preparation, producing injection-ready samples for proteomics research.
Collapse
|
3
|
Tracking the effects of PLGA-based nanoparticles on protein expression in living cells through quantitative proteomics. J Mater Chem B 2024; 12:4262-4269. [PMID: 38602378 DOI: 10.1039/d3tb01881d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/12/2024]
Abstract
Mass spectrometry (MS)-based proteomics can identify and quantify the differential abundance of expressed proteins in parallel, and bottom-up proteomic approaches are even approaching comprehensive coverage of the complex eukaryotic proteome. Protein-nanoparticle (NP) interactions have been extensively studied owing to their importance in biological applications and nanotoxicology. However, the proteome-level effects of NPs on cells have received little attention, although changes in protein abundance can reflect the direct effects of nanocarriers on protein expression. Herein, we investigated the effect of PLGA-based NPs on protein expression in HepG2 cells using a label-free quantitative proteomics approach with data independent acquisition (DIA). The percentage of two-fold change in the protein expression of cells treated with PLGA-based NPs was less than 10.15% during a 6 hour observation period. Among the changed proteins, we found that dynamic proteins involved in cell division, localization, and transport are more likely to be more susceptible to PLGA-based NPs.
Collapse
|
4
|
Screening of novel biomarkers for acute kidney transplant rejection using DIA-MS based proteomics. Proteomics Clin Appl 2024; 18:e2300047. [PMID: 38215274 DOI: 10.1002/prca.202300047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 11/03/2023] [Accepted: 11/22/2023] [Indexed: 01/14/2024]
Abstract
BACKGROUND Kidney transplantation is the preferred treatment for patients with end-stage renal disease. However, acute rejection poses a threat to the graft long-term survival. The aim of this study was to identify novel biomarkers to detect acute kidney transplant rejection. METHODS The serum proteomic profiling of kidney transplant patients with T cell-mediated acute rejection (TCMR) and stable allograft function (STA) was analyzed using data-independent acquisition mass spectrometry (DIA-MS). The differentially expressed proteins (DEPs) of interest were further verified by enzyme-linked immunosorbent assay (ELISA). RESULTS A total of 131 DEPs were identified between STA and TCMR patients, 114 DEPs were identified between mild and severe TCMR patients. The verification results showed that remarkable higher concentrations of serum amyloid A protein 1 (SAA1) and insulin like growth factor binding protein 2 (IGFBP2), and lower fetuin-A (AHSG) concentration were found in TCMR patients when compared with STA patients. We also found higher SAA1 concentration in severe TCMR group when compared with mild TCMR group. The receiver operating characteristics (ROC) analysis further confirmed that combination of SAA1, AHSG, and IGFBP2 had excellent performance in the acute rejection diagnosis. CONCLUSIONS Our data demonstrated that serum SAA1, AHSG, and IGFBP2 could be effective biomarkers for diagnosing acute rejection after kidney transplantation. DIA-MS has great potential in biomarker screening of kidney transplantation.
Collapse
|
5
|
MassDash: A Web-Based Dashboard for Data-Independent Acquisition Mass Spectrometry Visualization. J Proteome Res 2024. [PMID: 38684072 DOI: 10.1021/acs.jproteome.4c00026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2024]
Abstract
With the increased usage and diversity of methods and instruments being applied to analyze Data-Independent Acquisition (DIA) data, visualization is becoming increasingly important to validate automated software results. Here we present MassDash, a cross-platform DIA mass spectrometry visualization and validation software for comparing features and results across popular tools. MassDash provides a web-based interface and Python package for interactive feature visualizations and summary report plots across multiple automated DIA feature detection tools, including OpenSwath, DIA-NN, and dreamDIA. Furthermore, MassDash processes peptides on the fly, enabling interactive visualization of peptides across dozens of runs simultaneously on a personal computer. MassDash supports various multidimensional visualizations across retention time, ion mobility, m/z, and intensity, providing additional insights into the data. The modular framework is easily extendable, enabling rapid algorithm development of novel peak-picker techniques, such as deep-learning-based approaches and refinement of existing tools. MassDash is open-source under a BSD 3-Clause license and freely available at https://github.com/Roestlab/massdash, and a demo version can be accessed at https://massdash.streamlit.app.
Collapse
|
6
|
TopDIA: A Software Tool for Top-Down Data-Independent Acquisition Proteomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.05.588302. [PMID: 38645171 PMCID: PMC11030422 DOI: 10.1101/2024.04.05.588302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Top-down mass spectrometry is widely used for proteoform identification, characterization, and quantification owing to its ability to analyze intact proteoforms. In the last decade, top-down proteomics has been dominated by top-down data-dependent acquisition mass spectrometry (TD-DDA-MS), and top-down data-independent acquisition mass spectrometry (TD-DIA-MS) has not been well studied. While TD-DIA-MS produces complex multiplexed tandem mass spectrometry (MS/MS) spectra, which are challenging to confidently identify, it selects more precursor ions for MS/MS analysis and has the potential to increase proteoform identifications compared with TD-DDA-MS. Here we present TopDIA, the first software tool for proteoform identification by TD-DIA-MS. It generates demultiplexed pseudo MS/MS spectra from TD-DIA-MS data and then searches the pseudo MS/MS spectra against a protein sequence database for proteoform identification. We compared the performance of TD-DDA-MS and TD-DIA-MS using Escherichia coli K-12 MG1655 cells and demonstrated that TD-DIA-MS with TopDIA increased proteoform and protein identifications compared with TD-DDA-MS.
Collapse
|
7
|
SeFilter-DIA: Squeeze-and-Excitation Network for Filtering High-Confidence Peptides of Data-Independent Acquisition Proteomics. Interdiscip Sci 2024:10.1007/s12539-024-00611-4. [PMID: 38472692 DOI: 10.1007/s12539-024-00611-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 01/12/2024] [Accepted: 01/21/2024] [Indexed: 03/14/2024]
Abstract
Mass spectrometry is crucial in proteomics analysis, particularly using Data Independent Acquisition (DIA) for reliable and reproducible mass spectrometry data acquisition, enabling broad mass-to-charge ratio coverage and high throughput. DIA-NN, a prominent deep learning software in DIA proteome analysis, generates peptide results but may include low-confidence peptides. Conventionally, biologists have to manually screen peptide fragment ion chromatogram peaks (XIC) for identifying high-confidence peptides, a time-consuming and subjective process prone to variability. In this study, we introduce SeFilter-DIA, a deep learning algorithm, aiming at automating the identification of high-confidence peptides. Leveraging compressed excitation neural network and residual network models, SeFilter-DIA extracts XIC features and effectively discerns between high and low-confidence peptides. Evaluation of the benchmark datasets demonstrates SeFilter-DIA achieving 99.6% AUC on the test set and 97% for other performance indicators. Furthermore, SeFilter-DIA is applicable for screening peptides with phosphorylation modifications. These results demonstrate the potential of SeFilter-DIA to replace manual screening, providing an efficient and objective approach for high-confidence peptide identification while mitigating associated limitations.
Collapse
|
8
|
VPBrowse: Genome-based representation of MS/MS spectra to quantify 10,000 bovine proteins. Proteomics 2024:e2300431. [PMID: 38468111 DOI: 10.1002/pmic.202300431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 02/11/2024] [Accepted: 02/26/2024] [Indexed: 03/13/2024]
Abstract
SWATH is a data acquisition strategy acclaimed for generating quantitatively accurate and consistent measurements of proteins across multiple samples. Its utility for proteomics studies in nonlaboratory animals, however, is currently compromised by the lack of sufficiently comprehensive and reliable public libraries, either experimental or predicted, and relevant platforms that support their sharing and utilization in an intuitive manner. Here we describe the development of the Veterinary Proteome Browser, VPBrowse (http://browser.proteo.cloud/), an on-line platform for genome-based representation of the Bos taurus proteome, which is equipped with an interactive database and tools for searching, visualization, and building quantitative mass spectrometry assays. In its current version (VPBrowse 1.0), it contains high-quality fragmentation spectra acquired on QToF instrument for over 36,000 proteotypic peptides, the experimental evidence for over 10,000 proteins. Data can be downloaded in different formats to enable analysis using popular software packages for SWATH data processing whilst normalization to iRT scale ensures compatibility with diverse chromatography systems. When applied to published blood plasma dataset from the biomarker discovery study, the resource supported label-free quantification of additional proteins not reported by the authors previously including PSMA4, a tissue leakage protein and a promising candidate biomarker of animal's response to dehorning-related injury.
Collapse
|
9
|
Substantial downregulation of mitochondrial and peroxisomal proteins during acute kidney injury revealed by data-independent acquisition proteomics. Proteomics 2024; 24:e2300162. [PMID: 37775337 DOI: 10.1002/pmic.202300162] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 08/17/2023] [Accepted: 08/22/2023] [Indexed: 10/01/2023]
Abstract
Acute kidney injury (AKI) manifests as a major health concern, particularly for the elderly. Understanding AKI-related proteome changes is critical for prevention and development of novel therapeutics to recover kidney function and to mitigate the susceptibility for recurrent AKI or development of chronic kidney disease. In this study, mouse kidneys were subjected to ischemia-reperfusion injury, and the contralateral kidneys remained uninjured to enable comparison and assess injury-induced changes in the kidney proteome. A ZenoTOF 7600 mass spectrometer was optimized for data-independent acquisition (DIA) to achieve comprehensive protein identification and quantification. Short microflow gradients and the generation of a deep kidney-specific spectral library allowed for high-throughput, comprehensive protein quantification. Upon AKI, the kidney proteome was completely remodeled, and over half of the 3945 quantified protein groups changed significantly. Downregulated proteins in the injured kidney were involved in energy production, including numerous peroxisomal matrix proteins that function in fatty acid oxidation, such as ACOX1, CAT, EHHADH, ACOT4, ACOT8, and Scp2. Injured kidneys exhibited severely damaged tissues and injury markers. The comprehensive and sensitive kidney-specific DIA-MS assays feature high-throughput analytical capabilities to achieve deep coverage of the kidney proteome, and will serve as useful tools for developing novel therapeutics to remediate kidney function.
Collapse
|
10
|
Multispecies Benchmark Analysis for LC-MS/MS Validation and Performance Evaluation in Bottom-Up Proteomics. J Proteome Res 2024; 23:684-691. [PMID: 38243904 PMCID: PMC10845134 DOI: 10.1021/acs.jproteome.3c00531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 12/04/2023] [Accepted: 01/04/2024] [Indexed: 01/22/2024]
Abstract
We present an instrument-independent benchmark procedure and software (LFQ_bout) for the validation and comparative evaluation of the performance of LC-MS/MS and data processing workflows in bottom-up proteomics. The procedure enables a back-to-back comparison of common and emerging workflows, e.g., diaPASEF or ScanningSWATH, and evaluates the impact of arbitrary and inadequately documented settings or black-box data processing algorithms. It enhances the overall performance and quantification accuracy by recognizing and reporting common quantification errors.
Collapse
|
11
|
Acquisition and Analysis of DIA-Based Proteomic Data: A Comprehensive Survey in 2023. Mol Cell Proteomics 2024; 23:100712. [PMID: 38182042 PMCID: PMC10847697 DOI: 10.1016/j.mcpro.2024.100712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/27/2023] [Accepted: 01/02/2024] [Indexed: 01/07/2024] Open
Abstract
Data-independent acquisition (DIA) mass spectrometry (MS) has emerged as a powerful technology for high-throughput, accurate, and reproducible quantitative proteomics. This review provides a comprehensive overview of recent advances in both the experimental and computational methods for DIA proteomics, from data acquisition schemes to analysis strategies and software tools. DIA acquisition schemes are categorized based on the design of precursor isolation windows, highlighting wide-window, overlapping-window, narrow-window, scanning quadrupole-based, and parallel accumulation-serial fragmentation-enhanced DIA methods. For DIA data analysis, major strategies are classified into spectrum reconstruction, sequence-based search, library-based search, de novo sequencing, and sequencing-independent approaches. A wide array of software tools implementing these strategies are reviewed, with details on their overall workflows and scoring approaches at different steps. The generation and optimization of spectral libraries, which are critical resources for DIA analysis, are also discussed. Publicly available benchmark datasets covering global proteomics and phosphoproteomics are summarized to facilitate performance evaluation of various software tools and analysis workflows. Continued advances and synergistic developments of versatile components in DIA workflows are expected to further enhance the power of DIA-based proteomics.
Collapse
|
12
|
On the excessive use of coefficient of variation as a metric of quantitation quality in proteomics. Proteomics 2024; 24:e2300090. [PMID: 37496303 DOI: 10.1002/pmic.202300090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/05/2023] [Accepted: 07/18/2023] [Indexed: 07/28/2023]
Abstract
The coefficient of variation (CV) is often used in proteomics as a proxy to characterize the performance of a quantitation method and/or the related software. In this note, we question the excessive reliance on this metric in quantitative proteomics that may result in erroneous conclusions. We support this note using a ground-truth Human-Yeast-E. coli dataset demonstrating in a number of cases that erroneous data processing methods may lead to a low CV which has nothing to do with these methods' performances in quantitation.
Collapse
|
13
|
Fast proteomics with dia-PASEF and analytical flow-rate chromatography. Proteomics 2024; 24:e2300100. [PMID: 37287406 DOI: 10.1002/pmic.202300100] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 05/22/2023] [Accepted: 05/22/2023] [Indexed: 06/09/2023]
Abstract
Increased throughput in proteomic experiments can improve accessibility of proteomic platforms, reduce costs, and facilitate new approaches in systems biology and biomedical research. Here we propose combination of analytical flow rate chromatography with ion mobility separation of peptide ions, data-independent acquisition, and data analysis with the DIA-NN software suite, to achieve high-quality proteomic experiments from limited sample amounts, at a throughput of up to 400 samples per day. For instance, when benchmarking our workflow using a 500-μL/min flow rate and 3-min chromatographic gradients, we report the quantification of 5211 proteins from 2 μg of a mammalian cell-line standard at high quantitative accuracy and precision. We further used this platform to analyze blood plasma samples from a cohort of COVID-19 inpatients, using a 3-min chromatographic gradient and alternating column regeneration on a dual pump system. The method delivered a comprehensive view of the COVID-19 plasma proteome, allowing classification of the patients according to disease severity and revealing plasma biomarker candidates.
Collapse
|
14
|
Baldur: Bayesian Hierarchical Modeling for Label-Free Proteomics with Gamma Regressing Mean-Variance Trends. Mol Cell Proteomics 2023; 22:100658. [PMID: 37806340 PMCID: PMC10687340 DOI: 10.1016/j.mcpro.2023.100658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 09/20/2023] [Accepted: 10/04/2023] [Indexed: 10/10/2023] Open
Abstract
Label-free proteomics is a fast-growing methodology to infer abundances in mass spectrometry proteomics. Extensive research has focused on spectral quantification and peptide identification. However, research toward modeling and understanding quantitative proteomics data is scarce. Here we propose a Bayesian hierarchical decision model (Baldur) to test for differences in means between conditions for proteins, peptides, and post-translational modifications. We developed a Bayesian regression model to characterize local mean-variance trends in data, to estimate measurement uncertainty and hyperparameters for the decision model. A key contribution is the development of a new gamma regression model that describes the mean-variance dependency as a mixture of a common and a latent trend-allowing for localized trend estimates. We then evaluate the performance of Baldur, limma-trend, and t test on six benchmark datasets: five total proteomics and one post-translational modification dataset. We find that Baldur drastically improves the decision in noisier post-translational modification data over limma-trend and t test. In addition, we see significant improvements using Baldur over the other methods in the total proteomics datasets. Finally, we analyzed Baldur's performance when increasing the number of replicates and found that the method always increases precision with sample size, while showing robust control of the false positive rate. We conclude that our model vastly improves over popular data analysis methods (limma-trend and t test) in several spike-in datasets by achieving a high true positive detection rate, while greatly reducing the false-positive rate.
Collapse
|
15
|
Guidelines for mouse and human DC functional assays. Eur J Immunol 2023; 53:e2249925. [PMID: 36563126 DOI: 10.1002/eji.202249925] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/25/2022] [Accepted: 10/26/2022] [Indexed: 12/24/2022]
Abstract
This article is part of the Dendritic Cell Guidelines article series, which provides a collection of state-of-the-art protocols for the preparation, phenotype analysis by flow cytometry, generation, fluorescence microscopy, and functional characterization of mouse and human dendritic cells (DC) from lymphoid organs and various non-lymphoid tissues. Recent studies have provided evidence for an increasing number of phenotypically distinct conventional DC (cDC) subsets that on one hand exhibit a certain functional plasticity, but on the other hand are characterized by their tissue- and context-dependent functional specialization. Here, we describe a selection of assays for the functional characterization of mouse and human cDC. The first two protocols illustrate analysis of cDC endocytosis and metabolism, followed by guidelines for transcriptomic and proteomic characterization of cDC populations. Then, a larger group of assays describes the characterization of cDC migration in vitro, ex vivo, and in vivo. The final guidelines measure cDC inflammasome and antigen (cross)-presentation activity. While all protocols were written by experienced scientists who routinely use them in their work, this article was also peer-reviewed by leading experts and approved by all co-authors, making it an essential resource for basic and clinical DC immunologists.
Collapse
|
16
|
Data-Driven Tool for Cross-Run Ion Selection and Peak-Picking in Quantitative Proteomics with Data-Independent Acquisition LC-MS/MS. Anal Chem 2023; 95:16558-16566. [PMID: 37906674 DOI: 10.1021/acs.analchem.3c02689] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Proteomics provides molecular bases of biology and disease, and liquid chromatography-tandem mass spectrometry (LC-MS/MS) is a platform widely used for bottom-up proteomics. Data-independent acquisition (DIA) improves the run-to-run reproducibility of LC-MS/MS in proteomics research. However, the existing DIA data processing tools sometimes produce large deviations from true values for the peptides and proteins in quantification. Peak-picking error and incorrect ion selection are the two main causes of the deviations. We present a cross-run ion selection and peak-picking (CRISP) tool that utilizes the important advantage of run-to-run consistency of DIA and simultaneously examines the DIA data from the whole set of runs to filter out the interfering signals, instead of only looking at a single run at a time. Eight datasets acquired by mass spectrometers from different vendors with different types of mass analyzers were used to benchmark our CRISP-DIA against other currently available DIA tools. In the benchmark datasets, for analytes with large content variation among samples, CRISP-DIA generally resulted in 20 to 50% relative decrease in error rates compared to other DIA tools, at both the peptide precursor level and the protein level. CRISP-DIA detected differentially expressed proteins more efficiently, with 3.3 to 90.3% increases in the numbers of true positives and 12.3 to 35.3% decreases in the false positive rates, in some cases. In the real biological datasets, CRISP-DIA showed better consistencies of the quantification results. The advantages of assimilating DIA data in multiple runs for quantitative proteomics were demonstrated, which can significantly improve the quantification accuracy.
Collapse
|
17
|
A review of the current state of single-cell proteomics and future perspective. Anal Bioanal Chem 2023; 415:6889-6899. [PMID: 37285026 PMCID: PMC10632274 DOI: 10.1007/s00216-023-04759-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 05/12/2023] [Accepted: 05/16/2023] [Indexed: 06/08/2023]
Abstract
Single-cell methodologies and technologies have started a revolution in biology which until recently has primarily been limited to deep sequencing and imaging modalities. With the advent and subsequent torrid development of single-cell proteomics over the last 5 years, despite the fact that proteins cannot be amplified like transcripts, it has now become abundantly clear that it is a worthy complement to single-cell transcriptomics. In this review, we engage in an assessment of the current state of the art of single-cell proteomics including workflow, sample preparation techniques, instrumentation, and biological applications. We investigate the challenges associated with working with very small sample volumes and the acute need for robust statistical methods for data interpretation. We delve into what we believe is a promising future for biological research at single-cell resolution and highlight some of the exciting discoveries that already have been made using single-cell proteomics, including the identification of rare cell types, characterization of cellular heterogeneity, and investigation of signaling pathways and disease mechanisms. Finally, we acknowledge that there are a number of outstanding and pressing problems that the scientific community vested in advancing this technology needs to resolve. Of prime importance is the need to set standards so that this technology becomes widely accessible allowing novel discoveries to be easily verifiable. We conclude with a plea to solve these problems rapidly so that single-cell proteomics can be part of a robust, high-throughput, and scalable single-cell multi-omics platform that can be ubiquitously applied to elucidating deep biological insights into the diagnosis and treatment of all diseases that afflict us.
Collapse
|
18
|
Achieving quantitative reproducibility in label-free multisite DIA experiments through multirun alignment. Commun Biol 2023; 6:1101. [PMID: 37903988 PMCID: PMC10616189 DOI: 10.1038/s42003-023-05437-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 10/10/2023] [Indexed: 11/01/2023] Open
Abstract
DIA is a mainstream method for quantitative proteomics, but consistent quantification across multiple LC-MS/MS instruments remains a bottleneck in parallelizing data acquisition. One reason for this inconsistency and missing quantification is the retention time shift which current software does not adequately address for runs from multiple sites. We present multirun chromatogram alignment strategies to map peaks across columns, including the traditional reference-based Star method, and two novel approaches: MST and Progressive alignment. These reference-free strategies produce a quantitatively accurate data-matrix, even from heterogeneous multi-column studies. Progressive alignment also generates merged chromatograms from all runs which has not been previously achieved for LC-MS/MS data. First, we demonstrate the effectiveness of multirun alignment strategies on a gold-standard annotated dataset, resulting in a threefold reduction in quantitation error-rate compared to non-aligned DIA results. Subsequently, on a multi-species dataset that DIAlignR effectively controls the quantitative error rate, improves precision in protein measurements, and exhibits conservative peak alignment. We next show that the MST alignment reduces cross-site CV by 50% for highly abundant proteins when applied to a dataset from 11 different LC-MS/MS setups. Finally, the reanalysis of 949 plasma runs with multirun alignment revealed a more than 50% increase in insulin resistance (IR) and respiratory viral infection (RVI) proteins, identifying 11 and 13 proteins respectively, compared to prior analysis without it. The three strategies are implemented in our DIAlignR workflow (>2.3) and can be combined with linear, non-linear, or hybrid pairwise alignment.
Collapse
|
19
|
Cannflavins A and B with Anti-Ferroptosis, Anti-Glycation, and Antioxidant Activities Protect Human Keratinocytes in a Cell Death Model with Erastin and Reactive Carbonyl Species. Nutrients 2023; 15:4565. [PMID: 37960218 PMCID: PMC10650133 DOI: 10.3390/nu15214565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 10/12/2023] [Accepted: 10/19/2023] [Indexed: 11/15/2023] Open
Abstract
Precursors of advanced glycation endproducts, namely, reactive carbonyl species (RCSs), are aging biomarkers that contribute to cell death. However, the impact of RCSs on ferroptosis-an iron-dependent form of cell death-in skin cells remains unknown. Herein, we constructed a cellular model (with human keratinocyte; HaCaT cells) to evaluate the cytotoxicity of the combinations of RCSs (including glyoxal; GO and methyglyoxal; MGO) and erastin (a ferroptosis inducer) using bioassays (measuring cellular lipid peroxidation and iron content) and proteomics with sequential window acquisition of all theoretical mass spectra. Additionally, a data-independent acquisition approach was used to characterize RCSs' and erastin's molecular network including genes, canonical pathways, and upstream regulators. Using this model, we evaluated the cytoprotective effects of two dietary flavonoids including cannflavins A and B against RCSs and erastin-induced cytotoxicity in HaCaT cells. Cannflavins A and B (at 0.625 to 20 µM) inhibited ferroptosis by restoring the cell viability (by 56.6-78.6% and 63.8-81.1%) and suppressing cellular lipid peroxidation (by 42.3-70.2% and 28.8-63.6%), respectively. They also alleviated GO + erastin- or MGO + erastin-induced cytotoxicity by 62.2-67.6% and 56.1-69.3%, and 35.6-54.5% and 33.8-62.0%, respectively. Mechanistic studies supported that the cytoprotective effects of cannflavins A and B are associated with their antioxidant activities including free radical scavenging capacity and an inhibitory effect on glycation. This is the first study showing that cannflavins A and B protect human keratinocytes from RCSs + erastin-induced cytotoxicity, which supports their potential applications as dietary interventions for aging-related skin conditions.
Collapse
|
20
|
Current perspectives on mass spectrometry-based immunopeptidomics: the computational angle to tumor antigen discovery. J Immunother Cancer 2023; 11:e007073. [PMID: 37899131 PMCID: PMC10619091 DOI: 10.1136/jitc-2023-007073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/21/2023] [Indexed: 10/31/2023] Open
Abstract
Identification of tumor antigens presented by the human leucocyte antigen (HLA) molecules is essential for the design of effective and safe cancer immunotherapies that rely on T cell recognition and killing of tumor cells. Mass spectrometry (MS)-based immunopeptidomics enables high-throughput, direct identification of HLA-bound peptides from a variety of cell lines, tumor tissues, and healthy tissues. It involves immunoaffinity purification of HLA complexes followed by MS profiling of the extracted peptides using data-dependent acquisition, data-independent acquisition, or targeted approaches. By incorporating DNA, RNA, and ribosome sequencing data into immunopeptidomics data analysis, the proteogenomic approach provides a powerful means for identifying tumor antigens encoded within the canonical open reading frames of annotated coding genes and non-canonical tumor antigens derived from presumably non-coding regions of our genome. We discuss emerging computational challenges in immunopeptidomics data analysis and tumor antigen identification, highlighting key considerations in the proteogenomics-based approach, including accurate DNA, RNA and ribosomal sequencing data analysis, careful incorporation of predicted novel protein sequences into reference protein database, special quality control in MS data analysis due to the expanded and heterogeneous search space, cancer-specificity determination, and immunogenicity prediction. The advancements in technology and computation is continually enabling us to identify tumor antigens with higher sensitivity and accuracy, paving the way toward the development of more effective cancer immunotherapies.
Collapse
|
21
|
A Comparative Analysis of Data Analysis Tools for Data-Independent Acquisition Mass Spectrometry. Mol Cell Proteomics 2023; 22:100623. [PMID: 37481071 PMCID: PMC10458344 DOI: 10.1016/j.mcpro.2023.100623] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 06/12/2023] [Accepted: 07/18/2023] [Indexed: 07/24/2023] Open
Abstract
Data-independent acquisition (DIA) mass spectrometry-based proteomics generates reproducible proteome data. The complex processing of the DIA data has led to the development of multiple data analysis tools. In this study, we assessed the performance of five tools (OpenSWATH, EncyclopeDIA, Skyline, DIA-NN, and Spectronaut) using six DIA datasets obtained from TripleTOF, Orbitrap, and TimsTOF Pro instruments. By comparing identification and quantification metrics and examining shared and unique cross-tool identifications, we evaluated both library-based and library-free approaches. Our findings indicate that library-free approaches outperformed library-based methods when the spectral library had limited comprehensiveness. However, our results also suggest that constructing a comprehensive library still offers benefits for most DIA analyses. This study provides comprehensive guidance for DIA data analysis tools, benefiting both experienced and novice users of DIA-mass spectrometry technology.
Collapse
|
22
|
Protein Coronas on Functionalized Nanoparticles Enable Quantitative and Precise Large-Scale Deep Plasma Proteomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.28.555225. [PMID: 37693476 PMCID: PMC10491250 DOI: 10.1101/2023.08.28.555225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Background The wide dynamic range of circulating proteins coupled with the diversity of proteoforms present in plasma has historically impeded comprehensive and quantitative characterization of the plasma proteome at scale. Automated nanoparticle (NP) protein corona-based proteomics workflows can efficiently compress the dynamic range of protein abundances into a mass spectrometry (MS)-accessible detection range. This enhances the depth and scalability of quantitative MS-based methods, which can elucidate the molecular mechanisms of biological processes, discover new protein biomarkers, and improve comprehensiveness of MS-based diagnostics. Methods Investigating multi-species spike-in experiments and a cohort, we investigated fold-change accuracy, linearity, precision, and statistical power for the using the Proteograph™ Product Suite, a deep plasma proteomics workflow, in conjunction with multiple MS instruments. Results We show that NP-based workflows enable accurate identification (false discovery rate of 1%) of more than 6,000 proteins from plasma (Orbitrap Astral) and, compared to a gold standard neat plasma workflow that is limited to the detection of hundreds of plasma proteins, facilitate quantification of more proteins with accurate fold-changes, high linearity, and precision. Furthermore, we demonstrate high statistical power for the discovery of biomarkers in small- and large-scale cohorts. Conclusions The automated NP workflow enables high-throughput, deep, and quantitative plasma proteomics investigation with sufficient power to discover new biomarker signatures with a peptide level resolution.
Collapse
|
23
|
nf-encyclopedia: A Cloud-Ready Pipeline for Chromatogram Library Data-Independent Acquisition Proteomics Workflows. J Proteome Res 2023; 22:2743-2749. [PMID: 37417926 DOI: 10.1021/acs.jproteome.2c00613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023]
Abstract
Data-independent acquisition (DIA) mass spectrometry methods provide systematic and comprehensive quantification of the proteome; yet, relatively few open-source tools are available to analyze DIA proteomics experiments. Fewer still are tools that can leverage gas phase fractionated (GPF) chromatogram libraries to enhance the detection and quantification of peptides in these experiments. Here, we present nf-encyclopedia, an open-source NextFlow pipeline that connects three open-source tools, MSConvert, EncyclopeDIA, and MSstats, to analyze DIA proteomics experiments with or without chromatogram libraries. We demonstrate that nf-encyclopedia is reproducible when run on either a cloud platform or a local workstation and provides robust peptide and protein quantification. Additionally, we found that MSstats enhances protein-level quantitative performance over EncyclopeDIA alone. Finally, we benchmarked the ability of nf-encyclopedia to scale to large experiments in the cloud by leveraging the parallelization of compute resources. The nf-encyclopedia pipeline is available under a permissive Apache 2.0 license; run it on your desktop, cluster, or in the cloud: https://github.com/TalusBio/nf-encyclopedia.
Collapse
|
24
|
MsImpute: Estimation of Missing Peptide Intensity Data in Label-Free Quantitative Mass Spectrometry. Mol Cell Proteomics 2023; 22:100558. [PMID: 37105364 PMCID: PMC10368900 DOI: 10.1016/j.mcpro.2023.100558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 04/18/2023] [Accepted: 04/21/2023] [Indexed: 04/29/2023] Open
Abstract
Mass spectrometry (MS) enables high-throughput identification and quantification of proteins in complex biological samples and can provide insights into the global function of biological systems. Label-free quantification is cost-effective and suitable for the analysis of human samples. Despite rapid developments in label-free data acquisition workflows, the number of proteins quantified across samples can be limited by technical and biological variability. This variation can result in missing values which can in turn challenge downstream data analysis tasks. General purpose or gene expression-specific imputation algorithms are widely used to improve data completeness. Here, we propose an imputation algorithm designated for label-free MS data that is aware of the type of missingness affecting data. On published datasets acquired by data-dependent and data-independent acquisition workflows with variable degrees of biological complexity, we demonstrate that the proposed missing value estimation procedure by barycenter computation competes closely with the state-of-the-art imputation algorithms in differential abundance tasks while outperforming them in the accuracy of variance estimates of the peptide abundance measurements, and better controls the false discovery rate in label-free MS experiments. The barycenter estimation procedure is implemented in the msImpute software package and is available from the Bioconductor repository.
Collapse
|
25
|
Proteome Landscapes of Human Hepatocellular Carcinoma and Intrahepatic Cholangiocarcinoma. Mol Cell Proteomics 2023; 22:100604. [PMID: 37353004 PMCID: PMC10413158 DOI: 10.1016/j.mcpro.2023.100604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 04/12/2023] [Accepted: 06/20/2023] [Indexed: 06/25/2023] Open
Abstract
Liver cancer is among the top leading causes of cancer mortality worldwide. Particularly, hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (CCA) have been extensively investigated from the aspect of tumor biology. However, a comprehensive and systematic understanding of the molecular characteristics of HCC and CCA remains absent. Here, we characterized the proteome landscapes of HCC and CCA using the data-independent acquisition (DIA) mass spectrometry (MS) method. By comparing the quantitative proteomes of HCC and CCA, we found several differences between the two cancer types. In particular, we found an abnormal lipid metabolism in HCC and activated extracellular matrix-related pathways in CCA. We next developed a three-protein classifier to distinguish CCA from HCC, achieving an area under the curve (AUC) of 0.92, and an accuracy of 90% in an independent validation cohort of 51 patients. The distinct molecular characteristics of HCC and CCA presented in this study provide new insights into the tumor biology of these two major important primary liver cancers. Our findings may help develop more efficient diagnostic approaches and new targeted drug treatments.
Collapse
|
26
|
Analysis of DIA proteomics data using MSFragger-DIA and FragPipe computational platform. Nat Commun 2023; 14:4154. [PMID: 37438352 PMCID: PMC10338508 DOI: 10.1038/s41467-023-39869-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 06/28/2023] [Indexed: 07/14/2023] Open
Abstract
Liquid chromatography (LC) coupled with data-independent acquisition (DIA) mass spectrometry (MS) has been increasingly used in quantitative proteomics studies. Here, we present a fast and sensitive approach for direct peptide identification from DIA data, MSFragger-DIA, which leverages the unmatched speed of the fragment ion indexing-based search engine MSFragger. Different from most existing methods, MSFragger-DIA conducts a database search of the DIA tandem mass (MS/MS) spectra prior to spectral feature detection and peak tracing across the LC dimension. To streamline the analysis of DIA data and enable easy reproducibility, we integrate MSFragger-DIA into the FragPipe computational platform for seamless support of peptide identification and spectral library building from DIA, data-dependent acquisition (DDA), or both data types combined. We compare MSFragger-DIA with other DIA tools, such as DIA-Umpire based workflow in FragPipe, Spectronaut, DIA-NN library-free, and MaxDIA. We demonstrate the fast, sensitive, and accurate performance of MSFragger-DIA across a variety of sample types and data acquisition schemes, including single-cell proteomics, phosphoproteomics, and large-scale tumor proteome profiling studies.
Collapse
|
27
|
2019 Association of Biomolecular Resource Facilities Multi-Laboratory Data-Independent Acquisition Proteomics Study. J Biomol Tech 2023; 34:3fc1f5fe.9b78d780. [PMID: 37435391 PMCID: PMC10332336 DOI: 10.7171/3fc1f5fe.9b78d780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Despite the advantages of fewer missing values by collecting fragment ion data on all analytes in the sample as well as the potential for deeper coverage, the adoption of data-independent acquisition (DIA) in proteomics core facility settings has been slow. The Association of Biomolecular Resource Facilities conducted a large interlaboratory study to evaluate DIA performance in proteomics laboratories with various instrumentation. Participants were supplied with generic methods and a uniform set of test samples. The resulting 49 DIA datasets act as benchmarks and have utility in education and tool development. The sample set consisted of a tryptic HeLa digest spiked with high or low levels of 4 exogenous proteins. Data are available in MassIVE MSV000086479. Additionally, we demonstrate how the data can be analyzed by focusing on 2 datasets using different library approaches and show the utility of select summary statistics. These data can be used by DIA newcomers, software developers, or DIA experts evaluating performance with different platforms, acquisition settings, and skill levels.
Collapse
|
28
|
Dear-DIA XMBD: Deep Autoencoder Enables Deconvolution of Data-Independent Acquisition Proteomics. RESEARCH (WASHINGTON, D.C.) 2023; 6:0179. [PMID: 37377457 PMCID: PMC10292580 DOI: 10.34133/research.0179] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 06/01/2023] [Indexed: 06/29/2023]
Abstract
Data-independent acquisition (DIA) technology for protein identification from mass spectrometry and related algorithms is developing rapidly. The spectrum-centric analysis of DIA data without the use of spectra library from data-dependent acquisition data represents a promising direction. In this paper, we proposed an untargeted analysis method, Dear-DIAXMBD, for direct analysis of DIA data. Dear-DIAXMBD first integrates the deep variational autoencoder and triplet loss to learn the representations of the extracted fragment ion chromatograms, then uses the k-means clustering algorithm to aggregate fragments with similar representations into the same classes, and finally establishes the inverted index tables to determine the precursors of fragment clusters between precursors and peptides and between fragments and peptides. We show that Dear-DIAXMBD performs superiorly with the highly complicated DIA data of different species obtained by different instrument platforms. Dear-DIAXMBD is publicly available at https://github.com/jianweishuai/Dear-DIA-XMBD.
Collapse
|
29
|
Generalized precursor prediction boosts identification rates and accuracy in mass spectrometry based proteomics. Commun Biol 2023; 6:628. [PMID: 37301900 PMCID: PMC10257694 DOI: 10.1038/s42003-023-04977-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/24/2023] [Indexed: 06/12/2023] Open
Abstract
Data independent acquisition mass spectrometry (DIA-MS) has recently emerged as an important method for the identification of blood-based biomarkers. However, the large search space required to identify novel biomarkers from the plasma proteome can introduce a high rate of false positives that compromise the accuracy of false discovery rates (FDR) using existing validation methods. We developed a generalized precursor scoring (GPS) method trained on 2.75 million precursors that can confidently control FDR while increasing the number of identified proteins in DIA-MS independent of the search space. We demonstrate how GPS can generalize to new data, increase protein identification rates, and increase the overall quantitative accuracy. Finally, we apply GPS to the identification of blood-based biomarkers and identify a panel of proteins that are highly accurate in discriminating between subphenotypes of septic acute kidney injury from undepleted plasma to showcase the utility of GPS in discovery DIA-MS proteomics.
Collapse
|
30
|
Mix24X, a Lab-Assembled Reference to Evaluate Interpretation Procedures for Tandem Mass Spectrometry Proteotyping of Complex Samples. Int J Mol Sci 2023; 24:ijms24108634. [PMID: 37239979 DOI: 10.3390/ijms24108634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 05/09/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
Correct identification of the microorganisms present in a complex sample is a crucial issue. Proteotyping based on tandem mass spectrometry can help establish an inventory of organisms present in a sample. Evaluation of bioinformatics strategies and tools for mining the recorded datasets is essential to establish confidence in the results obtained and to improve these pipelines in terms of sensitivity and accuracy. Here, we propose several tandem mass spectrometry datasets recorded on an artificial reference consortium comprising 24 bacterial species. This assemblage of environmental and pathogenic bacteria covers 20 different genera and 5 bacterial phyla. The dataset comprises difficult cases, such as the Shigella flexneri species, which is closely related to Escherichia coli, and several highly sequenced clades. Different acquisition strategies simulate real-life scenarios: from rapid survey sampling to exhaustive analysis. We provide access to individual proteomes of each bacterium separately to provide a rational basis for evaluating the assignment strategy of MS/MS spectra when recorded from complex mixtures. This resource should provide an interesting common reference for developers who wish to compare their proteotyping tools and for those interested in evaluating protein assignment when dealing with complex samples, such as microbiomes.
Collapse
|
31
|
MSstats Version 4.0: Statistical Analyses of Quantitative Mass Spectrometry-Based Proteomic Experiments with Chromatography-Based Quantification at Scale. J Proteome Res 2023; 22:1466-1482. [PMID: 37018319 PMCID: PMC10629259 DOI: 10.1021/acs.jproteome.2c00834] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Indexed: 04/06/2023]
Abstract
The MSstats R-Bioconductor family of packages is widely used for statistical analyses of quantitative bottom-up mass spectrometry-based proteomic experiments to detect differentially abundant proteins. It is applicable to a variety of experimental designs and data acquisition strategies and is compatible with many data processing tools used to identify and quantify spectral features. In the face of ever-increasing complexities of experiments and data processing strategies, the core package of the family, with the same name MSstats, has undergone a series of substantial updates. Its new version MSstats v4.0 improves the usability, versatility, and accuracy of statistical methodology, and the usage of computational resources. New converters integrate the output of upstream processing tools directly with MSstats, requiring less manual work by the user. The package's statistical models have been updated to a more robust workflow. Finally, MSstats' code has been substantially refactored to improve memory use and computation speed. Here we detail these updates, highlighting methodological differences between the new and old versions. An empirical comparison of MSstats v4.0 to its previous implementations, as well as to the packages MSqRob and DEqMS, on controlled mixtures and biological experiments demonstrated a stronger performance and better usability of MSstats v4.0 as compared to existing methods.
Collapse
|
32
|
Data-independent acquisition and quantification of extracellular matrix from human lung in chronic inflammation-associated carcinomas. Proteomics 2023; 23:e2200021. [PMID: 36228107 PMCID: PMC10391693 DOI: 10.1002/pmic.202200021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/17/2022] [Accepted: 09/21/2022] [Indexed: 11/06/2022]
Abstract
Early events associated with chronic inflammation and cancer involve significant remodeling of the extracellular matrix (ECM), which greatly affects its composition and functional properties. Using lung squamous cell carcinoma (LSCC), a chronic inflammation-associated cancer (CIAC), we optimized a robust proteomic pipeline to discover potential biomarker signatures and protein changes specifically in the stroma. We combined ECM enrichment from fresh human tissues, data-independent acquisition (DIA) strategies, and stringent statistical processing to analyze "Tumor" and matched adjacent histologically normal ("Matched Normal") tissues from patients with LSCC. Overall, 1802 protein groups were quantified with at least two unique peptides, and 56% of those proteins were annotated as "extracellular." Confirming dramatic ECM remodeling during CIAC progression, 529 proteins were significantly altered in the "Tumor" compared to "Matched Normal" tissues. The signature was typified by a coordinated loss of basement membrane proteins and small leucine-rich proteins. The dramatic increase in the stromal levels of SERPINH1/heat shock protein 47, that was discovered using our ECM proteomic pipeline, was validated by immunohistochemistry (IHC) of "Tumor" and "Matched Normal" tissues, obtained from an independent cohort of LSCC patients. This integrated workflow provided novel insights into ECM remodeling during CIAC progression, and identified potential biomarker signatures and future therapeutic targets.
Collapse
|
33
|
Initial recommendations for performing, benchmarking and reporting single-cell proteomics experiments. Nat Methods 2023; 20:375-386. [PMID: 36864200 PMCID: PMC10130941 DOI: 10.1038/s41592-023-01785-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 01/24/2023] [Indexed: 03/04/2023]
Abstract
Analyzing proteins from single cells by tandem mass spectrometry (MS) has recently become technically feasible. While such analysis has the potential to accurately quantify thousands of proteins across thousands of single cells, the accuracy and reproducibility of the results may be undermined by numerous factors affecting experimental design, sample preparation, data acquisition and data analysis. We expect that broadly accepted community guidelines and standardized metrics will enhance rigor, data quality and alignment between laboratories. Here we propose best practices, quality controls and data-reporting recommendations to assist in the broad adoption of reliable quantitative workflows for single-cell proteomics. Resources and discussion forums are available at https://single-cell.net/guidelines .
Collapse
|
34
|
Substantial Downregulation of Mitochondrial and Peroxisomal Proteins during Acute Kidney Injury revealed by Data-Independent Acquisition Proteomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.26.530107. [PMID: 36865241 PMCID: PMC9980295 DOI: 10.1101/2023.02.26.530107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
Acute kidney injury (AKI) manifests as a major health concern, particularly for the elderly. Understanding AKI-related proteome changes is critical for prevention and development of novel therapeutics to recover kidney function and to mitigate the susceptibility for recurrent AKI or development of chronic kidney disease. In this study, mouse kidneys were subjected to ischemia-reperfusion injury, and the contralateral kidneys remained uninjured to enable comparison and assess injury-induced changes in the kidney proteome. A fast-acquisition rate ZenoTOF 7600 mass spectrometer was introduced for data-independent acquisition (DIA) for comprehensive protein identification and quantification. Short microflow gradients and the generation of a deep kidney-specific spectral library allowed for high-throughput, comprehensive protein quantification. Upon AKI, the kidney proteome was completely remodeled, and over half of the 3,945 quantified protein groups changed significantly. Downregulated proteins in the injured kidney were involved in energy production, including numerous peroxisomal matrix proteins that function in fatty acid oxidation, such as ACOX1, CAT, EHHADH, ACOT4, ACOT8, and Scp2. Injured mice exhibited severely declined health. The comprehensive and sensitive kidney-specific DIA assays highlighted here feature high-throughput analytical capabilities to achieve deep coverage of the kidney proteome and will serve as useful tools for developing novel therapeutics to remediate kidney function.
Collapse
|
35
|
Challenges and opportunities for single cell computational proteomics. Mol Cell Proteomics 2023; 22:100518. [PMID: 36828128 PMCID: PMC10060113 DOI: 10.1016/j.mcpro.2023.100518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/15/2023] [Accepted: 02/17/2023] [Indexed: 02/25/2023] Open
Abstract
Single-cell proteomics is growing rapidly and has made several technological advancements. As most research has been focused on improving instrumentation and sample preparation methods, very little attention has been given to algorithms responsible for identifying and quantifying proteins. Given the inherent difference between bulk data and single-cell data, it's necessary to realize that current algorithms being employed on single-cell data were designed for bulk data, and have underlying assumptions that may not hold true for single-cell data. In order to develop and optimize algorithms for single-cell data, we need to characterize the differences between single-cell data and bulk data, and assess how current algorithms perform on single-cell data. Here, we present a review of algorithms responsible for identifying and quantifying peptides and proteins. We will give a review of how each type of algorithm works, assumptions it relies on, how it performs on single-cell data, and possible optimizations and solutions that could be used to address the differences in single-cell data.
Collapse
|
36
|
Benchmarking bioinformatics pipelines in data-independent acquisition mass spectrometry for immunopeptidomics. Mol Cell Proteomics 2023; 22:100515. [PMID: 36796644 PMCID: PMC10060114 DOI: 10.1016/j.mcpro.2023.100515] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 01/26/2023] [Accepted: 02/06/2023] [Indexed: 02/16/2023] Open
Abstract
Immunopeptidomes are the peptide repertoires bound by the molecules encoded by the major histocompatibility complex (MHC) (human leukocyte antigen (HLA) in humans). These HLA-peptide complexes are presented on the cell surface for immune T-cell recognition. Immunopeptidomics denotes the utilization of tandem mass spectrometry (MS/MS) to identify and quantify peptides bound to HLA molecules. Data-independent acquisition (DIA) has emerged as a powerful strategy for quantitative proteomics and deep proteome-wide identification; however, DIA application to immunopeptidomics analyses has so far seen limited use. Further, of the many DIA data processing tools currently available, there is no consensus in the immunopeptidomics community on the most appropriate pipeline(s) for in-depth and accurate HLA peptide identification. Herein, we benchmarked four commonly used spectral library-based DIA pipelines developed for proteomics applications (Skyline, Spectronaut, DIA-NN, and PEAKS) for their ability to perform immunopeptidome quantification. We validated and assessed the capability of each tool to identify and quantify HLA-bound peptides. Generally, DIA-NN and PEAKS provided higher immunopeptidome coverage with more reproducible results. Skyline and Spectronaut conferred more accurate peptide identification with lower experimental false-positive rates. All tools demonstrated reasonable correlations in quantifying precursors of HLA-bound peptides. Our benchmarking study suggests a combined strategy of applying at least two complementary DIA software tools to achieve the greatest degree of confidence and in-depth coverage of immunopeptidome data.
Collapse
|
37
|
A Novel Blood Proteomic Signature for Prostate Cancer. Cancers (Basel) 2023; 15:cancers15041051. [PMID: 36831393 PMCID: PMC9954127 DOI: 10.3390/cancers15041051] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 02/02/2023] [Accepted: 02/03/2023] [Indexed: 02/11/2023] Open
Abstract
Prostate cancer is the most common malignant tumour in men. Improved testing for diagnosis, risk prediction, and response to treatment would improve care. Here, we identified a proteomic signature of prostate cancer in peripheral blood using data-independent acquisition mass spectrometry combined with machine learning. A highly predictive signature was derived, which was associated with relevant pathways, including the coagulation, complement, and clotting cascades, as well as plasma lipoprotein particle remodeling. We further validated the identified biomarkers against a second cohort, identifying a panel of five key markers (GP5, SERPINA5, ECM1, IGHG1, and THBS1) which retained most of the diagnostic power of the overall dataset, achieving an AUC of 0.91. Taken together, this study provides a proteomic signature complementary to PSA for the diagnosis of patients with localised prostate cancer, with the further potential for assessing risk of future development of prostate cancer. Data are available via ProteomeXchange with identifier PXD025484.
Collapse
|
38
|
Meiotic nuclear pore complex remodeling provides key insights into nuclear basket organization. J Cell Biol 2023; 222:e202204039. [PMID: 36515990 PMCID: PMC9754704 DOI: 10.1083/jcb.202204039] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 09/12/2022] [Accepted: 11/05/2022] [Indexed: 12/15/2022] Open
Abstract
Nuclear pore complexes (NPCs) are large proteinaceous assemblies that mediate nuclear compartmentalization. NPCs undergo large-scale structural rearrangements during mitosis in metazoans and some fungi. However, our understanding of NPC remodeling beyond mitosis remains limited. Using time-lapse fluorescence microscopy, we discovered that NPCs undergo two mechanistically separable remodeling events during budding yeast meiosis in which parts or all of the nuclear basket transiently dissociate from the NPC core during meiosis I and II, respectively. Meiosis I detachment, observed for Nup60 and Nup2, is driven by Polo kinase-mediated phosphorylation of Nup60 at its interface with the Y-complex. Subsequent reattachment of Nup60-Nup2 to the NPC core is facilitated by a lipid-binding amphipathic helix in Nup60. Preventing Nup60-Nup2 reattachment causes misorganization of the entire nuclear basket in gametes. Strikingly, meiotic nuclear basket remodeling also occurs in the distantly related fission yeast, Schizosaccharomyces pombe. Our study reveals a conserved and developmentally programmed aspect of NPC plasticity, providing key mechanistic insights into the nuclear basket organization.
Collapse
|
39
|
In-depth analysis of the Sirtuin 5-regulated mouse brain malonylome and succinylome using library-free data-independent acquisitions. Proteomics 2023; 23:e2100371. [PMID: 36479818 PMCID: PMC10363399 DOI: 10.1002/pmic.202100371] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 10/29/2022] [Accepted: 11/14/2022] [Indexed: 12/12/2022]
Abstract
Post-translational modifications (PTMs) dynamically regulate proteins and biological pathways, typically through the combined effects of multiple PTMs. Lysine residues are targeted for various PTMs, including malonylation and succinylation. However, PTMs offer specific challenges to mass spectrometry-based proteomics during data acquisition and processing. Thus, novel and innovative workflows using data-independent acquisition (DIA) ensure confident PTM identification, precise site localization, and accurate and robust label-free quantification. In this study, we present a powerful approach that combines antibody-based enrichment with comprehensive DIA acquisitions and spectral library-free data processing using directDIA (Spectronaut). Identical DIA data can be used to generate spectral libraries and comprehensively identify and quantify PTMs, reducing the amount of enriched sample and acquisition time needed, while offering a fully automated workflow. We analyzed brains from wild-type and Sirtuin 5 (SIRT5)-knock-out mice, and discovered and quantified 466 malonylated and 2211 succinylated peptides. SIRT5 regulation remodeled the acylomes by targeting 164 malonylated and 578 succinylated sites. Affected pathways included carbohydrate and lipid metabolisms, synaptic vesicle cycle, and neurodegenerative diseases. We found 48 common SIRT5-regulated malonylation and succinylation sites, suggesting potential PTM crosstalk. This innovative and efficient workflow offers deeper insights into the mouse brain lysine malonylome and succinylome.
Collapse
|
40
|
Data-independent acquisition boosts quantitative metaproteomics for deep characterization of gut microbiota. NPJ Biofilms Microbiomes 2023; 9:4. [PMID: 36693863 PMCID: PMC9873935 DOI: 10.1038/s41522-023-00373-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/11/2023] [Indexed: 01/26/2023] Open
Abstract
Metaproteomics can provide valuable insights into the functions of human gut microbiota (GM), but is challenging due to the extreme complexity and heterogeneity of GM. Data-independent acquisition (DIA) mass spectrometry (MS) has been an emerging quantitative technique in conventional proteomics, but is still at the early stage of development in the field of metaproteomics. Herein, we applied library-free DIA (directDIA)-based metaproteomics and compared the directDIA with other MS-based quantification techniques for metaproteomics on simulated microbial communities and feces samples spiked with bacteria with known ratios, demonstrating the superior performance of directDIA by a comprehensive consideration of proteome coverage in identification as well as accuracy and precision in quantification. We characterized human GM in two cohorts of clinical fecal samples of pancreatic cancer (PC) and mild cognitive impairment (MCI). About 70,000 microbial proteins were quantified in each cohort and annotated to profile the taxonomic and functional characteristics of GM in different diseases. Our work demonstrated the utility of directDIA in quantitative metaproteomics for investigating intestinal microbiota and its related disease pathogenesis.
Collapse
|
41
|
Label-free proteome quantification and evaluation. Brief Bioinform 2023; 24:6833644. [PMID: 36403090 DOI: 10.1093/bib/bbac477] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 09/24/2022] [Accepted: 10/08/2022] [Indexed: 11/21/2022] Open
Abstract
The label-free quantification (LFQ) has emerged as an exceptional technique in proteomics owing to its broad proteome coverage, great dynamic ranges and enhanced analytical reproducibility. Due to the extreme difficulty lying in an in-depth quantification, the LFQ chains incorporating a variety of transformation, pretreatment and imputation methods are required and constructed. However, it remains challenging to determine the well-performing chain, owing to its strong dependence on the studied data and the diverse possibility of integrated chains. In this study, an R package EVALFQ was therefore constructed to enable a performance evaluation on >3000 LFQ chains. This package is unique in (a) automatically evaluating the performance using multiple criteria, (b) exploring the quantification accuracy based on spiking proteins and (c) discovering the well-performing chains by comprehensive assessment. All in all, because of its superiority in assessing from multiple perspectives and scanning among over 3000 chains, this package is expected to attract broad interests from the fields of proteomic quantification. The package is available at https://github.com/idrblab/EVALFQ.
Collapse
|
42
|
Profiling mouse brown and white adipocytes to identify metabolically relevant small ORFs and functional microproteins. Cell Metab 2023; 35:166-183.e11. [PMID: 36599300 PMCID: PMC9889109 DOI: 10.1016/j.cmet.2022.12.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 09/19/2022] [Accepted: 12/06/2022] [Indexed: 01/05/2023]
Abstract
Microproteins (MPs) are a potentially rich source of uncharacterized metabolic regulators. Here, we use ribosome profiling (Ribo-seq) to curate 3,877 unannotated MP-encoding small ORFs (smORFs) in primary brown, white, and beige mouse adipocytes. Of these, we validated 85 MPs by proteomics, including 33 circulating MPs in mouse plasma. Analyses of MP-encoding mRNAs under different physiological conditions (high-fat diet) revealed that numerous MPs are regulated in adipose tissue in vivo and are co-expressed with established metabolic genes. Furthermore, Ribo-seq provided evidence for the translation of Gm8773, which encodes a secreted MP that is homologous to human and chicken FAM237B. Gm8773 is highly expressed in the arcuate nucleus of the hypothalamus, and intracerebroventricular administration of recombinant mFAM237B showed orexigenic activity in obese mice. Together, these data highlight the value of this adipocyte MP database in identifying MPs with roles in fundamental metabolic and physiological processes such as feeding.
Collapse
|
43
|
A Robust and Clinically Applicable Sample Preparation Protocol for Urinary Extracellular Vesicle Isolation Suitable for Mass Spectrometry-Based Proteomics. Methods Mol Biol 2023; 2718:235-251. [PMID: 37665463 DOI: 10.1007/978-1-0716-3457-8_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Urinary extracellular vesicles (uEVs) are a rich source of noninvasive protein biomarkers. However, for translation to clinical applications, an easy-to-use uEV isolation protocol is needed that is compatible with proteomics. Here, we provide a detailed description of a quick and clinical applicable uEV isolation protocol. We focus on the isolation procedure and subsequent in-depth proteome characterization using LC-MS/MS-based proteomics. As an example, we show how differential analyses can be performed using urine samples obtained from prostate cancer patients, compared to urine from controls.
Collapse
|
44
|
Increasing the throughput of sensitive proteomics by plexDIA. Nat Biotechnol 2023; 41:50-59. [PMID: 35835881 PMCID: PMC9839897 DOI: 10.1038/s41587-022-01389-w] [Citation(s) in RCA: 61] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 06/13/2022] [Indexed: 01/22/2023]
Abstract
Current mass spectrometry methods enable high-throughput proteomics of large sample amounts, but proteomics of low sample amounts remains limited in depth and throughput. To increase the throughput of sensitive proteomics, we developed an experimental and computational framework, called plexDIA, for simultaneously multiplexing the analysis of peptides and samples. Multiplexed analysis with plexDIA increases throughput multiplicatively with the number of labels without reducing proteome coverage or quantitative accuracy. By using three-plex non-isobaric mass tags, plexDIA enables quantification of threefold more protein ratios among nanogram-level samples. Using 1-hour active gradients, plexDIA quantified ~8,000 proteins in each sample of labeled three-plex sets and increased data completeness, reducing missing data more than twofold across samples. Applied to single human cells, plexDIA quantified ~1,000 proteins per cell and achieved 98% data completeness within a plexDIA set while using ~5 minutes of active chromatography per cell. These results establish a general framework for increasing the throughput of sensitive and quantitative protein analysis.
Collapse
|
45
|
Using SILAC to Develop Quantitative Data-Independent Acquisition (DIA) Proteomic Methods. Methods Mol Biol 2023; 2603:245-257. [PMID: 36370285 DOI: 10.1007/978-1-0716-2863-8_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Proteins are integral to biological systems and functions. Identifying and quantifying proteins can therefore offer systems-wide insights into protein-protein interactions, cellular signaling, and biological pathway activity. The use of quantitative proteomics has become a method of choice for identifying and quantifying proteins in complex matrices. Proteomics allows researchers to survey hundreds to thousands of proteins in a less biased manner than classical antibody-based protein capture strategies. Typically, discovery approaches have used data-dependent acquisition (DDA) methods, but this approach suffers from stochasticity that compromises quantitation. Recent developments in data-independent acquisition (DIA) proteomics workflows enable proteomic profiling of thousands of samples with increased peak picking consistency making it an excellent candidate for discovering and assessing biomarkers in clinical samples. However, quantitation of peptides from DIA datasets is computationally intensive, and guidelines on how to establish DIA methods are lacking. Method development and optimization require novel tools to visualize and filter DIA datasets appropriately. Here, a protocol and novel script workflow for the optimization of quantitative DIA methods using stable isotope labeling of amino acids in culture (SILAC) are presented. This protocol includes steps for cell growth and labeling, peptide digestion and preparation, and optimization of quantitative DIA methods. In addition, important steps for (1) computational analysis to identify and quantify peptides, (2) data visualizations to identify the linear abundance ranges for all peptides in the sample, and (3) descriptions of how to find high confidence quantitation abundance thresholds are described herein.
Collapse
|
46
|
MS-DAP Platform for Downstream Data Analysis of Label-Free Proteomics Uncovers Optimal Workflows in Benchmark Data Sets and Increased Sensitivity in Analysis of Alzheimer's Biomarker Data. J Proteome Res 2022; 22:374-386. [PMID: 36541440 PMCID: PMC9903323 DOI: 10.1021/acs.jproteome.2c00513] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
In the rapidly moving proteomics field, a diverse patchwork of data analysis pipelines and algorithms for data normalization and differential expression analysis is used by the community. We generated a mass spectrometry downstream analysis pipeline (MS-DAP) that integrates both popular and recently developed algorithms for normalization and statistical analyses. Additional algorithms can be easily added in the future as plugins. MS-DAP is open-source and facilitates transparent and reproducible proteome science by generating extensive data visualizations and quality reporting, provided as standardized PDF reports. Second, we performed a systematic evaluation of methods for normalization and statistical analysis on a large variety of data sets, including additional data generated in this study, which revealed key differences. Commonly used approaches for differential testing based on moderated t-statistics were consistently outperformed by more recent statistical models, all integrated in MS-DAP. Third, we introduced a novel normalization algorithm that rescues deficiencies observed in commonly used normalization methods. Finally, we used the MS-DAP platform to reanalyze a recently published large-scale proteomics data set of CSF from AD patients. This revealed increased sensitivity, resulting in additional significant target proteins which improved overlap with results reported in related studies and includes a large set of new potential AD biomarkers in addition to previously reported.
Collapse
|
47
|
Proteomic overview of hepatocellular carcinoma cell lines and generation of the spectral library. Sci Data 2022; 9:732. [PMID: 36446815 PMCID: PMC9708666 DOI: 10.1038/s41597-022-01845-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 11/14/2022] [Indexed: 12/02/2022] Open
Abstract
Cell lines are extensively used tools, therefore a comprehensive proteomic overview of hepatocellular carcinoma (HCC) cell lines and an extensive spectral library for data independent acquisition (DIA) quantification are necessary. Here, we present the proteome of nine commonly used HCC cell lines covering 9,208 protein groups, and the HCC spectral library containing 253,921 precursors, 168,811 peptides and 10,098 protein groups. The proteomic overview reveals the heterogeneity between different cell lines, and the similarity in proliferation and metastasis characteristics and drug targets-expression with tumour tissues. The HCC spectral library generating consumed 108 hours' runtime for data dependent acquisition (DDA) of 48 runs, 24 hours' runtime for database searching by MaxQuant version 2.0.3.0, and 1 hour' runtime for processing by SpectronautTM version 15.2. The HCC spectral library supports quantification of 7,637 protein groups of triples 2-hour DIA analysis of HepG2 and discovering biological alteration. This study provides valuable resources for HCC cell lines and efficient DIA quantification on LC-Orbitrap platform, further help to explore the molecular mechanism and candidate therapeutic targets.
Collapse
|
48
|
Proteome alterations during clonal isolation of established human pancreatic cancer cell lines. Cell Mol Life Sci 2022; 79:561. [PMID: 36271971 PMCID: PMC9587952 DOI: 10.1007/s00018-022-04584-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 09/29/2022] [Accepted: 10/01/2022] [Indexed: 11/25/2022]
Abstract
Clonal isolation is an integral step of numerous workflows in genome editing and cell engineering. It comprises the isolation of a single progenitor cell from a defined cell line population with subsequent expansion to obtain a monoclonal cell population. This process is associated with transient loss of cell–cell contacts and absence of a multicellular microenvironment. Previous studies have revealed transcriptomic changes upon clonal isolation with cell line specific extent. Since transcriptome alterations are only partially reflected on the proteome level, we sought to investigate the impact of clonal isolation on the cellular proteome to a depth of > 6000 proteins in three established pancreatic cancer cell lines. We show that clonal isolation does have an impact on the cellular proteome, however, with cell line specific extent, affecting different biological processes, and also depending on the isolation method. We demonstrate a different impact of clonal isolation on mesenchymal- and epithelial-derived cell lines mainly affecting cell proliferation, metabolism, cell adhesion and cellular stress. The results bear relevance to the field of genomic editing and cell engineering and highlight the need to consider the impact of clonal isolation when interpreting data stemming from experiments that include this step.
Collapse
|
49
|
Proteotype coevolution and quantitative diversity across 11 mammalian species. SCIENCE ADVANCES 2022; 8:eabn0756. [PMID: 36083897 PMCID: PMC9462687 DOI: 10.1126/sciadv.abn0756] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 07/25/2022] [Indexed: 06/15/2023]
Abstract
Evolutionary profiling has been largely limited to the nucleotide level. Using consistent proteomic methods, we quantified proteomic and phosphoproteomic layers in fibroblasts from 11 common mammalian species, with transcriptomes as reference. Covariation analysis indicates that transcript and protein expression levels and variabilities across mammals remarkably follow functional role, with extracellular matrix-associated expression being the most variable, demonstrating strong transcriptome-proteome coevolution. The biological variability of gene expression is universal at both interindividual and interspecies scales but to a different extent. RNA metabolic processes particularly show higher interspecies versus interindividual variation. Our results further indicate that while the ubiquitin-proteasome system is strongly conserved in mammals, lysosome-mediated protein degradation exhibits remarkable variation between mammalian lineages. In addition, the phosphosite profiles reveal a phosphorylation coevolution network independent of protein abundance.
Collapse
|
50
|
Normics: Proteomic Normalization by Variance and Data-Inherent Correlation Structure. Mol Cell Proteomics 2022; 21:100269. [PMID: 35853575 PMCID: PMC9450154 DOI: 10.1016/j.mcpro.2022.100269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 06/16/2022] [Accepted: 07/13/2022] [Indexed: 11/17/2022] Open
Abstract
Several algorithms for the normalization of proteomic data are currently available, each based on a priori assumptions. Among these is the extent to which differential expression (DE) can be present in the dataset. This factor is usually unknown in explorative biomarker screens. Simultaneously, the increasing depth of proteomic analyses often requires the selection of subsets with a high probability of being DE to obtain meaningful results in downstream bioinformatical analyses. Based on the relationship of technical variation and (true) biological DE of an unknown share of proteins, we propose the “Normics” algorithm: Proteins are ranked based on their expression level–corrected variance and the mean correlation with all other proteins. The latter serves as a novel indicator of the non-DE likelihood of a protein in a given dataset. Subsequent normalization is based on a subset of non-DE proteins only. No a priori information such as batch, clinical, or replicate group is necessary. Simulation data demonstrated robust and superior performance across a wide range of stochastically chosen parameters. Five publicly available spike-in and biologically variant datasets were reliably and quantitively accurately normalized by Normics with improved performance compared to standard variance stabilization as well as median, quantile, and LOESS normalizations. In complex biological datasets Normics correctly determined proteins as being DE that had been cross-validated by an independent transcriptome analysis of the same samples. In both complex datasets Normics identified the most DE proteins. We demonstrate that combining variance analysis and data-inherent correlation structure to identify non-DE proteins improves data normalization. Standard normalization algorithms can be consolidated against high shares of (one-sided) biological regulation. The statistical power of downstream analyses can be increased by focusing on Normics-selected subsets of high DE likelihood. Normics is a tool for the normalization of proteomic data based on existing algorithms. Specifically addresses data with high shares of differential expression. Combines variance and data-inherent correlation structure. Provides a ranking of differential expression likelihood. Enables normalization based on the most stable proteins.
Collapse
|