1
|
A framework for quality control in quantitative proteomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.12.589318. [PMID: 38645098 PMCID: PMC11030400 DOI: 10.1101/2024.04.12.589318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
A thorough evaluation of the quality, reproducibility, and variability of bottom-up proteomics data is necessary at every stage of a workflow from planning to analysis. We share real-world case studies applying adaptable quality control (QC) measures to assess sample preparation, system function, and quantitative analysis. System suitability samples are repeatedly measured longitudinally with targeted methods, and we share examples where they are used on three instrument platforms to identify severe system failures and track function over months to years. Internal QCs incorporated at protein and peptide-level allow our team to assess sample preparation issues and to differentiate system failures from sample-specific issues. External QC samples prepared alongside our experimental samples are used to verify the consistency and quantitative potential of our results during batch correction and normalization before assessing biological phenotypes. We combine these controls with rapid analysis using Skyline, longitudinal QC metrics using AutoQC, and server-based data deposition using PanoramaWeb. We propose that this integrated approach to QC be used as a starting point for groups to facilitate rapid quality control assessment to ensure that valuable instrument time is used to collect the best quality data possible.
Collapse
|
2
|
SEAOP: a statistical ensemble approach for outlier detection in quantitative proteomics data. Brief Bioinform 2024; 25:bbae129. [PMID: 38557674 PMCID: PMC10982946 DOI: 10.1093/bib/bbae129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/01/2024] [Accepted: 03/07/2024] [Indexed: 04/04/2024] Open
Abstract
Quality control in quantitative proteomics is a persistent challenge, particularly in identifying and managing outliers. Unsupervised learning models, which rely on data structure rather than predefined labels, offer potential solutions. However, without clear labels, their effectiveness might be compromised. Single models are susceptible to the randomness of parameters and initialization, which can result in a high rate of false positives. Ensemble models, on the other hand, have shown capabilities in effectively mitigating the impacts of such randomness and assisting in accurately detecting true outliers. Therefore, we introduced SEAOP, a Python toolbox that utilizes an ensemble mechanism by integrating multi-round data management and a statistics-based decision pipeline with multiple models. Specifically, SEAOP uses multi-round resampling to create diverse sub-data spaces and employs outlier detection methods to identify candidate outliers in each space. Candidates are then aggregated as confirmed outliers via a chi-square test, adhering to a 95% confidence level, to ensure the precision of the unsupervised approaches. Additionally, SEAOP introduces a visualization strategy, specifically designed to intuitively and effectively display the distribution of both outlier and non-outlier samples. Optimal hyperparameter models of SEAOP for outlier detection were identified by using a gradient-simulated standard dataset and Mann-Kendall trend test. The performance of the SEAOP toolbox was evaluated using three experimental datasets, confirming its reliability and accuracy in handling quantitative proteomics.
Collapse
|
3
|
Observations from the Proteomics Bench. Proteomes 2024; 12:6. [PMID: 38390966 PMCID: PMC10885119 DOI: 10.3390/proteomes12010006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 01/26/2024] [Accepted: 02/02/2024] [Indexed: 02/24/2024] Open
Abstract
Many challenges in proteomics result from the high-throughput nature of the experiments. This paper first presents pre-analytical problems, which still occur, although the call for standardization in omics has been ongoing for many years. This article also discusses aspects that affect bioinformatic analysis based on three sets of reference data measured with different orbitrap instruments. Despite continuous advances in mass spectrometer technology as well as analysis software, data-set-wise quality control is still necessary, and decoy-based estimation, although challenged by modern instruments, should be utilized. We draw attention to the fact that numerous young researchers perceive proteomics as a mature, readily applicable technology. However, it is important to emphasize that the maximum potential of the technology can only be realized by an educated handling of its limitations.
Collapse
|
4
|
Integration of data-independent acquisition (DIA) with co-fractionation mass spectrometry (CF-MS) to enhance interactome mapping capabilities. Proteomics 2023; 23:e2200278. [PMID: 37144656 DOI: 10.1002/pmic.202200278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 04/03/2023] [Accepted: 04/14/2023] [Indexed: 05/06/2023]
Abstract
Proteomics technologies are continually advancing, providing opportunities to develop stronger and more robust protein interaction networks (PINs). In part, this is due to the ever-growing number of high-throughput proteomics methods that are available. This review discusses how data-independent acquisition (DIA) and co-fractionation mass spectrometry (CF-MS) can be integrated to enhance interactome mapping abilities. Furthermore, integrating these two techniques can improve data quality and network generation through extended protein coverage, less missing data, and reduced noise. CF-DIA-MS shows promise in expanding our knowledge of interactomes, notably for non-model organisms (NMOs). CF-MS is a valuable technique on its own, but upon the integration of DIA, the potential to develop robust PINs increases, offering a unique approach for researchers to gain an in-depth understanding into the dynamics of numerous biological processes.
Collapse
|
5
|
Evaluating cPILOT Data toward Quality Control Implementation. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2023; 34:1741-1752. [PMID: 37459602 DOI: 10.1021/jasms.3c00179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Multiplexing enables the monitoring of hundreds to thousands of proteins in quantitative proteomics analyses and increases sample throughput. In most mass-spectrometry-based proteomics workflows, multiplexing is achieved by labeling biological samples with heavy isotopes via precursor isotopic labeling or isobaric tagging. Enhanced multiplexing strategies, such as combined precursor isotopic labeling and isobaric tagging (cPILOT), combine multiple technologies to afford an even higher sample throughput. Critical to enhanced multiplexing analyses is ensuring that analytical performance is optimal and that missingness of sample channels is minimized. Automation of sample preparation steps and use of quality control (QC) metrics can be incorporated into multiplexing analyses and reduce the likelihood of missing information, thus maximizing the amount of usable quantitative data. Here, we implemented QC metrics previously developed in our laboratory to evaluate a 36-plex cPILOT experiment that encompassed 144 mouse samples of various tissue types, time points, genotypes, and biological replicates. The evaluation focuses on the use of a sample pool generated from all samples in the experiment to monitor the daily instrument performance and to provide a means for data normalization across sample batches. Our results show that tracking QC metrics enabled the quantification of ∼7000 proteins in each sample batch, of which ∼70% had minimal missing values across up to 36 sample channels. Implementation of QC metrics for future cPILOT studies as well as other enhanced multiplexing strategies will help yield high-quality data sets.
Collapse
|
6
|
Establishing Quality Control Procedures for Large-Scale Plasma Proteomics Analyses. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2023. [PMID: 37163770 DOI: 10.1021/jasms.3c00050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Proteomics research has been transformed due to high-throughput liquid chromatography (LC-MS/MS) tandem mass spectrometry instruments combined with highly sophisticated automated sample preparation and multiplexing workflows. However, scaling proteomics experiments to large sample cohorts (hundreds to thousands) requires thoughtful quality control (QC) protocols. Robust QC protocols can help with reproducibility, quantitative accuracy, and provide opportunities for more decisive troubleshooting. Our laboratory conducted a plasma proteomics study of a cohort of N = 335 patient samples using tandem mass tag (TMTpro) 16-plex batches. Over the course of a 10-month data acquisition period for this cohort we collected 271 pooled QC LC-MS/MS result files obtained from MS/MS analysis of a patient-derived pooled plasma sample, representative of the entire cohort population. This sample was tagged with TMTzero or TMTpro reagents and used to inform the daily performance of the LC-MS/MS instruments and to allow within and across sample batch normalization. Analytical variability of a number of instrumental and data analysis metrics including protein and peptide identifications, peptide spectral matches (PSMs), number of obtained MS/MS spectra, average peptide abundance, percent of peptides with a Δ m/z between ±0.003 Da, percent of MS/MS spectra obtained at the maximum injection time, and the retention time of selected tracking peptides were evaluated to help inform the design of a robust LC-MS/MS QC workflow for use in future cohort studies. This study also led to general tips for using selected metrics to inform real-time troubleshooting of LC-MS/MS performance issues with daily QC checks.
Collapse
|
7
|
Histologic and proteomic remodeling of the pulmonary veins and arteries in a porcine model of chronic pulmonary venous hypertension. Cardiovasc Res 2023; 119:268-282. [PMID: 35022664 DOI: 10.1093/cvr/cvac005] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 11/15/2021] [Accepted: 01/10/2022] [Indexed: 11/14/2022] Open
Abstract
AIMS In heart failure (HF), pulmonary venous hypertension (PVH) produces pulmonary hypertension (PH) with remodeling of pulmonary veins (PV) and arteries (PA). In a porcine PVH model, we performed proteomic-based bioinformatics to investigate unique pathophysiologic mechanisms mediating PA and PV remodeling. METHODS AND RESULTS Large PV were banded (PVH, n = 10) or not (Sham, n = 9) in piglets. At sacrifice, PV and PA were perfusion labelled for vessel-specific histology and proteomics. The PA and PV were separately sampled with laser-capture micro-dissection for mass spectrometry. Pulmonary vascular resistance [Wood Units; 8.6 (95% confidence interval: 6.3, 12.3) vs. 2.0 (1.7, 2.3)] and PA [19.9 (standard error of mean, 1.1) vs. 10.3 (1.1)] and PV [14.2 (1.2) vs. 7.6 (1.1)] wall thickness/external diameter (%) were increased in PVH (P < 0.05 for all). Similar numbers of proteins were identified in PA (2093) and PV (2085) with 94% overlap, but biological processes differed. There were more differentially expressed proteins (287 vs. 161), altered canonical pathways (17 vs. 3), and predicted upstream regulators (PUSR; 22 vs. 6) in PV than PA. In PA and PV, bioinformatics indicated activation of the integrated stress response and mammalian target of rapamycin signalling with dysregulated growth. In PV, there was also activation of Rho/Rho-kinase signalling with decreased actin cytoskeletal signalling and altered tight and adherens junctions, ephrin B, and caveolae-mediated endocytosis signalling; all indicating disrupted endothelial barrier function. Indeed, protein biomarkers and the top PUSR in PV (transforming growth factor-beta) suggested endothelial to mesenchymal transition in PV. Findings were similar in human autopsy specimens. CONCLUSION These findings provide new therapeutic targets to oppose pulmonary vascular remodeling in HF-related PH.
Collapse
|
8
|
Quality Control—A Stepchild in Quantitative Proteomics: A Case Study for the Human CSF Proteome. Biomolecules 2023; 13:biom13030491. [PMID: 36979426 PMCID: PMC10046854 DOI: 10.3390/biom13030491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 02/08/2023] [Accepted: 03/01/2023] [Indexed: 03/09/2023] Open
Abstract
Proteomic studies using mass spectrometry (MS)-based quantification are a main approach to the discovery of new biomarkers. However, a number of analytical conditions in front and during MS data acquisition can affect the accuracy of the obtained outcome. Therefore, comprehensive quality assessment of the acquired data plays a central role in quantitative proteomics, though, due to the immense complexity of MS data, it is often neglected. Here, we address practically the quality assessment of quantitative MS data, describing key steps for the evaluation, including the levels of raw data, identification and quantification. With this, four independent datasets from cerebrospinal fluid, an important biofluid for neurodegenerative disease biomarker studies, were assessed, demonstrating that sample processing-based differences are already reflected at all three levels but with varying impacts on the quality of the quantitative data. Specifically, we provide guidance to critically interpret the quality of MS data for quantitative proteomics. Moreover, we provide the free and open source quality control tool MaCProQC, enabling systematic, rapid and uncomplicated data comparison of raw data, identification and feature detection levels through defined quality metrics and a step-by-step quality control workflow.
Collapse
|
9
|
Review of the Use of Liquid Chromatography-Tandem Mass Spectrometry in Clinical Laboratories: Part II-Operations. Ann Lab Med 2022; 42:531-557. [PMID: 35470272 PMCID: PMC9057814 DOI: 10.3343/alm.2022.42.5.531] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 02/08/2022] [Accepted: 04/13/2022] [Indexed: 11/19/2022] Open
Abstract
Liquid chromatography-tandem mass spectrometry (LC-MS/MS) is increasingly utilized in clinical laboratories because it has advantages in terms of specificity and sensitivity over other analytical technologies. These advantages come with additional responsibilities and challenges given that many assays and platforms are not provided to laboratories as a single kit or device. The skills, staff, and assays used in LC-MS/MS are internally developed by the laboratory, with relatively few exceptions. Hence, a laboratory that deploys LC-MS/MS assays must be conscientious of the practices and procedures adopted to overcome the challenges associated with the technology. This review discusses the post-development landscape of LC-MS/MS assays, including validation, quality assurance, operations, and troubleshooting. The content knowledge of LC-MS/MS users is quite broad and deep and spans multiple scientific fields, including biology, clinical chemistry, chromatography, engineering, and MS. However, there are no formal academic programs or specific literature to train laboratory staff on the fundamentals of LC-MS/MS beyond the reports on method development. Therefore, depending on their experience level, some readers may be familiar with aspects of the laboratory practices described herein, while others may be not. This review endeavors to assemble aspects of LC-MS/MS operations in the clinical laboratory to provide a framework for the thoughtful development and execution of LC-MS/MS applications.
Collapse
|
10
|
Proteomic Discovery and Validation of Novel Fluid Biomarkers for Improved Patient Selection and Prediction of Clinical Outcomes in Alzheimer’s Disease Patient Cohorts. Proteomes 2022; 10:proteomes10030026. [PMID: 35997438 PMCID: PMC9397030 DOI: 10.3390/proteomes10030026] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 07/13/2022] [Accepted: 07/23/2022] [Indexed: 01/25/2023] Open
Abstract
Alzheimer’s disease (AD) is an irreversible neurodegenerative disease characterized by progressive cognitive decline. The two cardinal neuropathological hallmarks of AD include the buildup of cerebral β amyloid (Aβ) plaques and neurofibrillary tangles of hyperphosphorylated tau. The current disease-modifying treatments are still not effective enough to lower the rate of cognitive decline. There is an urgent need to identify early detection and disease progression biomarkers that can facilitate AD drug development. The current established readouts based on the expression levels of amyloid beta, tau, and phospho-tau have shown many discrepancies in patient samples when linked to disease progression. There is an urgent need to identify diagnostic and disease progression biomarkers from blood, cerebrospinal fluid (CSF), or other biofluids that can facilitate the early detection of the disease and provide pharmacodynamic readouts for new drugs being tested in clinical trials. Advances in proteomic approaches using state-of-the-art mass spectrometry are now being increasingly applied to study AD disease mechanisms and identify drug targets and novel disease biomarkers. In this report, we describe the application of quantitative proteomic approaches for understanding AD pathophysiology, summarize the current knowledge gained from proteomic investigations of AD, and discuss the development and validation of new predictive and diagnostic disease biomarkers.
Collapse
|
11
|
A Sensitive and Controlled Data-Independent Acquisition Method for Proteomic Analysis of Cell Therapies. J Proteome Res 2022; 21:1229-1239. [PMID: 35404046 PMCID: PMC9087334 DOI: 10.1021/acs.jproteome.1c00887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Indexed: 11/29/2022]
Abstract
Mass spectrometry (MS)-based proteomic measurements are uniquely poised to impact the development of cell and gene therapies. With the adoption of rigorous instrumental performance qualifications (PQs), large-scale proteomics can move from a research to a manufacturing control tool. Especially suited, data-independent acquisition (DIA) approaches have distinctive qualities to extend multiattribute method (MAM) principles to characterize the proteome of cell therapies. Here, we describe the development of a DIA method for the sensitive identification and quantification of proteins on a Q-TOF instrument. Using the improved acquisition parameters, we defined a control strategy and highlighted some metrics to improve the reproducibility of SWATH acquisition-based proteomic measurements. Finally, we applied the method to analyze the proteome of Jurkat cells that here serves as a model for human T-cells. Raw and processed data were deposited in PRIDE (PXD029780).
Collapse
|
12
|
A practical guide to interpreting and generating bottom-up proteomics data visualizations. Proteomics 2022; 22:e2100103. [PMID: 35107884 DOI: 10.1002/pmic.202100103] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 12/22/2021] [Accepted: 01/20/2022] [Indexed: 11/10/2022]
Abstract
Mass-spectrometry based bottom-up proteomics is the main method to analyze proteomes comprehensively and the rapid evolution of instrumentation and data analysis has made the technology widely available. Data visualization is an integral part of the analysis process and it is crucial for the communication of results. This is a major challenge due to the immense complexity of MS data. In this review, we provide an overview of commonly used visualizations, starting with raw data of traditional and novel MS technologies, then basic peptide and protein level analyses, and finally visualization of highly complex datasets and networks. We specifically provide guidance on how to critically interpret and discuss the multitude of different proteomics data visualizations. Furthermore, we highlight Python-based libraries and other open science tools that can be applied for independent and transparent generation of customized visualizations. To further encourage programmatic data visualization, we provide the Python code used to generate all data Figures in this review on GitHub (https://github.com/MannLabs/ProteomicsVisualization). This article is protected by copyright. All rights reserved.
Collapse
|
13
|
Abstract
![]()
Every laboratory performing mass-spectrometry-based
proteomics
strives to generate high-quality data. Among the many factors that
impact the outcome of any experiment in proteomics is the LC–MS
system performance, which should be monitored within each specific
experiment and also long term. This process is termed quality control
(QC). We present an easy-to-use tool that rapidly produces a visual,
HTML-based report that includes the key parameters needed to monitor
the LC–MS system performance, with a focus on monitoring the
performance within an experiment. The tool, named RawBeans, generates
a report for individual files or for a set of samples from a whole
experiment. We anticipate that it will help proteomics users and experts
evaluate raw data quality independent of data processing. The tool
is available at https://bitbucket.org/incpm/prot-qc/downloads. The mass-spectrometry proteomics data have been deposited to the
ProteomeXchange Consortium via the PRIDE partner repository with the
data set identifier PXD022816.
Collapse
|
14
|
Creation and filtering of a recurrent spectral library of CHO cell metabolites and media components. Biotechnol Bioeng 2021; 118:1491-1510. [PMID: 33404064 PMCID: PMC8048470 DOI: 10.1002/bit.27661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 12/02/2020] [Accepted: 12/13/2020] [Indexed: 02/02/2023]
Abstract
This paper reports the first implementation of a new type of mass spectral library for the analysis of Chinese hamster ovary (CHO) cell metabolites that allows users to quickly identify most compounds in any complex metabolite sample. We also describe an annotation methodology developed to filter out artifacts and low‐quality spectra from recurrent unidentified spectra of metabolites. CHO cells are commonly used to produce biological therapeutics. Metabolic profiles of CHO cells and media can be used to monitor process variability and look for markers that discriminate between batches of product. We have created a comprehensive library of both identified and unidentified metabolites derived from CHO cells that can be used in conjunction with tandem mass spectrometry to identify metabolites. In addition, we present a workflow that can be used for assigning confidence to a NIST MS/MS Library search match based on prior probability of general utility. The goal of our work is to annotate and identify (when possible), all liquid chromatography‐mass spectrometry generated metabolite ions as well as create automatable library building and identification pipelines for use by others in the field.
Collapse
|
15
|
PTM-Shepherd: Analysis and Summarization of Post-Translational and Chemical Modifications From Open Search Results. Mol Cell Proteomics 2020; 20:100018. [PMID: 33568339 PMCID: PMC7950090 DOI: 10.1074/mcp.tir120.002216] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 11/13/2020] [Accepted: 12/01/2020] [Indexed: 01/17/2023] Open
Abstract
Open searching has proven to be an effective strategy for identifying both known and unknown modifications in shotgun proteomics experiments. Rather than being limited to a small set of user-specified modifications, open searches identify peptides with any mass shift that may correspond to a single modification or a combination of several modifications. Here we present PTM-Shepherd, a bioinformatics tool that automates characterization of post-translational modification profiles detected in open searches based on attributes, such as amino acid localization, fragmentation spectra similarity, retention time shifts, and relative modification rates. PTM-Shepherd can also perform multiexperiment comparisons for studying changes in modification profiles, e.g., in data generated in different laboratories or under different conditions. We demonstrate how PTM-Shepherd improves the analysis of data from formalin-fixed and paraffin-embedded samples, detects extreme underalkylation of cysteine in some data sets, discovers an artifactual modification introduced during peptide synthesis, and uncovers site-specific biases in sample preparation artifacts in a multicenter proteomics profiling study.
Collapse
|
16
|
Abstract
Data-independent acquisition (DIA) is a promising technique for the proteomic analysis of complex protein samples. A number of studies have claimed that DIA experiments are more reproducible than data-dependent acquisition (DDA), but these claims are unsubstantiated since different data analysis methods are used in the two methods. Data analysis in most DIA workflows depends on spectral library searches, whereas DDA typically employs sequence database searches. In this study, we examined the reproducibility of the DIA and DDA results using both sequence database and spectral library search. The comparison was first performed using a cell lysate and then extended to an interactome study. Protein overlap among the technical replicates in both DDA and DIA experiments was 30% higher with library-based identifications than with sequence database identifications. The reproducibility of quantification was also improved with library search compared to database search, with the mean of the coefficient of variation decreasing more than 30% and a reduction in the number of missing values of more than 35%. Our results show that regardless of the acquisition method, higher identification and quantification reproducibility is observed when library search was used.
Collapse
|
17
|
Simple Peptide Quantification Approach for MS-Based Proteomics Quality Control. ACS OMEGA 2020; 5:6754-6762. [PMID: 32258910 PMCID: PMC7114614 DOI: 10.1021/acsomega.0c00080] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 03/04/2020] [Indexed: 06/11/2023]
Abstract
Despite its growing popularity and use, bottom-up proteomics remains a complex analytical methodology. Its general workflow consists of three main steps: sample preparation, liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS), and computational data analysis. Quality assessment of the different steps and components of this workflow is instrumental to identify technical flaws and avoid loss of precious measurement time and sample material. However, assessment of the extent of sample losses along with the sample preparation protocol, in particular, after proteolytic digestion, is not yet routinely implemented because of the lack of an accurate and straightforward method to quantify peptides. Here, we report on the use of a microfluidic UV/visible spectrophotometer to quantify MS-ready peptides directly in the MS-loading solvent, consuming only 2 μL of sample. We compared the performance of the microfluidic spectrophotometer with a standard device and determined the optimal sample amount for LC-MS/MS analysis on a Q Exactive HF mass spectrometer using a dilution series of a commercial K562 cell digest. A careful evaluation of selected LC and MS parameters allowed us to define 3 μg as an optimal peptide amount to be injected into this particular LC-MS/MS system. Finally, using tryptic digests from human HEK293T cells and showing that injecting equal peptide amounts, rather than approximate ones, result in less variable LC-MS/MS and protein quantification data. The obtained quality improvement together with easy implementation of the approach makes it possible to routinely quantify MS-ready peptides as a next step in daily proteomics quality control.
Collapse
|
18
|
Mass Spectrometry Advances and Perspectives for the Characterization of Emerging Adoptive Cell Therapies. Molecules 2020; 25:E1396. [PMID: 32204371 PMCID: PMC7144572 DOI: 10.3390/molecules25061396] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 03/06/2020] [Accepted: 03/11/2020] [Indexed: 12/12/2022] Open
Abstract
Adoptive cell therapy is an emerging anti-cancer modality, whereby the patient's own immune cells are engineered to express T-cell receptor (TCR) or chimeric antigen receptor (CAR). CAR-T cell therapies have advanced the furthest, with recent approvals of two treatments by the Food and Drug Administration of Kymriah (trisagenlecleucel) and Yescarta (axicabtagene ciloleucel). Recent developments in proteomic analysis by mass spectrometry (MS) make this technology uniquely suited to enable the comprehensive identification and quantification of the relevant biochemical architecture of CAR-T cell therapies and fulfill current unmet needs for CAR-T product knowledge. These advances include improved sample preparation methods, enhanced separation technologies, and extension of MS-based proteomic to single cells. Innovative technologies such as proteomic analysis of raw material quality attributes (MQA) and final product quality attributes (PQA) may provide insights that could ultimately fuel development strategies and lead to broad implementation.
Collapse
|
19
|
Abstract
Mass spectrometry-based proteomics is an invaluable tool for addressing important biological questions. Data-dependent acquisition methods effectuate stochastic acquisition of data in complex mixtures, which results in missing identifications across replicates. We developed a search approach that improves the reproducibility of data acquired from any mass spectrometer. In our approach, a spectral library is built from the identification results from a database search, and then, the library is used to research the same data files to obtain the final result. We showed that higher identification and quantification reproducibility is achieved with the dual-search approach than with a typical database search. Four datasets with different complexity were compared: (1) data from a cell lysate study performed in our lab, (2) data from an interactome study performed in our lab, (3) a publicly available extracellular vesicles dataset, and (4) a publicly available phosphoproteomics dataset. Our results show that the dual-search approach can be widely and easily used to improve data quality in proteomics data.
Collapse
|
20
|
viQC: Visual and Intuitive Quality Control for Mass Spectrometry-Based Proteome Analysis. JOURNAL OF ANALYTICAL CHEMISTRY 2019. [DOI: 10.1134/s1061934819140119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
21
|
Review of Issues and Solutions to Data Analysis Reproducibility and Data Quality in Clinical Proteomics. Methods Mol Biol 2019. [PMID: 31552637 DOI: 10.1007/978-1-4939-9744-2_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
In any analytical discipline, data analysis reproducibility is closely interlinked with data quality. In this book chapter focused on mass spectrometry-based proteomics approaches, we introduce how both data analysis reproducibility and data quality can influence each other and how data quality and data analysis designs can be used to increase robustness and improve reproducibility. We first introduce methods and concepts to design and maintain robust data analysis pipelines such that reproducibility can be increased in parallel. The technical aspects related to data analysis reproducibility are challenging, and current ways to increase the overall robustness are multifaceted. Software containerization and cloud infrastructures play an important part.We will also show how quality control (QC) and quality assessment (QA) approaches can be used to spot analytical issues, reduce the experimental variability, and increase confidence in the analytical results of (clinical) proteomics studies, since experimental variability plays a substantial role in analysis reproducibility. Therefore, we give an overview on existing solutions for QC/QA, including different quality metrics, and methods for longitudinal monitoring. The efficient use of both types of approaches undoubtedly provides a way to improve the experimental reliability, reproducibility, and level of consistency in proteomics analytical measurements.
Collapse
|
22
|
Targeted multiplex proteomics for molecular prescreening and biomarker discovery in metastatic colorectal cancer. Sci Rep 2019; 9:13568. [PMID: 31537838 PMCID: PMC6753065 DOI: 10.1038/s41598-019-49867-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 08/30/2019] [Indexed: 12/27/2022] Open
Abstract
Protein biomarkers are widely used in cancer diagnosis, prognosis, and prediction of treatment response. Here we introduce the use of targeted multiplex proteomics (TMP) as a tool to simultaneously measure a panel of 54 proteins involved in oncogenic, tumour suppression, drug metabolism and resistance, in patients with metastatic colorectal cancer (mCRC). TMP provided valuable diagnostic information by unmasking an occult neuroendocrine differentiation and identifying a misclassified case based on abnormal proteins phenotype. No significant differences in protein levels between unpaired primary and metastatic samples were observed. Four proteins were found differentially expressed in KRAS-mutant as compared to wild-type tumours (overexpressed in mutant: KRAS, EGFR; overexpressed in wild-type: TOPO1, TOP2A). Survival analyses revealed the association between mesothelin expression and poor overall survival, whereas lack of PTEN protein expression associated with lower progression-free survival with anti-EGFR-based therapy in the first-line setting for patients with RAS wild-type tumour. Finally, outlier analysis identified putative targetable proteins in 65% of patients lacking a targetable genomic alteration. Our data show that TMP constitutes a promising, novel molecular prescreening tool in mCRC to identify protein expression alterations that may impact on patient outcomes and more precisely guide patient eligibility to clinical trials with novel targeted experimental therapies.
Collapse
|
23
|
Mass Spectrometry Fingerprints of Small-Molecule Metabolites in Biofluids: Building a Spectral Library of Recurrent Spectra for Urine Analysis. Anal Chem 2019; 91:12021-12029. [PMID: 31424920 DOI: 10.1021/acs.analchem.9b02977] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
A large fraction of ions observed in electrospray liquid chromatography-mass spectrometry (LC-ESI-MS) experiments of biological samples remain unidentified. One of the main reasons for this is that spectral libraries of pure compounds fail to account for the complexity of the metabolite profiling of complex materials. Recently, the NIST Mass Spectrometry Data Center has been developing a novel type of searchable mass spectral library that includes all recurrent unidentified spectra found in the sample profile. These libraries, in conjunction with the NIST tandem mass spectral library, allow analysts to explore most of the chemical space accessible to LC-MS analysis. In this work, we demonstrate how these libraries can provide a reliable fingerprint of the material by applying them to a variety of urine samples, including an extremely altered urine from cancer patients undergoing total body irradiation. The same workflow is applicable to any other biological fluid. The selected class of acylcarnitines is examined in detail, and derived libraries and related software are freely available. They are intended to serve as online resources for continuing community review and improvement.
Collapse
|
24
|
Online porous graphic carbon chromatography coupled with tandem mass spectrometry for post-translational modification analysis. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2019; 33:1240-1247. [PMID: 31034685 DOI: 10.1002/rcm.8459] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Revised: 04/08/2019] [Accepted: 04/10/2019] [Indexed: 06/09/2023]
Abstract
RATIONALE Porous graphic carbon chromatography (PGC) has a different mechanism in the retention of tryptic peptides compared with reversed-phase chromatography and in this study we show that coupling PGC with tandem mass spectrometry offer advantages for the quantitation of phosphorylation stoichiometry and characterization of site-specific glycosylation. METHODS Digests of protein standards (horse myoglobin, bovine fetuin and β-casein) were analyzed with a capillary liquid chromatography/tandem mass spectrometry (LC/MS/MS) system by coupling an Agilent 1100 HPLC system to a Synapt G2-Si HDMS (Waters). Peptides were separated using a HyperCarb PGC column (300 μm i.d. × 100 mm) packed with 3 μm particles. MS/MS data were collected in data-dependent mode and three MS/MS scans were acquired after the full MS scan. RAW data were transformed to .mgf by PLGS (Waters) and searched against the Swissprot database by Mascot. Chromatograms and MS/MS spectra of identified compounds were extracted with Masslynx (Waters) and imported to Origin for analysis. Glycan composition and peptide sequence were manually annotated. RESULTS PGC/MS/MS enabled accurate quantitation of the stoichiometry of specific phosphorylation sites from β-casein by efficient separation of the phosphopeptide and its non-phosphorylated counterpart, which cannot be achieved by reversed-phase chromatography. PGC/MS/MS also enabled comprehensive characterization of protein sialoglycosylation as isomeric glycopeptides with different combinations of α2-3- and α2-6-linked sialic acids can be separated and the ratios of each combination were verified by exoglycosidase digestion. CONCLUSIONS PGC has demonstrated superior separation of peptides with phosphorylation and glycosylation and can be used as an alternative in the proteomic characterization of post-translational modifications (PTMs) by polar groups.
Collapse
|
25
|
QCMAP: An Interactive Web-Tool for Performance Diagnosis and Prediction of LC-MS Systems. Proteomics 2019; 19:e1900068. [PMID: 31099962 DOI: 10.1002/pmic.201900068] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 05/07/2019] [Indexed: 01/04/2023]
Abstract
The increasing role played by liquid chromatography-mass spectrometry (LC-MS)-based proteomics in biological discovery has led to a growing need for quality control (QC) on the LC-MS systems. While numerous quality control tools have been developed to track the performance of LC-MS systems based on a pre-defined set of performance factors (e.g., mass error, retention time), the precise influence and contribution of the performance factors and their generalization property to different biological samples are not as well characterized. Here, a web-based application (QCMAP) is developed for interactive diagnosis and prediction of the performance of LC-MS systems across different biological sample types. Leveraging on a standardized HeLa cell sample run as QC within a multi-user facility, predictive models are trained on a panel of commonly used performance factors to pinpoint the precise conditions to a (un)satisfactory performance in three LC-MS systems. It is demonstrated that the learned model can be applied to predict LC-MS system performance for brain samples generated from an independent study. By compiling these predictive models into our web-application, QCMAP allows users to benchmark the performance of their LC-MS systems using their own samples and identify key factors for instrument optimization. QCMAP is freely available from: http://shiny.maths.usyd.edu.au/QCMAP/.
Collapse
|
26
|
Abstract
The performance of ultrasensitive liquid chromatography and tandem mass spectrometry (LC-MS/MS) methods, such as single-cell proteomics by mass spectrometry (SCoPE-MS), depends on multiple interdependent parameters. This interdependence makes it challenging to specifically pinpoint the sources of problems in the LC-MS/MS methods and approaches for resolving them. For example, a low signal at the MS2 level can be due to poor LC separation, ionization, apex targeting, ion transfer, or ion detection. We sought to specifically diagnose such problems by interactively visualizing data from all levels of bottom-up LC-MS/MS analysis. Many software packages, such as MaxQuant, already provide such data, and we developed an open source platform for their interactive visualization and analysis: Data-driven Optimization of MS (DO-MS). We found that in many cases DO-MS not only specifically diagnosed LC-MS/MS problems but also enabled us to rationally optimize them. For example, by using DO-MS to optimize the sampling of the elution peak apexes, we increased ion accumulation times and apex sampling, which resulted in a 370% more efficient delivery of ions for MS2 analysis. DO-MS is easy to install and use, and its GUI allows for interactive data subsetting and high-quality figure generation. The modular design of DO-MS facilitates customization and expansion. DO-MS v1.0.8 is available for download from GitHub: https://github.com/SlavovLab/DO-MS . Additional documentation is available at https://do-ms.slavovlab.net .
Collapse
|
27
|
RawTools: Rapid and Dynamic Interrogation of Orbitrap Data Files for Mass Spectrometer System Management. J Proteome Res 2018; 18:700-708. [PMID: 30462513 DOI: 10.1021/acs.jproteome.8b00721] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Optimizing the quality of proteomics data collected from a mass spectrometer (MS) requires careful selection of acquisition parameters and proper assessment of instrument performance. Software tools capable of extracting a broad set of information from raw files, including meta, scan, quantification, and identification data, are needed to provide guidance for MS system management. In this work, direct extraction and utilization of these data is demonstrated using RawTools, a standalone tool for extracting meta and scan data directly from raw MS files generated on Thermo Orbitrap instruments. RawTools generates summarized and detailed plain text outputs after parsing individual raw files, including scan rates and durations, duty cycle characteristics, precursor and reporter ion quantification, and chromatography performance. RawTools also contains a diagnostic module that includes an optional "preview" database search for facilitating informed decision-making related to optimization of MS performance based on a variety of metrics. RawTools has been developed in C# and utilizes the Thermo RawFileReader library and thus can process raw MS files with high speed and high efficiency on all major operating systems (Windows, MacOS, Linux). To demonstrate the utility of RawTools, the extraction of meta and scan data from both individual and large collections of raw MS files was carried out to identify problematic characteristics of instrument performance. Taken together, the combined rich feature-set of RawTools with the capability for interrogation of MS and experiment performance makes this software a valuable tool for proteomics researchers.
Collapse
|
28
|
MSstatsQC 2.0: R/Bioconductor Package for Statistical Quality Control of Mass Spectrometry-Based Proteomics Experiments. J Proteome Res 2018; 18:678-686. [DOI: 10.1021/acs.jproteome.8b00732] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
29
|
Quality assessment and interference detection in targeted mass spectrometry data using machine learning. Clin Proteomics 2018; 15:33. [PMID: 30323719 PMCID: PMC6173846 DOI: 10.1186/s12014-018-9209-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 09/24/2018] [Indexed: 12/24/2022] Open
Abstract
Advances in the field of targeted proteomics and mass spectrometry have significantly improved assay sensitivity and multiplexing capacity. The high-throughput nature of targeted proteomics experiments has increased the rate of data production, which requires development of novel analytical tools to keep up with data processing demand. Currently, development and validation of targeted mass spectrometry assays require manual inspection of chromatographic peaks from large datasets to ensure quality, a process that is time consuming, prone to inter- and intra-operator variability and limits the efficiency of data interpretation from targeted proteomics analyses. To address this challenge, we have developed TargetedMSQC, an R package that facilitates quality control and verification of chromatographic peaks from targeted proteomics datasets. This tool calculates metrics to quantify several quality aspects of a chromatographic peak, e.g. symmetry, jaggedness and modality, co-elution and shape similarity of monitored transitions in a peak group, as well as the consistency of transitions’ ratios between endogenous analytes and isotopically labeled internal standards and consistency of retention time across multiple runs. The algorithm takes advantage of supervised machine learning to identify peaks with interference or poor chromatography based on a set of peaks that have been annotated by an expert analyst. Using TargetedMSQC to analyze targeted proteomics data reduces the time spent on manual inspection of peaks and improves both speed and accuracy of interference detection. Additionally, by allowing the analysts to customize the tool for application on different datasets, TargetedMSQC gives the users the flexibility to define the acceptable quality for specific datasets. Furthermore, automated and quantitative assessment of peak quality offers a more objective and systematic framework for high throughput analysis of targeted mass spectrometry assay datasets and is a step towards more robust and faster assay implementation.
Collapse
|
30
|
Quality control in mass spectrometry-based proteomics. MASS SPECTROMETRY REVIEWS 2018; 37:697-711. [PMID: 28802010 DOI: 10.1002/mas.21544] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2017] [Revised: 07/24/2017] [Accepted: 07/24/2017] [Indexed: 05/21/2023]
Abstract
Mass spectrometry is a highly complex analytical technique and mass spectrometry-based proteomics experiments can be subject to a large variability, which forms an obstacle to obtaining accurate and reproducible results. Therefore, a comprehensive and systematic approach to quality control is an essential requirement to inspire confidence in the generated results. A typical mass spectrometry experiment consists of multiple different phases including the sample preparation, liquid chromatography, mass spectrometry, and bioinformatics stages. We review potential sources of variability that can impact the results of a mass spectrometry experiment occurring in all of these steps, and we discuss how to monitor and remedy the negative influences on the experimental results. Furthermore, we describe how specialized quality control samples of varying sample complexity can be incorporated into the experimental workflow and how they can be used to rigorously assess detailed aspects of the instrument performance.
Collapse
|
31
|
Quality Control Analysis in Real-time (QC-ART): A Tool for Real-time Quality Control Assessment of Mass Spectrometry-based Proteomics Data. Mol Cell Proteomics 2018; 17:1824-1836. [PMID: 29666158 PMCID: PMC6126382 DOI: 10.1074/mcp.ra118.000648] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 03/13/2018] [Indexed: 12/29/2022] Open
Abstract
Liquid chromatography-mass spectrometry (LC-MS)-based proteomics studies of large sample cohorts can easily require from months to years to complete. Acquiring consistent, high-quality data in such large-scale studies is challenging because of normal variations in instrumentation performance over time, as well as artifacts introduced by the samples themselves, such as those because of collection, storage and processing. Existing quality control methods for proteomics data primarily focus on post-hoc analysis to remove low-quality data that would degrade downstream statistics; they are not designed to evaluate the data in near real-time, which would allow for interventions as soon as deviations in data quality are detected. In addition to flagging analyses that demonstrate outlier behavior, evaluating how the data structure changes over time can aide in understanding typical instrument performance or identify issues such as a degradation in data quality because of the need for instrument cleaning and/or re-calibration. To address this gap for proteomics, we developed Quality Control Analysis in Real-Time (QC-ART), a tool for evaluating data as they are acquired to dynamically flag potential issues with instrument performance or sample quality. QC-ART has similar accuracy as standard post-hoc analysis methods with the additional benefit of real-time analysis. We demonstrate the utility and performance of QC-ART in identifying deviations in data quality because of both instrument and sample issues in near real-time for LC-MS-based plasma proteomics analyses of a sample subset of The Environmental Determinants of Diabetes in the Young cohort. We also present a case where QC-ART facilitated the identification of oxidative modifications, which are often underappreciated in proteomic experiments.
Collapse
|
32
|
Proteomic Approaches for the Discovery of Biofluid Biomarkers of Neurodegenerative Dementias. Proteomes 2018; 6:32. [PMID: 30200280 PMCID: PMC6161166 DOI: 10.3390/proteomes6030032] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 08/22/2018] [Accepted: 08/29/2018] [Indexed: 12/11/2022] Open
Abstract
Neurodegenerative dementias are highly complex disorders driven by vicious cycles of intersecting pathophysiologies. While most can be definitively diagnosed by the presence of disease-specific pathology in the brain at postmortem examination, clinical disease presentations often involve substantially overlapping cognitive, behavioral, and functional impairment profiles that hamper accurate diagnosis of the specific disease. As global demographics shift towards an aging population in developed countries, clinicians need more sensitive and specific diagnostic tools to appropriately diagnose, monitor, and treat neurodegenerative conditions. This review is intended as an overview of how modern proteomic techniques (liquid chromatography mass spectrometry (LC-MS/MS) and advanced capture-based technologies) may contribute to the discovery and establishment of better biofluid biomarkers for neurodegenerative disease, and the limitations of these techniques. The review highlights some of the more interesting technical innovations and common themes in the field but is not intended to be an exhaustive systematic review of studies to date. Finally, we discuss clear reporting principles that should be integrated into all studies going forward to ensure data is presented in sufficient detail to allow meaningful comparisons across studies.
Collapse
|
33
|
Data-independent acquisition-based SWATH-MS for quantitative proteomics: a tutorial. Mol Syst Biol 2018; 14:e8126. [PMID: 30104418 PMCID: PMC6088389 DOI: 10.15252/msb.20178126] [Citation(s) in RCA: 563] [Impact Index Per Article: 93.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 05/11/2018] [Accepted: 05/15/2018] [Indexed: 01/16/2023] Open
Abstract
Many research questions in fields such as personalized medicine, drug screens or systems biology depend on obtaining consistent and quantitatively accurate proteomics data from many samples. SWATH-MS is a specific variant of data-independent acquisition (DIA) methods and is emerging as a technology that combines deep proteome coverage capabilities with quantitative consistency and accuracy. In a SWATH-MS measurement, all ionized peptides of a given sample that fall within a specified mass range are fragmented in a systematic and unbiased fashion using rather large precursor isolation windows. To analyse SWATH-MS data, a strategy based on peptide-centric scoring has been established, which typically requires prior knowledge about the chromatographic and mass spectrometric behaviour of peptides of interest in the form of spectral libraries and peptide query parameters. This tutorial provides guidelines on how to set up and plan a SWATH-MS experiment, how to perform the mass spectrometric measurement and how to analyse SWATH-MS data using peptide-centric scoring. Furthermore, concepts on how to improve SWATH-MS data acquisition, potential trade-offs of parameter settings and alternative data analysis strategies are discussed.
Collapse
|
34
|
Pregnancy-induced changes in metabolome and proteome in ovine uterine flushings. Biol Reprod 2018; 97:273-287. [PMID: 29044433 DOI: 10.1093/biolre/iox078] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 07/15/2017] [Indexed: 12/25/2022] Open
Abstract
Mass spectrometry (MS) approaches were used herein to identify metabolites and proteins in uterine flushings (UF) that may contribute to nourishing the conceptus. Ovine uteri collected on Day 12 of the estrous cycle (n = 5 ewes exposed to vasectomized ram) or Days 12 (n = 4), 14 (n = 5), or 16 (n = 5) of pregnancy (bred with fertile ram) were flushed using buffered saline. Metabolites were extracted using 80% methanol and profiled using ultraperformance liquid chromatography (LC) tandem mass spectrometry. The proteome was examined by digestion with trypsin, followed by the analysis of peptides with LC-MS/MS. Metabolite profiling detected 8510 molecular features of which 9 were detected only in UF from Day 14-16 pregnant ewes that function in fatty acid transport (carnitines), hormone synthesis (androstenedione like), and availability of nutrients (valine). Proteome analysis detected 783 proteins present by Days 14-16 of pregnancy in UF, 7 of which are as follows: annexin (ANX) A1, A2, and A5; calcium-binding protein (S100A11); profilin 1; trophoblast kunitz domain protein 1 (TKDP); and interferon tau (IFNT). These proteins function in endocytosis, exocytosis, calcium signaling, and inhibition of prostaglandins (annexins and S100A11); protecting against maternal proteases (TKDP); remodeling cytoskeleton (profilin 1); and altering uterine release of prostaglandin F2 alpha as well as inducing IFNT-stimulated genes in the endometrium and the corpus luteum (IFNT). Identifying metabolites and proteins produced by the uterus and conceptus advances our understanding of embryo/maternal signaling and provides insights into possible the causes of reproductive failure.
Collapse
|
35
|
Development of an LC-MS/MS peptide mapping protocol for the NISTmAb. Anal Bioanal Chem 2018; 410:2111-2126. [PMID: 29411091 PMCID: PMC5830484 DOI: 10.1007/s00216-018-0848-6] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 12/04/2017] [Accepted: 01/03/2018] [Indexed: 11/12/2022]
Abstract
Peptide mapping is a component of the analytical toolbox used within the biopharmaceutical industry to aid in the identity confirmation of a protein therapeutic and to monitor degradative events such as oxidation or deamidation. These methods offer the advantage of providing site-specific information regarding post-translational and chemical modifications that may arise during production, processing or storage. A number of such variations may also be induced by the sample preparation methods themselves which may confound the ability to accurately evaluate the true modification levels. One important focus when developing a peptide mapping method should therefore be the use of sample preparation conditions that will minimize the degree of artificial modifications induced. Unfortunately, the conditions that are amenable to effective reduction, alkylation and digestion are often the same conditions that promote unwanted modifications. Here we describe the optimization of a tryptic digestion protocol used for peptide mapping of the NISTmAb IgG1κ which addresses the challenge of balancing maximum digestion efficiency with minimum artificial modifications. The parameters on which we focused include buffer concentration, digestion time and temperature, as well as the source and type of trypsin (recombinant vs. pancreatic; bovine vs porcine) used. Using the optimized protocol we generated a peptide map of the NISTmAb which allowed us to confirm its identity at the level of primary structure. Graphical abstract Peptide map of the NISTmAb RM 8671 monoclonal antibody. Tryptic digestion was performed using an optimized protocol and followed by LC-UV-MS analysis. The trace represents the total ion chromatogram. Each peak was mapped to peptides identified using mass spectrometry data.
Collapse
|
36
|
QCloud: A cloud-based quality control system for mass spectrometry-based proteomics laboratories. PLoS One 2018; 13:e0189209. [PMID: 29324744 PMCID: PMC5764250 DOI: 10.1371/journal.pone.0189209] [Citation(s) in RCA: 88] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 11/21/2017] [Indexed: 01/03/2023] Open
Abstract
The increasing number of biomedical and translational applications in mass spectrometry-based proteomics poses new analytical challenges and raises the need for automated quality control systems. Despite previous efforts to set standard file formats, data processing workflows and key evaluation parameters for quality control, automated quality control systems are not yet widespread among proteomics laboratories, which limits the acquisition of high-quality results, inter-laboratory comparisons and the assessment of variability of instrumental platforms. Here we present QCloud, a cloud-based system to support proteomics laboratories in daily quality assessment using a user-friendly interface, easy setup, automated data processing and archiving, and unbiased instrument evaluation. QCloud supports the most common targeted and untargeted proteomics workflows, it accepts data formats from different vendors and it enables the annotation of acquired data and reporting incidences. A complete version of the QCloud system has successfully been developed and it is now open to the proteomics community (http://qcloud.crg.eu). QCloud system is an open source project, publicly available under a Creative Commons License Attribution-ShareAlike 4.0.
Collapse
|
37
|
OpenMS – A platform for reproducible analysis of mass spectrometry data. J Biotechnol 2017; 261:142-148. [DOI: 10.1016/j.jbiotec.2017.05.016] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 05/17/2017] [Accepted: 05/22/2017] [Indexed: 10/19/2022]
|
38
|
Protein complex analysis: From raw protein lists to protein interaction networks. MASS SPECTROMETRY REVIEWS 2017; 36:600-614. [PMID: 26709718 DOI: 10.1002/mas.21485] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Accepted: 11/17/2015] [Indexed: 06/05/2023]
Abstract
The elucidation of molecular interaction networks is one of the pivotal challenges in the study of biology. Affinity purification-mass spectrometry and other co-complex methods have become widely employed experimental techniques to identify protein complexes. These techniques typically suffer from a high number of false negatives and false positive contaminants due to technical shortcomings and purification biases. To support a diverse range of experimental designs and approaches, a large number of computational methods have been proposed to filter, infer and validate protein interaction networks from experimental pull-down MS data. Nevertheless, this expansion of available methods complicates the selection of the most optimal ones to support systems biology-driven knowledge extraction. In this review, we give an overview of the most commonly used computational methods to process and interpret co-complex results, and we discuss the issues and unsolved problems that still exist within the field. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:600-614, 2017.
Collapse
|
39
|
Multi-laboratory assessment of reproducibility, qualitative and quantitative performance of SWATH-mass spectrometry. Nat Commun 2017; 8:291. [PMID: 28827567 PMCID: PMC5566333 DOI: 10.1038/s41467-017-00249-5] [Citation(s) in RCA: 338] [Impact Index Per Article: 48.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 06/12/2017] [Indexed: 01/15/2023] Open
Abstract
Quantitative proteomics employing mass spectrometry is an indispensable tool in life science research. Targeted proteomics has emerged as a powerful approach for reproducible quantification but is limited in the number of proteins quantified. SWATH-mass spectrometry consists of data-independent acquisition and a targeted data analysis strategy that aims to maintain the favorable quantitative characteristics (accuracy, sensitivity, and selectivity) of targeted proteomics at large scale. While previous SWATH-mass spectrometry studies have shown high intra-lab reproducibility, this has not been evaluated between labs. In this multi-laboratory evaluation study including 11 sites worldwide, we demonstrate that using SWATH-mass spectrometry data acquisition we can consistently detect and reproducibly quantify >4000 proteins from HEK293 cells. Using synthetic peptide dilution series, we show that the sensitivity, dynamic range and reproducibility established with SWATH-mass spectrometry are uniformly achieved. This study demonstrates that the acquisition of reproducible quantitative proteomics data by multiple labs is achievable, and broadly serves to increase confidence in SWATH-mass spectrometry data acquisition as a reproducible method for large-scale protein quantification.SWATH-mass spectrometry consists of a data-independent acquisition and a targeted data analysis strategy that aims to maintain the favorable quantitative characteristics on the scale of thousands of proteins. Here, using data generated by eleven groups worldwide, the authors show that SWATH-MS is capable of generating highly reproducible data across different laboratories.
Collapse
|
40
|
Optimizing High-Resolution Mass Spectrometry for the Identification of Low-Abundance Post-Translational Modifications of Intact Proteins. J Proteome Res 2017; 16:3255-3265. [PMID: 28738681 DOI: 10.1021/acs.jproteome.7b00244] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Intact protein analysis by liquid chromatography-mass spectrometry (LC-MS) is now possible due to the improved capabilities of mass spectrometers yielding greater resolution, mass accuracy, and extended mass ranges. Concurrent measurement of post-translational modifications (PTMs) during LC-MS of intact proteins is advantageous while monitoring critical proteoform status, such as for clinical samples or during production of reference materials. However, difficulties exist for PTM identification when the protein is large or contains multiple modification sites. In this work, analyses of low-abundance proteoforms of proteins of clinical or therapeutic interest, including C-reactive protein, vitamin D-binding protein, transferrin, and immunoglobulin G (NISTmAb), were performed on an Orbitrap Elite mass spectrometer. This work investigated the effect of various instrument parameters including source temperatures, in-source CID, microscan type and quantity, resolution, and automatic gain control on spectral quality. The signal-to-noise ratio was found to be a suitable spectral attribute which facilitated identification of low abundance PTMs. Source temperature and CID voltage were found to require specific optimization for each protein. This study identifies key instrumental parameters requiring optimization for improved detection of a variety of PTMs by LC-MS and establishes a methodological framework to ensure robust proteoform identifications, the first step in their ultimate quantification.
Collapse
|
41
|
Quantitative proteomics: challenges and opportunities in basic and applied research. Nat Protoc 2017; 12:1289-1294. [DOI: 10.1038/nprot.2017.040] [Citation(s) in RCA: 149] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
42
|
ROTS: An R package for reproducibility-optimized statistical testing. PLoS Comput Biol 2017; 13:e1005562. [PMID: 28542205 PMCID: PMC5470739 DOI: 10.1371/journal.pcbi.1005562] [Citation(s) in RCA: 77] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Revised: 06/14/2017] [Accepted: 05/10/2017] [Indexed: 12/21/2022] Open
Abstract
Differential expression analysis is one of the most common types of analyses performed on various biological data (e.g. RNA-seq or mass spectrometry proteomics). It is the process that detects features, such as genes or proteins, showing statistically significant differences between the sample groups under comparison. A major challenge in the analysis is the choice of an appropriate test statistic, as different statistics have been shown to perform well in different datasets. To this end, the reproducibility-optimized test statistic (ROTS) adjusts a modified t-statistic according to the inherent properties of the data and provides a ranking of the features based on their statistical evidence for differential expression between two groups. ROTS has already been successfully applied in a range of different studies from transcriptomics to proteomics, showing competitive performance against other state-of-the-art methods. To promote its widespread use, we introduce here a Bioconductor R package for performing ROTS analysis conveniently on different types of omics data. To illustrate the benefits of ROTS in various applications, we present three case studies, involving proteomics and RNA-seq data from public repositories, including both bulk and single cell data. The package is freely available from Bioconductor (https://www.bioconductor.org/packages/ROTS).
Collapse
|
43
|
The Human Proteome Organization-Proteomics Standards Initiative Quality Control Working Group: Making Quality Control More Accessible for Biological Mass Spectrometry. Anal Chem 2017; 89:4474-4479. [PMID: 28318237 DOI: 10.1021/acs.analchem.6b04310] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
To have confidence in results acquired during biological mass spectrometry experiments, a systematic approach to quality control is of vital importance. Nonetheless, until now, only scattered initiatives have been undertaken to this end, and these individual efforts have often not been complementary. To address this issue, the Human Proteome Organization-Proteomics Standards Initiative has established a new working group on quality control at its meeting in the spring of 2016. The goal of this working group is to provide a unifying framework for quality control data. The initial focus will be on providing a community-driven standardized file format for quality control. For this purpose, the previously proposed qcML format will be adapted to support a variety of use cases for both proteomics and metabolomics applications, and it will be established as an official PSI format. An important consideration is to avoid enforcing restrictive requirements on quality control but instead provide the basic technical necessities required to support extensive quality control for any type of mass spectrometry-based workflow. We want to emphasize that this is an open community effort, and we seek participation from all scientists with an interest in this field.
Collapse
|
44
|
Abstract
Because proteomics experiments are so complex they can readily fail, and do so without clear cause. Using standard experimental design techniques and incorporating quality control can greatly increase the chances of success. This chapter introduces the relevant concepts and provides examples specific to proteomic workflows. Applying these notions to design successful proteomics experiments is straightforward. It can help identify failure causes and greatly increase the likelihood of inter-laboratory reproducibility.
Collapse
|
45
|
Computational quality control tools for mass spectrometry proteomics. Proteomics 2016; 17. [DOI: 10.1002/pmic.201600159] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Revised: 07/28/2016] [Accepted: 08/19/2016] [Indexed: 12/30/2022]
|
46
|
Exploring the Bone Proteome to Help Explain Altered Bone Remodeling and Preservation of Bone Architecture and Strength in Hibernating Marmots. Physiol Biochem Zool 2016; 89:364-76. [DOI: 10.1086/687413] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
47
|
Abstract
Mass spectrometry (MS) coupled to liquid chromatography (LC) is a commonly used technique in metabolomic and proteomic research. As the size and complexity of LC-MS-based experiments grow, it becomes increasingly more difficult to perform quality control of both raw data and processing results. In a practical setting, quality control steps for raw LC-MS data are often overlooked, and assessment of an experiment's success is based on some derived metrics such as "the number of identified compounds". The human brain interprets visual data much better than plain text, hence the saying "a picture is worth a thousand words". Here, we present the BatMass software package, which allows for performing quick quality control of raw LC-MS data through its fast visualization capabilities. It also serves as a testbed for developers of LC-MS data processing algorithms by providing a data access library for open mass spectrometry file formats and a means of visually mapping processing results back to the original data. We illustrate the utility of BatMass with several use cases of quality control and data exploration.
Collapse
|
48
|
In-Depth Characterization and Spectral Library Building of Glycopeptides in the Tryptic Digest of a Monoclonal Antibody Using 1D and 2D LC–MS/MS. J Proteome Res 2016; 15:1472-86. [DOI: 10.1021/acs.jproteome.5b01046] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
49
|
Proteomic profiling of eccrine sweat reveals its potential as a diagnostic biofluid for active tuberculosis. Proteomics Clin Appl 2016; 10:547-53. [DOI: 10.1002/prca.201500071] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 11/24/2015] [Accepted: 02/29/2016] [Indexed: 11/11/2022]
|
50
|
Differentiating samples and experimental protocols by direct comparison of tandem mass spectra. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2016; 30:731-738. [PMID: 26864526 DOI: 10.1002/rcm.7494] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Revised: 12/16/2015] [Accepted: 12/20/2015] [Indexed: 06/05/2023]
Abstract
RATIONALE Peptide tandem mass spectra can be analyzed by a number of means. They can be compared against predicted spectra of peptides derived from genome sequences, compared against previously acquired and identified spectra, or - sometimes - sequenced de novo. We recently introduced another method which compares spectra between liquid chromatography/tandem mass spectrometry (LC/MS/MS) datasets to determine the shared spectral content, and demonstrated how this can be applied in a molecular phylogenetic study using sera from human and non-human primates. We will here explore if such a method have other, serendipitous uses. METHODS We used the existing compareMS2 algorithm without modification on a diverse set of experiments. RESULTS First we conducted a small phylogenetic study, using (mammalian) bone samples to study old material, and human pathogens aiming to distinguish clinically important strains. Although not as straightforward as primate sera analysis, the method shows significant promise for all these applications. We also used the algorithm to compare 24 different protocols for extraction of proteins from muscle tissue. The results provided useful information in comparing protocols. Finally, we applied compareMS2 aiming for quality control of two traceable protein reference standards (troponin) used in clinical chemistry assays, by analysing the effect of storage conditions. CONCLUSIONS The results illustrate a broad applicability of the metric based on shared tandem mass spectra between LC/MS/MS datasets for analysing protein digests in different types of experiments. There is no reason to assume that our instance of this method is optimal in any of these situations, as it makes limited or no use of accurate mass and chromatographic retention time. We propose that with further improvement and refinement, this type of analysis can be applied as a simple but informative first step in many pipelines for bottom-up tandem mass spectrometry data analysis in proteomics and other fields, comparing or analysing large numbers of samples or datasets.
Collapse
|