1
|
Etourneau L, Fancello L, Wieczorek S, Varoquaux N, Burger T. Penalized likelihood optimization for censored missing value imputation in proteomics. Biostatistics 2024; 26:kxaf006. [PMID: 40120089 DOI: 10.1093/biostatistics/kxaf006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 01/31/2025] [Accepted: 02/03/2025] [Indexed: 03/25/2025] Open
Abstract
Label-free bottom-up proteomics using mass spectrometry and liquid chromatography has long been established as one of the most popular high-throughput analysis workflows for proteome characterization. However, it produces data hindered by complex and heterogeneous missing values, which imputation has long remained problematic. To cope with this, we introduce Pirat, an algorithm that harnesses this challenge using an original likelihood maximization strategy. Notably, it models the instrument limit by learning a global censoring mechanism from the data available. Moreover, it estimates the covariance matrix between enzymatic cleavage products (ie peptides or precursor ions), while offering a natural way to integrate complementary transcriptomic information when multi-omic assays are available. Our benchmarking on several datasets covering a variety of experimental designs (number of samples, acquisition mode, missingness patterns, etc.) and using a variety of metrics (differential analysis ground truth or imputation errors) shows that Pirat outperforms all pre-existing imputation methods. Beyond the interest of Pirat as an imputation tool, these results pinpoint the need for a paradigm change in proteomics imputation, as most pre-existing strategies could be boosted by incorporating similar models to account for the instrument censorship or for the correlation structures, either grounded to the analytical pipeline or arising from a multi-omic approach.
Collapse
Affiliation(s)
- Lucas Etourneau
- Univ. Grenoble Alpes, CNRS, CEA, INSERM, BGE UA13, ProFI FR2048, EDyP, Bâtiment 42b, CEA de Grenoble, 17 avenue des Martyrs, 38054 Grenoble Cedex 9, France
- TIMC, Univ. Grenoble Alpes, CNRS, Grenoble INP, Laboratoire TIMC, Rond-Point de la Croix de Vie, 38700 La Tronche, France
| | - Laura Fancello
- Univ. Grenoble Alpes, CNRS, CEA, INSERM, BGE UA13, ProFI FR2048, EDyP, Bâtiment 42b, CEA de Grenoble, 17 avenue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Samuel Wieczorek
- Univ. Grenoble Alpes, CNRS, CEA, INSERM, BGE UA13, ProFI FR2048, EDyP, Bâtiment 42b, CEA de Grenoble, 17 avenue des Martyrs, 38054 Grenoble Cedex 9, France
| | - Nelle Varoquaux
- TIMC, Univ. Grenoble Alpes, CNRS, Grenoble INP, Laboratoire TIMC, Rond-Point de la Croix de Vie, 38700 La Tronche, France
| | - Thomas Burger
- Univ. Grenoble Alpes, CNRS, CEA, INSERM, BGE UA13, ProFI FR2048, EDyP, Bâtiment 42b, CEA de Grenoble, 17 avenue des Martyrs, 38054 Grenoble Cedex 9, France
| |
Collapse
|
2
|
Ryu SY, Yun MP, Kim S. Integrating Multiple Quantitative Proteomic Analyses Using MetaMSD. Methods Mol Biol 2023; 2426:361-374. [PMID: 36308697 DOI: 10.1007/978-1-0716-1967-4_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
MetaMSD is a proteomic software that integrates multiple quantitative mass spectrometry data analysis results using statistical summary combination approaches. By utilizing this software, scientists can combine results from their pilot and main studies to maximize their biomarker discovery while effectively controlling false discovery rates. It also works for combining proteomic datasets generated by different labeling techniques and/or different types of mass spectrometry instruments. With these advantages, MetaMSD enables biological researchers to explore various proteomic datasets in public repositories to discover new biomarkers and generate interesting hypotheses for future studies. In this protocol, we provide a step-by-step procedure on how to install and perform a meta-analysis for quantitative proteomics using MetaMSD.
Collapse
Affiliation(s)
- So Young Ryu
- School of Public Health, University of Nevada Reno, Reno, NV, USA.
| | - Miriam P Yun
- Department of Psychology Institute for Neuroscience, University of Nevada Reno, Reno, NV, USA
| | - Sujung Kim
- School of Public Health, University of Nevada Reno, Reno, NV, USA
| |
Collapse
|
3
|
Kong W, Hui HWH, Peng H, Goh WWB. Dealing with missing values in proteomics data. Proteomics 2022; 22:e2200092. [PMID: 36349819 DOI: 10.1002/pmic.202200092] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/15/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022]
Abstract
Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.
Collapse
Affiliation(s)
- Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Harvard Wai Hann Hui
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Hui Peng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.,Centre for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
4
|
Ryu SY, Wendt GA. MetaMSD: meta analysis for mass spectrometry data. PeerJ 2019; 7:e6699. [PMID: 30993040 PMCID: PMC6462182 DOI: 10.7717/peerj.6699] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 03/01/2019] [Indexed: 11/25/2022] Open
Abstract
Mass spectrometry-based proteomics facilitate disease understanding by providing protein abundance information about disease progression. For the same type of disease studies, multiple mass spectrometry datasets may be generated. Integrating multiple mass spectrometry datasets can provide valuable information that a single dataset analysis cannot provide. In this article, we introduce a meta-analysis software, MetaMSD (Meta Analysis for Mass Spectrometry Data) that is specifically designed for mass spectrometry data. Using Stouffer’s or Pearson’s test, MetaMSD detects significantly more differential proteins than the analysis based on the single best experiment. We demonstrate the performance of MetaMSD using simulated data, urinary proteomic data of kidney transplant patients, and breast cancer proteomic data. Noting the common practice of performing a pilot study prior to a main study, this software will help proteomics researchers fully utilize the benefit of multiple studies (or datasets), thus optimizing biomarker discovery. MetaMSD is a command line tool that automatically outputs various graphs and differential proteins with confidence scores. It is implemented in R and is freely available for public use at https://github.com/soyoungryu/MetaMSD. The user manual and data are available at the site. The user manual is written in such a way that scientists who are not familiar with R software can use MetaMSD.
Collapse
Affiliation(s)
- So Young Ryu
- School of Community Health Sciences, University of Nevada - Reno, Reno, NV, United States of America
| | - George A Wendt
- School of Community Health Sciences, University of Nevada - Reno, Reno, NV, United States of America.,Department of Epidemiology, University of California, Berkeley, Berkeley, CA, United States of America
| |
Collapse
|
5
|
Pascovici D, Wu JX, McKay MJ, Joseph C, Noor Z, Kamath K, Wu Y, Ranganathan S, Gupta V, Mirzaei M. Clinically Relevant Post-Translational Modification Analyses-Maturing Workflows and Bioinformatics Tools. Int J Mol Sci 2018; 20:E16. [PMID: 30577541 PMCID: PMC6337699 DOI: 10.3390/ijms20010016] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 12/09/2018] [Accepted: 12/17/2018] [Indexed: 01/04/2023] Open
Abstract
Post-translational modifications (PTMs) can occur soon after translation or at any stage in the lifecycle of a given protein, and they may help regulate protein folding, stability, cellular localisation, activity, or the interactions proteins have with other proteins or biomolecular species. PTMs are crucial to our functional understanding of biology, and new quantitative mass spectrometry (MS) and bioinformatics workflows are maturing both in labelled multiplexed and label-free techniques, offering increasing coverage and new opportunities to study human health and disease. Techniques such as Data Independent Acquisition (DIA) are emerging as promising approaches due to their re-mining capability. Many bioinformatics tools have been developed to support the analysis of PTMs by mass spectrometry, from prediction and identifying PTM site assignment, open searches enabling better mining of unassigned mass spectra-many of which likely harbour PTMs-through to understanding PTM associations and interactions. The remaining challenge lies in extracting functional information from clinically relevant PTM studies. This review focuses on canvassing the options and progress of PTM analysis for large quantitative studies, from choosing the platform, through to data analysis, with an emphasis on clinically relevant samples such as plasma and other body fluids, and well-established tools and options for data interpretation.
Collapse
Affiliation(s)
- Dana Pascovici
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia.
- Australian Proteome Analysis Facility, Macquarie University, Sydney, NSW 2109, Australia.
| | - Jemma X Wu
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia.
- Australian Proteome Analysis Facility, Macquarie University, Sydney, NSW 2109, Australia.
| | - Matthew J McKay
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia.
- Australian Proteome Analysis Facility, Macquarie University, Sydney, NSW 2109, Australia.
| | - Chitra Joseph
- Department of Clinical Medicine, Macquarie University, Sydney, NSW 2109, Australia.
| | - Zainab Noor
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia.
| | - Karthik Kamath
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia.
- Australian Proteome Analysis Facility, Macquarie University, Sydney, NSW 2109, Australia.
| | - Yunqi Wu
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia.
- Australian Proteome Analysis Facility, Macquarie University, Sydney, NSW 2109, Australia.
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia.
| | - Vivek Gupta
- Department of Clinical Medicine, Macquarie University, Sydney, NSW 2109, Australia.
| | - Mehdi Mirzaei
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia.
- Australian Proteome Analysis Facility, Macquarie University, Sydney, NSW 2109, Australia.
- Department of Clinical Medicine, Macquarie University, Sydney, NSW 2109, Australia.
| |
Collapse
|
6
|
Goeminne LJE, Gevaert K, Clement L. Experimental design and data-analysis in label-free quantitative LC/MS proteomics: A tutorial with MSqRob. J Proteomics 2017; 171:23-36. [PMID: 28391044 DOI: 10.1016/j.jprot.2017.04.004] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 03/29/2017] [Accepted: 04/01/2017] [Indexed: 12/14/2022]
Abstract
Label-free shotgun proteomics is routinely used to assess proteomes. However, extracting relevant information from the massive amounts of generated data remains difficult. This tutorial provides a strong foundation on analysis of quantitative proteomics data. We provide key statistical concepts that help researchers to design proteomics experiments and we showcase how to analyze quantitative proteomics data using our recent free and open-source R package MSqRob, which was developed to implement the peptide-level robust ridge regression method for relative protein quantification described by Goeminne et al. MSqRob can handle virtually any experimental proteomics design and outputs proteins ordered by statistical significance. Moreover, its graphical user interface and interactive diagnostic plots provide easy inspection and also detection of anomalies in the data and flaws in the data analysis, allowing deeper assessment of the validity of results and a critical review of the experimental design. Our tutorial discusses interactive preprocessing, data analysis and visualization of label-free MS-based quantitative proteomics experiments with simple and more complex designs. We provide well-documented scripts to run analyses in bash mode on GitHub, enabling the integration of MSqRob in automated pipelines on cluster environments (https://github.com/statOmics/MSqRob). SIGNIFICANCE The concepts outlined in this tutorial aid in designing better experiments and analyzing the resulting data more appropriately. The two case studies using the MSqRob graphical user interface will contribute to a wider adaptation of advanced peptide-based models, resulting in higher quality data analysis workflows and more reproducible results in the proteomics community. We also provide well-documented scripts for experienced users that aim at automating MSqRob on cluster environments.
Collapse
Affiliation(s)
- Ludger J E Goeminne
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium; VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biochemistry, Ghent University, Belgium; Bioinformatics Institute Ghent, Ghent University, Belgium.
| | - Kris Gevaert
- VIB-UGent Center for Medical Biotechnology, VIB, Belgium; Department of Biochemistry, Ghent University, Belgium; Bioinformatics Institute Ghent, Ghent University, Belgium.
| | - Lieven Clement
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium; Bioinformatics Institute Ghent, Ghent University, Belgium.
| |
Collapse
|
7
|
van Ooijen MP, Jong VL, Eijkemans MJC, Heck AJR, Andeweg AC, Binai NA, van den Ham HJ. Identification of differentially expressed peptides in high-throughput proteomics data. Brief Bioinform 2017; 19:971-981. [DOI: 10.1093/bib/bbx031] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Indexed: 12/25/2022] Open
Affiliation(s)
| | - Victor L Jong
- Department of Biostatistics and Research Support, Julius Center, UMC Utrecht, Netherlands
| | - Marinus J C Eijkemans
- Julius Center for Health Sciences and Primary Care of the University Medical Center Utrecht, Netherlands
| | - Albert J R Heck
- Biomolecular Mass Spectrometry and Proteomics, Utrecht University, Netherlands
| | - Arno C Andeweg
- Department of Viroscience, Erasmus MC, CA Rotterdam, Netherlands
| | - Nadine A Binai
- Biomolecular Mass Spectrometry Group, Utrecht University, Netherlands
| | | |
Collapse
|
8
|
Zhu D, Zhang P, Xie C, Zhang W, Sun J, Qian WJ, Yang B. Biodegradation of alkaline lignin by Bacillus ligniniphilus L1. BIOTECHNOLOGY FOR BIOFUELS 2017; 10:44. [PMID: 28239416 PMCID: PMC5320714 DOI: 10.1186/s13068-017-0735-y] [Citation(s) in RCA: 99] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 02/14/2017] [Indexed: 05/07/2023]
Abstract
BACKGROUND Lignin is the most abundant aromatic biopolymer in the biosphere and it comprises up to 30% of plant biomass. Although lignin is the most recalcitrant component of the plant cell wall, still there are microorganisms able to decompose it or degrade it. Fungi are recognized as the most widely used microbes for lignin degradation. However, bacteria have also been known to be able to utilize lignin as a carbon or energy source. Bacillus ligniniphilus L1 was selected in this study due to its capability to utilize alkaline lignin as a single carbon or energy source and its excellent ability to survive in extreme environments. RESULTS To investigate the aromatic metabolites of strain L1 decomposing alkaline lignin, GC-MS analysis was performed and fifteen single phenol ring aromatic compounds were identified. The dominant absorption peak included phenylacetic acid, 4-hydroxy-benzoicacid, and vanillic acid with the highest proportion of metabolites resulting in 42%. Comparison proteomic analysis was carried out for further study showed that approximately 1447 kinds of proteins were produced, 141 of which were at least twofold up-regulated with alkaline lignin as the single carbon source. The up-regulated proteins contents different categories in the biological functions of protein including lignin degradation, ABC transport system, environmental response factors, protein synthesis, assembly, etc. CONCLUSIONS GC-MS analysis showed that alkaline lignin degradation of strain L1 produced 15 kinds of aromatic compounds. Comparison proteomic data and metabolic analysis showed that to ensure the degradation of lignin and growth of strain L1, multiple aspects of cells metabolism including transporter, environmental response factors, and protein synthesis were enhanced. Based on genome and proteomic analysis, at least four kinds of lignin degradation pathway might be present in strain L1, including a Gentisate pathway, the benzoic acid pathway and the β-ketoadipate pathway. The study provides an important basis for lignin degradation by bacteria.
Collapse
Affiliation(s)
- Daochen Zhu
- School of Environment and safty Engineering, Jiangsu University, Zhenjiang, Jiangsu China
- State Key Laboratory of Microbial Culture Collection and Application, Guangdong Institute of Microbiology, Guangzhou, Guangdong China
| | - Peipei Zhang
- School of Environment and safty Engineering, Jiangsu University, Zhenjiang, Jiangsu China
| | - Changxiao Xie
- School of Environment and safty Engineering, Jiangsu University, Zhenjiang, Jiangsu China
| | - Weimin Zhang
- State Key Laboratory of Microbial Culture Collection and Application, Guangdong Institute of Microbiology, Guangzhou, Guangdong China
| | - Jianzhong Sun
- School of Environment and safty Engineering, Jiangsu University, Zhenjiang, Jiangsu China
| | - Wei-Jun Qian
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352 USA
| | - Bin Yang
- Bioproducts, Sciences and Engineering Laboratory, Department of Biological Systems Engineering, Washington State University, Richland, WA 99354 USA
| |
Collapse
|
9
|
Blein-Nicolas M, Zivy M. Thousand and one ways to quantify and compare protein abundances in label-free bottom-up proteomics. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2016; 1864:883-95. [PMID: 26947242 DOI: 10.1016/j.bbapap.2016.02.019] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 01/21/2016] [Accepted: 02/24/2016] [Indexed: 11/18/2022]
Abstract
How to process and analyze MS data to quantify and statistically compare protein abundances in bottom-up proteomics has been an open debate for nearly fifteen years. Two main approaches are generally used: the first is based on spectral data generated during the process of identification (e.g. peptide counting, spectral counting), while the second makes use of extracted ion currents to quantify chromatographic peaks and infer protein abundances based on peptide quantification. These two approaches actually refer to multiple methods which have been developed during the last decade, but were submitted to deep evaluations only recently. In this paper, we compiled these different methods as exhaustively as possible. We also summarized the way they address the different problems raised by bottom-up protein quantification such as normalization, the presence of shared peptides, unequal peptide measurability and missing data. This article is part of a Special Issue entitled: Plant Proteomics--a bridge between fundamental processes and crop production, edited by Dr. Hans-Peter Mock.
Collapse
Affiliation(s)
- Mélisande Blein-Nicolas
- GQE-Le Moulon, INRA, Univ Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, F-91190 Gif-sur-Yvette, France
| | - Michel Zivy
- GQE-Le Moulon, INRA, Univ Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, F-91190 Gif-sur-Yvette, France.
| |
Collapse
|
10
|
Lazar C, Gatto L, Ferro M, Bruley C, Burger T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J Proteome Res 2016; 15:1116-25. [DOI: 10.1021/acs.jproteome.5b00981] [Citation(s) in RCA: 286] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Cosmin Lazar
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
- CEA, iRTSV-BGE, F-38000 Grenoble, France
- INSERM, BGE, F-38000 Grenoble, France
| | - Laurent Gatto
- Computational Proteomics Unit, Cambridge CB2 1GA, United Kingdom
- Cambridge Center for Proteomics, Cambridge CB2 1GA, United Kingdom
| | - Myriam Ferro
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
- CEA, iRTSV-BGE, F-38000 Grenoble, France
- INSERM, BGE, F-38000 Grenoble, France
| | - Christophe Bruley
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
- CEA, iRTSV-BGE, F-38000 Grenoble, France
- INSERM, BGE, F-38000 Grenoble, France
| | - Thomas Burger
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France
- CNRS, iRTSV-BGE, F-38000 Grenoble, France
- CEA, iRTSV-BGE, F-38000 Grenoble, France
- INSERM, BGE, F-38000 Grenoble, France
| |
Collapse
|
11
|
Jung K. Statistical Aspects in Proteomic Biomarker Discovery. Methods Mol Biol 2016; 1362:293-310. [PMID: 26519185 DOI: 10.1007/978-1-4939-3106-4_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In the pursuit of a personalized medicine, i.e., the individual treatment of a patient, many medical decision problems are desired to be supported by biomarkers that can help to make a diagnosis, prediction, or prognosis. Proteomic biomarkers are of special interest since they can not only be detected in tissue samples but can also often be easily detected in diverse body fluids. Statistical methods play an important role in the discovery and validation of proteomic biomarkers. They are necessary in the planning of experiments, in the processing of raw signals, and in the final data analysis. This review provides an overview on the most frequent experimental settings including sample size considerations, and focuses on exploratory data analysis and classifier development.
Collapse
Affiliation(s)
- Klaus Jung
- Department of Medical Statistics, Georg-August-University Göttingen, Humboldtallee 32, 37073, Göttingen, Germany.
| |
Collapse
|
12
|
Computational and statistical methods for high-throughput analysis of post-translational modifications of proteins. J Proteomics 2015. [PMID: 26216596 DOI: 10.1016/j.jprot.2015.07.016] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The investigation of post-translational modifications (PTMs) represents one of the main research focuses for the study of protein function and cell signaling. Mass spectrometry instrumentation with increasing sensitivity improved protocols for PTM enrichment and recently established pipelines for high-throughput experiments allow large-scale identification and quantification of several PTM types. This review addresses the concurrently emerging challenges for the computational analysis of the resulting data and presents PTM-centered approaches for spectra identification, statistical analysis, multivariate analysis and data interpretation. We furthermore discuss the potential of future developments that will help to gain deep insight into the PTM-ome and its biological role in cells. This article is part of a Special Issue entitled: Computational Proteomics.
Collapse
|
13
|
|