1
|
Kalhor M, Lapin J, Picciani M, Wilhelm M. Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors into Peptide Identification. Mol Cell Proteomics 2024:100798. [PMID: 38871251 DOI: 10.1016/j.mcpro.2024.100798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/26/2024] [Accepted: 06/09/2024] [Indexed: 06/15/2024] Open
Abstract
Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities and future perspectives of this approach and its impact on mass spectrometry-based proteomics.
Collapse
Affiliation(s)
- Mostafa Kalhor
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Joel Lapin
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mario Picciani
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Mathias Wilhelm
- Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, Freising, Germany; Munich Data Science Institute, Technical University of Munich, Garching, Germany.
| |
Collapse
|
2
|
Gong Y, Ding W, Wang P, Wu Q, Yao X, Yang Q. Evaluating Machine Learning Methods of Analyzing Multiclass Metabolomics. J Chem Inf Model 2023; 63:7628-7641. [PMID: 38079572 DOI: 10.1021/acs.jcim.3c01525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]
Abstract
Multiclass metabolomic studies have become popular for revealing the differences in multiple stages of complex diseases, various lifestyles, or the effects of specific treatments. In multiclass metabolomics, there are multiple data manipulation steps for analyzing raw data, which consist of data filtering, the imputation of missing values, data normalization, marker identification, sample separation, classification, and so on. In each step, several to dozens of machine learning methods can be chosen for the given data set, with potentially hundreds or thousands of method combinations in the whole data processing chain. Therefore, a clear understanding of these machine learning methods is helpful for selecting an appropriate method combination for obtaining stable and reliable analytical results of specific data. However, there has rarely been an overall introduction or evaluation of these methods based on multiclass metabolomic data. Herein, detailed descriptions of these machine learning methods in multiple data manipulation steps are reviewed. Moreover, an assessment of these methods was performed using a benchmark data set for multiclass metabolomics. First, 12 imputation methods for imputing missing values were evaluated based on the PSS (Procrustes statistical shape analysis) and NRMSE (normalized root-mean-square error) values. Second, 17 normalization methods for processing multiclass metabolomic data were evaluated by applying the PMAD (pooled median absolute deviation) value. Third, different methods of identifying markers of multiclass metabolomics were evaluated based on the CWrel (relative weighted consistency) value. Fourth, nine classification methods for constructing multiclass models were assessed using the AUC (area under the curve) value. Performance evaluations of machine learning methods are highly recommended to select the most appropriate method combination before performing the final analysis of the given data. Overall, detailed descriptions and evaluation of various machine learning methods are expected to improve analyses of multiclass metabolomic data.
Collapse
Affiliation(s)
- Yaguo Gong
- State Key Laboratory of Quality Research in Chinese Medicine, School of Pharmacy, Macau University of Science and Technology, Macau 999078, China
| | - Wei Ding
- State Key Laboratory of Quality Research in Chinese Medicine, School of Pharmacy, Macau University of Science and Technology, Macau 999078, China
| | - Panpan Wang
- College of Chemistry and Pharmaceutical Engineering, Huanghuai University, Zhumadian 463000, China
| | - Qibiao Wu
- State Key Laboratory of Quality Research in Chinese Medicine, School of Pharmacy, Macau University of Science and Technology, Macau 999078, China
| | - Xiaojun Yao
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| |
Collapse
|
3
|
Yang Q, Gong Y, Zhu F. Critical Assessment of the Biomarker Discovery and Classification Methods for Multiclass Metabolomics. Anal Chem 2023; 95:5542-5552. [PMID: 36944135 DOI: 10.1021/acs.analchem.2c04402] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Abstract
Multiclass metabolomics has been widely applied in clinical practice to understand pathophysiological processes involved in disease progression and diagnostic biomarkers of various disorders. In contrast to the binary problem, the multiclass classification problem is more difficult in terms of obtaining reliable and stable results due to the increase in the complexity of determining exact class decision boundaries. In particular, methods of biomarker discovery and classification have a significant effect on the multiclass model because different methods with significantly varied theories produce conflicting results even for the same dataset. However, a systematic assessment for selecting the most appropriate methods of biomarker discovery and classification for multiclass metabolomics is still lacking. Therefore, a comprehensive assessment is essential to measure the suitability of methods in multiclass classification models from multiple perspectives. In this study, five biomarker discovery methods and nine classification methods were assessed based on four benchmark datasets of multiclass metabolomics. The performance assessment of the biomarker discovery and classification methods was performed using three evaluation criteria: assessment a (cluster analysis of sample grouping), assessment b (biomarker consistency in multiple subgroups), and assessment c (accuracy in the classification model). As a result, 13 combining strategies with superior performance were selected under multiple criteria based on these benchmark datasets. In conclusion, superior strategies that performed consistently well are suggested for the discovery of biomarkers and the construction of a classification model for multiclass metabolomics.
Collapse
Affiliation(s)
- Qingxia Yang
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yaguo Gong
- School of Pharmacy, Macau University of Science and Technology, Macau 999078, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
4
|
Mohammed Y, Goodlett D, Borchers CH. Bioinformatics Tools and Knowledgebases to Assist Generating Targeted Assays for Plasma Proteomics. Methods Mol Biol 2023; 2628:557-577. [PMID: 36781806 DOI: 10.1007/978-1-0716-2978-9_32] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
In targeted proteomics experiments, selecting the appropriate proteotypic peptides as surrogate for the target protein is a crucial pre-acquisition step. This step is largely a bioinformatics exercise that involves integrating information on the peptides and proteins and using various software tools and knowledgebases. We present here a few resources that automate and simplify the selection process to a great degree. These tools and knowledgebases were developed primarily to streamline targeted proteomics assay development and include PeptidePicker, PeptidePickerDB, MRMAssayDB, MouseQuaPro, and PeptideTracker. We have used these tools to develop and document thousands of targeted proteomics assays, many of them for plasma proteins with focus on human and mouse. An important aspect in all these resources is the integrative approach on which they are based. Using these tools in the first steps of designing a singleplexed or multiplexed targeted proteomic experiment can reduce the necessary experimental steps tremendously. All the tools and knowledgebases we describe here are Web-based and freely accessible so scientists can query the information conveniently from the browser. This chapter provides an overview of these software tools and knowledgebases, their content, and how to use them for targeted plasma proteomics. We further demonstrate how to use them with the results of the HUPO Human Plasma Proteome Project to produce a new database of 3.8 k targeted assays for known human plasma proteins. Upon experimental validation, these assays should help in the further quantitative characterizing of the plasma proteome.
Collapse
Affiliation(s)
- Yassene Mohammed
- Center for Proteomics and Metabolomics, Leiden University Medical Center, Leiden, ZA, Netherlands. .,University of Victoria - Genome BC Proteomics Centre, Victoria, BC, Canada. .,Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada.
| | - David Goodlett
- University of Victoria - Genome BC Proteomics Centre, Victoria, BC, Canada.,Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada.,University of Gdansk, International Centre for Cancer Vaccine Science, Gdansk, Poland
| | - Christoph H Borchers
- Proteomics Centre, Segal Cancer Centre, Lady Davis Institute, Jewish General Hospital, McGill University, Montreal, QC, Canada.,Gerald Bronfman Department of Oncology, Jewish General Hospital, Montreal, QC, Canada.,Division of Experimental Medicine, McGill University, Montreal, QC, Canada.,Department of Pathology, McGill University, Montreal, QC, Canada
| |
Collapse
|
5
|
Rusilowicz M, Newman DW, Creamer DR, Johnson J, Adair K, Harman VM, Grant CM, Beynon RJ, Hubbard SJ. AlacatDesigner─Computational Design of Peptide Concatamers for Protein Quantitation. J Proteome Res 2023; 22:594-604. [PMID: 36688735 PMCID: PMC9903321 DOI: 10.1021/acs.jproteome.2c00608] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Protein quantitation via mass spectrometry relies on peptide proxies for the parent protein from which abundances are estimated. Owing to the variability in signal from individual peptides, accurate absolute quantitation usually relies on the addition of an external standard. Typically, this involves stable isotope-labeled peptides, delivered singly or as a concatenated recombinant protein. Consequently, the selection of the most appropriate surrogate peptides and the attendant design in recombinant proteins termed QconCATs are challenges for proteome science. QconCATs can now be built in a "a-la-carte" assembly method using synthetic biology: ALACATs. To assist their design, we present "AlacatDesigner", a tool that supports the peptide selection for recombinant protein standards based on the user's target protein. The user-customizable tool considers existing databases, occurrence in the literature, potential post-translational modifications, predicted miscleavage, predicted divergence of the peptide and protein quantifications, and ionization potential within the mass spectrometer. We show that peptide selections are enriched for good proteotypic and quantotypic candidates compared to empirical data. The software is freely available to use either via a web interface AlacatDesigner, downloaded as a Desktop application or imported as a Python package for the command line interface or in scripts.
Collapse
Affiliation(s)
- Martin Rusilowicz
- Division
of Evolution, Infection and Genomics, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - David W. Newman
- Division
of Evolution, Infection and Genomics, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - Declan R. Creamer
- Division
of Molecular and Cellular Function, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - James Johnson
- GeneMill,
Institute of Systems Molecular and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, United
Kingdom
| | - Kareena Adair
- Centre
for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, United
Kingdom
| | - Victoria M. Harman
- Centre
for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, United
Kingdom
| | - Chris M. Grant
- Division
of Molecular and Cellular Function, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom
| | - Robert J. Beynon
- Centre
for Proteome Research, Institute of Systems and Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, United
Kingdom
| | - Simon J. Hubbard
- Division
of Evolution, Infection and Genomics, School of Biological Sciences,
Faculty of Biology, Medicine and Health, Manchester Academic Health
Science Centre, University of Manchester, Manchester M13 9PT, United Kingdom,
| |
Collapse
|
6
|
Morales-Martínez A, Bertrand B, Hernández-Meza JM, Garduño-Juárez R, Silva-Sanchez J, Munoz-Garay C. Membrane fluidity, composition, and charge affect the activity and selectivity of the AMP ascaphin-8. Biophys J 2022; 121:3034-3048. [PMID: 35842753 PMCID: PMC9463648 DOI: 10.1016/j.bpj.2022.07.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 06/28/2022] [Accepted: 07/12/2022] [Indexed: 12/29/2022] Open
Abstract
Ascaphins are cationic antimicrobial peptides that have been shown to have potential in the treatment of infectious diseases caused by multidrug-resistant pathogens (MDR). However, to date, their principal molecular target and mechanism of action are unknown. Results from peptide prediction software and molecular dynamics simulations confirmed that ascaphin-8 is an alpha-helical peptide. For the first time, the peptide was described as membranotrophic using biophysical approaches including calcein liposome leakage, Laurdan general polarization, and dynamic light scattering. Ascaphin-8's activity and selectivity were modulated by rearranging the spatial distribution of lysine (Var-K5), aspartic acid (Var-D4) residues, or substitution of phenylalanine with tyrosine (Var-Y). The parental peptide and its variants presented high affinity toward the bacterial membrane model (≤2 μM), but lost activity in sterol-enriched membranes (mammal and fungal models, with cholesterol and ergosterol, respectively). The peptide-induced pore size was estimated to be >20 nm in the bacterial model, with no difference among peptides. The same pattern was observed in membrane fluidity (general polarization) assays, where all peptides reduced membrane fluidity of the bacterial model but not in the models containing sterols. The peptides also showed high activity toward MDR bacteria. Moreover, peptide sensitivity of the artificial membrane models compared with pathogenic bacterial isolates were in good agreement.
Collapse
Affiliation(s)
- Adriana Morales-Martínez
- Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México (ICF-UNAM), Cuernavaca, Morelos, México
| | - Brandt Bertrand
- Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México (ICF-UNAM), Cuernavaca, Morelos, México
| | - Juan M Hernández-Meza
- Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México (ICF-UNAM), Cuernavaca, Morelos, México
| | - Ramón Garduño-Juárez
- Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México (ICF-UNAM), Cuernavaca, Morelos, México
| | - Jesús Silva-Sanchez
- Centro de Investigación sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Morelos, México
| | - Carlos Munoz-Garay
- Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México (ICF-UNAM), Cuernavaca, Morelos, México.
| |
Collapse
|
7
|
Dincer AB, Lu Y, Schweppe DK, Oh S, Noble WS. Reducing Peptide Sequence Bias in Quantitative Mass Spectrometry Data with Machine Learning. J Proteome Res 2022; 21:1771-1782. [PMID: 35696663 DOI: 10.1021/acs.jproteome.2c00211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Quantitative mass spectrometry measurements of peptides necessarily incorporate sequence-specific biases that reflect the behavior of the peptide during enzymatic digestion and liquid chromatography and in a mass spectrometer. These sequence-specific effects impair quantification accuracy, yielding peptide quantities that are systematically under- or overestimated. We provide empirical evidence for the existence of such biases, and we use a deep neural network, called Pepper, to automatically identify and reduce these biases. The model generalizes to new proteins and new runs within a related set of tandem mass spectrometry experiments, and the learned coefficients themselves reflect expected physicochemical properties of the corresponding peptide sequences. The resulting adjusted abundance measurements are more correlated with mRNA-based gene expression measurements than the unadjusted measurements. Pepper is suitable for data generated on a variety of mass spectrometry instruments and can be used with labeled or label-free approaches and with data-independent or data-dependent acquisition.
Collapse
Affiliation(s)
- Ayse B Dincer
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Yang Lu
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Devin K Schweppe
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States.,Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
8
|
Gao Z, Chang C, Yang J, Zhu Y, Fu Y. AP3: An Advanced Proteotypic Peptide Predictor for Targeted Proteomics by Incorporating Peptide Digestibility. Anal Chem 2019; 91:8705-8711. [DOI: 10.1021/acs.analchem.9b02520] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Zhiqiang Gao
- National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Cheng Chang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
| | - Jinghan Yang
- National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yunping Zhu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China
- Anhui Medical University, Hefei 230032, China
| | - Yan Fu
- National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
9
|
Zimmer D, Schneider K, Sommer F, Schroda M, Mühlhaus T. Artificial Intelligence Understands Peptide Observability and Assists With Absolute Protein Quantification. FRONTIERS IN PLANT SCIENCE 2018; 9:1559. [PMID: 30483279 PMCID: PMC6242780 DOI: 10.3389/fpls.2018.01559] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 10/04/2018] [Indexed: 05/20/2023]
Abstract
Targeted mass spectrometry has become the method of choice to gain absolute quantification information of high quality, which is essential for a quantitative understanding of biological systems. However, the design of absolute protein quantification assays remains challenging due to variations in peptide observability and incomplete knowledge about factors influencing peptide detectability. Here, we present a deep learning algorithm for peptide detectability prediction, d::pPop, which allows the informed selection of synthetic proteotypic peptides for the successful design of targeted proteomics quantification assays. The deep neural network is able to learn a regression model that relates the physicochemical properties of a peptide to its ion intensity detected by mass spectrometry. The approach makes use of experimentally detected deviations from the assumed equimolar abundance of all peptides derived from a given protein. Trained on extensive proteomics datasets, d::pPop's plant and non-plant specific models can predict the quality of proteotypic peptides for not yet experimentally identified proteins. Interrogating the deep neural network after learning from ~76,000 peptides per model organism allows to investigate the impact of different physicochemical properties on the observability of a peptide, thus providing insights into peptide observability as a multifaceted process. Empirical evaluation with rank accuracy metrics showed that our prediction approach outperforms existing algorithms. We circumvent the delicate step of selecting positive and negative training sets and at the same time also more closely reflect the need for selecting the top most promising peptides for targeting a protein of interest. Further, we used an artificial QconCAT protein to experimentally validate the observability prediction. Our proteotypic peptide prediction approach not only facilitates the design of absolute protein quantification assays via a user-friendly web interface but also enables the selection of proteotypic peptides for not yet observed proteins, hence rendering the tool especially useful for plant research.
Collapse
Affiliation(s)
- David Zimmer
- Computational Systems BiologyTU Kaiserslautern, Kaiserslautern, Germany
| | - Kevin Schneider
- Computational Systems BiologyTU Kaiserslautern, Kaiserslautern, Germany
| | - Frederik Sommer
- Molekulare Biotechnologie & SystembiologieTU Kaiserslautern, Kaiserslautern, Germany
| | - Michael Schroda
- Molekulare Biotechnologie & SystembiologieTU Kaiserslautern, Kaiserslautern, Germany
| | - Timo Mühlhaus
- Computational Systems BiologyTU Kaiserslautern, Kaiserslautern, Germany
| |
Collapse
|
10
|
Manes NP, Nita-Lazar A. Application of targeted mass spectrometry in bottom-up proteomics for systems biology research. J Proteomics 2018; 189:75-90. [PMID: 29452276 DOI: 10.1016/j.jprot.2018.02.008] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 01/25/2018] [Accepted: 02/07/2018] [Indexed: 02/08/2023]
Abstract
The enormous diversity of proteoforms produces tremendous complexity within cellular proteomes, facilitates intricate networks of molecular interactions, and constitutes a formidable analytical challenge for biomedical researchers. Currently, quantitative whole-proteome profiling often relies on non-targeted liquid chromatography-mass spectrometry (LC-MS), which samples proteoforms broadly, but can suffer from lower accuracy, sensitivity, and reproducibility compared with targeted LC-MS. Recent advances in bottom-up proteomics using targeted LC-MS have enabled previously unachievable identification and quantification of target proteins and posttranslational modifications within complex samples. Consequently, targeted LC-MS is rapidly advancing biomedical research, especially systems biology research in diverse areas that include proteogenomics, interactomics, kinomics, and biological pathway modeling. With the recent development of targeted LC-MS assays for nearly the entire human proteome, targeted LC-MS is positioned to enable quantitative proteomic profiling of unprecedented quality and accessibility to support fundamental and clinical research. Here we review recent applications of bottom-up proteomics using targeted LC-MS for systems biology research. SIGNIFICANCE: Advances in targeted proteomics are rapidly advancing systems biology research. Recent applications include systems-level investigations focused on posttranslational modifications (such as phosphoproteomics), protein conformation, protein-protein interaction, kinomics, proteogenomics, and metabolic and signaling pathways. Notably, absolute quantification of metabolic and signaling pathway proteins has enabled accurate pathway modeling and engineering. Integration of targeted proteomics with other technologies, such as RNA-seq, has facilitated diverse research such as the identification of hundreds of "missing" human proteins (genes and transcripts that appear to encode proteins but direct experimental evidence was lacking).
Collapse
Affiliation(s)
- Nathan P Manes
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Aleksandra Nita-Lazar
- Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
11
|
Hoofnagle AN, Whiteaker JR, Carr SA, Kuhn E, Liu T, Massoni SA, Thomas SN, Townsend RR, Zimmerman LJ, Boja E, Chen J, Crimmins DL, Davies SR, Gao Y, Hiltke TR, Ketchum KA, Kinsinger CR, Mesri M, Meyer MR, Qian WJ, Schoenherr RM, Scott MG, Shi T, Whiteley GR, Wrobel JA, Wu C, Ackermann BL, Aebersold R, Barnidge DR, Bunk DM, Clarke N, Fishman JB, Grant RP, Kusebauch U, Kushnir MM, Lowenthal MS, Moritz RL, Neubert H, Patterson SD, Rockwood AL, Rogers J, Singh RJ, Van Eyk JE, Wong SH, Zhang S, Chan DW, Chen X, Ellis MJ, Liebler DC, Rodland KD, Rodriguez H, Smith RD, Zhang Z, Zhang H, Paulovich AG. Recommendations for the Generation, Quantification, Storage, and Handling of Peptides Used for Mass Spectrometry-Based Assays. Clin Chem 2016; 62:48-69. [PMID: 26719571 DOI: 10.1373/clinchem.2015.250563] [Citation(s) in RCA: 151] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND For many years, basic and clinical researchers have taken advantage of the analytical sensitivity and specificity afforded by mass spectrometry in the measurement of proteins. Clinical laboratories are now beginning to deploy these work flows as well. For assays that use proteolysis to generate peptides for protein quantification and characterization, synthetic stable isotope-labeled internal standard peptides are of central importance. No general recommendations are currently available surrounding the use of peptides in protein mass spectrometric assays. CONTENT The Clinical Proteomic Tumor Analysis Consortium of the National Cancer Institute has collaborated with clinical laboratorians, peptide manufacturers, metrologists, representatives of the pharmaceutical industry, and other professionals to develop a consensus set of recommendations for peptide procurement, characterization, storage, and handling, as well as approaches to the interpretation of the data generated by mass spectrometric protein assays. Additionally, the importance of carefully characterized reference materials-in particular, peptide standards for the improved concordance of amino acid analysis methods across the industry-is highlighted. The alignment of practices around the use of peptides and the transparency of sample preparation protocols should allow for the harmonization of peptide and protein quantification in research and clinical care.
Collapse
Affiliation(s)
| | | | | | | | - Tao Liu
- Pacific Northwest National Laboratory, Richland, WA
| | | | | | | | | | | | - Jing Chen
- Johns Hopkins University, Baltimore, MD
| | | | | | - Yuqian Gao
- Pacific Northwest National Laboratory, Richland, WA
| | | | | | | | | | | | - Wei-Jun Qian
- Pacific Northwest National Laboratory, Richland, WA
| | | | | | - Tujin Shi
- Pacific Northwest National Laboratory, Richland, WA
| | | | - John A Wrobel
- University of North Carolina School of Medicine, Chapel Hill, NC
| | - Chaochao Wu
- Pacific Northwest National Laboratory, Richland, WA
| | | | - Ruedi Aebersold
- Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | | | | | | | | | - Russ P Grant
- Laboratory Corporation of America Holdings, Inc., Burlington, NC
| | | | - Mark M Kushnir
- University of Utah and ARUP Laboratories, Salt Lake City, UT
| | | | | | | | | | - Alan L Rockwood
- University of Utah and ARUP Laboratories, Salt Lake City, UT
| | | | | | | | | | | | | | - Xian Chen
- University of North Carolina School of Medicine, Chapel Hill, NC
| | | | | | | | | | | | | | - Hui Zhang
- Johns Hopkins University, Baltimore, MD
| | | |
Collapse
|
12
|
Ma S, Downard KM, Wong JW. FluClass: A novel algorithm and approach to score and visualize the phylogeny of the influenza virus using mass spectrometry. Anal Chim Acta 2015; 895:54-61. [DOI: 10.1016/j.aca.2015.09.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 08/29/2015] [Accepted: 09/03/2015] [Indexed: 10/23/2022]
|
13
|
Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat Protoc 2015; 10:426-41. [PMID: 25675208 DOI: 10.1038/nprot.2015.015] [Citation(s) in RCA: 220] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Targeted proteomics by selected/multiple reaction monitoring (S/MRM) or, on a larger scale, by SWATH (sequential window acquisition of all theoretical spectra) MS (mass spectrometry) typically relies on spectral reference libraries for peptide identification. Quality and coverage of these libraries are therefore of crucial importance for the performance of the methods. Here we present a detailed protocol that has been successfully used to build high-quality, extensive reference libraries supporting targeted proteomics by SWATH MS. We describe each step of the process, including data acquisition by discovery proteomics, assertion of peptide-spectrum matches (PSMs), generation of consensus spectra and compilation of MS coordinates that uniquely define each targeted peptide. Crucial steps such as false discovery rate (FDR) control, retention time normalization and handling of post-translationally modified peptides are detailed. Finally, we show how to use the library to extract SWATH data with the open-source software Skyline. The protocol takes 2-3 d to complete, depending on the extent of the library and the computational resources available.
Collapse
|
14
|
An Advanced Partial Discharge Recognition Strategy of Power Cable. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING 2015. [DOI: 10.1155/2015/174538] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Detection and localization of partial discharge are very important in condition monitoring of power cables, so it is necessary to build an accurate recognizer to recognize the discharge types. In this paper, firstly, a power cable model based on FDTD simulation is built to get the typical discharge signals as training samples. Secondly, because the extraction of discharge signal features is crucial, fractal characteristics of the training samples are extracted and inputted into the recognizer. To make the results more accurate, multi-SVM recognizer made up of six Support Vector Machines (SVM) is proposed in this paper. The result of the multi-SVM recognizer is determined by the vote of the six SVM. Finally, the BP neural networks and ELM are compared with multi-SVM. The accuracy comparison shows that the multi-SVM recognizer has the best accuracy and stability, and it can recognize the discharge type efficiently.
Collapse
|
15
|
Muntel J, Boswell SA, Tang S, Ahmed S, Wapinski I, Foley G, Steen H, Springer M. Abundance-based classifier for the prediction of mass spectrometric peptide detectability upon enrichment (PPA). Mol Cell Proteomics 2014; 14:430-40. [PMID: 25473088 DOI: 10.1074/mcp.m114.044321] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The function of a large percentage of proteins is modulated by post-translational modifications (PTMs). Currently, mass spectrometry (MS) is the only proteome-wide technology that can identify PTMs. Unfortunately, the inability to detect a PTM by MS is not proof that the modification is not present. The detectability of peptides varies significantly making MS potentially blind to a large fraction of peptides. Learning from published algorithms that generally focus on predicting the most detectable peptides we developed a tool that incorporates protein abundance into the peptide prediction algorithm with the aim to determine the detectability of every peptide within a protein. We tested our tool, "Peptide Prediction with Abundance" (PPA), on in-house acquired as well as published data sets from other groups acquired on different instrument platforms. Incorporation of protein abundance into the prediction allows us to assess not only the detectability of all peptides but also whether a peptide of interest is likely to become detectable upon enrichment. We validated the ability of our tool to predict changes in protein detectability with a dilution series of 31 purified proteins at several different concentrations. PPA predicted the concentration dependent peptide detectability in 78% of the cases correctly, demonstrating its utility for predicting the protein enrichment needed to observe a peptide of interest in targeted experiments. This is especially important in the analysis of PTMs. PPA is available as a web-based or executable package that can work with generally applicable defaults or retrained from a pilot MS data set.
Collapse
Affiliation(s)
- Jan Muntel
- From the ‡Departments of Pathology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Sarah A Boswell
- §Department of Systems Biology, Harvard Medical School, Boston, MA
| | - Shaojun Tang
- From the ‡Departments of Pathology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Saima Ahmed
- From the ‡Departments of Pathology, Boston Children's Hospital and Harvard Medical School, Boston, MA
| | - Ilan Wapinski
- §Department of Systems Biology, Harvard Medical School, Boston, MA
| | - Greg Foley
- §Department of Systems Biology, Harvard Medical School, Boston, MA
| | - Hanno Steen
- From the ‡Departments of Pathology, Boston Children's Hospital and Harvard Medical School, Boston, MA;
| | - Michael Springer
- §Department of Systems Biology, Harvard Medical School, Boston, MA
| |
Collapse
|
16
|
Mohammed Y, Domański D, Jackson AM, Smith DS, Deelder AM, Palmblad M, Borchers CH. PeptidePicker: A scientific workflow with web interface for selecting appropriate peptides for targeted proteomics experiments. J Proteomics 2014; 106:151-61. [DOI: 10.1016/j.jprot.2014.04.018] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Revised: 04/08/2014] [Accepted: 04/10/2014] [Indexed: 01/08/2023]
|
17
|
Schliekelman P, Liu S. Quantifying the effect of competition for detection between coeluting peptides on detection probabilities in mass-spectrometry-based proteomics. J Proteome Res 2013; 13:348-61. [PMID: 24313442 DOI: 10.1021/pr400034z] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
There are many factors that contribute to the variation in detection probabilities of proteins in LC-MS/MS experiments, and currently little is known about their relative importance. In this study, we analyze the effect of competition for detection between coeluting peptides on peptide detection probability. Using a novel method for estimating peptide detection probabilities, we show that these probabilities can vary by an order of magnitude between peptides that elute from the liquid chromatograph at the same time as many other peptides and those that elute with fewer other peptides. To explore these results, we use a mathematical model to show that competition for detection between peptides is expected to be a major source of missed detections in complex mixtures because there will be many MS/MS scanning intervals that contain more coeluting peptides than can be subjected to MS/MS analysis. Our data and simulation results show that the number of coeluting peptides is a primary determinant of whether a peptide will be detected. In our data, this had a several-fold larger effect on peptide detection probability than did peptide abundance. Furthermore, the distribution of elution times for the most frequently detected peptides was strongly shifted toward values where there were few coeluting peptides, indicating that the number of coeluting peptides is a major determinant of whether a peptide is proteotypic.
Collapse
Affiliation(s)
- Paul Schliekelman
- Department of Statistics, University of Georgia , 204 Statistics Building, Athens, Georgia 30602, United States
| | | |
Collapse
|
18
|
Abstract
![]()
Quantitative
measurement of proteins is one of the most fundamental analytical
tasks in a biochemistry laboratory, but widely used immunochemical
methods often have limited specificity and high measurement variation.
In this review, we discuss applications of multiple-reaction monitoring
(MRM) mass spectrometry, which allows sensitive, precise quantitative
analyses of peptides and the proteins from which they are derived.
Systematic development of MRM assays is permitted by databases of
peptide mass spectra and sequences, software tools for analysis design
and data analysis, and rapid evolution of tandem mass spectrometer
technology. Key advantages of MRM assays are the ability to target
specific peptide sequences, including variants and modified forms,
and the capacity for multiplexing that allows analysis of dozens to
hundreds of peptides. Different quantitative standardization methods
provide options that balance precision, sensitivity, and assay cost.
Targeted protein quantitation by MRM and related mass spectrometry
methods can advance biochemistry by transforming approaches to protein
measurement.
Collapse
Affiliation(s)
- Daniel C Liebler
- Department of Biochemistry and Jim Ayers Institute for Precancer Detection and Diagnosis, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, Tennessee 37232-6350, United States.
| | | |
Collapse
|
19
|
Zerck A, Nordhoff E, Lehrach H, Reinert K. Optimal precursor ion selection for LC-MALDI MS/MS. BMC Bioinformatics 2013; 14:56. [PMID: 23418672 PMCID: PMC3651328 DOI: 10.1186/1471-2105-14-56] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Accepted: 01/23/2013] [Indexed: 12/30/2022] Open
Abstract
Background Liquid chromatography mass spectrometry (LC-MS) maps in shotgun proteomics are often too complex to select every detected peptide signal for fragmentation by tandem mass spectrometry (MS/MS). Standard methods for precursor ion selection, commonly based on data dependent acquisition, select highly abundant peptide signals in each spectrum. However, these approaches produce redundant information and are biased towards high-abundance proteins. Results We present two algorithms for inclusion list creation that formulate precursor ion selection as an optimization problem. Given an LC-MS map, the first approach maximizes the number of selected precursors given constraints such as a limited number of acquisitions per RT fraction. Second, we introduce a protein sequence-based inclusion list that can be used to monitor proteins of interest. Given only the protein sequences, we create an inclusion list that optimally covers the whole protein set. Additionally, we propose an iterative precursor ion selection that aims at reducing the redundancy obtained with data dependent LC-MS/MS. We overcome the risk of erroneous assignments by including methods for retention time and proteotypicity predictions. We show that our method identifies a set of proteins requiring fewer precursors than standard approaches. Thus, it is well suited for precursor ion selection in experiments with limited sample amount or analysis time. Conclusions We present three approaches to precursor ion selection with LC-MALDI MS/MS. Using a well-defined protein standard and a complex human cell lysate, we demonstrate that our methods outperform standard approaches. Our algorithms are implemented as part of OpenMS and are available under http://www.openms.de.
Collapse
Affiliation(s)
- Alexandra Zerck
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | | | | | |
Collapse
|
20
|
Methods and Progress of Mass Spectrometry-based Selected Reaction Monitoring*. PROG BIOCHEM BIOPHYS 2012. [DOI: 10.3724/sp.j.1206.2012.00009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
21
|
Yadav AK, Kumar D, Dash D. Learning from decoys to improve the sensitivity and specificity of proteomics database search results. PLoS One 2012. [PMID: 23189209 PMCID: PMC3506577 DOI: 10.1371/journal.pone.0050651] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The statistical validation of database search results is a complex issue in bottom-up proteomics. The correct and incorrect peptide spectrum match (PSM) scores overlap significantly, making an accurate assessment of true peptide matches challenging. Since the complete separation between the true and false hits is practically never achieved, there is need for better methods and rescoring algorithms to improve upon the primary database search results. Here we describe the calibration and False Discovery Rate (FDR) estimation of database search scores through a dynamic FDR calculation method, FlexiFDR, which increases both the sensitivity and specificity of search results. Modelling a simple linear regression on the decoy hits for different charge states, the method maximized the number of true positives and reduced the number of false negatives in several standard datasets of varying complexity (18-mix, 49-mix, 200-mix) and few complex datasets (E. coli and Yeast) obtained from a wide variety of MS platforms. The net positive gain for correct spectral and peptide identifications was up to 14.81% and 6.2% respectively. The approach is applicable to different search methodologies- separate as well as concatenated database search, high mass accuracy, and semi-tryptic and modification searches. FlexiFDR was also applied to Mascot results and showed better performance than before. We have shown that appropriate threshold learnt from decoys, can be very effective in improving the database search results. FlexiFDR adapts itself to different instruments, data types and MS platforms. It learns from the decoy hits and sets a flexible threshold that automatically aligns itself to the underlying variables of data quality and size.
Collapse
Affiliation(s)
- Amit Kumar Yadav
- GNR Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, Delhi, India
| | - Dhirendra Kumar
- GNR Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, Delhi, India
| | - Debasis Dash
- GNR Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, Delhi, India
- * E-mail:
| |
Collapse
|
22
|
Abstract
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programming and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area.
Collapse
Affiliation(s)
- Yong Fuga Li
- School of Informatics and Computing, Indiana University, Bloomington 150 S, Woodlawn Avenue, Bloomington, Indiana 47405, USA
| | | |
Collapse
|
23
|
Christin C, Hoefsloot HCJ, Smilde AK, Hoekman B, Suits F, Bischoff R, Horvatovich P. A critical assessment of feature selection methods for biomarker discovery in clinical proteomics. Mol Cell Proteomics 2012; 12:263-76. [PMID: 23115301 DOI: 10.1074/mcp.m112.022566] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery-t test, the Mann-Whitney-Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machine-recursive features elimination (SVM-RFE), principal component discriminant analysis (PCDA), and partial least squares discriminant analysis (PLSDA)-using human urine and porcine cerebrospinal fluid samples that were spiked with a range of peptides at different concentration levels. The ideal feature selection method should select the complete list of discriminating features that are related to the spiked peptides without selecting unrelated features. Whereas many studies have to rely on classification error to judge the reliability of the selected biomarker candidates, we assessed the accuracy of selection directly from the list of spiked peptides. The feature selection methods were applied to data sets with different sample sizes and extents of sample class separation determined by the concentration level of spiked compounds. For each feature selection method and data set, the performance for selecting a set of features related to spiked compounds was assessed using the harmonic mean of the recall and the precision (f-score) and the geometric mean of the recall and the true negative rate (g-score). We conclude that the univariate t test and the mww test with multiple testing corrections are not applicable to data sets with small sample sizes (n = 6), but their performance improves markedly with increasing sample size up to a point (n > 12) at which they outperform the other methods. PCDA and PLSDA select small feature sets with high precision but miss many true positive features related to the spiked peptides. NSC strikes a reasonable compromise between recall and precision for all data sets independent of spiking level and number of samples. Linear SVM-RFE performs poorly for selecting features related to the spiked compounds, even though the classification error is relatively low.
Collapse
Affiliation(s)
- Christin Christin
- Analytical Biochemistry, Department of Pharmacy, University of Groningen, A. Deusinglaan 1, 9713 AV Groningen, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
24
|
Bereman MS, MacLean B, Tomazela DM, Liebler DC, MacCoss MJ. The development of selected reaction monitoring methods for targeted proteomics via empirical refinement. Proteomics 2012; 12:1134-41. [PMID: 22577014 DOI: 10.1002/pmic.201200042] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Software advancements in the last several years have had a significant impact on proteomics from method development to data analysis. Herein, we detail a method, which uses our in-house developed software tool termed Skyline, for empirical refinement of candidate peptides from targeted proteins. The method consists of four main steps from generation of a testable hypothesis, method development, peptide refinement, to peptide validation. The ultimate goal is to identify the best performing peptide in terms of ionization efficiency, reproducibility, specificity, and chromatographic characteristics to monitor as a proxy for protein abundance. It is important to emphasize that this method allows the user to perform this refinement procedure in the sample matrix and organism of interest with the instrumentation available. Finally, the method is demonstrated in a case study to determine the best peptide to monitor the abundance of surfactant protein B in lung aspirates.
Collapse
Affiliation(s)
- Michael S Bereman
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | | | | | | |
Collapse
|
25
|
Rafalko A, Dai S, Hancock WS, Karger BL, Hincapie M. Development of a Chip/Chip/SRM platform using digital chip isoelectric focusing and LC-Chip mass spectrometry for enrichment and quantitation of low abundance protein biomarkers in human plasma. J Proteome Res 2012; 11:808-17. [PMID: 22098410 PMCID: PMC3656385 DOI: 10.1021/pr2006704] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Protein biomarkers are critical for diagnosis, prognosis, and treatment of disease. The transition from protein biomarker discovery to verification can be a rate limiting step in clinical development of new diagnostics. Liquid chromatography-selected reaction monitoring mass spectrometry (LC-SRM MS) is becoming an important tool for biomarker verification studies in highly complex biological samples. Analyte enrichment or sample fractionation is often necessary to reduce sample complexity and improve sensitivity of SRM for quantitation of clinically relevant biomarker candidates present at the low ng/mL range in blood. In this paper, we describe an alternative method for sample preparation for LC-SRM MS, which does not rely on availability of antibodies. This new platform is based on selective enrichment of proteotypic peptides from complex biological peptide mixtures via isoelectric focusing (IEF) on a digital ProteomeChip (dPC) for SRM quantitation using a triple quadrupole (QQQ) instrument with an LC-Chip (Chip/Chip/SRM). To demonstrate the value of this approach, the optimization of the Chip/Chip/SRM platform was performed using prostate specific antigen (PSA) added to female plasma as a model system. The combination of immunodepletion of albumin and IgG with peptide fractionation on the dPC, followed by SRM analysis, resulted in a limit of quantitation of PSA added to female plasma at the level of ∼1-2.5 ng/mL with a CV of ∼13%. The optimized platform was applied to measure levels of PSA in plasma of a small cohort of male patients with prostate cancer (PCa) and healthy matched controls with concentrations ranging from 1.5 to 25 ng/mL. A good correlation (r(2) = 0.9459) was observed between standard clinical ELISA tests and the SRM-based assay. Our data demonstrate that the combination of IEF on the dPC and SRM (Chip/Chip/SRM) can be successfully applied for verification of low abundance protein biomarkers in complex samples.
Collapse
Affiliation(s)
- Agnes Rafalko
- Barnett Institute of Chemical and Biological Analysis and Department of Chemistry and Chemical Biology Northeastern University, 360 Huntington Avenue, Boston, MA 02115
| | - Shujia Dai
- Barnett Institute of Chemical and Biological Analysis and Department of Chemistry and Chemical Biology Northeastern University, 360 Huntington Avenue, Boston, MA 02115
| | - William S. Hancock
- Barnett Institute of Chemical and Biological Analysis and Department of Chemistry and Chemical Biology Northeastern University, 360 Huntington Avenue, Boston, MA 02115
| | - Barry L. Karger
- Barnett Institute of Chemical and Biological Analysis and Department of Chemistry and Chemical Biology Northeastern University, 360 Huntington Avenue, Boston, MA 02115
| | - Marina Hincapie
- Barnett Institute of Chemical and Biological Analysis and Department of Chemistry and Chemical Biology Northeastern University, 360 Huntington Avenue, Boston, MA 02115
| |
Collapse
|
26
|
Abstract
High-throughput proteomics experiments involving tandem mass spectrometry produce large volumes of complex data that require sophisticated computational analyses. As such, the field offers many challenges for computational biologists. In this article, we briefly introduce some of the core computational and statistical problems in the field and then describe a variety of outstanding problems that readers of PLoS Computational Biology might be able to help solve.
Collapse
Affiliation(s)
- William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America.
| | | |
Collapse
|
27
|
Cannon WR, Rawlins MM, Baxter DJ, Callister SJ, Lipton MS, Bryant DA. Large improvements in MS/MS-based peptide identification rates using a hybrid analysis. J Proteome Res 2011; 10:2306-17. [PMID: 21391700 DOI: 10.1021/pr101130b] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
We report a hybrid search method combining database and spectral library searches that allows for a straightforward approach to characterizing the error rates from the combined data. Using these methods, we demonstrate significantly increased sensitivity and specificity in matching peptides to tandem mass spectra. The hybrid search method increased the number of spectra that can be assigned to a peptide in a global proteomics study by 57-147% at an estimated false discovery rate of 5%, with clear room for even greater improvements. The approach combines the general utility of using consensus model spectra typical of database search methods with the accuracy of the intensity information contained in spectral libraries. A common scoring metric based on recent developments linking data analysis and statistical thermodynamics is used, which allows the use of a conservative estimate of error rates for the combined data. We applied this approach to proteomics analysis of Synechococcus sp. PCC 7002, a cyanobacterium that is a model organism for studies of photosynthetic carbon fixation and biofuels development. The increased specificity and sensitivity of this approach allowed us to identify many more peptides involved in the processes important for photoautotrophic growth.
Collapse
Affiliation(s)
- William R Cannon
- Computational Biology and Bioinformatics Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States.
| | | | | | | | | | | |
Collapse
|
28
|
Gallien S, Duriez E, Domon B. Selected reaction monitoring applied to proteomics. JOURNAL OF MASS SPECTROMETRY : JMS 2011; 46:298-312. [PMID: 21394846 DOI: 10.1002/jms.1895] [Citation(s) in RCA: 202] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Selected reaction monitoring (SRM) performed on triple quadrupole mass spectrometers has been the reference quantitative technique to analyze small molecules for several decades. It is now emerging in proteomics as the ideal tool to complement shotgun qualitative studies; targeted SRM quantitative analysis offers high selectivity, sensitivity and a wide dynamic range. However, SRM applied to proteomics presents singularities that distinguish it from small molecules analysis. This review is an overview of SRM technology and describes the specificities and the technical aspects of proteomics experiments. Ongoing developments aiming at increasing multiplexing capabilities of SRM are discussed; they dramatically improve its throughput and extend its field of application to directed or supervised discovery experiments.
Collapse
Affiliation(s)
- Sebastien Gallien
- Luxembourg Clinical Proteomics center (LCP), Centre de Recherche Public de la Santé, 1 B rue Thomas Edison, L-1445 Strassen, Luxembourg
| | | | | |
Collapse
|
29
|
Abstract
Methods for predicting protein post-translational modifications have been developed extensively. In this chapter, we review major post-translational modification prediction strategies, with a particular focus on statistical and machine learning approaches. We present the workflow of the methods and summarize the advantages and disadvantages of the methods.
Collapse
Affiliation(s)
- Chunmei Liu
- Department of Systems and Computer Science, Howard University, Washington, DC, USA.
| | | |
Collapse
|
30
|
Li YF, Arnold RJ, Tang H, Radivojac P. The importance of peptide detectability for protein identification, quantification, and experiment design in MS/MS proteomics. J Proteome Res 2010; 9:6288-97. [PMID: 21067214 DOI: 10.1021/pr1005586] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Peptide detectability is defined as the probability that a peptide is identified in an LC-MS/MS experiment and has been useful in providing solutions to protein inference and label-free quantification. Previously, predictors for peptide detectability trained on standard or complex samples were proposed. Although the models trained on complex samples may benefit from the large training data sets, it is unclear to what extent they are affected by the unequal abundances of identified proteins. To address this challenge and improve detectability prediction, we present a new algorithm for the iterative learning of peptide detectability from complex mixtures. We provide evidence that the new method approximates detectability with useful accuracy and, based on its design, can be used to interpret the outcome of other learning strategies. We studied the properties of peptides from the bacterium Deinococcus radiodurans and found that at standard quantities, its tryptic peptides can be roughly classified as either detectable or undetectable, with a relatively small fraction having medium detectability. We extend the concept of detectability from peptides to proteins and apply the model to predict the behavior of a replicate LC-MS/MS experiment from a single analysis. Finally, our study summarizes a theoretical framework for peptide/protein identification and label-free quantification.
Collapse
Affiliation(s)
- Yong Fuga Li
- Department of Chemistry, School of Informatics and Computing, Indiana University, Bloomington, Indiana, United States
| | | | | | | |
Collapse
|
31
|
Alves G, Ogurtsov AY, Yu YK. Assigning statistical significance to proteotypic peptides via database searches. J Proteomics 2010; 74:199-211. [PMID: 21055489 DOI: 10.1016/j.jprot.2010.10.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Revised: 10/18/2010] [Accepted: 10/21/2010] [Indexed: 11/19/2022]
Abstract
Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId's knowledge database to include proteotypic information, utilized RAId's statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId's programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those that occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | |
Collapse
|
32
|
Shah AR, Agarwal K, Baker ES, Singhal M, Mayampurath AM, Ibrahim YM, Kangas LJ, Monroe ME, Zhao R, Belov ME, Anderson GA, Smith RD. Machine learning based prediction for peptide drift times in ion mobility spectrometry. Bioinformatics 2010; 26:1601-7. [PMID: 20495001 PMCID: PMC2913656 DOI: 10.1093/bioinformatics/btq245] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2010] [Revised: 04/18/2010] [Accepted: 05/02/2010] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Ion mobility spectrometry (IMS) has gained significant traction over the past few years for rapid, high-resolution separations of analytes based upon gas-phase ion structure, with significant potential impacts in the field of proteomic analysis. IMS coupled with mass spectrometry (MS) affords multiple improvements over traditional proteomics techniques, such as in the elucidation of secondary structure information, identification of post-translational modifications, as well as higher identification rates with reduced experiment times. The high throughput nature of this technique benefits from accurate calculation of cross sections, mobilities and associated drift times of peptides, thereby enhancing downstream data analysis. Here, we present a model that uses physicochemical properties of peptides to accurately predict a peptide's drift time directly from its amino acid sequence. This model is used in conjunction with two mathematical techniques, a partial least squares regression and a support vector regression setting. RESULTS When tested on an experimentally created high confidence database of 8675 peptide sequences with measured drift times, both techniques statistically significantly outperform the intrinsic size parameters-based calculations, the currently held practice in the field, on all charge states (+2, +3 and +4). AVAILABILITY The software executable, imPredict, is available for download from http:/omics.pnl.gov/software/imPredict.php CONTACT rds@pnl.gov SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anuj R Shah
- Fundamental and Computational Sciences Directorate, Pacific Northwest National Laboratory, 999 Battelle Boulevard, Richland, WA 99352, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Hewel JA, Liu J, Onishi K, Fong V, Chandran S, Olsen JB, Pogoutse O, Schutkowski M, Wenschuh H, Winkler DFH, Eckler L, Zandstra PW, Emili A. Synthetic peptide arrays for pathway-level protein monitoring by liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics 2010; 9:2460-73. [PMID: 20467045 DOI: 10.1074/mcp.m900456-mcp200] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Effective methods to detect and quantify functionally linked regulatory proteins in complex biological samples are essential for investigating mammalian signaling pathways. Traditional immunoassays depend on proprietary reagents that are difficult to generate and multiplex, whereas global proteomic profiling can be tedious and can miss low abundance proteins. Here, we report a target-driven liquid chromatography-tandem mass spectrometry (LC-MS/MS) strategy for selectively examining the levels of multiple low abundance components of signaling pathways which are refractory to standard shotgun screening procedures and hence appear limited in current MS/MS repositories. Our stepwise approach consists of: (i) synthesizing microscale peptide arrays, including heavy isotope-labeled internal standards, for use as high quality references to (ii) build empirically validated high density LC-MS/MS detection assays with a retention time scheduling system that can be used to (iii) identify and quantify endogenous low abundance protein targets in complex biological mixtures with high accuracy by correlation to a spectral database using new software tools. The method offers a flexible, rapid, and cost-effective means for routine proteomic exploration of biological systems including "label-free" quantification, while minimizing spurious interferences. As proof-of-concept, we have examined the abundance of transcription factors and protein kinases mediating pluripotency and self-renewal in embryonic stem cell populations.
Collapse
Affiliation(s)
- Johannes A Hewel
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Liang G, Zhao W. Using factor analysis scales of generalized amino acid information for prediction and characteristic analysis of β-turns in proteins based on a support vector machine model. Sci China Chem 2010. [DOI: 10.1007/s11426-010-0165-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
35
|
Advance of Peptide Detectability Prediction on Mass Spectrometry Platform in Proteomics. CHINESE JOURNAL OF ANALYTICAL CHEMISTRY 2010. [DOI: 10.3724/sp.j.1096.2010.00286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
36
|
Abstract
The field of proteomics, particularly the application of MS analysis to protein samples, is well established and growing rapidly. Proteomic studies generate large volumes of raw experimental data and inferred biological results. To facilitate the dissemination of these data, centralized data repositories have been developed that make the data and results accessible to proteomic researchers and biologists alike. This review of proteomics data repositories focuses exclusively on freely available, centralized data resources that disseminate or store experimental MS data and results. The resources chosen reflect a current "snapshot" of the state of resources available with an emphasis placed on resources that may be of particular interest to yeast researchers. Resources are described in terms of their intended purpose and the features and functionality provided to users.
Collapse
Affiliation(s)
- Michael Riffle
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | | |
Collapse
|
37
|
Mujezinovic N, Schneider G, Wildpaner M, Mechtler K, Eisenhaber F. Reducing the haystack to find the needle: improved protein identification after fast elimination of non-interpretable peptide MS/MS spectra and noise reduction. BMC Genomics 2010; 11 Suppl 1:S13. [PMID: 20158870 PMCID: PMC2822527 DOI: 10.1186/1471-2164-11-s1-s13] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background Tandem mass spectrometry (MS/MS) has become a standard method for identification of proteins extracted from biological samples but the huge number and the noise contamination of MS/MS spectra obstruct swift and reliable computer-aided interpretation. Typically, a minor fraction of the spectra per sample (most often, only a few %) and about 10% of the peaks per spectrum contribute to the final result if protein identification is not prevented by the noise at all. Results Two fast preprocessing screens can substantially reduce the haystack of MS/MS data. (1) Simple sequence ladder rules remove spectra non-interpretable in peptide sequences. (2) Modified Fourier-transform-based criteria clear background in the remaining data. In average, only a remainder of 35% of the MS/MS spectra (each reduced in size by about one quarter) has to be handed over to the interpretation software for reliable protein identification essentially without loss of information, with a trend to improved sequence coverage and with proportional decrease of computer resource consumption. Conclusions The search for sequence ladders in tandem MS/MS spectra with subsequent noise suppression is a promising strategy to reduce the number of MS/MS spectra from electro-spray instruments and to enhance the reliability of protein matches. Supplementary material and the software are available from an accompanying WWW-site with the URL http://mendel.bii.a-star.edu.sg/mass-spectrometry/MSCleaner-2.0/.
Collapse
Affiliation(s)
- Nedim Mujezinovic
- Sarajevo School of Science and Technology, Sarajevo, Bosnia-Herzegovina
| | | | | | | | | |
Collapse
|
38
|
XU CM, ZHANG JY, LIU H, SUN HC, ZHU YP, XIE HW. Advance of Peptide Detectability Prediction on Mass Spectrometry Platform in Proteomics. CHINESE JOURNAL OF ANALYTICAL CHEMISTRY 2010. [DOI: 10.1016/s1872-2040(09)60023-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
39
|
Cham Mead JA, Bianco L, Bessant C. Free computational resources for designing selected reaction monitoring transitions. Proteomics 2010; 10:1106-26. [DOI: 10.1002/pmic.200900396] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
40
|
Prediction of high-responding peptides for targeted protein assays by mass spectrometry. Nat Biotechnol 2009; 27:190-8. [PMID: 19169245 DOI: 10.1038/nbt.1524] [Citation(s) in RCA: 238] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2008] [Accepted: 01/03/2009] [Indexed: 12/21/2022]
Abstract
Protein biomarker discovery produces lengthy lists of candidates that must subsequently be verified in blood or other accessible biofluids. Use of targeted mass spectrometry (MS) to verify disease- or therapy-related changes in protein levels requires the selection of peptides that are quantifiable surrogates for proteins of interest. Peptides that produce the highest ion-current response (high-responding peptides) are likely to provide the best detection sensitivity. Identification of the most effective signature peptides, particularly in the absence of experimental data, remains a major resource constraint in developing targeted MS-based assays. Here we describe a computational method that uses protein physicochemical properties to select high-responding peptides and demonstrate its utility in identifying signature peptides in plasma, a complex proteome with a wide range of protein concentrations. Our method, which employs a Random Forest classifier, facilitates the development of targeted MS-based assays for biomarker verification or any application where protein levels need to be measured.
Collapse
|