1
|
Madej D, Lam H. On the use of tandem mass spectra acquired from samples of evolutionarily distant organisms to validate methods for false discovery rate estimation. Proteomics 2024:e2300398. [PMID: 38491400 DOI: 10.1002/pmic.202300398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 03/01/2024] [Accepted: 03/06/2024] [Indexed: 03/18/2024]
Abstract
Estimating the false discovery rate (FDR) of peptide identifications is a key step in proteomics data analysis, and many methods have been proposed for this purpose. Recently, an entrapment-inspired protocol to validate methods for FDR estimation appeared in articles showcasing new spectral library search tools. That validation approach involves generating incorrect spectral matches by searching spectra from evolutionarily distant organisms (entrapment queries) against the original target search space. Although this approach may appear similar to the solutions using entrapment databases, it represents a distinct conceptual framework whose correctness has not been verified yet. In this viewpoint, we first discussed the background of the entrapment-based validation protocols and then conducted a few simple computational experiments to verify the assumptions behind them. The results reveal that entrapment databases may, in some implementations, be a reasonable choice for validation, while the assumptions underpinning validation protocols based on entrapment queries are likely to be violated in practice. This article also highlights the need for well-designed frameworks for validating FDR estimation methods in proteomics.
Collapse
Affiliation(s)
- Dominik Madej
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Henry Lam
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| |
Collapse
|
2
|
Higdon R, Stewart E, Stanberry L, Haynes W, Choiniere J, Montague E, Anderson N, Yandl G, Janko I, Broomall W, Fishilevich S, Lancet D, Kolker N, Kolker E. MOPED enables discoveries through consistently processed proteomics data. J Proteome Res 2013; 13:107-13. [PMID: 24350770 DOI: 10.1021/pr400884c] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
The Model Organism Protein Expression Database (MOPED, http://moped.proteinspire.org) is an expanding proteomics resource to enable biological and biomedical discoveries. MOPED aggregates simple, standardized and consistently processed summaries of protein expression and metadata from proteomics (mass spectrometry) experiments from human and model organisms (mouse, worm, and yeast). The latest version of MOPED adds new estimates of protein abundance and concentration as well as relative (differential) expression data. MOPED provides a new updated query interface that allows users to explore information by organism, tissue, localization, condition, experiment, or keyword. MOPED supports the Human Proteome Project's efforts to generate chromosome- and diseases-specific proteomes by providing links from proteins to chromosome and disease information as well as many complementary resources. MOPED supports a new omics metadata checklist to harmonize data integration, analysis, and use. MOPED's development is driven by the user community, which spans 90 countries and guides future development that will transform MOPED into a multiomics resource. MOPED encourages users to submit data in a simple format. They can use the metadata checklist to generate a data publication for this submission. As a result, MOPED will provide even greater insights into complex biological processes and systems and enable deeper and more comprehensive biological and biomedical discoveries.
Collapse
|
3
|
Yadav AK, Kumar D, Dash D. Learning from decoys to improve the sensitivity and specificity of proteomics database search results. PLoS One 2012. [PMID: 23189209 PMCID: PMC3506577 DOI: 10.1371/journal.pone.0050651] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The statistical validation of database search results is a complex issue in bottom-up proteomics. The correct and incorrect peptide spectrum match (PSM) scores overlap significantly, making an accurate assessment of true peptide matches challenging. Since the complete separation between the true and false hits is practically never achieved, there is need for better methods and rescoring algorithms to improve upon the primary database search results. Here we describe the calibration and False Discovery Rate (FDR) estimation of database search scores through a dynamic FDR calculation method, FlexiFDR, which increases both the sensitivity and specificity of search results. Modelling a simple linear regression on the decoy hits for different charge states, the method maximized the number of true positives and reduced the number of false negatives in several standard datasets of varying complexity (18-mix, 49-mix, 200-mix) and few complex datasets (E. coli and Yeast) obtained from a wide variety of MS platforms. The net positive gain for correct spectral and peptide identifications was up to 14.81% and 6.2% respectively. The approach is applicable to different search methodologies- separate as well as concatenated database search, high mass accuracy, and semi-tryptic and modification searches. FlexiFDR was also applied to Mascot results and showed better performance than before. We have shown that appropriate threshold learnt from decoys, can be very effective in improving the database search results. FlexiFDR adapts itself to different instruments, data types and MS platforms. It learns from the decoy hits and sets a flexible threshold that automatically aligns itself to the underlying variables of data quality and size.
Collapse
Affiliation(s)
- Amit Kumar Yadav
- GNR Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, Delhi, India
| | - Dhirendra Kumar
- GNR Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, Delhi, India
| | - Debasis Dash
- GNR Knowledge Center for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, Delhi, India
- * E-mail:
| |
Collapse
|
4
|
Koh KH, Jurkovic S, Yang K, Choi SY, Jung JW, Kim KP, Zhang W, Jeong H. Estradiol induces cytochrome P450 2B6 expression at high concentrations: implication in estrogen-mediated gene regulation in pregnancy. Biochem Pharmacol 2012; 84:93-103. [PMID: 22484313 PMCID: PMC3376749 DOI: 10.1016/j.bcp.2012.03.016] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2012] [Revised: 03/21/2012] [Accepted: 03/22/2012] [Indexed: 12/19/2022]
Abstract
Pregnancy alters the rate and extent of drug metabolism, but little is known about the underlying molecular mechanism. We have found that 17β-estradiol (E2) upregulates expression of the major drug-metabolizing enzyme CYP2B6 in primary human hepatocytes. Results from promoter reporter assays in HepG2 cells revealed that E2 activates constitutive androstane receptor (CAR) and enhances promoter activity of CYP2B6, for which high concentrations of E2 reached during pregnancy were required. E2 triggered nuclear translocation of CAR in primary rat hepatocytes that were transiently transfected with human CAR as well as in primary human hepatocytes, further confirming transactivation of CAR by E2. E2-activated estrogen receptor (ER) also enhanced CYP2B6 promoter activity. The DNA-binding domain of ER was not required for the induction of CYP2B6 promoter activity by E2, suggesting involvement of a non-classical mechanism of ER action. Results from deletion and mutation assays as well as electrophorectic mobility shift and supershift assays revealed that two AP-1 binding sites (-1782/-1776 and -1664/-1658 of CYP2B6) are critical for ER-mediated activation of the CYP2B6 promoter by E2. Concurrent activation of both ER and CAR by E2 enhanced CYP2B6 expression in a synergistic manner. Our data demonstrate that at high concentrations reached during pregnancy, E2 activates both CAR and ER that synergistically induce CYP2B6 expression. These results illustrate pharmacological activity of E2 that would likely become prominent during pregnancy.
Collapse
MESH Headings
- Adult
- Aryl Hydrocarbon Hydroxylases/genetics
- Aryl Hydrocarbon Hydroxylases/metabolism
- Binding Sites
- Cell Nucleus/metabolism
- Chromatin Immunoprecipitation
- Chromatography, High Pressure Liquid
- Constitutive Androstane Receptor
- Cytochrome P-450 CYP2B6
- Dose-Response Relationship, Drug
- Electrophoretic Mobility Shift Assay
- Estradiol/blood
- Estradiol/pharmacology
- Estrogens/blood
- Estrogens/pharmacology
- Female
- Gene Expression Profiling
- Gene Expression Regulation, Enzymologic/drug effects
- Genes, Reporter
- Hep G2 Cells
- Hepatocytes/drug effects
- Hepatocytes/enzymology
- Humans
- Luciferases/genetics
- Middle Aged
- Nuclear Proteins/metabolism
- Oligonucleotide Array Sequence Analysis
- Oxidoreductases, N-Demethylating/genetics
- Oxidoreductases, N-Demethylating/metabolism
- Pregnancy/blood
- Pregnancy/genetics
- Promoter Regions, Genetic
- Real-Time Polymerase Chain Reaction
- Receptors, Cytoplasmic and Nuclear/genetics
- Receptors, Cytoplasmic and Nuclear/metabolism
- Receptors, Estrogen/genetics
- Receptors, Estrogen/metabolism
- Tandem Mass Spectrometry
- Transcription Factor AP-1/genetics
- Transcription Factor AP-1/metabolism
- Transcriptional Activation
Collapse
Affiliation(s)
- Kwi Hye Koh
- Department of Pharmacy Practice, College of Pharmacy, University of Illinois at Chicago, Chicago, IL 60612, USA
| | - Steve Jurkovic
- Department of Biopharmaceutical Sciences, College of Pharmacy, University of Illinois at Chicago, Chicago, IL 60612, USA
| | - Kyunghee Yang
- Department of Pharmacy Practice, College of Pharmacy, University of Illinois at Chicago, Chicago, IL 60612, USA
| | - Su-Young Choi
- Center for Pharmaceutical Biotechnology, College of Pharmacy, University of Illinois at Chicago, Chicago, IL 60612, USA
| | - Jin Woo Jung
- Department of Molecular Biotechnology, Institute of Biomedical Science and Technology, Konkuk University, Seoul 143-701, South Korea
| | - Kwang Pyo Kim
- Department of Molecular Biotechnology, Institute of Biomedical Science and Technology, Konkuk University, Seoul 143-701, South Korea
| | - Wei Zhang
- Department of Pediatrics, College of Medicine, University of Illinois at Chicago, Chicago, IL 60612, USA
| | - Hyunyoung Jeong
- Department of Pharmacy Practice, College of Pharmacy, University of Illinois at Chicago, Chicago, IL 60612, USA
- Department of Biopharmaceutical Sciences, College of Pharmacy, University of Illinois at Chicago, Chicago, IL 60612, USA
| |
Collapse
|
5
|
Pang S, Ahsan ES, Valdivia HJ, Minguez J, Foy CA. A pilot study to evaluate the application of a generic protein standard panel for quality control of biomarker detection technologies. BMC Res Notes 2011; 4:281. [PMID: 21834984 PMCID: PMC3162916 DOI: 10.1186/1756-0500-4-281] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2011] [Accepted: 08/11/2011] [Indexed: 11/22/2022] Open
Abstract
Background Protein biomarker studies are currently hampered by a lack of measurement standards to demonstrate quality, reliability and comparability across multiple assay platforms. This is especially pertinent for immunoassays where multiple formats for detecting target analytes are commonly used. Findings In this pilot study a generic panel of six non-human protein standards (50 - 10^7 pg/mL) of varying abundance was prepared as a quality control (QC) material. Simulated "normal" and "diseased" panels of proteins were prepared in pooled human plasma and incorporated into immunoassays using the Meso Scale Discovery® (MSD®) platform to illustrate reliable detection of the component proteins. The protein panel was also evaluated as a spike-in material for a model immunoassay involving detection of ovarian cancer biomarkers within individual human plasma samples. Our selected platform could discriminate between two panels of the proteins exhibiting small differences in abundance. Across distinct experiments, all component proteins exhibited reproducible signal outputs in pooled human plasma. When individual donor samples were used, half the proteins produced signals independent of matrix effects. These proteins may serve as a generic indicator of platform reliability. Each of the remaining proteins exhibit differential signals across the distinct samples, indicative of sample matrix effects, with the three proteins following the same trend. This subset of proteins may be useful for characterising the degree of matrix effects associated with the sample which may impact on the reliability of quantifying target diagnostic biomarkers. Conclusions We have demonstrated the potential utility of this panel of standards to act as a generic QC tool for evaluating the reproducibility of the platform for protein biomarker detection independent of serum matrix effects.
Collapse
Affiliation(s)
- Susan Pang
- LGC, Queens Road, Teddington, Middlesex, TW11 0LY, UK.
| | | | | | | | | |
Collapse
|
6
|
Kolker E, Higdon R, Morgan P, Sedensky M, Welch D, Bauman A, Stewart E, Haynes W, Broomall W, Kolker N. SPIRE: Systematic protein investigative research environment. J Proteomics 2011; 75:122-6. [PMID: 21609792 DOI: 10.1016/j.jprot.2011.05.009] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2011] [Revised: 05/03/2011] [Accepted: 05/05/2011] [Indexed: 12/21/2022]
Abstract
The SPIRE (Systematic Protein Investigative Research Environment) provides web-based experiment-specific mass spectrometry (MS) proteomics analysis (https://www.proteinspire.org). Its emphasis is on usability and integration of the best analytic tools. SPIRE provides an easy to use web-interface and generates results in both interactive and simple data formats. In contrast to run-based approaches, SPIRE conducts the analysis based on the experimental design. It employs novel methods to generate false discovery rates and local false discovery rates (FDR, LFDR) and integrates the best and complementary open-source search and data analysis methods. The SPIRE approach of integrating X!Tandem, OMSSA and SpectraST can produce an increase in protein IDs (52-88%) over current combinations of scoring and single search engines while also providing accurate multi-faceted error estimation. One of SPIRE's primary assets is combining the results with data on protein function, pathways and protein expression from model organisms. We demonstrate some of SPIRE's capabilities by analyzing mitochondrial proteins from the wild type and 3 mutants of C. elegans. SPIRE also connects results to publically available proteomics data through its Model Organism Protein Expression Database (MOPED). SPIRE can also provide analysis and annotation for user supplied protein ID and expression data.
Collapse
Affiliation(s)
- Eugene Kolker
- Bioinformatics & High-throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, WA, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Bauman A, Higdon R, Rapson S, Loiue B, Hogan J, Stacy R, Napuli A, Guo W, van Voorhis W, Roach J, Lu V, Landorf E, Stewart E, Kolker N, Collart F, Myler P, van Belle G, Kolker E. Design and initial characterization of the SC-200 proteomics standard mixture. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2011; 15:73-82. [PMID: 21250827 DOI: 10.1089/omi.2010.0118] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
High-throughput (HTP) proteomics studies generate large amounts of data. Interpretation of these data requires effective approaches to distinguish noise from biological signal, particularly as instrument and computational capacity increase and studies become more complex. Resolving this issue requires validated and reproducible methods and models, which in turn requires complex experimental and computational standards. The absence of appropriate standards and data sets for validating experimental and computational workflows hinders the development of HTP proteomics methods. Most protein standards are simple mixtures of proteins or peptides, or undercharacterized reference standards in which the identity and concentration of the constituent proteins is unknown. The Seattle Children's 200 (SC-200) proposed proteomics standard mixture is the next step toward developing realistic, fully characterized HTP proteomics standards. The SC-200 exhibits a unique modular design to extend its functionality, and consists of 200 proteins of known identities and molar concentrations from 6 microbial genomes, distributed into 10 molar concentration tiers spanning a 1,000-fold range. We describe the SC-200's design, potential uses, and initial characterization. We identified 84% of SC-200 proteins with an LTQ-Orbitrap and 65% with an LTQ-Velos (false discovery rate = 1% for both). There were obvious trends in success rate, sequence coverage, and spectral counts with protein concentration; however, protein identification, sequence coverage, and spectral counts vary greatly within concentration levels.
Collapse
Affiliation(s)
- Andrew Bauman
- Seattle Children's Research Institute, Bioinformatics and High-throughput Analysis Laboratory, Seattle Children's Research Institute, High-throughput Analysis Core, Seattle, Washington 98109, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Hather G, Higdon R, Bauman A, von Haller PD, Kolker E. Estimating false discovery rates for peptide and protein identification using randomized databases. Proteomics 2010; 10:2369-76. [PMID: 20391536 DOI: 10.1002/pmic.200900619] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
MS-based proteomics characterizes protein contents of biological samples. The most common approach is to first match observed MS/MS peptide spectra against theoretical spectra from a protein sequence database and then to score these matches. The false discovery rate (FDR) can be estimated as a function of the score by searching together the protein sequence database and its randomized version and comparing the score distributions of the randomized versus nonrandomized matches. This work introduces a straightforward isotonic regression-based method to estimate the cumulative FDRs and local FDRs (LFDRs) of peptide identification. Our isotonic method not only performed as well as other methods used for comparison, but also has the advantages of being: (i) monotonic in the score, (ii) computationally simple, and (iii) not dependent on assumptions about score distributions. We demonstrate the flexibility of our approach by using it to estimate FDRs and LFDRs for protein identification using summaries of the peptide spectra scores. We reconfirmed that several of these methods were superior to a two-peptide rule. Finally, by estimating both the FDRs and LFDRs, we showed for both peptide and protein identification, moderate FDR values (5%) corresponded to large LFDR values (53 and 60%).
Collapse
Affiliation(s)
- Gregory Hather
- Bioinformatics & High-throughput Analysis Laboratory, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | | | | | | | | |
Collapse
|
9
|
Elliott MH, Smith DS, Parker CE, Borchers C. Current trends in quantitative proteomics. JOURNAL OF MASS SPECTROMETRY : JMS 2009; 44:1637-1660. [PMID: 19957301 DOI: 10.1002/jms.1692] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
It was inevitable that as soon as mass spectrometrists were able to tell biologists which proteins were in their samples, the next question would be how much of these proteins were present. This has turned out to be a much more challenging question. In this review, we describe the multiple ways that mass spectrometry has attempted to address this issue, both for relative quantitation and for absolute quantitation of proteins. There is no single method that will work for every problem or for every sample. What we present here is a variety of techniques, with guidelines that we hope will assist the researcher in selecting the most appropriate technique for the particular biological problem that needs to be addressed. We need to emphasize that this is a very active area of proteomics research-new quantitative methods are continuously being introduced and some 'pitfalls' of older methods are just being discovered. However, even though there is no perfect technique--and a better technique may be developed tomorrow--valuable information on biomarkers and pathways can be obtained using these currently available methods.
Collapse
Affiliation(s)
- Monica H Elliott
- University of Victoria Genome BC Proteomics Centre, British Columbia, V8Z 7X8, Canada
| | | | | | | |
Collapse
|
10
|
Volchenboum SL, Kristjansdottir K, Wolfgeher D, Kron SJ. Rapid validation of Mascot search results via stable isotope labeling, pair picking, and deconvolution of fragmentation patterns. Mol Cell Proteomics 2009; 8:2011-22. [PMID: 19435713 DOI: 10.1074/mcp.m800472-mcp200] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Conventional LC-MS/MS data analysis matches each precursor ion and fragmentation pattern to their best fit within databases of theoretical spectra, yielding a peptide identification. Confidence is estimated by a score but can be validated by statistics, false discovery rates, and/or manual validation. A weakness is that each ion is evaluated independently, discarding potentially useful cross-correlations. In a classical approach to de novo sequence analysis, mixtures of peptides differing only in a carboxyl-terminal isotopic label yield fragmentation spectra with single, unlabeled b-type ions but pairs of isotope-labeled y-type ions, facilitating confident assignments. To apply this principle to identification by fragmentation pattern matching, we developed Validator, software that recognizes isotopic peptide pairs and compares their identifications and fragmentation patterns. Testing Validator 1 on a Mascot results file from FT-ICR LC-MS/MS of (16)O/(18)O-labeled yeast cell lysate peptides yielded 2,775 peptide pairs sharing a common identification but differing in carboxyl-terminal label. Comparing observed b- and y-ions with the predicted fragmentation pattern improved the threshold Mascot score for 5% false discovery from 36 to 22, significantly increasing both sensitivity and specificity. Validator 2, which identifies pairs by precursor mass difference alone before comparing observed fragmentation with that predicted by Mascot, found 2,021 isotopic pairs, similarly achieving improved sensitivity and specificity. Finally Validator 3, which finds pairs based on mass difference alone and then deconvolutes fragmentation patterns independently of Mascot, found 964 predicted peptides. Validator 3 allowed raw mass spectrometry data to be mined not only to validate Mascot results but also to discover peptides missed by Mascot. Using standard desktop hardware, the Validator 1-3 software processed the 11,536 spectra in the 93-MB Mascot .DAT file in less than 6 min (32 spectra/s), revealing high confidence peptide identifications without regard to Mascot score, far faster than manual or other independent validation methods.
Collapse
Affiliation(s)
- Samuel L Volchenboum
- Department of Pediatrics, The University of Chicago, Chicago, Illinois 60637, USA.
| | | | | | | |
Collapse
|