1
|
Bhushan V, Nita-Lazar A. Recent Advancements in Subcellular Proteomics: Growing Impact of Organellar Protein Niches on the Understanding of Cell Biology. J Proteome Res 2024; 23:2700-2722. [PMID: 38451675 PMCID: PMC11296931 DOI: 10.1021/acs.jproteome.3c00839] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
The mammalian cell is a complex entity, with membrane-bound and membrane-less organelles playing vital roles in regulating cellular homeostasis. Organellar protein niches drive discrete biological processes and cell functions, thus maintaining cell equilibrium. Cellular processes such as signaling, growth, proliferation, motility, and programmed cell death require dynamic protein movements between cell compartments. Aberrant protein localization is associated with a wide range of diseases. Therefore, analyzing the subcellular proteome of the cell can provide a comprehensive overview of cellular biology. With recent advancements in mass spectrometry, imaging technology, computational tools, and deep machine learning algorithms, studies pertaining to subcellular protein localization and their dynamic distributions are gaining momentum. These studies reveal changing interaction networks because of "moonlighting proteins" and serve as a discovery tool for disease network mechanisms. Consequently, this review aims to provide a comprehensive repository for recent advancements in subcellular proteomics subcontexting methods, challenges, and future perspectives for method developers. In summary, subcellular proteomics is crucial to the understanding of the fundamental cellular mechanisms and the associated diseases.
Collapse
Affiliation(s)
- Vanya Bhushan
- Functional Cellular Networks Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Aleksandra Nita-Lazar
- Functional Cellular Networks Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| |
Collapse
|
2
|
Kaddour H, Kopcho S, Lyu Y, Shouman N, Paromov V, Pratap S, Dash C, Kim EY, Martinson J, McKay H, Epeldegui M, Margolick JB, Stapleton JT, Okeoma CM. HIV-infection and cocaine use regulate semen extracellular vesicles proteome and miRNAome in a manner that mediates strategic monocyte haptotaxis governed by miR-128 network. Cell Mol Life Sci 2021; 79:5. [PMID: 34936021 PMCID: PMC9134786 DOI: 10.1007/s00018-021-04068-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 11/22/2021] [Accepted: 11/30/2021] [Indexed: 11/29/2022]
Abstract
BACKGROUND Extracellular vesicles (EVs) are regulators of cell-cell interactions and mediators of horizontal transfer of bioactive molecules between cells. EV-mediated cell-cell interactions play roles in physiological and pathophysiological processes, which maybe modulated by exposure to pathogens and cocaine use. However, the effect of pathogens and cocaine use on EV composition and function are not fully understood. RESULTS Here, we used systems biology and multi-omics analysis to show that HIV infection (HIV +) and cocaine (COC) use (COC +) promote the release of semen-derived EVs (SEV) with dysregulated extracellular proteome (exProtein), miRNAome (exmiR), and exmiR networks. Integrating SEV proteome and miRNAome revealed a significant decrease in the enrichment of disease-associated, brain-enriched, and HIV-associated miR-128-3p (miR-128) in HIV + COC + SEV with a concomitant increase in miR-128 targets-PEAK1 and RND3/RhoE. Using two-dimensional-substrate single cell haptotaxis, we observed that in the presence of HIV + COC + SEV, contact guidance provided by the extracellular matrix (ECM, collagen type 1) network facilitated far-ranging haptotactic cues that guided monocytes over longer distances. Functionalizing SEV with a miR-128 mimic revealed that the strategic changes in monocyte haptotaxis are in large part the result of SEV-associated miR-128. CONCLUSIONS We propose that compositionally and functionally distinct HIV + COC + and HIV-COC- SEVs and their exmiR networks may provide cells relevant but divergent haptotactic guidance in the absence of chemotactic cues, under both physiological and pathophysiological conditions.
Collapse
Affiliation(s)
- Hussein Kaddour
- Department of Pharmacology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, 11794-8651, USA
- Regeneron Pharmaceuticals, Inc., Tarrytown, NY, 10591, USA
| | - Steven Kopcho
- Department of Pharmacology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, 11794-8651, USA
| | - Yuan Lyu
- Department of Pharmacology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, 11794-8651, USA
| | - Nadia Shouman
- Department of Pharmacology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, 11794-8651, USA
| | - Victor Paromov
- CRISALIS, School of Graduate Studies and Research, Proteomics Core, Meharry Medical College, Nashville, TN, 37208, USA
| | - Siddharth Pratap
- CRISALIS, School of Graduate Studies and Research, Bioinformatics Core, Meharry Medical College, Nashville, TN, 37208, USA
| | - Chandravanu Dash
- Department of Biochemistry and Cancer Biology, Meharry Medical College, Nashville, TN, 37208, USA
| | - Eun-Young Kim
- Division of Infectious Diseases, Department of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Jeremy Martinson
- Department of Infectious Diseases and Microbiology, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, 15261, USA
| | - Heather McKay
- Department of Epidemiology, The Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Marta Epeldegui
- Department of Obstetrics and Gynecology, David Geffen School of Medicine at UCLA, UCLA AIDS Institute and UCLA Jonsson Comprehensive Cancer Center, Los Angeles, USA
- David Geffen School of Medicine at UCLA, UCLA AIDS Institute, Los Angeles, USA
- UCLA Jonsson Comprehensive Cancer Center, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Joseph B Margolick
- Department of Molecular Microbiology and Immunology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21207, USA
| | - Jack T Stapleton
- Departments of Internal Medicine, Microbiology and Immunology, University of Iowa and Iowa City Veterans Administration Healthcare, Iowa City, IA, 52242-1081, USA
| | - Chioma M Okeoma
- Department of Pharmacology, Stony Brook University Renaissance School of Medicine, Stony Brook, NY, 11794-8651, USA.
| |
Collapse
|
3
|
Zhang F, Deng CK, Wang M, Deng B, Barber R, Huang G. Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq. BMC Bioinformatics 2020; 21:541. [PMID: 33272210 PMCID: PMC7713335 DOI: 10.1186/s12859-020-03824-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 10/19/2020] [Indexed: 01/12/2023] Open
Abstract
Background Alternative splicing isoforms have been reported as a new and robust class of diagnostic biomarkers. Over 95% of human genes are estimated to be alternatively spliced as a powerful means of producing functionally diverse proteins from a single gene. The emergence of next-generation sequencing technologies, especially RNA-seq, provides novel insights into large-scale detection and analysis of alternative splicing at the transcriptional level. Advances in Proteomic Technologies such as liquid chromatography coupled tandem mass spectrometry (LC–MS/MS), have shown tremendous power for the parallel characterization of large amount of proteins in biological samples. Although poor correspondence has been generally found from previous qualitative comparative analysis between proteomics and microarray data, significantly higher degrees of correlation have been observed at the level of exon. Combining protein and RNA data by searching LC–MS/MS data against a customized protein database from RNA-Seq may produce a subset of alternatively spliced protein isoform candidates that have higher confidence. Results We developed a bioinformatics workflow to discover alternative splicing biomarkers from LC–MS/MS using RNA-Seq. First, we retrieved high confident, novel alternative splicing biomarkers from the breast cancer RNA-Seq database. Then, we translated these sequences into in silico Isoform Junction Peptides, and created a customized alternative splicing database for MS searching. Lastly, we ran the Open Mass spectrometry Search Algorithm against the customized alternative splicing database with breast cancer plasma proteome. Twenty six alternative splicing biomarker peptides with one single intron event and one exon skipping event were identified. Further interpretation of biological pathways with our Integrated Pathway Analysis Database showed that these 26 peptides are associated with Cancer, Signaling, Metabolism, Regulation, Immune System and Hemostasis pathways, which are consistent with the 256 alternative splicing biomarkers from the RNA-Seq. Conclusions This paper presents a bioinformatics workflow for using RNA-seq data to discover novel alternative splicing biomarkers from the breast cancer proteome. As a complement to synthetic alternative splicing database technique for alternative splicing identification, this method combines the advantages of two platforms: mass spectrometry and next generation sequencing and can help identify potentially highly sample-specific alternative splicing isoform biomarkers at early-stage of cancer.
Collapse
Affiliation(s)
- Fan Zhang
- Vermont Biomedical Research Network and Department of Biology, University of Vermont, Burlington, VT, 05405, USA. .,Institute for Translational Research and Department of Family Medicine, University of North Texas Health Science Center, Fort Worth, TX, 76107, USA.
| | - Chris K Deng
- School of Molecular and Cellular Biology, University of Illinois at Urbana-Champaign, Champaign, IL, 61801, USA
| | - Mu Wang
- Department of Biochemistry and Molecular Biology, IU School of Medicine, Indianapolis, IN, 46202, USA.,Indiana Center for Systems Biology and Personalized Medicine, Indianapolis, IN, 46202, USA
| | - Bin Deng
- Vermont Biomedical Research Network and Department of Biology, University of Vermont, Burlington, VT, 05405, USA.,Institute for Translational Research and Department of Family Medicine, University of North Texas Health Science Center, Fort Worth, TX, 76107, USA
| | - Robert Barber
- Department of Pharmacology and Neuroscience, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - Gang Huang
- Shanghai Key Laboratory for Molecular Imaging, Shanghai University of Medicine and Health Sciences, Shanghai, 201318, People's Republic of China.
| |
Collapse
|
4
|
The Power of Three in Cannabis Shotgun Proteomics: Proteases, Databases and Search Engines. Proteomes 2020; 8:proteomes8020013. [PMID: 32549361 PMCID: PMC7356525 DOI: 10.3390/proteomes8020013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 06/12/2020] [Accepted: 06/12/2020] [Indexed: 11/29/2022] Open
Abstract
Cannabis research has taken off since the relaxation of legislation, yet proteomics is still lagging. In 2019, we published three proteomics methods aimed at optimizing protein extraction, protein digestion for bottom-up and middle-down proteomics, as well as the analysis of intact proteins for top-down proteomics. The database of Cannabis sativa proteins used in these studies was retrieved from UniProt, the reference repositories for proteins, which is incomplete and therefore underrepresents the genetic diversity of this non-model species. In this fourth study, we remedy this shortcoming by searching larger databases from various sources. We also compare two search engines, the oldest, SEQUEST, and the most popular, Mascot. This shotgun proteomics experiment also utilizes the power of parallel digestions with orthogonal proteases of increasing selectivity, namely chymotrypsin, trypsin/Lys-C and Asp-N. Our results show that the larger the database the greater the list of accessions identified but the longer the duration of the search. Using orthogonal proteases and different search algorithms increases the total number of proteins identified, most of them common despite differing proteases and algorithms, but many of them unique as well.
Collapse
|
5
|
Abstract
Mass spectrometry is extremely efficient for sequencing small peptides generated by, for example, a trypsin digestion of a complex mixture. Current instruments have the capacity to generate 50-100 K MSMS spectra from a single run. Of these ~30-50% is typically assigned to peptide matches on a 1% FDR threshold. The remaining spectra need more research to explain. We address here whether the 30-50% matched spectra provide consensus matches when using different database-dependent search pipelines. Although the majority of the spectra peptide assignments concur across search engines, our conclusion is that database-dependent search engines still require improvements.
Collapse
Affiliation(s)
- Rune Matthiesen
- Computational and Experimental Biology Group, CEDOC, Chronic Diseases Research Centre, NOVA Medical School, Faculdade de Ciências Médicas, Universidade NOVA de Lisboa, Lisboa, Portugal.
| | - Gorka Prieto
- Department of Communications Engineering, Faculty of Engineering of Bilbao, University of the Basque Country (UPV/EHU), Bilbao, Spain
| | - Hans Christian Beck
- Department of Clinical Biochemistry and Pharmacology, Odense University Hospital, Odense C, Denmark
| |
Collapse
|
6
|
R Cerqueira F, Vasconcelos ATR. OCCAM: prediction of small ORFs in bacterial genomes by means of a target-decoy database approach and machine learning techniques. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5989499. [PMID: 33206960 PMCID: PMC7673341 DOI: 10.1093/database/baaa067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 07/11/2020] [Accepted: 07/27/2020] [Indexed: 11/14/2022]
Abstract
Small open reading frames (ORFs) have been systematically disregarded by automatic genome annotation. The difficulty in finding patterns in tiny sequences is the main reason that makes small ORFs to be overlooked by computational procedures. However, advances in experimental methods show that small proteins can play vital roles in cellular activities. Hence, it is urgent to make progress in the development of computational approaches to speed up the identification of potential small ORFs. In this work, our focus is on bacterial genomes. We improve a previous approach to identify small ORFs in bacteria. Our method uses machine learning techniques and decoy subject sequences to filter out spurious ORF alignments. We show that an advanced multivariate analysis can be more effective in terms of sensitivity than applying the simplistic and widely used e-value cutoff. This is particularly important in the case of small ORFs for which alignments present higher e-values than usual. Experiments with control datasets show that the machine learning algorithms used in our method to curate significant alignments can achieve average sensitivity and specificity of 97.06% and 99.61%, respectively. Therefore, an important step is provided here toward the construction of more accurate computational tools for the identification of small ORFs in bacteria.
Collapse
Affiliation(s)
- Fabio R Cerqueira
- Department of Production Engineering, Universidade Federal Fluminense, Rua Domingos Silvério s/n, Petrópolis, 25 650-050, Rio de Janeiro, Brazil.,Graduate Program in Computer Science, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
| | | |
Collapse
|
7
|
Elguoshy A, Hirao Y, Yamamoto K, Xu B, Kinoshita N, Mitsui T, Yamamoto T. Utilization of the Proteome Data Deposited in SRMAtlas for Validating the Existence of the Human Missing Proteins in GPM. J Proteome Res 2019; 18:4197-4205. [PMID: 31646870 DOI: 10.1021/acs.jproteome.9b00355] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The Human Proteome Project (HPP) has made great efforts to clarify the existing evidence of human proteins since 2012. However, according to the recent release of neXtProt (2019-1), approximately 10% of all human genes still have inadequate or no experimental evidence of their translation at the protein level. They were categorized as missing proteins (PE2-PE4). To further the goal of HPP, we developed a two-step bioinformatic strategy addressing the utilization of the SRMAtlas synthetic peptides corresponding to the missing proteins as an exclusive reference in order to explore their natural counterparts within GPM. In the first step, we searched the GPM for the non-nested SRMAtlas peptides corresponding to the missing proteins, taking under consideration only those detected via ≥2 non-nested unitypic/proteotypic peptides "Stranded peptides" with length ≥9 amino acids in the same proteomic study. As a result, 51 missing proteins were newly detected in 35 different proteomic studies. In the second step, we validated these newly detected missing proteins based on matching the spectra of their synthetic and natural peptides in SRMAtlas and GPM, respectively. The results showed that 23 of the missing proteins with ≥2 non-nested peptides were validated by careful spectral matching.
Collapse
Affiliation(s)
- Amr Elguoshy
- Biofluid and Biomarker Center, Graduate School of Medical and Dental Sciences , Niigata University , Niigata 950-2181 , Japan.,Graduate School of Science and Technology , Niigata University , Niigata 950-2181 , Japan.,Biotechnology Department, Faculty of Agriculture , Al-Azhar University , Cairo 11651 , Egypt
| | - Yoshitoshi Hirao
- Biofluid and Biomarker Center, Graduate School of Medical and Dental Sciences , Niigata University , Niigata 950-2181 , Japan
| | - Keiko Yamamoto
- Biofluid and Biomarker Center, Graduate School of Medical and Dental Sciences , Niigata University , Niigata 950-2181 , Japan
| | - Bo Xu
- Biofluid and Biomarker Center, Graduate School of Medical and Dental Sciences , Niigata University , Niigata 950-2181 , Japan
| | - Naohiko Kinoshita
- Biofluid and Biomarker Center, Graduate School of Medical and Dental Sciences , Niigata University , Niigata 950-2181 , Japan.,Department of Health Informatics , Niigata University of Health and Welfare , Niigata 950-3102 , Japan
| | - Toshiaki Mitsui
- Graduate School of Science and Technology , Niigata University , Niigata 950-2181 , Japan
| | - Tadashi Yamamoto
- Biofluid and Biomarker Center, Graduate School of Medical and Dental Sciences , Niigata University , Niigata 950-2181 , Japan.,Department of Clinical Laboratory , Shinrakuen Hospital , Niigata 950-2087 , Japan
| |
Collapse
|
8
|
Liu S, Xu F, Yin Y, Zhang J, Wang F, Li Y, Xu P. LysargiNase enhances protein identification on the basis of trypsin on formalin-fixed paraffin-embedded samples. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2019; 33:1381-1389. [PMID: 31066118 DOI: 10.1002/rcm.8479] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 04/30/2019] [Accepted: 05/01/2019] [Indexed: 06/09/2023]
Abstract
RATIONALE Formalin-Fixed Paraffin-Embedded (FFPE) samples are valuable for proteomic studies of disease. However, the crosslink among proteins, protein vs nucleic acid, and other covalent chemical modifications like methylation introduced by formaldehyde can interfere with trypsin digestion in proteomics studies. LysargiNase was reported to have a better full-cleavage rate at methylation and b ion coverage than trypsin. The contribution of LysargiNase in the proteomic study of FFPE samples was assessed and compared with trypsin in this study for the first time to facilitate proteomic research on FFPE samples. METHODS The FFPE proteins were extracted with an "antigen retrieval" method. Digestion parameters were optimized by visualization of the digests on the tricine gel by silver staining. Then the FFPE proteins were separated by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and cut into 16 gel bands and in-gel digested by trypsin and LysargiNase, respectively. Peptides were desalted with Stage-Tips and separated via liquid chromatography. Electrospray ionization was conducted and peptide mass was measured in the LTQ Orbitrap Velos in the data-dependent mode. RESULTS High concentrations of enzyme facilitate the digestion efficiency of FFPE samples. A total of 32,294 peptides and 3445 proteins were identified with LysargiNase and trypsin combined in two replicates. LysargiNase increased peptide identification by 18.9% and protein identification by 13.4% on the basis of trypsin. Consistently, LysargiNase increased C-terminal peptide identification by 47.7%. Moreover, LysargiNase showed better full-cleavage rate (49.3%) at methylated sites than trypsin (23.9%). LysargiNase and trypsin combined can improve the b-ion coverage by 50% on FFPE samples. CONCLUSIONS FFPE samples can be more efficiently digested at high concentrations of LysargiNase and trypsin. LysargiNase can better digest methylated peptides and improve the proteome identification by 13.4% and the b-ion coverage by 50% on the basis of trypsin in FFPE samples.
Collapse
Affiliation(s)
- Shu Liu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Feng Xu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Yanan Yin
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery of the Ministry of Education, School of Pharmaceutical Sciences, Wuhan University, Wuhan, 430072, China
| | - Junling Zhang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Fuqiang Wang
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Yanchang Li
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
| | - Ping Xu
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery of the Ministry of Education, School of Pharmaceutical Sciences, Wuhan University, Wuhan, 430072, China
- Anhui Medical University, Hefei, 230032, China
| |
Collapse
|
9
|
Dou M, Tsai CF, Piehowski PD, Wang Y, Fillmore TL, Zhao R, Moore RJ, Zhang P, Qian WJ, Smith RD, Liu T, Kelly RT, Shi T, Zhu Y. Automated Nanoflow Two-Dimensional Reversed-Phase Liquid Chromatography System Enables In-Depth Proteome and Phosphoproteome Profiling of Nanoscale Samples. Anal Chem 2019; 91:9707-9715. [PMID: 31241912 DOI: 10.1021/acs.analchem.9b01248] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Two-dimensional reversed-phase capillary liquid chromatography (2D RPLC) separations have enabled comprehensive proteome profiling of biological systems. However, milligram sample quantities of proteins are typically required due to significant losses during offline fractionation. Such a large sample requirement generally precludes the application samples in the nanogram to low-microgram range. To achieve in-depth proteomic analysis of such small-sized samples, we have developed the nanoFAC (nanoflow Fractionation and Automated Concatenation) 2D RPLC platform, in which the first dimension high-pH fractionation was performed on a 75-μm i.d. capillary column at a 300 nL/min flow rate with automated fraction concatenation, instead of on a typically used 2.1 mm column at a 200 μL/min flow rate with manual concatenation. Each fraction was then fully transferred to the second-dimension low-pH nanoLC separation using an autosampler equipped with a custom-machined syringe. We have found that using a polypropylene 96-well plate as collection device as well as the addition of n-Dodecyl β-d-maltoside (0.01%) in the collection buffer can significantly improve sample recovery. We have demonstrated the nanoFAC 2D RPLC platform can achieve confident identifications of ∼49,000-94,000 unique peptides, corresponding to ∼6,700-8,300 protein groups using only 100-1000 ng of HeLa tryptic digest (equivalent to ∼500-5,000 cells). Furthermore, by integrating with phosphopeptide enrichment, the nanoFAC 2D RPLC platform can identify ∼20,000 phosphopeptides from 100 μg of MCF-7 cell lysate.
Collapse
Affiliation(s)
- Maowei Dou
- Environmental Molecular Sciences Laboratory , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| | - Chia-Feng Tsai
- Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| | - Paul D Piehowski
- Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| | - Yang Wang
- Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| | - Thomas L Fillmore
- Environmental Molecular Sciences Laboratory , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| | - Rui Zhao
- Environmental Molecular Sciences Laboratory , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| | - Ronald J Moore
- Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| | - Pengfei Zhang
- Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| | - Wei-Jun Qian
- Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| | - Richard D Smith
- Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| | - Tao Liu
- Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| | - Ryan T Kelly
- Environmental Molecular Sciences Laboratory , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States.,Department of Chemistry and Biochemistry , Brigham Young University , Provo , Utah 84604 , United States
| | - Tujin Shi
- Biological Sciences Division , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| | - Ying Zhu
- Environmental Molecular Sciences Laboratory , Pacific Northwest National Laboratory , Richland , Washington 99354 , United States
| |
Collapse
|
10
|
Lisitsa AV, Petushkova NA, Levitsky LI, Zgoda VG, Larina OV, Kisrieva YS, Frankevich VE, Gamidov SI. Comparative Analysis of the Performаnce of Mascot and IdentiPy Algorithms on a Benchmark Dataset Obtained by Tandem Mass Spectrometry Analysis of Testicular Biopsies. Mol Biol 2019. [DOI: 10.1134/s0026893319010096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
11
|
Wang X, Jones DR, Shaw TI, Cho JH, Wang Y, Tan H, Xie B, Zhou S, Li Y, Peng J. Target-Decoy-Based False Discovery Rate Estimation for Large-Scale Metabolite Identification. J Proteome Res 2018; 17:2328-2334. [PMID: 29790753 DOI: 10.1021/acs.jproteome.8b00019] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Metabolite identification is a crucial step in mass spectrometry (MS)-based metabolomics. However, it is still challenging to assess the confidence of assigned metabolites. We report a novel method for estimating the false discovery rate (FDR) of metabolite assignment with a target-decoy strategy, in which the decoys are generated through violating the octet rule of chemistry by adding small odd numbers of hydrogen atoms. The target-decoy strategy was integrated into JUMPm, an automated metabolite identification pipeline for large-scale MS analysis and was also evaluated with two other metabolomics tools, mzMatch and MZmine 2. The reliability of FDR calculation was examined by false data sets, which were simulated by altering MS1 or MS2 spectra. Finally, we used the JUMPm pipeline coupled to the target-decoy strategy to process unlabeled and stable-isotope-labeled metabolomic data sets. The results demonstrate that the target-decoy strategy is a simple and effective method for evaluating the confidence of high-throughput metabolite identification.
Collapse
|
12
|
Anapindi KDB, Romanova EV, Southey BR, Sweedler JV. Peptide identifications and false discovery rates using different mass spectrometry platforms. Talanta 2018; 182:456-463. [PMID: 29501178 DOI: 10.1016/j.talanta.2018.01.062] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 01/28/2018] [Indexed: 12/29/2022]
Abstract
Characterization of endogenous neuropeptides produced from post-translational proteolytic processing of precursor proteins is a demanding task. A variety of complex prohormone processing steps generate molecular diversity from neuropeptide prohormones, making in silico neuropeptide discovery difficult. In addition, the wide range of endogenous peptide concentrations as well as significant peptide complexity further challenge the structural characterization of neuropeptides. Liquid chromatography-mass spectrometry (MS), performed in conjunction with bioinformatics, allows for high-throughput characterization of peptides. Mass analyzers and molecular dissociation techniques render specific characteristics to the acquired data and thus, influence the analysis of the MS data using bioinformatic algorithms for follow-up peptide identification. Here we evaluated the efficacy of several distinct peptidomic workflows using two mass spectrometers, the Thermo Orbitrap Fusion Tribrid and Bruker Impact HD UHR-QqTOF, for confident peptide discovery and characterization. We compared the results in several categories, including the numbers of identified peptides, full-length mature neuropeptides among all identifications, and precursor proteins mapped by the identified peptides. We also characterized the peptide false discovery rate (FDR) based on the occurrence of amidation, a known post-translational modification (PTM) that has been shown to require the presence of a C-terminal glycine. Thus, amidation events without a preceding glycine were considered false-positive amidation assignments. We compared the FDR calculated by the search engine used here to the minimum FDR estimated via false amidation assignments. The search engine severely underestimated the rate of false PTM assignments among the identified peptides, regardless of the specific MS platform used.
Collapse
Affiliation(s)
- Krishna D B Anapindi
- Department of Chemistry and the Beckman Institute, University of Illinois at Urbana-Champaign, Urbana 61801, IL, USA
| | - Elena V Romanova
- Department of Chemistry and the Beckman Institute, University of Illinois at Urbana-Champaign, Urbana 61801, IL, USA
| | - Bruce R Southey
- Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana 61801, IL, USA
| | - Jonathan V Sweedler
- Department of Chemistry and the Beckman Institute, University of Illinois at Urbana-Champaign, Urbana 61801, IL, USA.
| |
Collapse
|
13
|
Dufresne J, Florentinus-Mefailoski A, Ajambo J, Ferwa A, Bowden P, Marshall J. Random and independent sampling of endogenous tryptic peptides from normal human EDTA plasma by liquid chromatography micro electrospray ionization and tandem mass spectrometry. Clin Proteomics 2017; 14:41. [PMID: 29234243 PMCID: PMC5721679 DOI: 10.1186/s12014-017-9176-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 11/26/2017] [Indexed: 12/12/2022] Open
Abstract
Background Normal human EDTA plasma samples were collected on ice, processed ice cold, and stored in a freezer at – 80 °C prior to experiments. Plasma test samples from the – 80 °C freezer were thawed on ice or intentionally warmed to room temperature. Methods Protein content was measured by CBBR binding and the release of alcohol soluble amines by the Cd ninhydrin assay. Plasma peptides released over time were collected over C18 for random and independent sampling by liquid chromatography micro electrospray ionization and tandem mass spectrometry (LC–ESI–MS/MS) and correlated with X!TANDEM. Results Fully tryptic peptides by X!TANDEM returned a similar set of proteins, but was more computationally efficient, than “no enzyme” correlations. Plasma samples maintained on ice, or ice with a cocktail of protease inhibitors, showed lower background amounts of plasma peptides compared to samples incubated at room temperature. Regression analysis indicated that warming plasma to room temperature, versus ice cold, resulted in a ~ twofold increase in the frequency of peptide identification over hours–days of incubation at room temperature. The type I error rate of the protein identification from the X!TANDEM algorithm combined was estimated to be low compared to a null model of computer generated random MS/MS spectra. Conclusion The peptides of human plasma were identified and quantified with low error rates by random and independent sampling that revealed 1000s of peptides from hundreds of human plasma proteins from endogenous tryptic peptides. Electronic supplementary material The online version of this article (10.1186/s12014-017-9176-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jaimie Dufresne
- Ryerson University, 350 Victoria Street, Toronto, ON M5B 2K3 Canada
| | | | - Juliet Ajambo
- Ryerson University, 350 Victoria Street, Toronto, ON M5B 2K3 Canada
| | - Ammara Ferwa
- Ryerson University, 350 Victoria Street, Toronto, ON M5B 2K3 Canada
| | - Peter Bowden
- Ryerson University, 350 Victoria Street, Toronto, ON M5B 2K3 Canada
| | - John Marshall
- Ryerson University, 350 Victoria Street, Toronto, ON M5B 2K3 Canada.,Integrated BioBank of Luxembourg, 6 r. Nicolas-Ernest Barblé, Dudelange, 1210 Luxembourg
| |
Collapse
|
14
|
Guruceaga E, Garin-Muga A, Prieto G, Bejarano B, Marcilla M, Marín-Vicente C, Perez-Riverol Y, Casal JI, Vizcaíno JA, Corrales FJ, Segura V. Enhanced Missing Proteins Detection in NCI60 Cell Lines Using an Integrative Search Engine Approach. J Proteome Res 2017; 16:4374-4390. [PMID: 28960077 PMCID: PMC5737412 DOI: 10.1021/acs.jproteome.7b00388] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Indexed: 12/17/2022]
Abstract
The Human Proteome Project (HPP) aims deciphering the complete map of the human proteome. In the past few years, significant efforts of the HPP teams have been dedicated to the experimental detection of the missing proteins, which lack reliable mass spectrometry evidence of their existence. In this endeavor, an in depth analysis of shotgun experiments might represent a valuable resource to select a biological matrix in design validation experiments. In this work, we used all the proteomic experiments from the NCI60 cell lines and applied an integrative approach based on the results obtained from Comet, Mascot, OMSSA, and X!Tandem. This workflow benefits from the complementarity of these search engines to increase the proteome coverage. Five missing proteins C-HPP guidelines compliant were identified, although further validation is needed. Moreover, 165 missing proteins were detected with only one unique peptide, and their functional analysis supported their participation in cellular pathways as was also proposed in other studies. Finally, we performed a combined analysis of the gene expression levels and the proteomic identifications from the common cell lines between the NCI60 and the CCLE project to suggest alternatives for further validation of missing protein observations.
Collapse
Affiliation(s)
- Elizabeth Guruceaga
- Bioinformatics
Unit, Center for Applied Medical Research, University of Navarra, Pamplona 31008, Spain
- IdiSNA, Navarra Institute for Health Research, Pamplona 31008, Spain
| | - Alba Garin-Muga
- Bioinformatics
Unit, Center for Applied Medical Research, University of Navarra, Pamplona 31008, Spain
| | - Gorka Prieto
- Department
of Communications Engineering, University
of the Basque Country (UPV/EHU), Bilbao 48013, Spain
| | | | - Miguel Marcilla
- Proteomics
Unit, Spanish National Biotechnology Centre,
CSIC, Madrid 28049, Spain
| | - Consuelo Marín-Vicente
- Functional
Proteomics, Department of Cellular and Molecular Medicine and Proteomic Facility, Centro de Investigaciones Biológicas (CIB-CSIC), Ramiro de Maeztu 9, Madrid 28040, Spain
| | - Yasset Perez-Riverol
- European
Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K.
| | - J. Ignacio Casal
- Functional
Proteomics, Department of Cellular and Molecular Medicine and Proteomic Facility, Centro de Investigaciones Biológicas (CIB-CSIC), Ramiro de Maeztu 9, Madrid 28040, Spain
| | - Juan Antonio Vizcaíno
- European
Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K.
| | - Fernando J. Corrales
- Proteomics
Unit, Spanish National Biotechnology Centre,
CSIC, Madrid 28049, Spain
| | - Victor Segura
- Bioinformatics
Unit, Center for Applied Medical Research, University of Navarra, Pamplona 31008, Spain
- IdiSNA, Navarra Institute for Health Research, Pamplona 31008, Spain
| |
Collapse
|
15
|
Elguoshy A, Hirao Y, Xu B, Saito S, Quadery AF, Yamamoto K, Mitsui T, Yamamoto T. Identification and Validation of Human Missing Proteins and Peptides in Public Proteome Databases: Data Mining Strategy. J Proteome Res 2017; 16:4403-4414. [PMID: 28980472 DOI: 10.1021/acs.jproteome.7b00423] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In an attempt to complete human proteome project (HPP), Chromosome-Centric Human Proteome Project (C-HPP) launched the journey of missing protein (MP) investigation in 2012. However, 2579 and 572 protein entries in the neXtProt (2017-1) are still considered as missing and uncertain proteins, respectively. Thus, in this study, we proposed a pipeline to analyze, identify, and validate human missing and uncertain proteins in open-access transcriptomics and proteomics databases. Analysis of RNA expression pattern for missing proteins in Human protein Atlas showed that 28% of them, such as Olfactory receptor 1I1 ( O60431 ), had no RNA expression, suggesting the necessity to consider uncommon tissues for transcriptomic and proteomic studies. Interestingly, 21% had elevated expression level in a particular tissue (tissue-enriched proteins), indicating the importance of targeting such proteins in their elevated tissues. Additionally, the analysis of RNA expression level for missing proteins showed that 95% had no or low expression level (0-10 transcripts per million), indicating that low abundance is one of the major obstacles facing the detection of missing proteins. Moreover, missing proteins are predicted to generate fewer predicted unique tryptic peptides than the identified proteins. Searching for these predicted unique tryptic peptides that correspond to missing and uncertain proteins in the experimental peptide list of open-access MS-based databases (PA, GPM) resulted in the detection of 402 missing and 19 uncertain proteins with at least two unique peptides (≥9 aa) at <(5 × 10-4)% FDR. Finally, matching the native spectra for the experimentally detected peptides with their SRMAtlas synthetic counterparts at three transition sources (QQQ, QTOF, QTRAP) gave us an opportunity to validate 41 missing proteins by ≥2 proteotypic peptides.
Collapse
Affiliation(s)
- Amr Elguoshy
- Biofluid and Biomarker Center, Niigata University , Niigata 950-2181, Japan.,Graduate School of Science and Technology, Niigata University , Niigata 950-2181, Japan.,Biotechnology Department - Faculty of Agriculture, Al-azhar University , Cairo 11651, Egypt
| | - Yoshitoshi Hirao
- Biofluid and Biomarker Center, Niigata University , Niigata 950-2181, Japan
| | - Bo Xu
- Biofluid and Biomarker Center, Niigata University , Niigata 950-2181, Japan
| | - Suguru Saito
- Biofluid and Biomarker Center, Niigata University , Niigata 950-2181, Japan
| | - Ali F Quadery
- Biofluid and Biomarker Center, Niigata University , Niigata 950-2181, Japan
| | - Keiko Yamamoto
- Biofluid and Biomarker Center, Niigata University , Niigata 950-2181, Japan
| | - Toshiaki Mitsui
- Graduate School of Science and Technology, Niigata University , Niigata 950-2181, Japan
| | - Tadashi Yamamoto
- Biofluid and Biomarker Center, Niigata University , Niigata 950-2181, Japan
| | | |
Collapse
|
16
|
Cerqueira FR, Ricardo AM, de Paiva Oliveira A, Graber A, Baumgartner C. MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm. BMC Bioinformatics 2016; 17:472. [PMID: 28105913 PMCID: PMC5249030 DOI: 10.1186/s12859-016-1341-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND This work presents a machine learning strategy to increase sensitivity in tandem mass spectrometry (MS/MS) data analysis for peptide/protein identification. MS/MS yields thousands of spectra in a single run which are then interpreted by software. Most of these computer programs use a protein database to match peptide sequences to the observed spectra. The peptide-spectrum matches (PSMs) must also be assessed by computational tools since manual evaluation is not practicable. The target-decoy database strategy is largely used for error estimation in PSM assessment. However, in general, that strategy does not account for sensitivity. RESULTS In a previous study, we proposed the method MUMAL that applies an artificial neural network to effectively generate a model to classify PSMs using decoy hits with increased sensitivity. Nevertheless, the present approach shows that the sensitivity can be further improved with the use of a cost matrix associated with the learning algorithm. We also demonstrate that using a threshold selector algorithm for probability adjustment leads to more coherent probability values assigned to the PSMs. Our new approach, termed MUMAL2, provides a two-fold contribution to shotgun proteomics. First, the increase in the number of correctly interpreted spectra in the peptide level augments the chance of identifying more proteins. Second, the more appropriate PSM probability values that are produced by the threshold selector algorithm impact the protein inference stage performed by programs that take probabilities into account, such as ProteinProphet. Our experiments demonstrate that MUMAL2 reached around 15% of improvement in sensitivity compared to the best current method. Furthermore, the area under the ROC curve obtained was 0.93, demonstrating that the probabilities generated by our model are in fact appropriate. Finally, Venn diagrams comparing MUMAL2 with the best current method show that the number of exclusive peptides found by our method was nearly 4-fold higher, which directly impacts the proteome coverage. CONCLUSIONS The inclusion of a cost matrix and a probability threshold selector algorithm to the learning task further improves the target-decoy database analysis for identifying peptides, which optimally contributes to the challenging task of protein level identification, resulting in a powerful computational tool for shotgun proteomics.
Collapse
Affiliation(s)
| | - Adilson Mendes Ricardo
- Department of Informatics, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil.,Department of Computing and Construction, Centro Federal de Educação Tecnológica de Minas Gerais, Rua 19 de Novembro, 121, Timóteo, 35180-008, Brazil
| | - Alcione de Paiva Oliveira
- Department of Informatics, Universidade Federal de Viçosa, Viçosa, 36570-900, Brazil.,Department of Computer Science, University of Sheffield, Western Bank, S10 2TN, Sheffield, UK
| | - Armin Graber
- Research and Product Development of Genoptix, a Novartis company, 2110 Rutherford Rd, Carlsbad, 92008, USA
| | - Christian Baumgartner
- Institute of Health Care Engineering with European Notified Body of Medical Devices, Graz University of Technology, Stremayrgasse 16/II, Graz, A-8010, Austria
| |
Collapse
|
17
|
Park GW, Hwang H, Kim KH, Lee JY, Lee HK, Park JY, Ji ES, Park SKR, Yates JR, Kwon KH, Park YM, Lee HJ, Paik YK, Kim JY, Yoo JS. Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate. J Proteome Res 2016; 15:4082-4090. [PMID: 27537616 DOI: 10.1021/acs.jproteome.6b00376] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).
Collapse
Affiliation(s)
- Gun Wook Park
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon 34134, Republic of Korea
| | - Heeyoun Hwang
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea
| | - Kwang Hoe Kim
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon 34134, Republic of Korea
| | - Ju Yeon Lee
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea
| | - Hyun Kyoung Lee
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon 34134, Republic of Korea
| | - Ji Yeong Park
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon 34134, Republic of Korea
| | - Eun Sun Ji
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea
| | - Sung-Kyu Robin Park
- Department of Chemical Physiology, The Scripps Research Institute , La Jolla, California 92037, United States
| | - John R Yates
- Department of Chemical Physiology, The Scripps Research Institute , La Jolla, California 92037, United States
| | - Kyung-Hoon Kwon
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea
| | - Young Mok Park
- Center for Cognition and Sociality, Institute for Basic Science , Daejeon 305-811, Republic of Korea
| | - Hyoung-Joo Lee
- Yonsei Proteome Research Center and Department of Integrated OMICS for Biomedical Science, and Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University , Seoul 120-749, Republic of Korea
| | - Young-Ki Paik
- Yonsei Proteome Research Center and Department of Integrated OMICS for Biomedical Science, and Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University , Seoul 120-749, Republic of Korea
| | - Jin Young Kim
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea
| | - Jong Shin Yoo
- Biomedical Omics Group, Korea Basic Science Institute , 162 YeonGuDanji-Ro, Ochang 363-883, Republic of Korea.,Graduate School of Analytical Science and Technology, Chungnam National University , Daejeon 34134, Republic of Korea
| |
Collapse
|
18
|
Abstract
In vivo isotopic labeling coupled with high-resolution proteomics is used to investigate primary metabolism in techniques such as stable isotope probing (protein-SIP) and peptide-based metabolic flux analysis (PMFA). Isotopic enrichment of carbon substrates and intracellular metabolism determine the distribution of isotopes within amino acids. The resulting amino acid mass distributions (AMDs) are convoluted into peptide mass distributions (PMDs) during protein synthesis. With no a priori knowledge on metabolic fluxes, the PMDs are therefore unknown. This complicates labeled peptide identification because prior knowledge on PMDs is used in all available peptide identification software. An automated framework for the identification and quantification of PMDs for nonuniformly labeled samples is therefore lacking. To unlock the potential of peptide labeling experiments for high-throughput flux analysis and other complex labeling experiments, an unsupervised peptide identification and quantification method was developed that uses discrete deconvolution of mass distributions of identified peptides to inform on the mass distributions of otherwise unidentifiable peptides. Uniformly (13)C-labeled Escherichia coli protein was used to test the developed feature reconstruction and deconvolution algorithms. The peptide identification was validated by comparing MS(2)-identified peptides to peptides identified from PMDs using unlabeled E. coli protein. Nonuniformly labeled Glycine max protein was used to demonstrate the technology on a representative sample suitable for flux analysis. Overall, automatic peptide identification and quantification were comparable or superior to manual extraction, enabling proteomics-based technology for high-throughput flux analysis studies.
Collapse
Affiliation(s)
- Joshua E Goldford
- Biotechnology Institute, University of Minnesota , Saint Paul, Minnesota 55108, United States
| | - Igor G L Libourel
- Biotechnology Institute, University of Minnesota , Saint Paul, Minnesota 55108, United States
- Department of Plant Biology, 1500 Gortner Avenue, University of Minnesota , Saint Paul, Minnesota 55108, United States
| |
Collapse
|
19
|
Ivanov MV, Levitsky LI, Lobas AA, Tarasova IA, Pridatchenko ML, Zgoda VG, Moshkovskii SA, Mitulovic G, Gorshkov MV. Peptide identification in “shotgun” proteomics using tandem mass spectrometry: Comparison of search engine algorithms. JOURNAL OF ANALYTICAL CHEMISTRY 2015. [DOI: 10.1134/s1061934815140075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
20
|
Quantitative proteomic analysis of Shigella flexneri and Shigella sonnei Generalized Modules for Membrane Antigens (GMMA) reveals highly pure preparations. Int J Med Microbiol 2015; 306:99-108. [PMID: 26746581 PMCID: PMC4820968 DOI: 10.1016/j.ijmm.2015.12.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2015] [Revised: 11/15/2015] [Accepted: 12/18/2015] [Indexed: 11/20/2022] Open
Abstract
Outer membrane blebs are naturally shed by Gram-negative bacteria and are candidates of interest for vaccines development. Genetic modification of bacteria to induce hyperblebbing greatly increases the yield of blebs, called Generalized Modules for Membrane Antigens (GMMA). The composition of the GMMA from hyperblebbing mutants of Shigella flexneri 2a and Shigella sonnei were quantitatively analyzed using high-sensitivity mass spectrometry with the label-free iBAQ procedure and compared to the composition of the solubilized cells of the GMMA-producing strains. There were 2306 proteins identified, 659 in GMMA and 2239 in bacteria, of which 290 (GMMA) and 1696 (bacteria) were common to both S. flexneri 2a and S. sonnei. Predicted outer membrane and periplasmic proteins constituted 95.7% and 98.7% of the protein mass of S. flexneri 2a and S. sonnei GMMA, respectively. Among the remaining proteins, small quantities of ribosomal proteins collectively accounted for more than half of the predicted cytoplasmic protein impurities in the GMMA. In GMMA, the outer membrane and periplasmic proteins were enriched 13.3-fold (S. flexneri 2a) and 8.3-fold (S. sonnei) compared to their abundance in the parent bacteria. Both periplasmic and outer membrane proteins were enriched similarly, suggesting that GMMA have a similar surface to volume ratio as the surface to periplasmic volume ratio in these mutant bacteria. Results in S. flexneri 2a and S. sonnei showed high reproducibility indicating a robust GMMA-producing process and the low contamination by cytoplasmic proteins support the use of GMMA for vaccines. Data are available via ProteomeXchange with identifier PXD002517.
Collapse
|
21
|
Griss J, Perez-Riverol Y, Hermjakob H, Vizcaíno JA. Identifying novel biomarkers through data mining-a realistic scenario? Proteomics Clin Appl 2015; 9:437-43. [PMID: 25347964 PMCID: PMC4833187 DOI: 10.1002/prca.201400107] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Revised: 10/08/2014] [Accepted: 10/21/2014] [Indexed: 12/12/2022]
Abstract
In this article we discuss the requirements to use data mining of published proteomics datasets to assist proteomics-based biomarker discovery, the use of external data integration to solve the issue of inadequate small sample sizes and finally, we try to estimate the probability that new biomarkers will be identified through data mining alone.
Collapse
Affiliation(s)
- Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK; Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria
| | | | | | | |
Collapse
|
22
|
Schnell G, Boeuf A, Jaulhac B, Boulanger N, Collin E, Barthel C, De Martino S, Ehret-Sabatier L. Proteomic analysis of three Borrelia burgdorferi sensu lato native species and disseminating clones: relevance for Lyme vaccine design. Proteomics 2015; 15:1280-90. [PMID: 25475896 DOI: 10.1002/pmic.201400177] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Revised: 11/06/2014] [Accepted: 11/28/2014] [Indexed: 11/10/2022]
Abstract
Lyme borreliosis is the most important vector-borne disease in the Northern hemisphere. It is caused by Borrelia burgdorferi sensu lato bacteria transmitted to humans by the bite of hard ticks, Ixodes spp. Although antibiotic treatments are efficient in the early stage of the infection, a significant number of patients develop disseminated manifestations (articular, neurological, and cutaneous) due to unnoticed or absence of erythema migrans, or to inappropriate treatment. Vaccine could be an efficient approach to decrease Lyme disease incidence. We have developed a proteomic approach based on a one dimensional gel electrophoresis followed by LC-MS/MS strategy to identify new vaccine candidates. We analyzed a disseminating clone and the associated wild-type strain for each major pathogenic Borrelia species: B. burgdorferi sensu stricto, B. garinii, and B. afzelii. We identified specific proteins and common proteins to the disseminating clones of the three main species. In parallel, we used a spectral counting strategy to identify upregulated proteins common to the clones. Finally, 40 proteins were found that could potentially be involved in bacterial virulence and of interest in the development of a new vaccine. We selected the three proteins specifically detected in the disseminating clones of the three Borrelia species and checked by RT-PCR whether they are expressed in mouse skin upon B. burgdorferi ss inoculation. Interestingly, BB0566 appears as a potential vaccine candidate. All MS data have been deposited in the ProteomeXchange with identifier PXD000876 (http://proteomecentral.proteomexchange.org/dataset/PXD000876).
Collapse
Affiliation(s)
- Gilles Schnell
- Laboratoire de Spectrométrie de Masse BioOrganique, Institut Pluridisciplinaire Hubert Curien, Université de Strasbourg, Strasbourg, France
| | | | | | | | | | | | | | | |
Collapse
|
23
|
Küster SK, Pabst M, Zenobi R, Dittrich PS. Automatisierte Detektion von Proteinphosphorylierung durch Nanoliter-Enzymreaktionen auf Mikroarrays. Angew Chem Int Ed Engl 2014. [DOI: 10.1002/ange.201409440] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
24
|
Küster SK, Pabst M, Zenobi R, Dittrich PS. Screening for protein phosphorylation using nanoscale reactions on microdroplet arrays. Angew Chem Int Ed Engl 2014; 54:1671-5. [PMID: 25504774 DOI: 10.1002/anie.201409440] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Indexed: 12/25/2022]
Abstract
We present a novel and straightforward screening method to detect protein phosphorylations in complex protein mixtures. A proteolytic digest is separated by a conventional nanoscale liquid chromatography (nano-LC) separation and the eluate is immediately compartmentalized into microdroplets, which are spotted on a microarray MALDI plate. Subsequently, the enzyme alkaline phosphatase is applied to every second microarray spot to remove the phosphate groups from phosphorylated peptides, which results in a mass shift of n×-80 Da. The MALDI-MS scan of the microarray is then evaluated by a software algorithm to automatically identify the phosphorylated peptides by exploiting the characteristic chromatographic peak profile induced by the phosphatase treatment. This screening method does not require extensive MS/MS experiments or peak list evaluation and can be easily extended to other enzymatic or chemical reactions.
Collapse
Affiliation(s)
- Simon K Küster
- Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 3, 8093 Zurich (Switzerland)
| | | | | | | |
Collapse
|
25
|
Zaccarin M, Falda M, Roveri A, Bosello-Travain V, Bordin L, Maiorino M, Ursini F, Toppo S. Quantitative label-free redox proteomics of reversible cysteine oxidation in red blood cell membranes. Free Radic Biol Med 2014; 71:90-98. [PMID: 24642086 DOI: 10.1016/j.freeradbiomed.2014.03.013] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Revised: 03/01/2014] [Accepted: 03/04/2014] [Indexed: 01/06/2023]
Abstract
Reversible oxidation of cysteine residues is a relevant posttranslational modification of proteins. However, the low activation energy and transitory nature of the redox switch and the intrinsic complexity of the analysis render quite challenging the aim of a rigorous high-throughput screening of the redox status of redox-sensitive cysteine residues. We describe here a quantitative workflow for redox proteomics, where the ratio between the oxidized forms of proteins in the control vs treated samples is determined by a robust label-free approach. We critically present the convenience of the procedure by specifically addressing the following aspects: (i) the accurate ratio, calculated from the whole set of identified peptides rather than just isotope-tagged fragments; (ii) the application of a robust analytical pipeline to frame the most consistent data averaged over the biological variability; (iii) the relevance of using stringent criteria of analysis, even at the cost of losing potentially interesting but statistically uncertain data. The pipeline has been assessed on red blood cell membrane challenged with diamide as a model of a mild oxidative condition. The cluster of identified proteins encompassed components of the cytoskeleton more oxidized. Indirectly, our analysis confirmed the previous observation that oxidized hemoglobin binds to membranes while oxidized peroxiredoxin 2 loses affinity.
Collapse
Affiliation(s)
- Mattia Zaccarin
- Department of Molecular Medicine, via A. Gabelli, 63, I-35121 Padova, Italy
| | - Marco Falda
- Department of Molecular Medicine, via A. Gabelli, 63, I-35121 Padova, Italy
| | - Antonella Roveri
- Department of Molecular Medicine, via A. Gabelli, 63, I-35121 Padova, Italy
| | | | - Luciana Bordin
- Department of Molecular Medicine, via A. Gabelli, 63, I-35121 Padova, Italy
| | - Matilde Maiorino
- Department of Molecular Medicine, via A. Gabelli, 63, I-35121 Padova, Italy
| | - Fulvio Ursini
- Department of Molecular Medicine, via A. Gabelli, 63, I-35121 Padova, Italy
| | - Stefano Toppo
- Department of Molecular Medicine, via A. Gabelli, 63, I-35121 Padova, Italy.
| |
Collapse
|
26
|
Dupae J, Bohler S, Noben JP, Carpentier S, Vangronsveld J, Cuypers A. Problems inherent to a meta-analysis of proteomics data: a case study on the plants' response to Cd in different cultivation conditions. J Proteomics 2014; 108:30-54. [PMID: 24821411 DOI: 10.1016/j.jprot.2014.04.029] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2013] [Revised: 03/07/2014] [Accepted: 04/15/2014] [Indexed: 01/14/2023]
Abstract
UNLABELLED This meta-analysis focuses on plant-proteome responses to cadmium (Cd) stress. Initially, some general topics related to a proteomics meta-analysis are discussed: (1) obstacles encountered during data analysis, (2) a consensus in proteomic research, (3) validation and good reporting practices for protein identification and (4) guidelines for statistical analysis of differentially abundant proteins. In a second part, the Cd responses in leaves and roots obtained from a proteomics meta-analysis are discussed in (1) a time comparison (short versus long term exposure), and (2) a culture comparison (hydroponics versus soil cultivation). Data of the meta-analysis confirmed the existence of an initial alarm phase upon Cd exposure. Whereas no metabolic equilibrium is established in hydroponically exposed plants, an equilibrium seems to be manifested in roots of plants grown in Cd-contaminated soil after long term exposure. In leaves, the carbohydrate metabolism is primarily affected independent of the exposure time and the cultivation method. In addition, a metabolic shift from CO2-fixation towards respiration is manifested, independent of the cultivation system. Finally, some ideas for the improvement of proteomics setups and for comparisons between studies are discussed. BIOLOGICAL SIGNIFICANCE This meta-analysis focuses on the plant responses to Cd stress in leaves and roots at the proteome level. This meta-analysis points out the encountered obstacles when performing a proteomics meta-analysis related to inherent technologies, but also related to experimental setups. Furthermore, the question is addressed whether an extrapolation of results obtained in hydroponic cultivation towards soil-grown plants is possible.
Collapse
Affiliation(s)
- Joke Dupae
- Environmental Biology, Hasselt University, Agoralaan - Gebouw D, 3590 Diepenbeek, Belgium.
| | - Sacha Bohler
- Environmental Biology, Hasselt University, Agoralaan - Gebouw D, 3590 Diepenbeek, Belgium.
| | - Jean-Paul Noben
- Biomedical Institute, Hasselt University, Agoralaan - Gebouw D, 3590 Diepenbeek, Belgium.
| | - Sebastien Carpentier
- Afdeling Plantenbiotechniek, Catholic University Leuven, Willem de Croylaan 42 - bus 2455, 3001 Leuven, Belgium.
| | - Jaco Vangronsveld
- Environmental Biology, Hasselt University, Agoralaan - Gebouw D, 3590 Diepenbeek, Belgium.
| | - Ann Cuypers
- Environmental Biology, Hasselt University, Agoralaan - Gebouw D, 3590 Diepenbeek, Belgium.
| |
Collapse
|
27
|
Pathak RR, Davé V. Integrating omics technologies to study pulmonary physiology and pathology at the systems level. Cell Physiol Biochem 2014; 33:1239-60. [PMID: 24802001 PMCID: PMC4396816 DOI: 10.1159/000358693] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/11/2014] [Indexed: 12/13/2022] Open
Abstract
Assimilation and integration of "omics" technologies, including genomics, epigenomics, proteomics, and metabolomics has readily altered the landscape of medical research in the last decade. The vast and complex nature of omics data can only be interpreted by linking molecular information at the organismic level, forming the foundation of systems biology. Research in pulmonary biology/medicine has necessitated integration of omics, network, systems and computational biology data to differentially diagnose, interpret, and prognosticate pulmonary diseases, facilitating improvement in therapy and treatment modalities. This review describes how to leverage this emerging technology in understanding pulmonary diseases at the systems level -called a "systomic" approach. Considering the operational wholeness of cellular and organ systems, diseased genome, proteome, and the metabolome needs to be conceptualized at the systems level to understand disease pathogenesis and progression. Currently available omics technology and resources require a certain degree of training and proficiency in addition to dedicated hardware and applications, making them relatively less user friendly for the pulmonary biologist and clinicians. Herein, we discuss the various strategies, computational tools and approaches required to study pulmonary diseases at the systems level for biomedical scientists and clinical researchers.
Collapse
Affiliation(s)
- Ravi Ramesh Pathak
- Morsani College of Medicine, Department of Pathology and Cell Biology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL USA
| | - Vrushank Davé
- Morsani College of Medicine, Department of Pathology and Cell Biology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL USA
- Department of Molecular Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL USA
| |
Collapse
|
28
|
Carapito C, Burel A, Guterl P, Walter A, Varrier F, Bertile F, Van Dorsselaer A. MSDA, a proteomics software suite for in-depth Mass Spectrometry Data Analysis using grid computing. Proteomics 2014; 14:1014-9. [DOI: 10.1002/pmic.201300415] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2013] [Revised: 01/15/2014] [Accepted: 01/15/2014] [Indexed: 12/20/2022]
Affiliation(s)
- Christine Carapito
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| | - Alexandre Burel
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| | - Patrick Guterl
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| | - Alexandre Walter
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| | - Fabrice Varrier
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| | - Fabrice Bertile
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| | - Alain Van Dorsselaer
- Laboratoire de Spectrométrie de Masse BioOrganique; IPHC; Université de Strasbourg; CNRS; UMR7178 Strasbourg France
| |
Collapse
|
29
|
Benevento M, Munoz J. Role of mass spectrometry-based proteomics in the study of cellular reprogramming and induced pluripotent stem cells. Expert Rev Proteomics 2014; 9:379-99. [DOI: 10.1586/epr.12.30] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
30
|
Winkler R. MASSyPup--an 'out of the box' solution for the analysis of mass spectrometry data. JOURNAL OF MASS SPECTROMETRY : JMS 2014; 49:37-42. [PMID: 24446261 DOI: 10.1002/jms.3314] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Revised: 11/05/2013] [Accepted: 11/18/2013] [Indexed: 05/18/2023]
Abstract
Mass spectrometry has evolved to a key technology in the areas of metabolomics and proteomics. Centralized facilities generate vast amount of data, which frequently need to be processed off-site. Therefore, the distribution of data and software, as well as the training of personnel in the analysis of mass spectrometry data, becomes increasingly important. Thus, we created a comprehensive collection of mass spectrometry software which can be run directly from different media such as DVD or USB without local installation. MASSyPup is based on a Linux Live distribution and was complemented with programs for conversion, visualization and analysis of mass spectrometry (MS) data. A special emphasis was put on protein analysis and proteomics, encompassing the measurement of complete proteins, the identification of proteins based on Peptide Mass Fingerprints (PMF) or LC-MS/MS data, and de novo sequencing. Another focus was directed to the study of metabolites and metabolomics, covering the detection, identification and quantification of compounds, as well as subsequent statistical analyses. Additionally, we added software for Mass Spectrometry Imaging (MSI), including hardware support for self-made MSI devices. MASSyPup represents a 'ready to work' system for teaching or MS data analysis, but also represents an ideal platform for the distribution of MS data and the development of related software. The current Live DVD version can be downloaded free of charge from http://www.bioprocess.org/massypup.
Collapse
Affiliation(s)
- Robert Winkler
- Department of Biotechnology and Biochemistry, CINVESTAV Unidad Irapuato, Irapuato, Mexico
| |
Collapse
|
31
|
Aagaard JE, George RD, Fishman L, MacCoss MJ, Swanson WJ. Selection on plant male function genes identifies candidates for reproductive isolation of yellow monkeyflowers. PLoS Genet 2013; 9:e1003965. [PMID: 24339787 PMCID: PMC3854799 DOI: 10.1371/journal.pgen.1003965] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Accepted: 10/04/2013] [Indexed: 11/18/2022] Open
Abstract
Understanding the genetic basis of reproductive isolation promises insight into speciation and the origins of biological diversity. While progress has been made in identifying genes underlying barriers to reproduction that function after fertilization (post-zygotic isolation), we know much less about earlier acting pre-zygotic barriers. Of particular interest are barriers involved in mating and fertilization that can evolve extremely rapidly under sexual selection, suggesting they may play a prominent role in the initial stages of reproductive isolation. A significant challenge to the field of speciation genetics is developing new approaches for identification of candidate genes underlying these barriers, particularly among non-traditional model systems. We employ powerful proteomic and genomic strategies to study the genetic basis of conspecific pollen precedence, an important component of pre-zygotic reproductive isolation among yellow monkeyflowers (Mimulus spp.) resulting from male pollen competition. We use isotopic labeling in combination with shotgun proteomics to identify more than 2,000 male function (pollen tube) proteins within maternal reproductive structures (styles) of M. guttatus flowers where pollen competition occurs. We then sequence array-captured pollen tube exomes from a large outcrossing population of M. guttatus, and identify those genes with evidence of selective sweeps or balancing selection consistent with their role in pollen competition. We also test for evidence of positive selection on these genes more broadly across yellow monkeyflowers, because a signal of adaptive divergence is a common feature of genes causing reproductive isolation. Together the molecular evolution studies identify 159 pollen tube proteins that are candidate genes for conspecific pollen precedence. Our work demonstrates how powerful proteomic and genomic tools can be readily adapted to non-traditional model systems, allowing for genome-wide screens towards the goal of identifying the molecular basis of genetically complex traits.
Collapse
Affiliation(s)
- Jan E. Aagaard
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Renee D. George
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Lila Fishman
- Division of Biological Sciences, University of Montana, Missoula, Montana, United States of America
| | - Michael J. MacCoss
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Willie J. Swanson
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
32
|
Gu Q, Yu LR. Proteomics quality and standard: from a regulatory perspective. J Proteomics 2013; 96:353-9. [PMID: 24316359 DOI: 10.1016/j.jprot.2013.11.024] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2012] [Revised: 11/07/2013] [Accepted: 11/22/2013] [Indexed: 12/30/2022]
Abstract
Proteomics has emerged as a rapidly expanding field dealing with large-scale protein analyses. It is anticipated that proteomics data will be increasingly submitted to the U.S. Food and Drug Administration (FDA) for biomarker qualification or in conjunction with applications for the approval of drugs, medical devices, and other FDA-regulated consumer products. To date, however, no established guideline has been available regarding the generation, submission and assessment of the quality of proteomics data that will be reviewed by regulatory agencies for decision making. Therefore, this commentary is aimed at provoking some thoughts and debates towards developing a framework which can guide future proteomics data submission. The ultimate goal is to establish quality control standards for proteomics data generation and evaluation, and to prepare government agencies such as the FDA to meet future obligations utilizing proteomics data to support regulatory decision.
Collapse
Affiliation(s)
- Qiang Gu
- Division of Systems Biology, National Center for Toxicological Research, Food and Drug Administration, USA
| | - Li-Rong Yu
- Division of Systems Biology, National Center for Toxicological Research, Food and Drug Administration, USA.
| |
Collapse
|
33
|
Abstract
BACKGROUND The identification of proteins based on analysis of tandem mass spectrometry (MS/MS) data is a valuable tool that is not fully realized because of the difficulty in carrying out automated analysis of large numbers of spectra. MS/MS spectra consist of peaks that represent each peptide fragment, usually b and y ions, with experimentally determined mass to charge ratios. Whether the strategy employed is database matching or De Novo sequencing, a major obstacle is distinguishing signal from noise. Improved ability to distinguish signal peaks of low intensity from background noise increases the likelihood of correctly identifying the peptide, as valuable information is preserved while extraneous information is not left to mislead. RESULTS This paper introduces an automated noise filtering method based on the construction of orthogonal polynomials. By subdividing the spectrum into a variable number (3 to 11) of bins, peaks that are considered "noise" are identified at a local level. Using a De Novo sequencing algorithm that we are developing, this filtering method was applied to a published dataset of more than 3000 mass spectra and an original dataset of more than 300 spectra. The samples were peptides from purified known proteins; therefore, the solutions could be compared to the correct sequences and the peaks corresponding to b, y and other fragments of significance could be identified. The same procedure was applied using two other published filtering methods. The ratios of the number of significant peaks that were preserved relative to the total number of peaks in each spectrum were determined. In the event that filtering out too many or too few signal peaks can lead to inaccuracy in sequence determination, the percentage of amino acid residues in the correct positions relative to the total number of amino acid residues in the correct sequence was also calculated for each sequence determined. CONCLUSIONS The results show that an orthogonal polynomial-based method of distinguishing signal peaks from background in mass spectra preserves a greater portion of signal peaks than compared methods, improving accuracy in sequence determination.
Collapse
Affiliation(s)
- Jason Gallia
- SUNY Binghamton Computer Science Department, Binghamton, NY, USA
| | - Katelyn Lavrich
- SUNY Binghamton Biological Sciences Department, Binghamton, NY, USA
| | - Anna Tan-Wilson
- SUNY Binghamton Biological Sciences Department, Binghamton, NY, USA
| | - Patrick H Madden
- SUNY Binghamton Computer Science Department, Binghamton, NY, USA
| |
Collapse
|
34
|
Chalkley RJ, Bandeira N, Chambers MC, Clauser KR, Cottrell JS, Deutsch EW, Kapp EA, Lam HHN, McDonald WH, Neubert TA, Sun RX. Proteome informatics research group (iPRG)_2012: a study on detecting modified peptides in a complex mixture. Mol Cell Proteomics 2013; 13:360-71. [PMID: 24187338 DOI: 10.1074/mcp.m113.032813] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
The proteome informatics research group of the Association of Biomolecular Resource Facilities conducted a study to assess the community's ability to detect and characterize peptides bearing a range of biologically occurring post-translational modifications when present in a complex peptide background. A data set derived from a mixture of synthetic peptides with biologically occurring modifications combined with a yeast whole cell lysate as background was distributed to a large group of researchers and their results were collectively analyzed. The results from the twenty-four participants, who represented a broad spectrum of experience levels with this type of data analysis, produced several important observations. First, there is significantly more variability in the ability to assess whether a results is significant than there is to determine the correct answer. Second, labile post-translational modifications, particularly tyrosine sulfation, present a challenge for most researchers. Finally, for modification site localization there are many tools being employed, but researchers are currently unsure of the reliability of the results these programs are producing.
Collapse
|
35
|
Omasits U, Quebatte M, Stekhoven DJ, Fortes C, Roschitzki B, Robinson MD, Dehio C, Ahrens CH. Directed shotgun proteomics guided by saturated RNA-seq identifies a complete expressed prokaryotic proteome. Genome Res 2013; 23:1916-27. [PMID: 23878158 PMCID: PMC3814891 DOI: 10.1101/gr.151035.112] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Prokaryotes, due to their moderate complexity, are particularly amenable to the comprehensive identification of the protein repertoire expressed under different conditions. We applied a generic strategy to identify a complete expressed prokaryotic proteome, which is based on the analysis of RNA and proteins extracted from matched samples. Saturated transcriptome profiling by RNA-seq provided an endpoint estimate of the protein-coding genes expressed under two conditions which mimic the interaction of Bartonella henselae with its mammalian host. Directed shotgun proteomics experiments were carried out on four subcellular fractions. By specifically targeting proteins which are short, basic, low abundant, and membrane localized, we could eliminate their initial underrepresentation compared to the estimated endpoint. A total of 1250 proteins were identified with an estimated false discovery rate below 1%. This represents 85% of all distinct annotated proteins and ∼90% of the expressed protein-coding genes. Genes that were detected at the transcript but not protein level, were found to be highly enriched in several genomic islands. Furthermore, genes that lacked an ortholog and a functional annotation were not detected at the protein level; these may represent examples of overprediction in genome annotations. A dramatic membrane proteome reorganization was observed, including differential regulation of autotransporters, adhesins, and hemin binding proteins. Particularly noteworthy was the complete membrane proteome coverage, which included expression of all members of the VirB/D4 type IV secretion system, a key virulence factor.
Collapse
Affiliation(s)
- Ulrich Omasits
- Quantitative Model Organism Proteomics, Institute of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland
| | | | | | | | | | | | | | | |
Collapse
|
36
|
Dong NP, Liang YZ, Yi LZ, Lu HM. Investigation of scrambled ions in tandem mass spectra, part 2. On the influence of the ions on peptide identification. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2013; 24:857-867. [PMID: 23504644 DOI: 10.1007/s13361-013-0591-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Revised: 01/19/2013] [Accepted: 01/20/2013] [Indexed: 06/01/2023]
Abstract
A comprehensive investigation was performed to understand the influence of sequence scrambling in peptide ions on peptide identification results. To achieve this, four tandem mass spectrometry datasets with scrambled ions included and with them excluded were analyzed by Crux, X!Tandem, SpectraST, Lutefisk, and PepNovo. While the different algorithms differed in their performance, an increase in the number of correctly identified peptides was generally observed when removing scrambled ions, with the exception of the SpectraST algorithm. However, the variation of the match scores upon removal was unpredictable. Following these investigations, an interpretation was given on how the scrambled ions affect peptide identification. Lastly, a simulated theoretical mass spectral library derived from the NIST peptide Libraries was constructed and searched by SpectraST to study whether scrambled ions in predicted mass spectra could affect peptide identification. Consistent with the peptide library search results, no significant variations for dot product scores as well as peptide identification results were observed when these ions were included in the theoretical MS/MS spectra. From the five adopted algorithms, the SpectraST and Crux provided the most robust results, whereas X!Tandem, PepNovo, and Lutefisk were sensitive to the existence of the scrambled ions, especially the latter two de novo sequencing algorithms.
Collapse
Affiliation(s)
- Nai-ping Dong
- College of Chemistry and Chemical Engineering, Central South University, Changsha, People's Republic of China
| | | | | | | |
Collapse
|
37
|
Kalli A, Smith GT, Sweredoski MJ, Hess S. Evaluation and optimization of mass spectrometric settings during data-dependent acquisition mode: focus on LTQ-Orbitrap mass analyzers. J Proteome Res 2013; 12:3071-86. [PMID: 23642296 DOI: 10.1021/pr3011588] [Citation(s) in RCA: 123] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Mass-spectrometry-based proteomics has evolved as the preferred method for the analysis of complex proteomes. Undoubtedly, recent advances in mass spectrometry instrumentation have greatly enhanced proteomic analysis. A popular instrument platform in proteomics research is the LTQ-Orbitrap mass analyzer. In this tutorial, we discuss the significance of evaluating and optimizing mass spectrometric settings on the LTQ-Orbitrap during CID data-dependent acquisition (DDA) mode to improve protein and peptide identification rates. We focus on those MS and MS/MS parameters that have been systematically examined and evaluated by several researchers and are commonly used during DDA. More specifically, we discuss the effect of mass resolving power, preview mode for FTMS scan, monoisotopic precursor selection, signal threshold for triggering MS/MS events, number of microscans per MS/MS scan, number of MS/MS events, automatic gain control target value (ion population) for MS and MS/MS, maximum ion injection time for MS/MS, rapid and normal scan rate, and prediction of ion injection time. We furthermore present data from the latest generation LTQ-Orbitrap system, the Orbitrap Elite, along with recommended MS and MS/MS parameters. The Orbitrap Elite outperforms the Orbitrap Classic in terms of scan speed, sensitivity, dynamic range, and resolving power and results in higher identification rates. Several of the optimized MS parameters determined on the LTQ-Orbitrap Classic and XL were easily transferable to the Orbitrap Elite, whereas others needed to be reevaluated. Finally, the Q Exactive and HCD are briefly discussed, as well as sample preparation, LC-optimization, and bioinformatics analysis. We hope this tutorial will serve as guidance for researchers new to the field of proteomics and assist in achieving optimal results.
Collapse
Affiliation(s)
- Anastasia Kalli
- Proteome Exploration Laboratory, Division of Biology, Beckman Institute, California Institute of Technology, Pasadena, California 91125, USA
| | | | | | | |
Collapse
|
38
|
Madsen JA, Xu H, Robinson MR, Horton AP, Shaw JB, Giles DK, Kaoud TS, Dalby KN, Trent MS, Brodbelt JS. High-throughput database search and large-scale negative polarity liquid chromatography-tandem mass spectrometry with ultraviolet photodissociation for complex proteomic samples. Mol Cell Proteomics 2013; 12:2604-14. [PMID: 23695934 DOI: 10.1074/mcp.o113.028258] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The use of ultraviolet photodissociation (UVPD) for the activation and dissociation of peptide anions is evaluated for broader coverage of the proteome. To facilitate interpretation and assignment of the resulting UVPD mass spectra of peptide anions, the MassMatrix database search algorithm was modified to allow automated analysis of negative polarity MS/MS spectra. The new UVPD algorithms were developed based on the MassMatrix database search engine by adding specific fragmentation pathways for UVPD. The new UVPD fragmentation pathways in MassMatrix were rigorously and statistically optimized using two large data sets with high mass accuracy and high mass resolution for both MS(1) and MS(2) data acquired on an Orbitrap mass spectrometer for complex Halobacterium and HeLa proteome samples. Negative mode UVPD led to the identification of 3663 and 2350 peptides for the Halo and HeLa tryptic digests, respectively, corresponding to 655 and 645 peptides that were unique when compared with electron transfer dissociation (ETD), higher energy collision-induced dissociation, and collision-induced dissociation results for the same digests analyzed in the positive mode. In sum, 805 and 619 proteins were identified via UVPD for the Halobacterium and HeLa samples, respectively, with 49 and 50 unique proteins identified in contrast to the more conventional MS/MS methods. The algorithm also features automated charge determination for low mass accuracy data, precursor filtering (including intact charge-reduced peaks), and the ability to combine both positive and negative MS/MS spectra into a single search, and it is freely open to the public. The accuracy and specificity of the MassMatrix UVPD search algorithm was also assessed for low resolution, low mass accuracy data on a linear ion trap. Analysis of a known mixture of three mitogen-activated kinases yielded similar sequence coverage percentages for UVPD of peptide anions versus conventional collision-induced dissociation of peptide cations, and when these methods were combined into a single search, an increase of up to 13% sequence coverage was observed for the kinases. The ability to sequence peptide anions and cations in alternating scans in the same chromatographic run was also demonstrated. Because ETD has a significant bias toward identifying highly basic peptides, negative UVPD was used to improve the identification of the more acidic peptides in conjunction with positive ETD for the more basic species. In this case, tryptic peptides from the cytosolic section of HeLa cells were analyzed by polarity switching nanoLC-MS/MS utilizing ETD for cation sequencing and UVPD for anion sequencing. Relative to searching using ETD alone, positive/negative polarity switching significantly improved sequence coverages across identified proteins, resulting in a 33% increase in unique peptide identifications and more than twice the number of peptide spectral matches.
Collapse
Affiliation(s)
- James A Madsen
- Department of Chemistry and Biochemistry, The University of Texas at Austin, 1 University Station A5300, Austin, Texas 78712, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Wang P, Wilson SR. Mass spectrometry-based protein identification by integrating de novo sequencing with database searching. BMC Bioinformatics 2013; 14 Suppl 2:S24. [PMID: 23369017 PMCID: PMC3549845 DOI: 10.1186/1471-2105-14-s2-s24] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Mass spectrometry-based protein identification is a very challenging task. The main identification approaches include de novo sequencing and database searching. Both approaches have shortcomings, so an integrative approach has been developed. The integrative approach firstly infers partial peptide sequences, known as tags, directly from tandem spectra through de novo sequencing, and then puts these sequences into a database search to see if a close peptide match can be found. However the current implementation of this integrative approach has several limitations. Firstly, simplistic de novo sequencing is applied and only very short sequence tags are used. Secondly, most integrative methods apply an algorithm similar to BLAST to search for exact sequence matches and do not accommodate sequence errors well. Thirdly, by applying these methods the integrated de novo sequencing makes a limited contribution to the scoring model which is still largely based on database searching. RESULTS We have developed a new integrative protein identification method which can integrate de novo sequencing more efficiently into database searching. Evaluated on large real datasets, our method outperforms popular identification methods.
Collapse
Affiliation(s)
- Penghao Wang
- Prince of Wales Clinical School, University of New South Wales, Australia.
| | | |
Collapse
|
40
|
Hoopmann MR, Moritz RL. Current algorithmic solutions for peptide-based proteomics data generation and identification. Curr Opin Biotechnol 2012; 24:31-8. [PMID: 23142544 DOI: 10.1016/j.copbio.2012.10.013] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2012] [Revised: 10/08/2012] [Accepted: 10/18/2012] [Indexed: 12/28/2022]
Abstract
Peptide-based proteomic data sets are ever increasing in size and complexity. These data sets provide computational challenges when attempting to quickly analyze spectra and obtain correct protein identifications. Database search and de novo algorithms must consider high-resolution MS/MS spectra and alternative fragmentation methods. Protein inference is a tricky problem when analyzing large data sets of degenerate peptide identifications. Combining multiple algorithms for improved peptide identification puts significant strain on computational systems when investigating large data sets. This review highlights some of the recent developments in peptide and protein identification algorithms for analyzing shotgun mass spectrometry data when encountering the aforementioned hurdles. Also explored are the roles that analytical pipelines, public spectral libraries, and cloud computing play in the evolution of peptide-based proteomics.
Collapse
|
41
|
Akhtar MN, Southey BR, Andrén PE, Sweedler JV, Rodriguez-Zas SL. Evaluation of database search programs for accurate detection of neuropeptides in tandem mass spectrometry experiments. J Proteome Res 2012; 11:6044-55. [PMID: 23082934 PMCID: PMC3516866 DOI: 10.1021/pr3007123] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
![]()
Neuropeptide identification in mass spectrometry experiments
using
database search programs developed for proteins is challenging. Unlike
proteins, the detection of the complete sequence using a single spectrum
is required to identify neuropeptides or prohormone peptides. This
study compared the performance of three open-source programs used
to identify proteins, OMSSA, X!Tandem and Crux, to identify prohormone
peptides. From a target database of 7850 prohormone peptides, 23550
query spectra were simulated across different scenarios. Crux was
the only program that correctly matched all peptides regardless of p-value and at p-value < 1 × 10–2, 33%, 64%, and >75%, of the 5, 6, and ≥7
amino
acid-peptides were detected. Crux also had the best performance in
the identification of peptides from chimera spectra and in a variety
of missing ion scenarios. OMSSA, X!Tandem and Crux correctly detected
98.9% (99.9%), 93.9% (97.4%) and 88.7% (98.3%) of the peptides at E- or p-value < 1 × 10–6 (< 1 × 10–2), respectively. OMSSA and
X!Tandem outperformed the other programs in significance level and
computational speed, respectively. A consensus approach is not recommended
because some prohormone peptides were only identified by one program.
Collapse
Affiliation(s)
- Malik N Akhtar
- Department of Animal Sciences, University of Illinois Urbana-Champaign, Illinois 61801, United States
| | | | | | | | | |
Collapse
|
42
|
Abstract
Shotgun proteomics has recently emerged as a powerful approach to characterizing proteomes in biological samples. Its overall objective is to identify the form and quantity of each protein in a high-throughput manner by coupling liquid chromatography with tandem mass spectrometry. As a consequence of its high throughput nature, shotgun proteomics faces challenges with respect to the analysis and interpretation of experimental data. Among such challenges, the identification of proteins present in a sample has been recognized as an important computational task. This task generally consists of (1) assigning experimental tandem mass spectra to peptides derived from a protein database, and (2) mapping assigned peptides to proteins and quantifying the confidence of identified proteins. Protein identification is fundamentally a statistical inference problem with a number of methods proposed to address its challenges. In this review we categorize current approaches into rule-based, combinatorial optimization and probabilistic inference techniques, and present them using integer programming and Bayesian inference frameworks. We also discuss the main challenges of protein identification and propose potential solutions with the goal of spurring innovative research in this area.
Collapse
Affiliation(s)
- Yong Fuga Li
- School of Informatics and Computing, Indiana University, Bloomington 150 S, Woodlawn Avenue, Bloomington, Indiana 47405, USA
| | | |
Collapse
|
43
|
Cerqueira FR, Ferreira RS, Oliveira AP, Gomes AP, Ramos HJO, Graber A, Baumgartner C. MUMAL: multivariate analysis in shotgun proteomics using machine learning techniques. BMC Genomics 2012; 13 Suppl 5:S4. [PMID: 23095859 PMCID: PMC3477001 DOI: 10.1186/1471-2164-13-s5-s4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background The shotgun strategy (liquid chromatography coupled with tandem mass spectrometry) is widely applied for identification of proteins in complex mixtures. This method gives rise to thousands of spectra in a single run, which are interpreted by computational tools. Such tools normally use a protein database from which peptide sequences are extracted for matching with experimentally derived mass spectral data. After the database search, the correctness of obtained peptide-spectrum matches (PSMs) needs to be evaluated also by algorithms, as a manual curation of these huge datasets would be impractical. The target-decoy database strategy is largely used to perform spectrum evaluation. Nonetheless, this method has been applied without considering sensitivity, i.e., only error estimation is taken into account. A recently proposed method termed MUDE treats the target-decoy analysis as an optimization problem, where sensitivity is maximized. This method demonstrates a significant increase in the retrieved number of PSMs for a fixed error rate. However, the MUDE model is constructed in such a way that linear decision boundaries are established to separate correct from incorrect PSMs. Besides, the described heuristic for solving the optimization problem has to be executed many times to achieve a significant augmentation in sensitivity. Results Here, we propose a new method, termed MUMAL, for PSM assessment that is based on machine learning techniques. Our method can establish nonlinear decision boundaries, leading to a higher chance to retrieve more true positives. Furthermore, we need few iterations to achieve high sensitivities, strikingly shortening the running time of the whole process. Experiments show that our method achieves a considerably higher number of PSMs compared with standard tools such as MUDE, PeptideProphet, and typical target-decoy approaches. Conclusion Our approach not only enhances the computational performance, and thus the turn around time of MS-based experiments in proteomics, but also improves the information content with benefits of a higher proteome coverage. This improvement, for instance, increases the chance to identify important drug targets or biomarkers for drug development or molecular diagnostics.
Collapse
Affiliation(s)
- Fabio R Cerqueira
- Department of Informatics, Federal University of Viçosa, 36570-000 Minas Geras, Brazil.
| | | | | | | | | | | | | |
Collapse
|
44
|
Armean IM, Lilley KS, Trotter MWB. Popular computational methods to assess multiprotein complexes derived from label-free affinity purification and mass spectrometry (AP-MS) experiments. Mol Cell Proteomics 2012; 12:1-13. [PMID: 23071097 DOI: 10.1074/mcp.r112.019554] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Advances in sensitivity, resolution, mass accuracy, and throughput have considerably increased the number of protein identifications made via mass spectrometry. Despite these advances, state-of-the-art experimental methods for the study of protein-protein interactions yield more candidate interactions than may be expected biologically owing to biases and limitations in the experimental methodology. In silico methods, which distinguish between true and false interactions, have been developed and applied successfully to reduce the number of false positive results yielded by physical interaction assays. Such methods may be grouped according to: (1) the type of data used: methods based on experiment-specific measurements (e.g., spectral counts or identification scores) versus methods that extract knowledge encoded in external annotations (e.g., public interaction and functional categorisation databases); (2) the type of algorithm applied: the statistical description and estimation of physical protein properties versus predictive supervised machine learning or text-mining algorithms; (3) the type of protein relation evaluated: direct (binary) interaction of two proteins in a cocomplex versus probability of any functional relationship between two proteins (e.g., co-occurrence in a pathway, sub cellular compartment); and (4) initial motivation: elucidation of experimental data by evaluation versus prediction of novel protein-protein interaction, to be experimentally validated a posteriori. This work reviews several popular computational scoring methods and software platforms for protein-protein interactions evaluation according to their methodology, comparative strengths and weaknesses, data representation, accessibility, and availability. The scoring methods and platforms described include: CompPASS, SAINT, Decontaminator, MINT, IntAct, STRING, and FunCoup. References to related work are provided throughout in order to provide a concise but thorough introduction to a rapidly growing interdisciplinary field of investigation.
Collapse
Affiliation(s)
- Irina M Armean
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1GA, UK
| | | | | |
Collapse
|
45
|
Han NY, Kim EH, Choi J, Lee H, Hahm KB. Quantitative proteomic approaches in biomarker discovery of inflammatory bowel disease. J Dig Dis 2012; 13:497-503. [PMID: 22988922 DOI: 10.1111/j.1751-2980.2012.00625.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Proteomics offers considerable opportunities for either enhancing our biological understanding or discovering biomarkers, blood and biopsied specimen-based proteomic approaches, provide reproducible and quantitative tools that can complement clinical assessments and aid clinicians in the diagnosis and treatment of inflammatory bowel disease (IBD). Sometimes a differential diagnosis of Crohn's disease (CD) and ulcerative colitis (UC) and the prediction of treatment response can be deduced by finding meaningful biomarkers, for which the central platform for proteomics is tandem mass spectrometry (MS/MS). A range of workflows are available for protein (or peptide) separation prior to MS/MS as well as bioinformatics analysis to achieve protein identification, for which two-dimensional electrophoresis (2-DE) and subsequent mass spectrometry (MS), liquid chromatography-MS, difference gel electrophoresis following 2-DE, isobaric tags for relative and absolute quantification (iTRAQ), stable isotope labeling by amino acids and label-free quantification are under development. In this article, the current status and perspective of these advanced proteomic technologies are introduced, with examples of recent biomarkers focused on the diagnosis, treatment response, prognosis of IBD, and even colitis-associated carcinogenesis in both animal models and human patients.
Collapse
Affiliation(s)
- Na-Young Han
- Lee Gil Ya Cancer and Diabetes Institute, Gachon University, Incheon, Korea
| | | | | | | | | |
Collapse
|
46
|
Yang P, Ma J, Wang P, Zhu Y, Zhou BB, Yang YHJ. Improving X!Tandem on peptide identification from mass spectrometry by self-boosted Percolator. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1273-1280. [PMID: 22689082 DOI: 10.1109/tcbb.2012.86] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
A critical component in mass spectrometry (MS)-based proteomics is an accurate protein identification procedure. Database search algorithms commonly generate a list of peptide-spectrum matches (PSMs). The validity of these PSMs is critical for downstream analysis since proteins that are present in the sample are inferred from those PSMs. A variety of postprocessing algorithms have been proposed to validate and filter PSMs. Among them, the most popular ones include a semi-supervised learning (SSL) approach known as Percolator and an empirical modeling approach known as PeptideProphet. However, they are predominantly designed for commercial database search algorithms, i.e., SEQUEST and MASCOT. Therefore, it is highly desirable to extend and optimize those PSM postprocessing algorithms for open source database search algorithms such as X!Tandem. In this paper, we propose a Self-boosted Percolator for postprocessing X!Tandem search results. We find that the SSL algorithm utilized by Percolator depends heavily on the initial ranking of PSMs. Starting with a poor PSM ranking list may cause Percolator to perform suboptimally. By implementing Percolator in a cascade learning manner, we can progressively improve the performance through multiple boost runs, enabling many more PSM identifications without sacrificing false discovery rate (FDR).
Collapse
Affiliation(s)
- Pengyi Yang
- School of Information Technologies, University of Sydney, NSW 2006, Australia.
| | | | | | | | | | | |
Collapse
|
47
|
Fang X, Wang C, Balgley BM, Zhao K, Wang W, He F, Weil RJ, Lee CS. Targeted tissue proteomic analysis of human astrocytomas. J Proteome Res 2012; 11:3937-46. [PMID: 22794670 DOI: 10.1021/pr300303t] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Complicating proteomic analysis of whole tissues is the obvious problem of cell heterogeneity in tissues, which often results in misleading or confusing molecular findings. Thus, the coupling of tissue microdissection for tumor cell enrichment with capillary isotachophoresis-based selective analyte concentration not only serves as a synergistic strategy to characterize low abundance proteins, but it can also be employed to conduct comparative proteomic studies of human astrocytomas. A set of fresh frozen brain biopsies were selectively microdissected to provide an enriched, high quality, and reproducible sample of tumor cells. Despite sharing many common proteins, there are significant differences in the protein expression level among different grades of astrocytomas. A large number of proteins, such as plasma membrane proteins EGFR and Erbb2, are up-regulated in glioblastoma. Besides facilitating the prioritization of follow-on biomarker selection and validation, comparative proteomics involving measurements in changes of pathways are expected to reveal the molecular relationships among different pathological grades of gliomas and potential molecular mechanisms that drive gliomagenesis.
Collapse
Affiliation(s)
- Xueping Fang
- Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742, United States
| | | | | | | | | | | | | | | |
Collapse
|
48
|
Degroeve S, Staes A, De Bock PJ, Martens L. The effect of peptide identification search algorithms on MS2-based label-free protein quantification. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2012; 16:443-8. [PMID: 22804230 DOI: 10.1089/omi.2011.0137] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Several approaches exist for the quantification of proteins in complex samples processed by liquid chromatography-mass spectrometry followed by fragmentation analysis (MS2). One of these approaches is label-free MS2-based quantification, which takes advantage of the information computed from MS2 spectrum observations to estimate the abundance of a protein in a sample. As a first step in this approach, fragmentation spectra are typically matched to the peptides that generated them by a search algorithm. Because different search algorithms identify overlapping but non-identical sets of peptides, here we investigate whether these differences in peptide identification have an impact on the quantification of the proteins in the sample. We therefore evaluated the effect of using different search algorithms by examining the reproducibility of protein quantification in technical repeat measurements of the same sample. From our results, it is clear that a search engine effect does exist for MS2-based label-free protein quantification methods. As a general conclusion, it is recommended to address the overall possibility of search engine-induced bias in the protein quantification results of label-free MS2-based methods by performing the analysis with two or more distinct search engines.
Collapse
Affiliation(s)
- Sven Degroeve
- Department of Medical Protein Research, VIB, and Ghent University, Faculty of Medicine and Health Sciences, Ghent, Belgium
| | | | | | | |
Collapse
|
49
|
Mancuso F, Bunkenborg J, Wierer M, Molina H. Data extraction from proteomics raw data: an evaluation of nine tandem MS tools using a large Orbitrap data set. J Proteomics 2012; 75:5293-303. [PMID: 22728601 DOI: 10.1016/j.jprot.2012.06.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Revised: 06/07/2012] [Accepted: 06/12/2012] [Indexed: 10/28/2022]
Abstract
In shot-gun proteomics raw tandem MS data are processed with extraction tools to produce condensed peak lists that can be uploaded to database search engines. Many extraction tools are available but to our knowledge, a systematic comparison of such tools has not yet been carried out. Using raw data containing more than 400,000 tandem MS spectra acquired using an Orbitrap Velos we compared 9 tandem MS extraction tools, freely available as well as commercial. We compared the tools with respect to number of extracted MS/MS events, fragment ion information, number of matches, precursor mass accuracies and agreement in-between tools. Processing a primary data set with 9 different tandem MS extraction tools resulted in a low overlap of identified peptides. The tools differ by assigned charge states of precursors, precursor and fragment ion masses, and we show that peptides identified very confidently using one extraction tool might not be matched when using another tool. We also found a bias towards peptides of lower charge state when extracting fragment ion data from higher resolution raw data without deconvolution. Collecting and comparing the extracted data from the same raw data allow adjusting parameters and expectations and selecting the right tool for extraction of tandem MS data.
Collapse
Affiliation(s)
- Francesco Mancuso
- Centro de Regulación Genòmica (CRG), C/Dr. Aiguader 88, 08003 Barcelona, Spain
| | | | | | | |
Collapse
|
50
|
Hawse WF, Champion MM, Joyce MV, Hellman LM, Hossain M, Ryan V, Pierce BG, Weng Z, Baker BM. Cutting edge: Evidence for a dynamically driven T cell signaling mechanism. THE JOURNAL OF IMMUNOLOGY 2012; 188:5819-23. [PMID: 22611242 DOI: 10.4049/jimmunol.1200952] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
T cells use the αβ TCR to bind peptides presented by MHC proteins (pMHC) on APCs. Formation of a TCR-pMHC complex initiates T cell signaling via a poorly understood process, potentially involving changes in oligomeric state, altered interactions with CD3 subunits, and mechanical stress. These mechanisms could be facilitated by binding-induced changes in the TCR, but the nature and extent of any such alterations are unclear. Using hydrogen/deuterium exchange, we demonstrate that ligation globally rigidifies the TCR, which via entropic and packing effects will promote associations with neighboring proteins and enhance the stability of existing complexes. TCR regions implicated in lateral associations and signaling are particularly affected. Computational modeling demonstrated a high degree of dynamic coupling between the TCR constant and variable domains that is dampened upon ligation. These results raise the possibility that TCR triggering could involve a dynamically driven, allosteric mechanism.
Collapse
Affiliation(s)
- William F Hawse
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN 46556, USA
| | | | | | | | | | | | | | | | | |
Collapse
|