1
|
Bhushan V, Nita-Lazar A. Recent Advancements in Subcellular Proteomics: Growing Impact of Organellar Protein Niches on the Understanding of Cell Biology. J Proteome Res 2024. [PMID: 38451675 DOI: 10.1021/acs.jproteome.3c00839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
The mammalian cell is a complex entity, with membrane-bound and membrane-less organelles playing vital roles in regulating cellular homeostasis. Organellar protein niches drive discrete biological processes and cell functions, thus maintaining cell equilibrium. Cellular processes such as signaling, growth, proliferation, motility, and programmed cell death require dynamic protein movements between cell compartments. Aberrant protein localization is associated with a wide range of diseases. Therefore, analyzing the subcellular proteome of the cell can provide a comprehensive overview of cellular biology. With recent advancements in mass spectrometry, imaging technology, computational tools, and deep machine learning algorithms, studies pertaining to subcellular protein localization and their dynamic distributions are gaining momentum. These studies reveal changing interaction networks because of "moonlighting proteins" and serve as a discovery tool for disease network mechanisms. Consequently, this review aims to provide a comprehensive repository for recent advancements in subcellular proteomics subcontexting methods, challenges, and future perspectives for method developers. In summary, subcellular proteomics is crucial to the understanding of the fundamental cellular mechanisms and the associated diseases.
Collapse
Affiliation(s)
- Vanya Bhushan
- Functional Cellular Networks Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Aleksandra Nita-Lazar
- Functional Cellular Networks Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| |
Collapse
|
2
|
Hasan MK, Scott NE, Hays MP, Hardwidge PR, El Qaidi S. Salmonella T3SS effector SseK1 arginine-glycosylates the two-component response regulator OmpR to alter bile salt resistance. Sci Rep 2023; 13:9018. [PMID: 37270573 DOI: 10.1038/s41598-023-36057-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 05/28/2023] [Indexed: 06/05/2023] Open
Abstract
Type III secretion system (T3SS) effector proteins are primarily recognized for binding host proteins to subvert host immune response during infection. Besides their known host target proteins, several T3SS effectors also interact with endogenous bacterial proteins. Here we demonstrate that the Salmonella T3SS effector glycosyltransferase SseK1 glycosylates the bacterial two-component response regulator OmpR on two arginine residues, R15 and R122. Arg-glycosylation of OmpR results in reduced expression of ompF, a major outer membrane porin gene. Glycosylated OmpR has reduced affinity to the ompF promoter region, as compared to the unglycosylated form of OmpR. Additionally, the Salmonella ΔsseK1 mutant strain had higher bile salt resistance and increased capacity to form biofilms, as compared to WT Salmonella, thus linking OmpR glycosylation to several important aspects of bacterial physiology.
Collapse
Affiliation(s)
- Md Kamrul Hasan
- College of Veterinary Medicine, Kansas State University, Manhattan, KS, 66506, USA
| | - Nichollas E Scott
- Department of Microbiology and Immunology, University of Melbourne Within the Peter Doherty Institute for Infection and Immunity, Melbourne, 3000, Australia
| | - Michael P Hays
- College of Veterinary Medicine, Kansas State University, Manhattan, KS, 66506, USA
| | | | - Samir El Qaidi
- College of Veterinary Medicine, Kansas State University, Manhattan, KS, 66506, USA.
| |
Collapse
|
3
|
El Qaidi S, Scott NE, Hays MP, Hardwidge PR. Arginine glycosylation regulates UDP-GlcNAc biosynthesis in Salmonella enterica. Sci Rep 2022; 12:5293. [PMID: 35351940 PMCID: PMC8964723 DOI: 10.1038/s41598-022-09276-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 03/21/2022] [Indexed: 12/27/2022] Open
Abstract
The Salmonella enterica SseK1 protein is a type three secretion system effector that glycosylates host proteins during infection on specific arginine residues with N-acetyl glucosamine (GlcNAc). SseK1 also Arg-glycosylates endogenous bacterial proteins and we thus hypothesized that SseK1 activities might be integrated with regulating the intrabacterial abundance of UPD-GlcNAc, the sugar-nucleotide donor used by this effector. After searching for new SseK1 substrates, we found that SseK1 glycosylates arginine residues in the dual repressor-activator protein NagC, leading to increased DNA-binding affinity and enhanced expression of the NagC-regulated genes glmU and glmS. SseK1 also glycosylates arginine residues in GlmR, a protein that enhances GlmS activity. This Arg-glycosylation improves the ability of GlmR to enhance GlmS activity. We also discovered that NagC is a direct activator of glmR expression. Salmonella lacking SseK1 produce significantly reduced amounts of UDP-GlcNAc as compared with Salmonella expressing SseK1. Overall, we conclude that SseK1 up-regulates UDP-GlcNAc synthesis both by enhancing the DNA-binding activity of NagC and by increasing GlmS activity through GlmR glycosylation. Such regulatory activities may have evolved to maintain sufficient levels of UDP-GlcNAc for both bacterial cell wall precursors and for SseK1 to modify other bacterial and host targets in response to environmental changes and during infection.
Collapse
Affiliation(s)
- Samir El Qaidi
- College of Veterinary Medicine, Kansas State University, Manhattan, KS, 66506, USA
| | - Nichollas E Scott
- Department of Microbiology and Immunology, University of Melbourne Within the Peter Doherty Institute for Infection and Immunity, Melbourne, 3000, Australia
| | - Michael P Hays
- College of Veterinary Medicine, Kansas State University, Manhattan, KS, 66506, USA
| | - Philip R Hardwidge
- College of Veterinary Medicine, Kansas State University, Manhattan, KS, 66506, USA.
| |
Collapse
|
4
|
Abstract
Recently, there has been growing interest in genome sequencing, driven by advances in sequencing technology, in terms of both efficiency and affordability. These developments have allowed many to envision whole-genome sequencing as an invaluable tool for both personalized medical care and public health. As a result, increasingly large and ubiquitous genomic data sets are being generated. This poses a significant challenge for the storage and transmission of these data. Already, it is more expensive to store genomic data for a decade than it is to obtain the data in the first place. This situation calls for efficient representations of genomic information. In this review, we emphasize the need for designing specialized compressors tailored to genomic data and describe the main solutions already proposed. We also give general guidelines for storing these data and conclude with our thoughts on the future of genomic formats and compressors.
Collapse
Affiliation(s)
- Mikel Hernaez
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA
| | - Dmitri Pavlichin
- Department of Electrical Engineering, Stanford University, Stanford, California 94305, USA
| | - Tsachy Weissman
- Department of Electrical Engineering, Stanford University, Stanford, California 94305, USA
| | - Idoia Ochoa
- Department of Electrical and Computer Engineering, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
5
|
Yang R, Chen X, Ochoa I. MassComp, a lossless compressor for mass spectrometry data. BMC Bioinformatics 2019; 20:368. [PMID: 31262247 PMCID: PMC6604446 DOI: 10.1186/s12859-019-2962-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Accepted: 06/20/2019] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Mass Spectrometry (MS) is a widely used technique in biology research, and has become key in proteomics and metabolomics analyses. As a result, the amount of MS data has significantly increased in recent years. For example, the MS repository MassIVE contains more than 123TB of data. Somehow surprisingly, these data are stored uncompressed, hence incurring a significant storage cost. Efficient representation of these data is therefore paramount to lessen the burden of storage and facilitate its dissemination. RESULTS We present MassComp, a lossless compressor optimized for the numerical (m/z)-intensity pairs that account for most of the MS data. We tested MassComp on several MS data and show that it delivers on average a 46% reduction on the size of the numerical data, and up to 89%. These results correspond to an average improvement of more than 27% when compared to the general compressor gzip and of 40% when compared to the state-of-the-art numerical compressor FPC. When tested on entire files retrieved from the MassIVE repository, MassComp achieves on average a 59% size reduction. MassComp is written in C++ and freely available at https://github.com/iochoa/MassComp . CONCLUSIONS The compression performance of MassComp demonstrates its potential to significantly reduce the footprint of MS data, and shows the benefits of designing specialized compression algorithms tailored to MS data. MassComp is an addition to the family of omics compression algorithms designed to lessen the storage burden and facilitate the exchange and dissemination of omics data.
Collapse
Affiliation(s)
- Ruochen Yang
- Electrical Engineering Department, University of Southern California, Los Angeles, CA USA
| | - Xi Chen
- Electrical and Computer Engineering Department, University of Illinois at Urbana-Champaign, Urbana, IL USA
| | - Idoia Ochoa
- Electrical and Computer Engineering Department, University of Illinois at Urbana-Champaign, Urbana, IL USA
| |
Collapse
|
6
|
Marshall NC, Klein T, Thejoe M, von Krosigk N, Kizhakkedathu J, Finlay BB, Overall CM. Global Profiling of Proteolysis from the Mitochondrial Amino Terminome during Early Intrinsic Apoptosis Prior to Caspase-3 Activation. J Proteome Res 2018; 17:4279-4296. [PMID: 30371095 DOI: 10.1021/acs.jproteome.8b00675] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The human genome encodes ∼20 mitochondrial proteases, yet we know little of how they sculpt the mitochondrial proteome, particularly during important mitochondrial events such as the initiation of apoptosis. To characterize global mitochondrial proteolysis we refined our technique, terminal amine isotopic labeling of substrates, for mitochondrial SILAC (MS-TAILS) to identify proteolysis across mitochondria and parent cells in parallel. Our MS-TAILS analyses identified 45% of the mitochondrial proteome and identified protein amino (N)-termini from 26% of mitochondrial proteins, the highest reported coverage of the human mitochondrial N-terminome. MS-TAILS revealed 97 previously unknown proteolytic sites. MS-TAILS also identified mitochondrial targeting sequence (MTS) removal by proteolysis during protein import, confirming 101 MTS sites and identifying 135 new MTS sites, revealing a wobbly requirement for the MTS cleavage motif. To examine the relatively unknown initial cleavage events occurring before the well-studied activation of caspase-3 in intrinsic apoptosis, we quantitatively compared N-terminomes of mitochondria and their parent cells before and after initiation of apoptosis at very early time points. By identifying altered levels of >400 N-termini, MS-TAILS analyses implicated specific mitochondrial pathways including protein import, fission, and iron homeostasis in apoptosis initiation. Notably, both staurosporine and Bax activator molecule-7 triggered in common 7 mitochondrial and 85 cellular cleavage events that are potentially part of an essential core of apoptosis-initiating events. All mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the dataset identifier PXD009054.
Collapse
Affiliation(s)
- Natalie C Marshall
- Michael Smith Laboratories , University of British Columbia , Vancouver , British Columbia , V6T 1Z4 , Canada
| | | | - Maichael Thejoe
- Michael Smith Laboratories , University of British Columbia , Vancouver , British Columbia , V6T 1Z4 , Canada
| | - Niklas von Krosigk
- Michael Smith Laboratories , University of British Columbia , Vancouver , British Columbia , V6T 1Z4 , Canada
| | - Jayachandran Kizhakkedathu
- Department of Pathology and Laboratory Medicine and Department of Chemistry , University of British Columbia , Vancouver , British Columbia V6T 1Z2 , Canada
| | - B Brett Finlay
- Michael Smith Laboratories , University of British Columbia , Vancouver , British Columbia , V6T 1Z4 , Canada
| | | |
Collapse
|
7
|
Gustafsson OJR, Winderbaum LJ, Condina MR, Boughton BA, Hamilton BR, Undheim EAB, Becker M, Hoffmann P. Balancing sufficiency and impact in reporting standards for mass spectrometry imaging experiments. Gigascience 2018; 7:5074354. [PMID: 30124809 PMCID: PMC6203951 DOI: 10.1093/gigascience/giy102] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 07/24/2018] [Accepted: 08/07/2018] [Indexed: 02/06/2023] Open
Abstract
Reproducibility, or a lack thereof, is an increasingly important topic across many research fields. A key aspect of reproducibility is accurate reporting of both experiments and the resulting data. Herein, we propose a reporting guideline for mass spectrometry imaging (MSI). Previous standards have laid out guidelines sufficient to guarantee a certain quality of reporting; however, they set a high bar and as a consequence can be exhaustive and broad, thus limiting uptake.To help address this lack of uptake, we propose a reporting supplement-Minimum Information About a Mass Spectrometry Imaging Experiment (MIAMSIE)-and its abbreviated reporting standard version, MSIcheck. MIAMSIE is intended to improve author-driven reporting. It is intentionally not exhaustive, but is rather designed for extensibility and could therefore eventually become analogous to existing standards that aim to guarantee reporting quality. Conversely, its abbreviated form MSIcheck is intended as a diagnostic tool focused on key aspects in MSI reporting.We discuss how existing standards influenced MIAMSIE/MSIcheck and how these new approaches could positively impact reporting quality, followed by test implementation of both standards to demonstrate their use. For MIAMSIE, we report on author reviews of four articles and a dataset. For MSIcheck, we show a snapshot review of a one-month subset of the MSI literature that indicated issues with data provision and the reporting of both data analysis steps and calibration settings for MS systems. Although our contribution is MSI specific, we believe the underlying approach could be considered as a general strategy for improving scientific reporting.
Collapse
Affiliation(s)
- Ove J R Gustafsson
- ARC Centre of Excellence in Convergent Bio-Nano Science & Technology (CBNS), University of South Australia, Mawson Lakes, South Australia 5095, Australia
- Future Industries Institute, University of South Australia, Mawson Lakes, South Australia 5095, Australia
| | - Lyron J Winderbaum
- Future Industries Institute, University of South Australia, Mawson Lakes, South Australia 5095, Australia
| | - Mark R Condina
- Future Industries Institute, University of South Australia, Mawson Lakes, South Australia 5095, Australia
| | - Berin A Boughton
- Metabolomics Australia, School of BioSciences, University of Melbourne, Parkville, Victoria 3010, Australia
| | - Brett R Hamilton
- Centre for Microscopy and Microanalysis, University of Queensland, St. Lucia, Queensland 4072, Australia
- Centre for Advanced Imaging, University of Queensland, St. Lucia, Queensland 4072, Australia
| | - Eivind A B Undheim
- Centre for Advanced Imaging, University of Queensland, St. Lucia, Queensland 4072, Australia
| | - Michael Becker
- Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach a.d. Riss 88397, Germany
| | - Peter Hoffmann
- Future Industries Institute, University of South Australia, Mawson Lakes, South Australia 5095, Australia
| |
Collapse
|
8
|
A Golden Age for Working with Public Proteomics Data. Trends Biochem Sci 2017; 42:333-341. [PMID: 28118949 PMCID: PMC5414595 DOI: 10.1016/j.tibs.2017.01.001] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2016] [Revised: 12/13/2016] [Accepted: 01/02/2017] [Indexed: 11/23/2022]
Abstract
Data sharing in mass spectrometry (MS)-based proteomics is becoming a common scientific practice, as is now common in the case of other, more mature ‘omics’ disciplines like genomics and transcriptomics. We want to highlight that this situation, unprecedented in the field, opens a plethora of opportunities for data scientists. First, we explain in some detail some of the work already achieved, such as systematic reanalysis efforts. We also explain existing applications of public proteomics data, such as proteogenomics and the creation of spectral libraries and spectral archives. Finally, we discuss the main existing challenges and mention the first attempts to combine public proteomics data with other types of omics data sets. The field of proteomics has matured and diversified substantially over the past 10 years. Proteomics data are increasingly shared through centralized, public repositories. Standardization efforts have ensured that a large proportion of these public data can be read and processed by any interested researcher. Because any proteomics data set is only partially understood, there is great opportunity for (orthogonal) reuse of public data. While public proteomics data has so far remained outside ethics and privacy discussions, recent work indicates that there is an inherent risk.
Collapse
|
9
|
Ischenko D, Alexeev D, Shitikov E, Kanygina A, Malakhova M, Kostryukova E, Larin A, Kovalchuk S, Pobeguts O, Butenko I, Anikanov N, Altukhov I, Ilina E, Govorun V. Large scale analysis of amino acid substitutions in bacterial proteomics. BMC Bioinformatics 2016; 17:450. [PMID: 27821049 PMCID: PMC5100282 DOI: 10.1186/s12859-016-1301-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 10/21/2016] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Proteomics of bacterial pathogens is a developing field exploring microbial physiology, gene expression and the complex interactions between bacteria and their hosts. One of the complications in proteomic approach is micro- and macro-heterogeneity of bacterial species, which makes it impossible to build a comprehensive database of bacterial genomes for identification, while most of the existing algorithms rely largely on genomic data. RESULTS Here we present a large scale study of identification of single amino acid polymorphisms between bacterial strains. An ad hoc method was developed based on MS/MS spectra comparison without the support of a genomic database. Whole-genome sequencing was used to validate the accuracy of polymorphism detection. Several approaches presented earlier to the proteomics community as useful for polymorphism detection were tested on isolates of Helicobacter pylori, Neisseria gonorrhoeae and Escherichia coli. CONCLUSION The developed method represents a perspective approach in the field of bacterial proteomics allowing to identify hundreds of peptides with novel SAPs from a single proteome.
Collapse
Affiliation(s)
- Dmitry Ischenko
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation.
- Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russian Federation.
| | - Dmitry Alexeev
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
- Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russian Federation
| | - Egor Shitikov
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Alexandra Kanygina
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
- Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russian Federation
| | - Maja Malakhova
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Elena Kostryukova
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Andrey Larin
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Sergey Kovalchuk
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Olga Pobeguts
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Ivan Butenko
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Nikolay Anikanov
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Ilya Altukhov
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
- Moscow Institute of Physics and Technology, Institutskiy pereulok, 9, Dolgoprudny, 141700, Russian Federation
| | - Elena Ilina
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| | - Vadim Govorun
- Research Institute of Physical Chemical Medicine, Malaya Pirogovskaya, 1a, Moscow, 119435, Russian Federation
| |
Collapse
|
10
|
Martens L. Public proteomics data: How the field has evolved from sceptical inquiry to the promise of in silico proteomics. EUPA OPEN PROTEOMICS 2016; 11:42-44. [PMID: 29900110 PMCID: PMC5988554 DOI: 10.1016/j.euprot.2016.02.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2015] [Revised: 02/13/2016] [Accepted: 02/15/2016] [Indexed: 12/23/2022]
Abstract
Proteomics data sharing moved from validation to re-use. New tools and services make data very easily accessible. Metadata provision can still benefit from improvements. Quality control metrics will soon be reported along with submitted data. Data re-use will enable the advent of actual in silico proteomics.
Collapse
Affiliation(s)
- Lennart Martens
- Department of Medical Protein Research, VIB 9000 Ghent, Belgium.,Department of Biochemistry, Ghent University, 9000 Ghent, Belgium.,Bioinformatics Institute Ghent, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
11
|
Vihinen M, Hancock JM, Maglott DR, Landrum MJ, Schaafsma GCP, Taschner P. Human Variome Project Quality Assessment Criteria for Variation Databases. Hum Mutat 2016; 37:549-58. [PMID: 26919176 DOI: 10.1002/humu.22976] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Revised: 01/25/2016] [Accepted: 02/12/2016] [Indexed: 12/28/2022]
Abstract
Numerous databases containing information about DNA, RNA, and protein variations are available. Gene-specific variant databases (locus-specific variation databases, LSDBs) are typically curated and maintained for single genes or groups of genes for a certain disease(s). These databases are widely considered as the most reliable information source for a particular gene/protein/disease, but it should also be made clear they may have widely varying contents, infrastructure, and quality. Quality is very important to evaluate because these databases may affect health decision-making, research, and clinical practice. The Human Variome Project (HVP) established a Working Group for Variant Database Quality Assessment. The basic principle was to develop a simple system that nevertheless provides a good overview of the quality of a database. The HVP quality evaluation criteria that resulted are divided into four main components: data quality, technical quality, accessibility, and timeliness. This report elaborates on the developed quality criteria and how implementation of the quality scheme can be achieved. Examples are provided for the current status of the quality items in two different databases, BTKbase, an LSDB, and ClinVar, a central archive of submissions about variants and their clinical significance.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, Lund University, BMC B13, SE-22184, Lund, Sweden
| | - John M Hancock
- The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Donna R Maglott
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, 20892
| | - Melissa J Landrum
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, 20892
| | - Gerard C P Schaafsma
- Department of Experimental Medical Science, Lund University, BMC B13, SE-22184, Lund, Sweden
| | - Peter Taschner
- Generade Center of Expertise Genomics and University of Applied Sciences Leiden, Leiden, The Netherlands.,Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
12
|
Vizcaíno JA, Csordas A, del-Toro N, Dianes JA, Griss J, Lavidas I, Mayer G, Perez-Riverol Y, Reisinger F, Ternent T, Xu QW, Wang R, Hermjakob H. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res 2016; 44:D447-56. [PMID: 26527722 PMCID: PMC4702828 DOI: 10.1093/nar/gkv1145] [Citation(s) in RCA: 2508] [Impact Index Per Article: 313.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Revised: 10/14/2015] [Accepted: 10/16/2015] [Indexed: 11/18/2022] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database is one of the world-leading data repositories of mass spectrometry (MS)-based proteomics data. Since the beginning of 2014, PRIDE Archive (http://www.ebi.ac.uk/pride/archive/) is the new PRIDE archival system, replacing the original PRIDE database. Here we summarize the developments in PRIDE resources and related tools since the previous update manuscript in the Database Issue in 2013. PRIDE Archive constitutes a complete redevelopment of the original PRIDE, comprising a new storage backend, data submission system and web interface, among other components. PRIDE Archive supports the most-widely used PSI (Proteomics Standards Initiative) data standard formats (mzML and mzIdentML) and implements the data requirements and guidelines of the ProteomeXchange Consortium. The wide adoption of ProteomeXchange within the community has triggered an unprecedented increase in the number of submitted data sets (around 150 data sets per month). We outline some statistics on the current PRIDE Archive data contents. We also report on the status of the PRIDE related stand-alone tools: PRIDE Inspector, PRIDE Converter 2 and the ProteomeXchange submission tool. Finally, we will give a brief update on the resources under development 'PRIDE Cluster' and 'PRIDE Proteomes', which provide a complementary view and quality-scored information of the peptide and protein identification data available in PRIDE Archive.
Collapse
Affiliation(s)
- Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Attila Csordas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Noemi del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - José A Dianes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Johannes Griss
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria
| | - Ilias Lavidas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Gerhard Mayer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Medizinisches Proteom Center (MPC), Ruhr-Universität Bochum, D-44801 Bochum, Germany
| | - Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Florian Reisinger
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Tobias Ternent
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Qing-Wei Xu
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Department of Computer Science and Technology, Hubei University of Education, Wuhan, China
| | - Rui Wang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK National Center for Protein Sciences, Beijing, China
| |
Collapse
|
13
|
Perez-Riverol Y, Alpi E, Wang R, Hermjakob H, Vizcaíno JA. Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics 2015; 15:930-49. [PMID: 25158685 PMCID: PMC4409848 DOI: 10.1002/pmic.201400302] [Citation(s) in RCA: 141] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Revised: 08/06/2014] [Accepted: 08/22/2014] [Indexed: 01/10/2023]
Abstract
Compared to other data-intensive disciplines such as genomics, public deposition and storage of MS-based proteomics, data are still less developed due to, among other reasons, the inherent complexity of the data and the variety of data types and experimental workflows. In order to address this need, several public repositories for MS proteomics experiments have been developed, each with different purposes in mind. The most established resources are the Global Proteome Machine Database (GPMDB), PeptideAtlas, and the PRIDE database. Additionally, there are other useful (in many cases recently developed) resources such as ProteomicsDB, Mass Spectrometry Interactive Virtual Environment (MassIVE), Chorus, MaxQB, PeptideAtlas SRM Experiment Library (PASSEL), Model Organism Protein Expression Database (MOPED), and the Human Proteinpedia. In addition, the ProteomeXchange consortium has been recently developed to enable better integration of public repositories and the coordinated sharing of proteomics information, maximizing its benefit to the scientific community. Here, we will review each of the major proteomics resources independently and some tools that enable the integration, mining and reuse of the data. We will also discuss some of the major challenges and current pitfalls in the integration and sharing of the data.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | | | |
Collapse
|
14
|
Szabo Z, Janaky T. Challenges and developments in protein identification using mass spectrometry. Trends Analyt Chem 2015. [DOI: 10.1016/j.trac.2015.03.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
15
|
Bowden-Davies K, Connolly J, Burghardt P, Koch LG, Britton SL, Burniston JG. Label-free profiling of white adipose tissue of rats exhibiting high or low levels of intrinsic exercise capacity. Proteomics 2015; 15:2342-9. [PMID: 25758023 DOI: 10.1002/pmic.201400537] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2014] [Revised: 01/29/2015] [Accepted: 03/04/2015] [Indexed: 12/13/2022]
Abstract
Divergent selection has created rat phenotypes of high- and low-capacity runners (HCR and LCR, respectively) that have differences in aerobic capacity and correlated traits such as adiposity. We analyzed visceral adipose tissue of HCR and LCR using label-free high-definition MS (elevated energy) profiling. The running capacity of HCR was ninefold greater than LCR. Proteome profiling encompassed 448 proteins and detected 30 significant (p <0.05; false discovery rate <10%, calculated using q-values) differences. Approximately half of the proteins analyzed were of mitochondrial origin, but there were no significant differences in the abundance of proteins involved in aerobic metabolism. Instead, adipose tissue of LCR rats exhibited greater abundances of proteins associated with adipogenesis (e.g. cathepsin D), ER stress (e.g. 78 kDa glucose response protein), and inflammation (e.g. Ig gamma-2B chain C region). Whereas the abundance antioxidant enzymes such as superoxide dismutase [Cu-Zn] was greater in HCR tissue. Putative adipokines were also detected, in particular protein S100-B, was 431% more abundant in LCR adipose tissue. These findings reveal low running capacity is associated with a pathological profile in visceral adipose tissue proteome despite no detectable differences in mitochondrial protein abundance.
Collapse
Affiliation(s)
- Kelly Bowden-Davies
- Research Institute for Sport and Exercise Sciences, Liverpool John Moores University, Liverpool, UK
| | | | - Paul Burghardt
- Department of Anaesthesiology, University of Michigan, Ann Arbor, MI, USA
| | - Lauren G Koch
- Department of Anaesthesiology, University of Michigan, Ann Arbor, MI, USA
| | - Steven L Britton
- Department of Anaesthesiology, University of Michigan, Ann Arbor, MI, USA.,K.G. Jebsen Center for Exercise in Medicine, Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, Trondheim, Norway
| | - Jatin G Burniston
- Research Institute for Sport and Exercise Sciences, Liverpool John Moores University, Liverpool, UK
| |
Collapse
|
16
|
Ternent T, Csordas A, Qi D, Gómez‐Baena G, Beynon RJ, Jones AR, Hermjakob H, Vizcaíno JA. How to submit MS proteomics data to ProteomeXchange via the PRIDE database. Proteomics 2014; 14:2233-41. [DOI: 10.1002/pmic.201400120] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Revised: 06/11/2014] [Accepted: 07/17/2014] [Indexed: 11/10/2022]
Affiliation(s)
- Tobias Ternent
- European Molecular Biology Laboratory European Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome Campus Hinxton Cambridge UK
| | - Attila Csordas
- European Molecular Biology Laboratory European Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome Campus Hinxton Cambridge UK
| | - Da Qi
- Institute of Integrative Biology University of Liverpool Liverpool UK
| | | | - Robert J. Beynon
- Institute of Integrative Biology University of Liverpool Liverpool UK
| | - Andrew R. Jones
- Institute of Integrative Biology University of Liverpool Liverpool UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory European Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome Campus Hinxton Cambridge UK
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory European Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome Campus Hinxton Cambridge UK
| |
Collapse
|
17
|
Altmäe S, Esteban FJ, Stavreus-Evers A, Simón C, Giudice L, Lessey BA, Horcajadas JA, Macklon NS, D'Hooghe T, Campoy C, Fauser BC, Salamonsen LA, Salumets A. Guidelines for the design, analysis and interpretation of 'omics' data: focus on human endometrium. Hum Reprod Update 2014; 20:12-28. [PMID: 24082038 PMCID: PMC3845681 DOI: 10.1093/humupd/dmt048] [Citation(s) in RCA: 101] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2013] [Revised: 08/04/2013] [Accepted: 08/19/2013] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND 'Omics' high-throughput analyses, including genomics, epigenomics, transcriptomics, proteomics and metabolomics, are widely applied in human endometrial studies. Analysis of endometrial transcriptome patterns in physiological and pathophysiological conditions has been to date the most commonly applied 'omics' technique in human endometrium. As the technologies improve, proteomics holds the next big promise for this field. The 'omics' technologies have undoubtedly advanced our knowledge of human endometrium in relation to fertility and different diseases. Nevertheless, the challenges arising from the vast amount of data generated and the broad variation of 'omics' profiling according to different environments and stimuli make it difficult to assess the validity, reproducibility and interpretation of such 'omics' data. With the expansion of 'omics' analyses in the study of the endometrium, there is a growing need to develop guidelines for the design of studies, and the analysis and interpretation of 'omics' data. METHODS Systematic review of the literature in PubMed, and references from relevant articles were investigated up to March 2013. RESULTS The current review aims to provide guidelines for future 'omics' studies on human endometrium, together with a summary of the status and trends, promise and shortcomings in the high-throughput technologies. In addition, the approaches presented here can be adapted to other areas of high-throughput 'omics' studies. CONCLUSION A highly rigorous approach to future studies, based on the guidelines provided here, is a prerequisite for obtaining data on biological systems which can be shared among researchers worldwide and will ultimately be of clinical benefit.
Collapse
Affiliation(s)
- Signe Altmäe
- Competence Centre on Reproductive Medicine and Biology, Tartu, Estonia
- School of Medicine, Department of Paediatrics, University of Granada, 18012 Granada, Spain
| | | | - Anneli Stavreus-Evers
- Department of Women's and Children's Health, Uppsala University, Akademiska Sjukhuset, 75185 Uppsala, Sweden
| | - Carlos Simón
- Fundación Instituto Valenciano de Infertilidad (FIVI) and Instituto Universitario IVI/INCLIVA, Valencia University, 46021 Valencia, Spain
| | - Linda Giudice
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of California San Francisco, San Francisco, CA 94143-0132, USA
| | - Bruce A. Lessey
- Division of Reproductive Endocrinology, Department of Obstetrics and Gynecology, University Medical Group, Greenville Hospital System, Greenville, South Carolina, SC 29605, USA
| | - Jose A. Horcajadas
- Araid-Hospital Miguel Servet, 50004 Zaragoza, Spain
- Department of Genetics, Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - Nick S. Macklon
- Department of Obstetrics and Gynaecology, Division of Developmental Origins of Adult Disease, University of Southampton, Princess Anne Hospital, SO16 5YA Southampton, UK
- Department of Reproductive Medicine and Gynaecology, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands
| | - Thomas D'Hooghe
- Leuven University Fertility Center, Department of Obstetrics and Gynecology, University Hospital Leuven, Leuven, Belgium
- Department of Development and Regeneration, KU Leuven (Leuven University), 3000 Leuven, Belgium
| | - Cristina Campoy
- School of Medicine, Department of Paediatrics, University of Granada, 18012 Granada, Spain
| | - Bart C. Fauser
- Department of Reproductive Medicine and Gynaecology, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands
| | - Lois A. Salamonsen
- Prince Henry's Institute of Medical Research, Melbourne, Victoria 3168, Australia
| | - Andres Salumets
- Competence Centre on Reproductive Medicine and Biology, Tartu, Estonia
- Department of Obstetrics and Gynaecology, University of Tartu, 51014 Tartu, Estonia
| |
Collapse
|
18
|
Verheggen K, Barsnes H, Martens L. Distributed computing and data storage in proteomics: many hands make light work, and a stronger memory. Proteomics 2013; 14:367-77. [PMID: 24285552 DOI: 10.1002/pmic.201300288] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Revised: 09/09/2013] [Accepted: 09/23/2013] [Indexed: 12/25/2022]
Abstract
Modern day proteomics generates ever more complex data, causing the requirements on the storage and processing of such data to outgrow the capacity of most desktop computers. To cope with the increased computational demands, distributed architectures have gained substantial popularity in the recent years. In this review, we provide an overview of the current techniques for distributed computing, along with examples of how the techniques are currently being employed in the field of proteomics. We thus underline the benefits of distributed computing in proteomics, while also pointing out the potential issues and pitfalls involved.
Collapse
Affiliation(s)
- Kenneth Verheggen
- Department of Medical Protein Research, VIB, Ghent, Belgium; Department of Biochemistry, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | | | | |
Collapse
|
19
|
Sacchi R, Li J, Villarreal F, Gardell AM, Kültz D. Salinity-induced regulation of the myo-inositol biosynthesis pathway in tilapia gill epithelium. ACTA ACUST UNITED AC 2013; 216:4626-38. [PMID: 24072791 DOI: 10.1242/jeb.093823] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
The myo-inositol biosynthesis (MIB) pathway converts glucose-6-phosphate to the compatible osmolyte myo-inositol that protects cells from osmotic stress. Using proteomics, the enzymes that constitute the MIB pathway, myo-inositol phosphate synthase (MIPS) and inositol monophosphatase 1 (IMPA1), are identified in tilapia (Oreochromis mossambicus) gill epithelium. Targeted, quantitative, label-free proteomics reveals that they are both upregulated during salinity stress. Upregulation is stronger when fish are exposed to severe (34 ppt acute and 90 ppt gradual) relative to moderate (70 ppt gradual) salinity stress. IMPA1 always responds more strongly than MIPS, suggesting that MIPS is more stable during salinity stress. MIPS is N-terminally acetylated and the corresponding peptide increases proportionally to MIPS protein, while non-acetylated N-terminal peptide is not detectable, indicating that MIPS acetylation is constitutive and may serve to stabilize the protein. Hyperosmotic induction of MIPS and IMPA1 is confirmed using western blot and real-time qPCR and is much higher at the mRNA than at the protein level. Two distinct MIPS mRNA variants are expressed in the gill, but one is more strongly regulated by salinity than the other. A single MIPS gene is encoded in the tilapia genome whereas the zebrafish genome lacks MIPS entirely. The genome of euryhaline tilapia contains four IMPA genes, two of which are expressed, but only one is salinity regulated in gill epithelium. The genome of stenohaline zebrafish contains a single IMPA gene. We conclude that the MIB pathway represents a major salinity stress coping mechanism that is regulated at multiple levels in euryhaline fish but absent in stenohaline zebrafish.
Collapse
Affiliation(s)
- Romina Sacchi
- Physiological Genomics Group, Department of Animal Sciences, University of California, Davis, One Shields Avenue, Meyer Hall, Davis, CA 95616, USA
| | | | | | | | | |
Collapse
|
20
|
Kültz D, Li J, Gardell A, Sacchi R. Quantitative molecular phenotyping of gill remodeling in a cichlid fish responding to salinity stress. Mol Cell Proteomics 2013; 12:3962-75. [PMID: 24065692 DOI: 10.1074/mcp.m113.029827] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
A two-tiered label-free quantitative (LFQ) proteomics workflow was used to elucidate how salinity affects the molecular phenotype, i.e. proteome, of gills from a cichlid fish, the euryhaline tilapia (Oreochromis mossambicus). The workflow consists of initial global profiling of relative tryptic peptide abundances in treated versus control samples followed by targeted identification (by MS/MS) and quantitation (by chromatographic peak area integration) of validated peptides for each protein of interest. Fresh water acclimated tilapia were independently exposed in separate experiments to acute short-term (34 ppt) and gradual long-term (70 ppt, 90 ppt) salinity stress followed by molecular phenotyping of the gill proteome. The severity of salinity stress can be deduced with high technical reproducibility from the initial global label-free quantitative profiling step alone at both peptide and protein levels. However, an accurate regulation ratio can only be determined by targeted label-free quantitative profiling because not all peptides used for protein identification are also valid for quantitation. Of the three salinity challenges, gradual acclimation to 90 ppt has the most pronounced effect on gill molecular phenotype. Known salinity effects on tilapia gills, including an increase in the size and number of mitochondria-rich ionocytes, activities of specific ion transporters, and induction of specific molecular chaperones are reflected in the regulation of abundances of the corresponding proteins. Moreover, specific protein isoforms that are responsive to environmental salinity change are resolved and it is revealed that salinity effects on the mitochondrial proteome are nonuniform. Furthermore, protein NDRG1 has been identified as a novel key component of molecular phenotype restructuring during salinity-induced gill remodeling. In conclusion, besides confirming known effects of salinity on gills of euryhaline fish, molecular phenotyping reveals novel insight into proteome changes that underlie the remodeling of tilapia gill epithelium in response to environmental salinity change.
Collapse
Affiliation(s)
- Dietmar Kültz
- Physiological Genomics Group, Department of Animal Sciences, University of California Davis, One Shields Avenue, Davis, California 95616
| | | | | | | |
Collapse
|
21
|
Castelli D, Manghi P, Thanos C. A vision towards Scientific Communication Infrastructures. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES 2013. [DOI: 10.1007/s00799-013-0106-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
22
|
Csordas A, Wang R, Ríos D, Reisinger F, Foster JM, Slotta DJ, Vizcaíno JA, Hermjakob H. From Peptidome to PRIDE: public proteomics data migration at a large scale. Proteomics 2013; 13:1692-5. [PMID: 23533138 PMCID: PMC3717177 DOI: 10.1002/pmic.201200514] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Revised: 02/14/2013] [Accepted: 02/28/2013] [Indexed: 11/07/2022]
Abstract
The PRIDE database, developed and maintained at the European Bioinformatics Institute (EBI), is one of the most prominent data repositories dedicated to high throughput MS-based proteomics data. Peptidome, developed by the National Center for Biotechnology Information (NCBI) as a sibling resource to PRIDE, was discontinued due to funding constraints in April 2011. A joint effort between the two teams was started soon after the Peptidome closure to ensure that data were not “lost” to the wider proteomics community by exporting it to PRIDE. As a result, data in the low terabyte range have been migrated from Peptidome to PRIDE and made publicly available under experiment accessions 17 900–18 271, representing 54 projects, ∼53 million mass spectra, ∼10 million peptide identifications, ∼650 000 protein identifications, ∼1.1 million biologically relevant protein modifications, and 28 species, from more than 30 different labs.
Collapse
Affiliation(s)
- Attila Csordas
- EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | | | | | | | | | | | | | |
Collapse
|
23
|
Hulstaert N, Reisinger F, Rameseder J, Barsnes H, Vizcaíno JA, Martens L. Pride-asap: automatic fragment ion annotation of identified PRIDE spectra. J Proteomics 2013; 95:89-92. [PMID: 23603108 PMCID: PMC4085470 DOI: 10.1016/j.jprot.2013.04.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Revised: 03/27/2013] [Accepted: 04/09/2013] [Indexed: 11/13/2022]
Abstract
We present an open source software application and library written in Java that provides a uniform annotation of identified spectra stored in the PRIDE database. Pride-asap can be ran in a command line mode for automated processing of multiple PRIDE experiments, but also has a graphical user interface that allows end users to annotate the spectra in PRIDE experiments and to inspect the results in detail. Pride-asap binaries, source code and additional information can be downloaded from http://pride-asa-pipeline.googlecode.com.This article is part of a Special Issue entitled: Standardization and Quality Control in Proteomics. We have built an automatic spectrum annotation pipeline for PRIDE. The tool provides both a GUI and a command-line. The provided annotations are robust and consistent. The tool can be applied easily to thousands of PRIDE experiments. Results are available in the GUI, and as text files for downstream analysis.
Collapse
Affiliation(s)
- Niels Hulstaert
- Department of Medical Protein Research, VIB, Ghent, Belgium; Department of Biochemistry, Ghent University, Ghent, Belgium
| | | | | | | | | | | |
Collapse
|
24
|
Griss J, Foster JM, Hermjakob H, Vizcaíno JA. PRIDE Cluster: building a consensus of proteomics data. Nat Methods 2013; 10:95-6. [PMID: 23361086 DOI: 10.1038/nmeth.2343] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
25
|
Medina-Aunon JA, Krishna R, Ghali F, Albar JP, Jones AJ. A guide for integration of proteomic data standards into laboratory workflows. Proteomics 2013; 13:480-92. [DOI: 10.1002/pmic.201200268] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Revised: 08/14/2012] [Accepted: 09/10/2012] [Indexed: 01/28/2023]
Affiliation(s)
| | - Ritesh Krishna
- Institute of Integrative Biology; University of Liverpool; Liverpool; UK
| | - Fawaz Ghali
- Institute of Integrative Biology; University of Liverpool; Liverpool; UK
| | - Juan P. Albar
- Centro Nacional de Biotecnología; CSIC; Madrid; Spain
| | - Andrew J. Jones
- Institute of Integrative Biology; University of Liverpool; Liverpool; UK
| |
Collapse
|
26
|
Liu G, Zhang J, Choi H, Lambert JP, Srikumar T, Larsen B, Nesvizhskii AI, Raught B, Tyers M, Gingras AC. Using ProHits to store, annotate, and analyze affinity purification-mass spectrometry (AP-MS) data. ACTA ACUST UNITED AC 2012; Chapter 8:8.16.1-8.16.32. [PMID: 22948730 DOI: 10.1002/0471250953.bi0816s39] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Affinity purification coupled with mass spectrometry (AP-MS) is a robust technique used to identify protein-protein interactions. With recent improvements in sample preparation, and dramatic advances in MS instrumentation speed and sensitivity, this technique is becoming more widely used throughout the scientific community. To meet the needs of research groups both large and small, we have developed software solutions for tracking, scoring and analyzing AP-MS data. Here, we provide details for the installation and utilization of ProHits, a Laboratory Information Management System designed specifically for AP-MS interaction proteomics. This protocol explains: (i) how to install the complete ProHits system, including modules for the management of mass spectrometry files and the analysis of interaction data, and (ii) alternative options for the use of pre-existing search results in simpler versions of ProHits, including a virtual machine implementation of our ProHits Lite software. We also describe how to use the main features of the software to analyze AP-MS data.
Collapse
Affiliation(s)
- Guomin Liu
- Centre for Systems Biology, Samuel Lunenfeld Research Institute at Mount Sinai Hospital, Toronto, Ontario, Canada
| | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Pan B, Sheng J, Sun W, Zhao Y, Hao P, Li X. OrysPSSP: a comparative platform for small secreted proteins from rice and other plants. Nucleic Acids Res 2012. [PMID: 23203890 PMCID: PMC3531210 DOI: 10.1093/nar/gks1090] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Plants have large diverse families of small secreted proteins (SSPs) that play critical roles in the processes of development, differentiation, defense, flowering, stress response, symbiosis, etc. Oryza sativa is one of the major crops worldwide and an excellent model for monocotyledonous plants. However, there had not been any effort to systematically analyze rice SSPs. Here, we constructed a comparative platform, OrysPSSP (http://www.genoportal.org/PSSP/index.do), involving >100 000 SSPs from rice and 25 plant species. OrysPSSP is composed of a core SSP database and a dynamic web interface that integrates a variety of user tools and resources. The current release (v0530) of core SSP database contains a total of 101 048 predicted SSPs, which were generated through a rigid computation/curation pipeline. The web interface consists of eight different modules, providing users with rich resources/functions, e.g. browsing SSP by chromosome, searching and filtering SSP, validating SSP with omics data, comparing SSP among multiple species and querying core SSP database with BLAST. Some cases of application are discussed to demonstrate the utility of OrysPSSP. OrysPSSP serves as a comprehensive resource to explore SSP on the genome scale and across the phylogeny of plant species.
Collapse
Affiliation(s)
- Bohu Pan
- Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | | | | | | | | | | |
Collapse
|
28
|
Vizcaíno JA, Côté RG, Csordas A, Dianes JA, Fabregat A, Foster JM, Griss J, Alpi E, Birim M, Contell J, O'Kelly G, Schoenegger A, Ovelleiro D, Pérez-Riverol Y, Reisinger F, Ríos D, Wang R, Hermjakob H. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res 2012. [PMID: 23203882 PMCID: PMC3531176 DOI: 10.1093/nar/gks1262] [Citation(s) in RCA: 1579] [Impact Index Per Article: 131.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The PRoteomics IDEntifications (PRIDE, http://www.ebi.ac.uk/pride) database at the European Bioinformatics Institute is one of the most prominent data repositories of mass spectrometry (MS)-based proteomics data. Here, we summarize recent developments in the PRIDE database and related tools. First, we provide up-to-date statistics in data content, splitting the figures by groups of organisms and species, including peptide and protein identifications, and post-translational modifications. We then describe the tools that are part of the PRIDE submission pipeline, especially the recently developed PRIDE Converter 2 (new submission tool) and PRIDE Inspector (visualization and analysis tool). We also give an update about the integration of PRIDE with other MS proteomics resources in the context of the ProteomeXchange consortium. Finally, we briefly review the quality control efforts that are ongoing at present and outline our future plans.
Collapse
Affiliation(s)
- Juan Antonio Vizcaíno
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Gaudet P, Arighi C, Bastian F, Bateman A, Blake JA, Cherry MJ, D'Eustachio P, Finn R, Giglio M, Hirschman L, Kania R, Klimke W, Martin MJ, Karsch-Mizrachi I, Munoz-Torres M, Natale D, O'Donovan C, Ouellette F, Pruitt KD, Robinson-Rechavi M, Sansone SA, Schofield P, Sutton G, Van Auken K, Vasudevan S, Wu C, Young J, Mazumder R. Recent advances in biocuration: meeting report from the Fifth International Biocuration Conference. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas036. [PMID: 23110974 PMCID: PMC3483532 DOI: 10.1093/database/bas036] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The 5th International Biocuration Conference brought together over 300 scientists to exchange on their work, as well as discuss issues relevant to the International Society for Biocuration's (ISB) mission. Recurring themes this year included the creation and promotion of gold standards, the need for more ontologies, and more formal interactions with journals. The conference is an essential part of the ISB's goal to support exchanges among members of the biocuration community. Next year's conference will be held in Cambridge, UK, from 7 to 10 April 2013. In the meanwhile, the ISB website provides information about the society's activities (http://biocurator.org), as well as related events of interest.
Collapse
Affiliation(s)
- Pascale Gaudet
- International Society for Biocuration and CALIPHO Group, Swiss Institute of Bioinformatics, 1 Rue Michel Servet, Geneva, Switzerland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Côté RG, Griss J, Dianes JA, Wang R, Wright JC, van den Toorn HWP, van Breukelen B, Heck AJR, Hulstaert N, Martens L, Reisinger F, Csordas A, Ovelleiro D, Perez-Rivevol Y, Barsnes H, Hermjakob H, Vizcaíno JA. The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium. Mol Cell Proteomics 2012; 11:1682-9. [PMID: 22949509 PMCID: PMC3518121 DOI: 10.1074/mcp.o112.021543] [Citation(s) in RCA: 100] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The original PRIDE Converter tool greatly simplified the process of submitting mass spectrometry (MS)-based proteomics data to the PRIDE database. However, after much user feedback, it was noted that the tool had some limitations and could not handle several user requirements that were now becoming commonplace. This prompted us to design and implement a whole new suite of tools that would build on the successes of the original PRIDE Converter and allow users to generate submission-ready, well-annotated PRIDE XML files. The PRIDE Converter 2 tool suite allows users to convert search result files into PRIDE XML (the format needed for performing submissions to the PRIDE database), generate mzTab skeleton files that can be used as a basis to submit quantitative and gel-based MS data, and post-process PRIDE XML files by filtering out contaminants and empty spectra, or by merging several PRIDE XML files together. All the tools have both a graphical user interface that provides a dialog-based, user-friendly way to convert and prepare files for submission, as well as a command-line interface that can be used to integrate the tools into existing or novel pipelines, for batch processing and power users. The PRIDE Converter 2 tool suite will thus become a cornerstone in the submission process to PRIDE and, by extension, to the ProteomeXchange consortium of MS-proteomics data repositories.
Collapse
Affiliation(s)
- Richard G Côté
- Proteomics Services Team, EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Schneider M, Consortium TU, Poux S. UniProtKB amid the turmoil of plant proteomics research. FRONTIERS IN PLANT SCIENCE 2012; 3:270. [PMID: 23230445 PMCID: PMC3515866 DOI: 10.3389/fpls.2012.00270] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2012] [Accepted: 11/19/2012] [Indexed: 05/09/2023]
Abstract
The UniProt KnowledgeBase (UniProtKB) provides a single, centralized, authoritative resource for protein sequences and functional information. The majority of its records is based on automatic translation of coding sequences (CDS) provided by submitters at the time of initial deposition to the nucleotide sequence databases (INSDC). This article will give a general overview of the current situation, with some specific illustrations extracted from our annotation of Arabidopsis and rice proteomes. More and more frequently, only the raw sequence of a complete genome is deposited to the nucleotide sequence databases and the gene model predictions and annotations are kept in separate, specialized model organism databases (MODs). In order to be able to provide the complete proteome of model organisms, UniProtKB had to implement pipelines for import of protein sequences from Ensembl and EnsemblGenomes. A single genome can be the target of several unrelated sequencing projects and the final assembly and gene model predictions may diverge quite significantly. In addition, several cultivars of the same species are often sequenced - 1001 Arabidopsis cultivars are currently under way - and the resulting proteomes are far from being identical. Therefore, one challenge for UniProtKB is to store and organize these data in a convenient way and to clearly defined reference proteomes that should be made available to users. Manual annotation is one of the landmarks of the Swiss-Prot section of UniProtKB. Besides adding functional annotation, curators are checking, and often correcting, gene model predictions. For plants, this task is limited to Arabidopsis thaliana and Oryza sativa subsp. japonica. Proteomics data providing experimental evidences confirming the existence of proteins or identifying sequence features such as post-translational modifications are also imported into UniProtKB records and the knowledgebase is cross-referenced to numerous proteomics resource.
Collapse
Affiliation(s)
- Michel Schneider
- Swiss-Prot, SIB Swiss Institute of Bioinformatics, Centre Médical UniversitaireGeneva, Switzerland
- *Correspondence: Michel Schneider, Swiss-Prot, SIB Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland. e-mail:
| | - the UniProt Consortium
- Swiss-Prot, SIB Swiss Institute of Bioinformatics, Centre Médical UniversitaireGeneva, Switzerland
- European Bioinformatics InstituteHinxton, UK
- Protein Information Resource, Georgetown University Medical CenterWashington, DC, USA
| | - Sylvain Poux
- Swiss-Prot, SIB Swiss Institute of Bioinformatics, Centre Médical UniversitaireGeneva, Switzerland
| |
Collapse
|