1
|
Hentschker C, Maaß S, Junker S, Hecker M, Hammerschmidt S, Otto A, Becher D. Comprehensive Spectral Library from the Pathogenic Bacterium Streptococcus pneumoniae with Focus on Phosphoproteins. J Proteome Res 2020; 19:1435-1446. [DOI: 10.1021/acs.jproteome.9b00615] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Christian Hentschker
- Department of Microbial Proteomics, Institute of Microbiology; University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Sandra Maaß
- Department of Microbial Proteomics, Institute of Microbiology; University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Sabryna Junker
- Department of Microbial Proteomics, Institute of Microbiology; University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Michael Hecker
- Department of Microbial Physiology and Molecular Biology, Institute of Microbiology; University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Sven Hammerschmidt
- Department of Molecular Genetics and Infection Biology, Interfaculty Institute for Genetics and Functional Genomics, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Andreas Otto
- Department of Microbial Proteomics, Institute of Microbiology; University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Dörte Becher
- Department of Microbial Proteomics, Institute of Microbiology; University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| |
Collapse
|
2
|
Fernández-Costa C, Martínez-Bartolomé S, McClatchy D, Yates JR. Improving Proteomics Data Reproducibility with a Dual-Search Strategy. Anal Chem 2020; 92:1697-1701. [PMID: 31880919 DOI: 10.1021/acs.analchem.9b04955] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Mass spectrometry-based proteomics is an invaluable tool for addressing important biological questions. Data-dependent acquisition methods effectuate stochastic acquisition of data in complex mixtures, which results in missing identifications across replicates. We developed a search approach that improves the reproducibility of data acquired from any mass spectrometer. In our approach, a spectral library is built from the identification results from a database search, and then, the library is used to research the same data files to obtain the final result. We showed that higher identification and quantification reproducibility is achieved with the dual-search approach than with a typical database search. Four datasets with different complexity were compared: (1) data from a cell lysate study performed in our lab, (2) data from an interactome study performed in our lab, (3) a publicly available extracellular vesicles dataset, and (4) a publicly available phosphoproteomics dataset. Our results show that the dual-search approach can be widely and easily used to improve data quality in proteomics data.
Collapse
Affiliation(s)
- Carolina Fernández-Costa
- Department of Molecular Medicine , The Scripps Research Institute , La Jolla , California 92037 , United States
| | - Salvador Martínez-Bartolomé
- Department of Molecular Medicine , The Scripps Research Institute , La Jolla , California 92037 , United States
| | - Daniel McClatchy
- Department of Molecular Medicine , The Scripps Research Institute , La Jolla , California 92037 , United States
| | - John R Yates
- Department of Molecular Medicine , The Scripps Research Institute , La Jolla , California 92037 , United States
| |
Collapse
|
3
|
Griss J. Spectral library searching in proteomics. Proteomics 2016; 16:729-40. [PMID: 26616598 DOI: 10.1002/pmic.201500296] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 10/15/2015] [Accepted: 10/29/2015] [Indexed: 12/12/2022]
Abstract
Spectral library searching has become a mature method to identify tandem mass spectra in proteomics data analysis. This review provides a comprehensive overview of available spectral library search engines and highlights their distinct features. Additionally, resources providing spectral libraries are summarized and tools presented that extend experimental spectral libraries by simulating spectra. Finally, spectrum clustering algorithms are discussed that utilize the same spectrum-to-spectrum matching algorithms as spectral library search engines and allow novel methods to analyse proteomics data.
Collapse
Affiliation(s)
- Johannes Griss
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Austria.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
4
|
Yang Y, Feng J, Li T, Ge F, Zhao J. CyanOmics: an integrated database of omics for the model cyanobacterium Synechococcus sp. PCC 7002. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bau127. [PMID: 25632108 PMCID: PMC4309022 DOI: 10.1093/database/bau127] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Cyanobacteria are an important group of organisms that carry out oxygenic photosynthesis and play vital roles in both the carbon and nitrogen cycles of the Earth. The annotated genome of Synechococcus sp. PCC 7002, as an ideal model cyanobacterium, is available. A series of transcriptomic and proteomic studies of Synechococcus sp. PCC 7002 cells grown under different conditions have been reported. However, no database of such integrated omics studies has been constructed. Here we present CyanOmics, a database based on the results of Synechococcus sp. PCC 7002 omics studies. CyanOmics comprises one genomic dataset, 29 transcriptomic datasets and one proteomic dataset and should prove useful for systematic and comprehensive analysis of all those data. Powerful browsing and searching tools are integrated to help users directly access information of interest with enhanced visualization of the analytical results. Furthermore, Blast is included for sequence-based similarity searching and Cluster 3.0, as well as the R hclust function is provided for cluster analyses, to increase CyanOmics’s usefulness. To the best of our knowledge, it is the first integrated omics analysis database for cyanobacteria. This database should further understanding of the transcriptional patterns, and proteomic profiling of Synechococcus sp. PCC 7002 and other cyanobacteria. Additionally, the entire database framework is applicable to any sequenced prokaryotic genome and could be applied to other integrated omics analysis projects. Database URL: http://lag.ihb.ac.cn/cyanomics
Collapse
Affiliation(s)
- Yaohua Yang
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China, University of Chinese Academy of Sciences, Beijing 100049, China, College of Life Science, Peking University, Beijing 100871, China Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China, University of Chinese Academy of Sciences, Beijing 100049, China, College of Life Science, Peking University, Beijing 100871, China
| | - Jie Feng
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China, University of Chinese Academy of Sciences, Beijing 100049, China, College of Life Science, Peking University, Beijing 100871, China Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China, University of Chinese Academy of Sciences, Beijing 100049, China, College of Life Science, Peking University, Beijing 100871, China Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China, University of Chinese Academy of Sciences, Beijing 100049, China, College of Life Science, Peking University, Beijing 100871, China
| | - Tao Li
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China, University of Chinese Academy of Sciences, Beijing 100049, China, College of Life Science, Peking University, Beijing 100871, China
| | - Feng Ge
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China, University of Chinese Academy of Sciences, Beijing 100049, China, College of Life Science, Peking University, Beijing 100871, China
| | - Jindong Zhao
- Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China, University of Chinese Academy of Sciences, Beijing 100049, China, College of Life Science, Peking University, Beijing 100871, China Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, China, University of Chinese Academy of Sciences, Beijing 100049, China, College of Life Science, Peking University, Beijing 100871, China
| |
Collapse
|
5
|
Ludwig M, Pandelia ME, Chew CY, Zhang B, Golbeck JH, Krebs C, Bryant DA. ChlR protein of Synechococcus sp. PCC 7002 is a transcription activator that uses an oxygen-sensitive [4Fe-4S] cluster to control genes involved in pigment biosynthesis. J Biol Chem 2014; 289:16624-39. [PMID: 24782315 DOI: 10.1074/jbc.m114.561233] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Synechococcus sp. PCC 7002 and many other cyanobacteria have two genes that encode key enzymes involved in chlorophyll a, biliverdin, and heme biosynthesis: acsFI/acsFII, ho1/ho2, and hemF/hemN. Under atmospheric O2 levels, AcsFI synthesizes 3,8-divinyl protochlorophyllide from Mg-protoporphyrin IX monomethyl ester, Ho1 oxidatively cleaves heme to form biliverdin, and HemF oxidizes coproporphyrinogen III to protoporphyrinogen IX. Under microoxic conditions, another set of genes directs the synthesis of alternative enzymes AcsFII, Ho2, and HemN. In Synechococcus sp. PCC 7002, open reading frame SynPCC7002_A1993 encodes a MarR family transcriptional regulator, which is located immediately upstream from the operon comprising acsFII, ho2, hemN, and desF (the latter encodes a putative fatty acid desaturase). Deletion and complementation analyses showed that this gene, denoted chlR, is a transcriptional activator that is essential for transcription of the acsFII-ho2-hemN-desF operon under microoxic conditions. Global transcriptome analyses showed that ChlR controls the expression of only these four genes. Co-expression of chlR with a yfp reporter gene under the control of the acsFII promoter from Synechocystis sp. PCC 6803 in Escherichia coli demonstrated that no other cyanobacterium-specific components are required for proper functioning of this regulatory circuit. A combination of analytical methods and Mössbauer and EPR spectroscopies showed that reconstituted, recombinant ChlR forms homodimers that harbor one oxygen-sensitive [4Fe-4S] cluster. We conclude that ChlR is a transcriptional activator that uses a [4Fe-4S] cluster to sense O2 levels and thereby control the expression of the acsFII-ho2-hemN-desF operon.
Collapse
Affiliation(s)
- Marcus Ludwig
- From the Departments of Biochemistry and Molecular Biology and
| | - Maria-Eirini Pandelia
- Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802 and
| | - Chyue Yie Chew
- From the Departments of Biochemistry and Molecular Biology and
| | - Bo Zhang
- Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802 and
| | - John H Golbeck
- From the Departments of Biochemistry and Molecular Biology and Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802 and
| | - Carsten Krebs
- From the Departments of Biochemistry and Molecular Biology and Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802 and
| | - Donald A Bryant
- From the Departments of Biochemistry and Molecular Biology and the Department of Chemistry and Biochemistry, Montana State University, Bozeman, Montana 59717
| |
Collapse
|
6
|
Alves G, Yu YK. Improving peptide identification sensitivity in shotgun proteomics by stratification of search space. J Proteome Res 2013; 12:2571-81. [PMID: 23668635 DOI: 10.1021/pr301139y] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Because of its high specificity, trypsin is the enzyme of choice in shotgun proteomics. Nonetheless, several publications do report the identification of semitryptic and nontryptic peptides. Many of these peptides are thought to be signaling peptides or to have formed during sample preparation. It is known that only a small fraction of tandem mass spectra from a trypsin-digested protein mixture can be confidently matched to tryptic peptides. If other possibilities such as post-translational modifications and single-amino acid polymorphisms are ignored, this suggests that many unidentified spectra originate from semitryptic and nontryptic peptides. To include them in database searches, however, may not improve overall peptide identification because of the possible sensitivity reduction from search space expansion. To circumvent this issue for E-value-based search methods, we have designed a scheme that categorizes qualified peptides (i.e., peptides whose differences in molecular weight from the parent ion are within a specified error tolerance) into three tiers: tryptic, semitryptic, and nontryptic. This classification allows peptides that belong to different tiers to have different Bonferroni correction factors. Our results show that this scheme can significantly improve retrieval performance compared to those of search strategies that assign equal Bonferroni correction factors to all qualified peptides.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, United States
| | | |
Collapse
|
7
|
Ji C, Arnold RJ, Sokoloski KJ, Hardy RW, Tang H, Radivojac P. Extending the coverage of spectral libraries: a neighbor-based approach to predicting intensities of peptide fragmentation spectra. Proteomics 2013; 13:756-65. [PMID: 23303707 DOI: 10.1002/pmic.201100670] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2011] [Revised: 10/19/2012] [Accepted: 11/11/2012] [Indexed: 01/10/2023]
Abstract
Searching spectral libraries in MS/MS is an important new approach to improving the quality of peptide and protein identification. The idea relies on the observation that ion intensities in an MS/MS spectrum of a given peptide are generally reproducible across experiments, and thus, matching between spectra from an experiment and the spectra of previously identified peptides stored in a spectral library can lead to better peptide identification compared to the traditional database search. However, the use of libraries is greatly limited by their coverage of peptide sequences: even for well-studied organisms a large fraction of peptides have not been previously identified. To address this issue, we propose to expand spectral libraries by predicting the MS/MS spectra of peptides based on the spectra of peptides with similar sequences. We first demonstrate that the intensity patterns of dominant fragment ions between similar peptides tend to be similar. In accordance with this observation, we develop a neighbor-based approach that first selects peptides that are likely to have spectra similar to the target peptide and then combines their spectra using a weighted K-nearest neighbor method to accurately predict fragment ion intensities corresponding to the target peptide. This approach has the potential to predict spectra for every peptide in the proteome. When rigorous quality criteria are applied, we estimate that the method increases the coverage of spectral libraries available from the National Institute of Standards and Technology by 20-60%, although the values vary with peptide length and charge state. We find that the overall best search performance is achieved when spectral libraries are supplemented by the high quality predicted spectra.
Collapse
Affiliation(s)
- Chao Ji
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | | | | | | | | | | |
Collapse
|
8
|
Lewis S, Csordas A, Killcoyne S, Hermjakob H, Hoopmann MR, Moritz RL, Deutsch EW, Boyle J. Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework. BMC Bioinformatics 2012; 13:324. [PMID: 23216909 PMCID: PMC3538679 DOI: 10.1186/1471-2105-13-324] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2012] [Accepted: 11/26/2012] [Indexed: 11/15/2022] Open
Abstract
Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
Collapse
Affiliation(s)
- Steven Lewis
- Institute for Systems Biology, Seattle, WA, USA.
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Pechan T, Gwaltney SR. Calculations of relative intensities of fragment ions in the MSMS spectra of a doubly charged penta-peptide. BMC Bioinformatics 2012; 13 Suppl 15:S13. [PMID: 23046347 PMCID: PMC3439735 DOI: 10.1186/1471-2105-13-s15-s13] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Currently, the tandem mass spectrometry (MSMS) of peptides is a dominant technique used to identify peptides and consequently proteins. The peptide fragmentation inside the mass analyzer typically offers a spectrum containing several different groups of ions. The mass to charge (m/z) values of these ions can be exactly calculated following simple rules based on the possible peptide fragmentation reactions. But the (relative) intensities of the particular ions cannot be simply predicted from the amino-acid sequence of the peptide. This study presents initial work towards developing a theoretical fundamental approach to ion intensity elucidation by utilizing quantum mechanical computations. METHODS MSMS spectra of the doubly charged GAVLK peptide were collected on electrospray ion trap mass spectrometers using low energy modes of fragmentation. Density functional theory (DFT) calculations were performed on the population of ion precursors to determine the fragment ion intensities corresponding to a Boltzmann distribution of the protonation of nitrogens in the peptide backbone amide bonds. RESULTS We were able to a) predict the y and b ions intensities order in concert with the experimental observation; b) predict relative intensities of y ions with errors not exceeding the experimental variation. CONCLUSIONS These results suggest that the GAVLK peptide fragmentation process in the ion trap mass spectrometer is predominantly driven by the thermodynamic stability of the precursor ions formed upon ionization of the sample. The computational approach presented in this manuscript successfully calculated ion intensities in the mass spectra of this doubly charged tryptic peptide, based solely on its amino acid sequence. As such, this work indicates a potential of incorporating quantum mechanical calculations into mass spectrometry based algorithms for molecular identification.
Collapse
Affiliation(s)
- Tibor Pechan
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi Agricultural and Forestry Experiment Station, High Performance Computing Collaboratory, Mississippi State University, Mississippi State, MS 39762, USA.
| | | |
Collapse
|
10
|
Yang C, He Z, Yang C, Yu W. Peptide reranking with protein-peptide correspondence and precursor peak intensity information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1212-1219. [PMID: 22350209 DOI: 10.1109/tcbb.2012.29] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Searching tandem mass spectra against a protein database has been a mainstream method for peptide identification. Improving peptide identification results by ranking true Peptide-Spectrum Matches (PSMs) over their false counterparts leads to the development of various reranking algorithms. In peptide reranking, discriminative information is essential to distinguish true PSMs from false PSMs. Generally, most peptide reranking methods obtain discriminative information directly from database search scores or by training machine learning models. Information in the protein database and MS1 spectra (i.e., single stage MS spectra) is ignored. In this paper, we propose to use information in the protein database and MS1 spectra to rerank peptide identification results. To quantitatively analyze their effects to peptide reranking results, three peptide reranking methods are proposed: PPMRanker, PPIRanker, and MIRanker. PPMRanker only uses Protein-Peptide Map (PPM) information from the protein database, PPIRanker only uses Precursor Peak Intensity (PPI) information, and MIRanker employs both PPM information and PPI information. According to our experiments on a standard protein mixture data set, a human data set and a mouse data set, PPMRanker and MIRanker achieve better peptide reranking results than PetideProphet, PeptideProphet+NSP (number of sibling peptides) and a score regularization method SRPI. The source codes of PPMRanker, PPIRanker, and MIRanker, and all supplementary documents are available at our website: http://bioinformatics.ust.hk/pepreranking/. Alternatively, these documents can also be downloaded from: http://sourceforge.net/projects/pepreranking/.
Collapse
Affiliation(s)
- Chao Yang
- The Hong Kong University of Science and Technology, RM B007D, University Apartment Tower B, Clear Water Bay, Kowloon, Hong Kong.
| | | | | | | |
Collapse
|
11
|
Peterson ES, McCue LA, Schrimpe-Rutledge AC, Jensen JL, Walker H, Kobold MA, Webb SR, Payne SH, Ansong C, Adkins JN, Cannon WR, Webb-Robertson BJM. VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data. BMC Genomics 2012; 13:131. [PMID: 22480257 PMCID: PMC3364912 DOI: 10.1186/1471-2164-13-131] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2011] [Accepted: 04/05/2012] [Indexed: 11/10/2022] Open
Abstract
Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php.
Collapse
Affiliation(s)
- Elena S Peterson
- Scientific Data Management, Pacific Northwest National Laboratory, Richland, WA, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Dasari S, Chambers MC, Martinez MA, Carpenter KL, Ham AJL, Vega-Montoto LJ, Tabb DL. Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment. J Proteome Res 2012; 11:1686-95. [PMID: 22217208 DOI: 10.1021/pr200874e] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.
Collapse
Affiliation(s)
- Surendra Dasari
- Department of Biomedical Informatics, Vanderbilt University Medical Center , Nashville, Tennessee 37232-8575, United States
| | | | | | | | | | | | | |
Collapse
|
13
|
Abstract
It is generally accepted that cyanobacteria have an incomplete tricarboxylic acid (TCA) cycle because they lack 2-oxoglutarate dehydrogenase and thus cannot convert 2-oxoglutarate to succinyl-coenzyme A (CoA). Genes encoding a novel 2-oxoglutarate decarboxylase and succinic semialdehyde dehydrogenase were identified in the cyanobacterium Synechococcus sp. PCC 7002. Together, these two enzymes convert 2-oxoglutarate to succinate and thus functionally replace 2-oxoglutarate dehydrogenase and succinyl-CoA synthetase. These genes are present in all cyanobacterial genomes except those of Prochlorococcus and marine Synechococcus species. Closely related genes occur in the genomes of some methanogens and other anaerobic bacteria, which are also thought to have incomplete TCA cycles.
Collapse
Affiliation(s)
- Shuyi Zhang
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | | |
Collapse
|
14
|
Kalyanaraman A, Cannon WR, Latt B, Baxter DJ. MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification. ACTA ACUST UNITED AC 2011; 27:3072-3. [PMID: 21926122 PMCID: PMC3198583 DOI: 10.1093/bioinformatics/btr523] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
SUMMARY A MapReduce-based implementation called MR-MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs. AVAILABILITY The source code along with user documentation are available on http://compbio.eecs.wsu.edu/MR-MSPolygraph. CONTACT ananth@eecs.wsu.edu; william.cannon@pnnl.gov. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ananth Kalyanaraman
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164-2752, USA.
| | | | | | | |
Collapse
|
15
|
Ahrné E, Ohta Y, Nikitin F, Scherl A, Lisacek F, Müller M. An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates. Proteomics 2011; 11:4085-95. [DOI: 10.1002/pmic.201000665] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Revised: 07/13/2011] [Accepted: 07/29/2011] [Indexed: 11/06/2022]
|