1901
|
Fagerquist CK, Bates AH, Heath S, King BC, Garbus BR, Harden LA, Miller WG. Sub-Speciating Campylobacter jejuni by Proteomic Analysis of Its Protein Biomarkers and Their Post-Translational Modifications. J Proteome Res 2006; 5:2527-38. [PMID: 17022624 DOI: 10.1021/pr050485w] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We have identified several protein biomarkers of three Campylobacter jejuni strains (RM1221, RM1859, and RM3782) by proteomic techniques. The protein biomarkers identified are prominently observed in the time-of-flight mass spectra (TOF MS) of bacterial cell lysate supernatants ionized by matrix-assisted laser desorption/ionization (MALDI). The protein biomarkers identified were: DNA-binding protein HU, translation initiation factor IF-1, cytochrome c553, a transthyretin-like periplasmic protein, chaperonin GroES, thioredoxin Trx, and ribosomal proteins: L7/L12 (50S), L24 (50S), S16 (30S), L29 (50S), and S15 (30S), and conserved proteins similar to strain NCTC 11168 proteins Cj1164 and Cj1225. The protein biomarkers identified appear to represent high copy, intact proteins. The significant findings are as follows: (1) Biomarker mass shifts between these strains were due to amino acid substitutions of the primary polypeptide sequence and not due to changes in post-translational modifications (PTMs). (2) If present, a PTM of a protein biomarker appeared consistently for all three strains, which supported that the biomarker mass shifts observed between strains were not due to PTM variability. (3) The PTMs observed included N-terminal methionine (N-Met) cleavage as well as a number of other PTMs. (4) It was discovered that protein biomarkers of C. jejuni (as well as other thermophilic Campylobacters) appear to violate the N-Met cleavage rule of bacterial proteins, which predicts N-Met cleavage if the penultimate residue is threonine. Two protein biomarkers (HU and 30S ribosomal protein S16) that have a penultimate threonine residue do not show N-Met cleavage. In all other cases, the rule correctly predicted N-Met cleavage among the biomarkers analyzed. This exception to the N-Met cleavage rule has implications for the development of bioinformatics algorithms for protein/pathogen identification. (5) There were fewer biomarker mass shifts between strains RM1221 and RM1859 compared to strain RM3782. As the mass shifts were due to the frequency of amino acid substitutions (and thus underlying genetic variations), this suggested that strains RM1221 and RM1859 were phylogenetically closer to one another than to strain RM3782 (in addition, a protein biomarker prominent in the spectra of RM1221 and RM1859 was absent from the RM3782 spectrum due to a nonsense mutation in the gene of the biomarker). These observations were confirmed by a nitrate reduction test, which showed that RM1221 and RM1859 were C. jejuni subsp. jejuni whereas RM3782 was C. jejuni subsp. doylei. This result suggests that detection/identification of protein biomarkers by pattern recognition and/or bioinformatics algorithms may easily subspeciate bacterial microorganisms. (6) Finally, the number and variation of PTMs detected in this relatively small number of protein biomarkers suggest that bioinformatics algorithms for pathogen identification may need to incorporate many more possible PTMs than suggested previously in the literature.
Collapse
Affiliation(s)
- Clifton K Fagerquist
- Western Regional Research Center, Agricultural Research Service, United States Department of Agriculture, 800 Buchanan Street, Albany, California 94710, USA.
| | | | | | | | | | | | | |
Collapse
|
1902
|
Chan QWT, Howes CG, Foster LJ. Quantitative comparison of caste differences in honeybee hemolymph. Mol Cell Proteomics 2006; 5:2252-62. [PMID: 16920818 DOI: 10.1074/mcp.m600197-mcp200] [Citation(s) in RCA: 117] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The honeybee, Apis mellifera, is an invaluable partner in agriculture around the world both for its production of honey and, more importantly, for its role in pollination. Honeybees are largely unexplored at the molecular level despite a long and distinguished career as a model organism for understanding social behavior. Like other eusocial insects, honeybees can be divided into several castes: the queen (fertile female), workers (sterile females), and drones (males). Each caste has different energetic and metabolic requirements, and each differs in its susceptibility to pathogens, many of which have evolved to take advantage of the close social network inside a colony. Hemolymph, arthropods' equivalent to blood, distributes nutrients throughout the bee, and the immune components contained within it form one of the primary lines of defense against invading microorganisms. In this study we have applied qualitative and quantitative proteomics to gain a better understanding of honeybee hemolymph and how it varies among the castes and during development. We found large differences in hemolymph protein composition, especially between larval and adult stage bees and between male and female castes but even between adult workers and queens. We also provide experimental evidence for the expression of several unannotated honeybee genes and for the detection of biomarkers of a viral infection. Our data provide an initial molecular picture of honeybee hemolymph, to a greater depth than previous studies in other insects, and will pave the way for future biochemical studies of innate immunity in this animal.
Collapse
Affiliation(s)
- Queenie W T Chan
- UBC Centre for Proteomics, Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | | | | |
Collapse
|
1903
|
Qian WJ, Jacobs JM, Liu T, Camp DG, Smith RD. Advances and challenges in liquid chromatography-mass spectrometry-based proteomics profiling for clinical applications. Mol Cell Proteomics 2006; 5:1727-44. [PMID: 16887931 PMCID: PMC1781927 DOI: 10.1074/mcp.m600162-mcp200] [Citation(s) in RCA: 281] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Recent advances in proteomics technologies provide tremendous opportunities for biomarker-related clinical applications; however, the distinctive characteristics of human biofluids such as the high dynamic range in protein abundances and extreme complexity of the proteomes present tremendous challenges. In this review we summarize recent advances in LC-MS-based proteomics profiling and its applications in clinical proteomics as well as discuss the major challenges associated with implementing these technologies for more effective candidate biomarker discovery. Developments in immunoaffinity depletion and various fractionation approaches in combination with substantial improvements in LC-MS platforms have enabled the plasma proteome to be profiled with considerably greater dynamic range of coverage, allowing many proteins at low ng/ml levels to be confidently identified. Despite these significant advances and efforts, major challenges associated with the dynamic range of measurements and extent of proteome coverage, confidence of peptide/protein identifications, quantitation accuracy, analysis throughput, and the robustness of present instrumentation must be addressed before a proteomics profiling platform suitable for efficient clinical applications can be routinely implemented.
Collapse
Affiliation(s)
- Wei-Jun Qian
- Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352, USA
| | | | | | | | | |
Collapse
|
1904
|
McAfee KJ, Duncan DT, Assink M, Link AJ. Analyzing Proteomes and Protein Function Using Graphical Comparative Analysis of Tandem Mass Spectrometry Results. Mol Cell Proteomics 2006; 5:1497-513. [PMID: 16707483 DOI: 10.1074/mcp.t500027-mcp200] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Although generating large amounts of proteomic data using tandem mass spectrometry has become routine, there is currently no single set of comprehensive tools for the rigorous analysis of tandem mass spectrometry results given the large variety of possible experimental aims. Currently available applications are typically designed for displaying proteins and posttranslational modifications from the point of view of the mass spectrometrist and are not versatile enough to allow investigators to develop biological models of protein function, protein structure, or cell state. In addition, storage and dissemination of mass spectrometry-based proteomic data are problems facing the scientific community. To address these issues, we have developed a relational database model that efficiently stores and manages large amounts of tandem mass spectrometry results. We have developed an integrated suite of multifunctional analysis software for interpreting, comparing, and displaying these results. Our system, Bioinformatic Graphical Comparative Analysis Tools (BIGCAT), allows sophisticated analysis of tandem mass spectrometry results in a biologically intuitive format and provides a solution to many data storage and dissemination issues.
Collapse
Affiliation(s)
- K Jill McAfee
- Department of Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, Tennessee 37232-2363, USA
| | | | | | | |
Collapse
|
1905
|
MacLean B, Eng JK, Beavis RC, McIntosh M. General framework for developing and evaluating database scoring algorithms using the TANDEM search engine. Bioinformatics 2006; 22:2830-2. [PMID: 16877754 DOI: 10.1093/bioinformatics/btl379] [Citation(s) in RCA: 182] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Tandem mass spectrometry (MS/MS) identifies protein sequences using database search engines, at the core of which is a score that measures the similarity between peptide MS/MS spectra and a protein sequence database. The TANDEM application was developed as a freely available database search engine for the proteomics research community. To extend TANDEM as a platform for further research on developing improved database scoring methods, we modified the software to allow users to redefine the scoring function and replace the native TANDEM scoring function while leaving the remaining core application intact. Redefinition is performed at run time so multiple scoring functions are available to be selected and applied from a single search engine binary. We introduce the implementation of the pluggable scoring algorithm and also provide implementations of two TANDEM compatible scoring functions, one previously described scoring function compatible with PeptideProphet and one very simple scoring function that quantitative researchers may use to begin their development. This extension builds on the open-source TANDEM project and will facilitate research into and dissemination of novel algorithms for matching MS/MS spectra to peptide sequences. The pluggable scoring schema is also compatible with related search applications P3 and Hunter, which are part of the X! suite of database matching algorithms. The pluggable scores and the X! suite of applications are all written in C++. AVAILABILITY Source code for the scoring functions is available from http://proteomics.fhcrc.org
Collapse
|
1906
|
Binns D, Januszewski T, Chen Y, Hill J, Markin VS, Zhao Y, Gilpin C, Chapman KD, Anderson RGW, Goodman JM. An intimate collaboration between peroxisomes and lipid bodies. ACTA ACUST UNITED AC 2006; 173:719-31. [PMID: 16735577 PMCID: PMC2063889 DOI: 10.1083/jcb.200511125] [Citation(s) in RCA: 294] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Although peroxisomes oxidize lipids, the metabolism of lipid bodies and peroxisomes is thought to be largely uncoupled from one another. In this study, using oleic acid-cultured Saccharomyces cerevisiae as a model system, we provide evidence that lipid bodies and peroxisomes have a close physiological relationship. Peroxisomes adhere stably to lipid bodies, and they can even extend processes into lipid body cores. Biochemical experiments and proteomic analysis of the purified lipid bodies suggest that these processes are limited to enzymes of fatty acid beta oxidation. Peroxisomes that are unable to oxidize fatty acids promote novel structures within lipid bodies ("gnarls"), which may be organized arrays of accumulated free fatty acids. However, gnarls are suppressed, and fatty acids are not accumulated in the absence of peroxisomal membranes. Our results suggest that the extensive physical contact between peroxisomes and lipid bodies promotes the coupling of lipolysis within lipid bodies with peroxisomal fatty acid oxidation.
Collapse
Affiliation(s)
- Derk Binns
- Department of Pharmacology, University of Texas Southwestern Medical School, Dallas, TX 75390, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
1907
|
Russeth KP, Higgins L, Andrews MT. Identification of proteins from non-model organisms using mass spectrometry: application to a hibernating mammal. J Proteome Res 2006; 5:829-39. [PMID: 16602690 DOI: 10.1021/pr050306a] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A major challenge in the life sciences is the extraction of detailed molecular information from plants and animals that are not among the handful of exhaustively studied "model organisms." As a consequence, certain species with novel phenotypes are often ignored due to the lack of searchable databases, tractable genetics, stock centers, and more recently, a sequenced genome. Characterization of phenotype at the molecular level commonly relies on the identification of differentially expressed proteins by combining database searching with tandem mass spectrometry (MS) of peptides derived from protein fragmentation. However, the identification of short peptides from nonmodel organisms can be hampered by the lack of sufficient amino acid sequence homology with proteins in existing databases; therefore, a database search strategy that encompasses both identity and homology can provide stronger evidence than a single search alone. The use of multiple algorithms for database searches may also increase the probability of correct protein identification since it is unlikely that each program would produce false negative or positive hits for the same peptides. In this study, four software packages, Mascot, Pro ID, Sequest, and Pro BLAST, were compared in their ability to identify proteins from the thirteen-lined ground squirrel (Spermophilus tridecemlineatus), a hibernating mammal that lacks a completely sequenced genome. Our results show similarities as well as the degree of variability among different software packages when the identical protein database is searched. In the process of this study, we identified the up-regulation of succinyl CoA-transferase (SCOT) in the heart of hibernators. SCOT is the rate-limiting enzyme in the catabolism of ketone bodies, an important alternative fuel source during hibernation.
Collapse
Affiliation(s)
- Kevin P Russeth
- Department of Biology, University of Minnesota Duluth, 1035 Kirby Drive, 55812, USA
| | | | | |
Collapse
|
1908
|
Kiebel GR, Auberry KJ, Jaitly N, Clark DA, Monroe ME, Peterson ES, Tolić N, Anderson GA, Smith RD. PRISM: a data management system for high-throughput proteomics. Proteomics 2006; 6:1783-90. [PMID: 16470653 DOI: 10.1002/pmic.200500500] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Advanced proteomic research efforts involving areas such as systems biology or biomarker discovery are enabled by the use of high level informatics tools that allow the effective analysis of large quantities of differing types of data originating from various studies. Performing such analyses on a large scale is not feasible without a computational platform that performs data processing and management tasks. Such a platform must be able to provide high-throughput operation while having sufficient flexibility to accommodate evolving data analysis tools and methodologies. The Proteomics Research Information Storage and Management system (PRISM) provides a platform that serves the needs of the accurate mass and time tag approach developed at Pacific Northwest National Laboratory. PRISM incorporates a diverse set of analysis tools and allows a wide range of operations to be incorporated by using a state machine that is accessible to independent, distributed computational nodes. The system has scaled well as data volume has increased over several years, while allowing adaptability for incorporating new and improved data analysis tools for more effective proteomics research.
Collapse
Affiliation(s)
- Gary R Kiebel
- Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
1909
|
Kolker E, Higdon R, Hogan JM. Protein identification and expression analysis using mass spectrometry. Trends Microbiol 2006; 14:229-35. [PMID: 16603360 DOI: 10.1016/j.tim.2006.03.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2005] [Revised: 03/02/2006] [Accepted: 03/22/2006] [Indexed: 11/28/2022]
Abstract
The identification and quantification of the proteins that a whole organism expresses under certain conditions is a main focus of high-throughput proteomics. Advanced proteomics approaches generate new biologically relevant data and potent hypotheses. A practical report of what proteome studies can and cannot accomplish in common laboratory settings is presented here. The review discusses the most popular tandem mass-spectrometry-based methods and focuses on how to produce reliable results. A step-by-step description of proteome experiments is given, including sample preparation, digestion, labeling, liquid chromatography, data processing, database searching and statistical analysis. The difficulties and bottlenecks of proteome analysis are addressed and the requirements for further improvements are discussed. Several diverse high-throughput proteomics-based studies of microorganisms are described.
Collapse
Affiliation(s)
- Eugene Kolker
- The BIATECH Institute, 19310 North Creek Parkway, Suite 115, Bothell, WA 98011, USA.
| | | | | |
Collapse
|
1910
|
Fermin D, Allen BB, Blackwell TW, Menon R, Adamski M, Xu Y, Ulintz P, Omenn GS, States DJ. Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics. Genome Biol 2006; 7:R35. [PMID: 16646984 PMCID: PMC1557991 DOI: 10.1186/gb-2006-7-4-r35] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2006] [Revised: 02/22/2006] [Accepted: 03/27/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database. RESULTS Applying a false discovery rate threshold of 0.05, we identified 282 significant open reading frames, each containing two or more peptide matches. There were 627 novel peptides associated with these open reading frames that mapped to a unique genomic coordinate placed within the start/stop points of previously annotated genes. These peptides matched 1,110 distinct tandem MS spectra. Peptides fell into four categories based upon where their genomic coordinates placed them relative to annotated exons within the parent gene. CONCLUSION This work provides evidence for novel alternative splice variants in many previously annotated genes. These findings suggest that annotation of the genome is not yet complete and that proteomics has the potential to further add to our understanding of gene structures.
Collapse
Affiliation(s)
- Damian Fermin
- Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA
| | - Baxter B Allen
- Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA
| | - Thomas W Blackwell
- Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA
| | - Rajasree Menon
- Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA
| | - Marcin Adamski
- Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yin Xu
- Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA
| | - Peter Ulintz
- Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gilbert S Omenn
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | - David J States
- Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
1911
|
Nesvizhskii AI, Roos FF, Grossmann J, Vogelzang M, Eddes JS, Gruissem W, Baginsky S, Aebersold R. Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data. Mol Cell Proteomics 2006; 5:652-70. [PMID: 16352522 DOI: 10.1074/mcp.m500319-mcp200] [Citation(s) in RCA: 139] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
In mass spectrometry-based proteomics, frequently hundreds of thousands of MS/MS spectra are collected in a single experiment. Of these, a relatively small fraction is confidently assigned to peptide sequences, whereas the majority of the spectra are not further analyzed. Spectra are not assigned to peptides for diverse reasons. These include deficiencies of the scoring schemes implemented in the database search tools, sequence variations (e.g. single nucleotide polymorphisms) or omissions in the database searched, post-translational or chemical modifications of the peptide analyzed, or the observation of sequences that are not anticipated from the genomic sequence (e.g. splice forms, somatic rearrangement, and processed proteins). To increase the amount of information that can be extracted from proteomic MS/MS datasets we developed a robust method that detects high quality spectra within the fraction of spectra unassigned by conventional sequence database searching and computes a quality score for each spectrum. We also demonstrate that iterative search strategies applied to such detected unassigned high quality spectra significantly increase the number of spectra that can be assigned from datasets and that biologically interesting new insights can be gained from existing data.
Collapse
|
1912
|
Sniatynski MJ, Rogalski JC, Hoffman MD, Kast J. Correlation and Convolution Analysis of Peptide Mass Spectra. Anal Chem 2006; 78:2600-7. [PMID: 16615769 DOI: 10.1021/ac051639u] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
As proteomics continues to establish itself as an effective postgenomic research tool, there is an increasingly urgent need for efficient, automated analysis techniques capable of effectively dealing with the vast amounts of data generated via mass spectrometry. Wholesale analysis packages, often used to deal with these enormous amounts of data, may benefit from supplementary, targeted analyses as current research begins to emphasize posttranscriptional/translational protein modifications, protein truncations, and poorly characterized mutations. We demonstrate the application of a new analysis technique based on mathematical correlation that is computationally efficient and robust against different instruments, noise levels, and experimental conditions. We have previously shown that this technique is able to extract pertinent mass shift signals from MS data, corresponding to the neutral loss of a modification from a peptide, e.g., a loss of 79.97 Th from phosphorylated tyrosine. Here we show that an extension of this method is applicable to MS and MS/MS data in general, allowing visualization of ions that produce a particular mass shift signal, be it from differential stable isotope labeling, overlap of fragment ions in a series, or ions that produce a neutral loss. The application of this method allows the researcher to discover individual features, such as the presence of specific modified or isotopically labeled peptides, to eliminate overlapping fragment ion series, and to localize specific sites of modification.
Collapse
Affiliation(s)
- Matthew J Sniatynski
- Biomedical Research Centre, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | | | | | | |
Collapse
|
1913
|
Rauch A, Bellew M, Eng J, Fitzgibbon M, Holzman T, Hussey P, Igra M, Maclean B, Lin CW, Detter A, Fang R, Faca V, Gafken P, Zhang H, Whiteaker J, Whitaker J, States D, Hanash S, Paulovich A, McIntosh MW. Computational Proteomics Analysis System (CPAS): an extensible, open-source analytic system for evaluating and publishing proteomic data and high throughput biological experiments. J Proteome Res 2006; 5:112-21. [PMID: 16396501 DOI: 10.1021/pr0503533] [Citation(s) in RCA: 157] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The open-source Computational Proteomics Analysis System (CPAS) contains an entire data analysis and management pipeline for Liquid Chromatography Tandem Mass Spectrometry (LC-MS/MS) proteomics, including experiment annotation, protein database searching and sequence management, and mining LC-MS/MS peptide and protein identifications. CPAS architecture and features, such as a general experiment annotation component, installation software, and data security management, make it useful for collaborative projects across geographical locations and for proteomics laboratories without substantial computational support.
Collapse
Affiliation(s)
- Adam Rauch
- Fred Hutchinson Cancer Research Center, Seattle, Washington, LabKey Software, Seattle, Washington 98109-1024, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1914
|
Higdon R, Hogan JM, Van Belle G, Kolker E. Randomized sequence databases for tandem mass spectrometry peptide and protein identification. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2006; 9:364-79. [PMID: 16402894 DOI: 10.1089/omi.2005.9.364] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Tandem mass spectrometry (MS/MS) combined with database searching is currently the most widely used method for high-throughput peptide and protein identification. Many different algorithms, scoring criteria, and statistical models have been used to identify peptides and proteins in complex biological samples, and many studies, including our own, describe the accuracy of these identifications, using at best generic terms such as "high confidence." False positive identification rates for these criteria can vary substantially with changing organisms under study, growth conditions, sequence databases, experimental protocols, and instrumentation; therefore, study-specific methods are needed to estimate the accuracy (false positive rates) of these peptide and protein identifications. We present and evaluate methods for estimating false positive identification rates based on searches of randomized databases (reversed and reshuffled). We examine the use of separate searches of a forward then a randomized database and combined searches of a randomized database appended to a forward sequence database. Estimated error rates from randomized database searches are first compared against actual error rates from MS/MS runs of known protein standards. These methods are then applied to biological samples of the model microorganism Shewanella oneidensis strain MR-1. Based on the results obtained in this study, we recommend the use of use of combined searches of a reshuffled database appended to a forward sequence database as a means providing quantitative estimates of false positive identification rates of peptides and proteins. This will allow researchers to set criteria and thresholds to achieve a desired error rate and provide the scientific community with direct and quantifiable measures of peptide and protein identification accuracy as opposed to vague assessments such as "high confidence."
Collapse
Affiliation(s)
- Roger Higdon
- The BIATECH Institute, 19310 N. Creek Parkway South, Suite 115, Bothell, WA 98011, USA
| | | | | | | |
Collapse
|
1915
|
Williams D, Zhu P, Bowden P, Stacey C, McDonell M, Kowalski P, Kowalski JM, Evans K, Diamandis EP, Michael Siu KW, Marshall J. Comparison of methods to examine the endogenous peptides of fetal calf serum. Clin Proteomics 2006. [DOI: 10.1385/cp:2:1:67] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
|
1916
|
Hernandez P, Müller M, Appel RD. Automated protein identification by tandem mass spectrometry: issues and strategies. MASS SPECTROMETRY REVIEWS 2006; 25:235-54. [PMID: 16284939 DOI: 10.1002/mas.20068] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Protein identification by tandem mass spectrometry (MS/MS) is key to most proteomics projects and has been widely explored in bioinformatics research. Obtaining good and trustful identification results has important implications for biological and clinical work. Although well matured, automated software identification of proteins from MS/MS data still faces a number of obstacles due to the complexity of the proteome or procedural issues of mass spectrometry data acquisition. Expected or unexpected modifications of the peptide sequences, polymorphisms, errors in databases, missed or non-specific cleavages, unusual fragmentation patterns, and single MS/MS spectra of multiple peptides of the same m/z are so many pitfalls for identification algorithms. A lot of research work has been carried out in recent years that yielded new strategies to handle a number of these issues. Multiple MS/MS identification algorithms are now available or have been theoretically described. The difficulty resides in choosing the most adapted method for each type of spectra being identified. This review presents an overview of the state-of-the-art bioinformatics approaches to the identification of proteins by MS/MS to help the reader doing the spade work of finding the right tools among the many possibilities offered.
Collapse
|
1917
|
Jones P, Côté RG, Martens L, Quinn AF, Taylor CF, Derache W, Hermjakob H, Apweiler R. PRIDE: a public repository of protein and peptide identifications for the proteomics community. Nucleic Acids Res 2006; 34:D659-63. [PMID: 16381953 PMCID: PMC1347500 DOI: 10.1093/nar/gkj138] [Citation(s) in RCA: 219] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
PRIDE, the ‘PRoteomics IDEntifications database’ () is a database of protein and peptide identifications that have been described in the scientific literature. These identifications will typically be from specific species, tissues and sub-cellular locations, perhaps under specific disease conditions. Any post-translational modifications that have been identified on individual peptides can be described. These identifications may be annotated with supporting mass spectra. At the time of writing, PRIDE includes the full set of identifications as submitted by individual laboratories participating in the HUPO Plasma Proteome Project and a profile of the human platelet proteome submitted by the University of Ghent in Belgium. By late 2005 PRIDE is expected to contain the identifications and spectra generated by the HUPO Brain Proteome Project. Proteomics laboratories are encouraged to submit their identifications and spectra to PRIDE to support their manuscript submissions to proteomics journals. Data can be submitted in PRIDE XML format if identifications are included or mzData format if the submitter is depositing mass spectra without identifications. PRIDE is a web application, so submission, searching and data retrieval can all be performed using an internet browser. PRIDE can be searched by experiment accession number, protein accession number, literature reference and sample parameters including species, tissue, sub-cellular location and disease state. Data can be retrieved as machine-readable PRIDE or mzData XML (the latter for mass spectra without identifications), or as human-readable HTML.
Collapse
Affiliation(s)
- Philip Jones
- EMBL Outstation, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | | | | | | | | | | | | | | |
Collapse
|
1918
|
Balmer Y, Vensel WH, Cai N, Manieri W, Schürmann P, Hurkman WJ, Buchanan BB. A complete ferredoxin/thioredoxin system regulates fundamental processes in amyloplasts. Proc Natl Acad Sci U S A 2006; 103:2988-93. [PMID: 16481623 PMCID: PMC1413819 DOI: 10.1073/pnas.0511040103] [Citation(s) in RCA: 145] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
A growing number of processes throughout biology are regulated by redox via thiol-disulfide exchange. This mechanism is particularly widespread in plants, where almost 200 proteins have been linked to thioredoxin (Trx), a widely distributed small regulatory disulfide protein. The current study extends regulation by Trx to amyloplasts, organelles prevalent in heterotrophic plant tissues that, among other biosynthetic activities, catalyze the synthesis and storage of copious amounts of starch. Using proteomics and immunological methods, we identified the components of the ferredoxin/Trx system (ferredoxin, ferredoxin-Trx reductase, and Trx), originally described for chloroplasts, in amyloplasts isolated from wheat starchy endosperm. Ferredoxin is reduced not by light, as in chloroplasts, but by metabolically generated NADPH via ferredoxin-NADP reductase. However, once reduced, ferredoxin appears to act as established for chloroplasts, i.e., via ferredoxin-Trx reductase and a Trx (m-type). A proteomics approach in combination with affinity chromatography and a fluorescent thiol probe led to the identification of 42 potential Trx target proteins, 13 not previously recognized, including a major membrane transporter (Brittle-1 or ADP-glucose transporter). The proteins function in a range of processes in addition to starch metabolism: biosynthesis of lipids, amino acids, and nucleotides; protein folding; and several miscellaneous reactions. The results suggest a mechanism whereby light is initially recognized as a thiol signal in chloroplasts, then as a sugar during transit to the sink, where it is converted again to a thiol signal. In this way, amyloplast reactions in the grain can be coordinated with photosynthesis taking place in leaves.
Collapse
Affiliation(s)
- Yves Balmer
- *Department of Plant and Biology, University of California, 111 Koshland Hall, Berkeley, CA 94720
| | - William H. Vensel
- Western Regional Research Center, U.S. Department of Agriculture Agricultural Research Service, Albany, CA 94710; and
| | - Nick Cai
- *Department of Plant and Biology, University of California, 111 Koshland Hall, Berkeley, CA 94720
| | - Wanda Manieri
- Laboratoire de Biochimie Végétale, Université de Neuchātel, 2007 Neuchātel, Switzerland
| | - Peter Schürmann
- Laboratoire de Biochimie Végétale, Université de Neuchātel, 2007 Neuchātel, Switzerland
| | - William J. Hurkman
- Western Regional Research Center, U.S. Department of Agriculture Agricultural Research Service, Albany, CA 94710; and
| | - Bob B. Buchanan
- *Department of Plant and Biology, University of California, 111 Koshland Hall, Berkeley, CA 94720
| |
Collapse
|
1919
|
Park GW, Kwon KH, Kim JY, Lee JH, Yun SH, Kim SI, Park YM, Cho SY, Paik YK, Yoo JS. Human plasma proteome analysis by reversed sequence database search and molecular weight correlation based on a bacterial proteome analysis. Proteomics 2006; 6:1121-32. [PMID: 16429460 DOI: 10.1002/pmic.200500318] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
In shotgun proteomics, proteins can be fractionated by 1-D gel electrophoresis and digested into peptides, followed by liquid chromatography to separate the peptide mixture. Mass spectrometry generates hundreds of thousands of tandem mass spectra from these fractions, and proteins are identified by database searching. However, the search scores are usually not sufficient to distinguish the correct peptides. In this study, we propose a confident protein identification method for high-throughput analysis of human proteome. To build a filtering protocol in database search, we chose Pseudomonas putida KT2440 as a reference because this bacterial proteome contains fewer modifications and is simpler than the human proteome. First, the P. putida KT2440 proteome was filtered by reversed sequence database search and correlated by the molecular weight in 1-D-gel band positions. The characterization protocol was then applied to determine the criteria for clustering of the human plasma proteome into three different groups. This protein filtering method, based on bacterial proteome data analysis, represents a rapid way to generate higher confidence protein list of the human proteome, which includes some of heavily modified and cleaved proteins.
Collapse
Affiliation(s)
- Gun Wook Park
- Proteomics Team, Korea Basic Science Institute, Yusung-Ku, Daejeon, Korea
| | | | | | | | | | | | | | | | | | | |
Collapse
|
1920
|
Falkner JA, Falkner JW, Andrews PC. ProteomeCommons.org JAF: reference information and tools for proteomics. Bioinformatics 2006; 22:632-3. [PMID: 16434446 DOI: 10.1093/bioinformatics/btk015] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Analysis of proteomics data, specifically mass spectrometry data, commonly relies on libraries of known information such as atomic masses, known stable isotopes, atomic compositions of amino acids, observed modifications of known amino acids and ion masses that directly correspond to known amino acid sequences. The Java Analysis Framework (JAF) for proteomics provides a freely usable, open-source library of Java code that abstracts all of the aforementioned data, enabling more rapid development of proteomics tools. The JAF also includes several user tools that can be run directly from a web browser. AVAILABILITY The current version and an archive of all older versions of the Java Analysis Framework for Proteomics is freely available, including complete source-code, at http://www.proteomecommons.org/current/511/.
Collapse
Affiliation(s)
- J A Falkner
- Department Biological Chemistry, University of Michigan, Ann Arbor, MI 48104, USA.
| | | | | |
Collapse
|
1921
|
Duncan DT, Craig R, Link AJ. Parallel tandem: a program for parallel processing of tandem mass spectra using PVM or MPI and X!Tandem. J Proteome Res 2006; 4:1842-7. [PMID: 16212440 DOI: 10.1021/pr050058i] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A method for the rapid correlation of tandem mass spectra to a list of protein sequences in a database has been developed. The combination of the fast and accurate computational search algorithm, X!Tandem, and a Linux cluster parallel computing environment with PVM or MPI, significantly reduces the time required to perform the correlation of tandem mass spectra to protein sequences in a database. A file of tandem mass spectra is divided into a specified number of files, each containing an equal number of the spectra from the larger file. These files are then searched in parallel against a protein sequence database. The results of each parallel output file are collated into one file for viewing through a web interface. Thousands of spectra can be searched in an accurate, practical, and time effective manner. The source code for running Parallel Tandem utilizing either PVM or MPI on Linux operating system is available from http://www.thegpm.org. This source code is made available under Artistic License from the authors.
Collapse
Affiliation(s)
- Dexter T Duncan
- Department of Microbiology and Immunology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | | | | |
Collapse
|
1922
|
Tabb DL, Friedman DB, Ham AJL. Verification of automated peptide identifications from proteomic tandem mass spectra. Nat Protoc 2006; 1:2213-22. [PMID: 17406459 PMCID: PMC2819013 DOI: 10.1038/nprot.2006.330] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Shotgun proteomics yields tandem mass spectra of peptides that can be identified by database search algorithms. When only a few observed peptides suggest the presence of a protein, establishing the accuracy of the peptide identifications is necessary for accepting or rejecting the protein identification. In this protocol, we describe the properties of peptide identifications that can differentiate legitimately identified peptides from spurious ones. The chemistry of fragmentation, as embodied in the 'mobile proton' and 'pathways in competition' models, informs the process of confirming or rejecting each spectral match. Examples of ion-trap and tandem time-of-flight (TOF/TOF) mass spectra illustrate these principles of fragmentation.
Collapse
Affiliation(s)
- David L Tabb
- Department of Biochemistry, Vanderbilt University Medical Center, Nashville, Tennessee 37232-8340, USA.
| | | | | |
Collapse
|
1923
|
McLaughlin T, Siepen JA, Selley J, Lynch JA, Lau KW, Yin H, Gaskell SJ, Hubbard SJ. PepSeeker: a database of proteome peptide identifications for investigating fragmentation patterns. Nucleic Acids Res 2006; 34:D649-54. [PMID: 16381951 PMCID: PMC1347429 DOI: 10.1093/nar/gkj066] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2005] [Revised: 10/08/2005] [Accepted: 10/08/2005] [Indexed: 11/14/2022] Open
Abstract
Proteome science relies on bioinformatics tools to characterize proteins via their proteolytic peptides which are identified via characteristic mass spectra generated after their ions undergo fragmentation in the gas phase within the mass spectrometer. The resulting secondary ion mass spectra are compared with protein sequence databases in order to identify the amino acid sequence. Although these search tools (e.g. SEQUEST, Mascot, X!Tandem, Phenyx) are frequently successful, much is still not understood about the amino acid sequence patterns which promote/protect particular fragmentation pathways, and hence lead to the presence/absence of particular ions from different ion series. In order to advance this area, we have developed a database, PepSeeker (http://nwsr.smith.man.ac.uk/pepseeker), which captures this peptide identification and ion information from proteome experiments. The database currently contains >185,000 peptides and associated database search information. Users may query this resource to retrieve peptide, protein and spectral information based on protein or peptide information, including the amino acid sequence itself represented by regular expressions coupled with ion series information. We believe this database will be useful to proteome researchers wishing to understand gas phase peptide ion chemistry in order to improve peptide identification strategies. Questions can be addressed to j.selley@manchester.ac.uk.
Collapse
Affiliation(s)
| | | | - Julian Selley
- Faculty of Life Sciences, University of ManchesterM13 9PT, UK
| | - Jennifer A. Lynch
- Faculty of Life Sciences, University of ManchesterM13 9PT, UK
- School of Electrical and Electronic Engineering, Faculty of Engineering and Physical Sciences, University of ManchesterM13 9PT, UK
| | - King Wai Lau
- School of Chemistry, University of ManchesterM13 9PT, UK
| | - Hujun Yin
- School of Chemistry, University of ManchesterM13 9PT, UK
| | - Simon J. Gaskell
- School of Electrical and Electronic Engineering, Faculty of Engineering and Physical Sciences, University of ManchesterM13 9PT, UK
| | | |
Collapse
|
1924
|
Bandeira N, Tsur D, Frank A, Pevzner P. A New Approach to Protein Identification. LECTURE NOTES IN COMPUTER SCIENCE 2006. [DOI: 10.1007/11732990_31] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
1925
|
Balmer Y, Vensel WH, DuPont FM, Buchanan BB, Hurkman WJ. Proteome of amyloplasts isolated from developing wheat endosperm presents evidence of broad metabolic capability. JOURNAL OF EXPERIMENTAL BOTANY 2006; 57:1591-602. [PMID: 16595579 DOI: 10.1093/jxb/erj156] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
By contrast to chloroplasts, our knowledge of amyloplasts--organelles that synthesize and store starch in heterotrophic plant tissues--is in a formative stage. While our understanding of what is considered their primary function, i.e. the biosynthesis and degradation of starch, has increased dramatically in recent years, relatively little is known about other biochemical processes taking place in these organelles. To help fill this gap, a proteomic analysis of amyloplasts isolated from the starchy endosperm of wheat seeds (10 d post-anthesis) has been conducted. The study has led to the identification of 289 proteins that function in a range of processes, including carbohydrate metabolism, cytoskeleton/plastid division, energetics, nitrogen and sulphur metabolism, nucleic acid-related reactions, synthesis of various building blocks, protein-related reactions, transport, signalling, stress, and a variety of other activities grouped under 'miscellaneous'. The function of 12% of the proteins was unknown. The results highlight the role of the amyloplast as a starch-storing organelle that fulfills a spectrum of biosynthetic needs of the parent tissue. When compared with a recent proteomic analysis of whole endosperm, the current study demonstrates the advantage of using isolated organelles in proteomic studies.
Collapse
Affiliation(s)
- Yves Balmer
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall, Berkeley, CA 94720, USA
| | | | | | | | | |
Collapse
|
1926
|
Chen H, Wilkerson CG, Kuchar JA, Phinney BS, Howe GA. Jasmonate-inducible plant enzymes degrade essential amino acids in the herbivore midgut. Proc Natl Acad Sci U S A 2005; 102:19237-42. [PMID: 16357201 PMCID: PMC1323180 DOI: 10.1073/pnas.0509026102] [Citation(s) in RCA: 215] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2005] [Indexed: 11/18/2022] Open
Abstract
The plant hormone jasmonic acid (JA) activates host defense responses against a broad spectrum of herbivores. Although it is well established that JA controls the expression of a large set of target genes in response to tissue damage, very few gene products have been shown to play a direct role in reducing herbivore performance. To test the hypothesis that JA-inducible proteins (JIPs) thwart attack by disrupting digestive processes in the insect gut, we used a MS-based approach to identify host proteins that accumulate in the midgut of Manduca sexta larvae reared on tomato (Solanum lycopersicum) plants. We show that two JIPs, arginase and threonine deaminase (TD), act in the M. sexta midgut to catabolize the essential amino acids Arg and Thr, respectively. Transgenic plants that overexpress arginase were more resistant to M. sexta larvae, and this effect was correlated with reduced levels of midgut Arg. We present evidence indicating that the ability of TD to degrade Thr in the midgut is enhanced by herbivore-induced proteolytic removal of the enzyme's C-terminal regulatory domain, which confers negative feedback regulation by isoleucine in planta. Our results demonstrate that the JA signaling pathway strongly influences the midgut protein content of phytophagous insects and support the hypothesis that catabolism of amino acids in the insect digestive tract by host enzymes plays a role in plant protection against herbivores.
Collapse
Affiliation(s)
- Hui Chen
- Department of Energy Plant Research Laboratory, Michigan Proteome Consortium, Michigan State University, East Lansing, MI 48824, USA
| | | | | | | | | |
Collapse
|
1927
|
Moulder R, Filén JJ, Salmi J, Katajamaa M, Nevalainen OS, Oresic M, Aittokallio T, Lahesmaa R, Nyman TA. A comparative evaluation of software for the analysis of liquid chromatography-tandem mass spectrometry data from isotope coded affinity tag experiments. Proteomics 2005; 5:2748-60. [PMID: 15952233 DOI: 10.1002/pmic.200401187] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The options available for processing quantitative data from isotope coded affinity tag (ICAT) experiments have mostly been confined to software specific to the instrument of acquisition. However, recent developments with data format conversion have subsequently increased such processing opportunities. In the present study, data sets from ICAT experiments, analysed with liquid chromatography/tandem mass spectrometry (MS/MS), using an Applied Biosystems QSTAR Pulsar quadrupole-TOF mass spectrometer, were processed in triplicate using separate mass spectrometry software packages. The programs Pro ICAT, Spectrum Mill and SEQUEST with XPRESS were employed. Attention was paid towards the extent of common identification and agreement of quantitative results, with additional interest in the flexibility and productivity of these programs. The comparisons were made with data from the analysis of a specifically prepared test mixture, nine proteins at a range of relative concentration ratios from 0.1 to 10 (light to heavy labelled forms), as a known control, and data selected from an ICAT study involving the measurement of cytokine induced protein expression in human lymphoblasts, as an applied example. Dissimilarities were detected in peptide identification that reflected how the associated scoring parameters favoured information from the MS/MS data sets. Accordingly, there were differences in the numbers of peptides and protein identifications, although from these it was apparent that both confirmatory and complementary information was present. In the quantitative results from the three programs, no statistically significant differences were observed.
Collapse
Affiliation(s)
- Robert Moulder
- Turku Centre for Biotechnology, University of Turku and Abo Akademi University, 20521 Turku, Finland.
| | | | | | | | | | | | | | | | | |
Collapse
|
1928
|
Patwardhan AJ, Strittmatter EF, Camp DG, Smith RD, Pallavicini MG. Comparison of Normal and Breast Cancer Cell Lines Using Proteome, Genome, and Interactome Data. J Proteome Res 2005; 4:1952-60. [PMID: 16335939 DOI: 10.1021/pr0501315] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Normal and cancer cell line proteomes were profiled using high throughput mass spectrometry techniques. Application of protein-level and peptide-level sample fractionation combined with LC-MS/MS analysis enabled identification of 2235 unmodified proteins representing a broad range of functional and compartmental classes. An iterative multistep search strategy was used to identify post-translational modifications, revealing several proteins that are preferentially modified in cancer cells. Information regarding both unmodified and modified protein forms was combined with publicly available gene expression and protein-protein interaction data. The resulting integrated dataset revealed several functionally related proteins that are differentially regulated between normal and cancer cell lines.
Collapse
Affiliation(s)
- Anil J Patwardhan
- UCSF Comprehensive Cancer Center, University of California-San Francisco, California 94143-0808, USA
| | | | | | | | | |
Collapse
|
1929
|
Matthiesen R, Trelle MB, Højrup P, Bunkenborg J, Jensen ON. VEMS 3.0: Algorithms and Computational Tools for Tandem Mass Spectrometry Based Identification of Post-translational Modifications in Proteins. J Proteome Res 2005; 4:2338-47. [PMID: 16335983 DOI: 10.1021/pr050264q] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Protein and peptide mass analysis and amino acid sequencing by mass spectrometry is widely used for identification and annotation of post-translational modifications (PTMs) in proteins. Modification-specific mass increments, neutral losses or diagnostic fragment ions in peptide mass spectra provide direct evidence for the presence of post-translational modifications, such as phosphorylation, acetylation, methylation or glycosylation. However, the commonly used database search engines are not always practical for exhaustive searches for multiple modifications and concomitant missed proteolytic cleavage sites in large-scale proteomic datasets, since the search space is dramatically expanded. We present a formal definition of the problem of searching databases with tandem mass spectra of peptides that are partially (sub-stoichiometrically) modified. In addition, an improved search algorithm and peptide scoring scheme that includes modification specific ion information from MS/MS spectra was implemented and tested using the Virtual Expert Mass Spectrometrist (VEMS) software. A set of 2825 peptide MS/MS spectra were searched with 16 variable modifications and 6 missed cleavages. The scoring scheme returned a large set of post-translationally modified peptides including precise information on modification type and position. The scoring scheme was able to extract and distinguish the near-isobaric modifications of trimethylation and acetylation of lysine residues based on the presence and absence of diagnostic neutral losses and immonium ions. In addition, the VEMS software contains a range of new features for analysis of mass spectrometry data obtained in large-scale proteomic experiments. Windows binaries are available at http://www.yass.sdu.dk/.
Collapse
Affiliation(s)
- Rune Matthiesen
- Department of Biochemistry & Molecular Biology, University of Southern Denmark, Odense, Denmark.
| | | | | | | | | |
Collapse
|
1930
|
Ulintz PJ, Zhu J, Qin ZS, Andrews PC. Improved classification of mass spectrometry database search results using newer machine learning approaches. Mol Cell Proteomics 2005; 5:497-509. [PMID: 16321970 DOI: 10.1074/mcp.m500233-mcp200] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Manual analysis of mass spectrometry data is a current bottleneck in high throughput proteomics. In particular, the need to manually validate the results of mass spectrometry database searching algorithms can be prohibitively time-consuming. Development of software tools that attempt to quantify the confidence in the assignment of a protein or peptide identity to a mass spectrum is an area of active interest. We sought to extend work in this area by investigating the potential of recent machine learning algorithms to improve the accuracy of these approaches and as a flexible framework for accommodating new data features. Specifically we demonstrated the ability of boosting and random forest approaches to improve the discrimination of true hits from false positive identifications in the results of mass spectrometry database search engines compared with thresholding and other machine learning approaches. We accommodated additional attributes obtainable from database search results, including a factor addressing proton mobility. Performance was evaluated using publically available electrospray data and a new collection of MALDI data generated from purified human reference proteins.
Collapse
Affiliation(s)
- Peter J Ulintz
- National Resource for Proteomics and Pathways, School of Public Health, University of Michigan, Ann Arbor, Michigan 48109, USA.
| | | | | | | |
Collapse
|
1931
|
Kapp EA, Schütz F, Connolly LM, Chakel JA, Meza JE, Miller CA, Fenyo D, Eng JK, Adkins JN, Omenn GS, Simpson RJ. An evaluation, comparison, and accurate benchmarking of several publicly available MS/MS search algorithms: sensitivity and specificity analysis. Proteomics 2005; 5:3475-90. [PMID: 16047398 DOI: 10.1002/pmic.200500126] [Citation(s) in RCA: 255] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
MS/MS and associated database search algorithms are essential proteomic tools for identifying peptides. Due to their widespread use, it is now time to perform a systematic analysis of the various algorithms currently in use. Using blood specimens used in the HUPO Plasma Proteome Project, we have evaluated five search algorithms with respect to their sensitivity and specificity, and have also accurately benchmarked them based on specified false-positive (FP) rates. Spectrum Mill and SEQUEST performed well in terms of sensitivity, but were inferior to MASCOT, X!Tandem, and Sonar in terms of specificity. Overall, MASCOT, a probabilistic search algorithm, correctly identified most peptides based on a specified FP rate. The rescoring algorithm, PeptideProphet, enhanced the overall performance of the SEQUEST algorithm, as well as provided predictable FP error rates. Ideally, score thresholds should be calculated for each peptide spectrum or minimally, derived from a reversed-sequence search as demonstrated in this study based on a validated data set. The availability of open-source search algorithms, such as X!Tandem, makes it feasible to further improve the validation process (manual or automatic) on the basis of "consensus scoring", i.e., the use of multiple (at least two) search algorithms to reduce the number of FPs. complement.
Collapse
Affiliation(s)
- Eugene A Kapp
- Joint ProteomicS Laboratory, Ludwig Institute for Cancer Research (Melbourne Branch)/Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
1932
|
Abstract
The shotgun proteomic strategy based on digesting proteins into peptides and sequencing them using tandem mass spectrometry and automated database searching has become the method of choice for identifying proteins in most large scale studies. However, the peptide-centric nature of shotgun proteomics complicates the analysis and biological interpretation of the data especially in the case of higher eukaryote organisms. The same peptide sequence can be present in multiple different proteins or protein isoforms. Such shared peptides therefore can lead to ambiguities in determining the identities of sample proteins. In this article we illustrate the difficulties of interpreting shotgun proteomic data and discuss the need for common nomenclature and transparent informatic approaches. We also discuss related issues such as the state of protein sequence databases and their role in shotgun proteomic analysis, interpretation of relative peptide quantification data in the presence of multiple protein isoforms, the integration of proteomic and transcriptional data, and the development of a computational infrastructure for the integration of multiple diverse datasets.
Collapse
|
1933
|
Kalume DE, Peri S, Reddy R, Zhong J, Okulate M, Kumar N, Pandey A. Genome annotation of Anopheles gambiae using mass spectrometry-derived data. BMC Genomics 2005; 6:128. [PMID: 16171517 PMCID: PMC1249570 DOI: 10.1186/1471-2164-6-128] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2005] [Accepted: 09/19/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A large number of animal and plant genomes have been completely sequenced over the last decade and are now publicly available. Although genomes can be rapidly sequenced, identifying protein-coding genes still remains a problematic task. Availability of protein sequence data allows direct confirmation of protein-coding genes. Mass spectrometry has recently emerged as a powerful tool for proteomic studies. Protein identification using mass spectrometry is usually carried out by searching against databases of known proteins or transcripts. This approach generally does not allow identification of proteins that have not yet been predicted or whose transcripts have not been identified. RESULTS We searched 3,967 mass spectra from 16 LC-MS/MS runs of Anopheles gambiae salivary gland homogenates against the Anopheles gambiae genome database. This allowed us to validate 23 known transcripts and 50 novel transcripts. In addition, a novel gene was identified on the basis of peptides that matched a genomic region where no gene was known and no transcript had been predicted. The amino termini of proteins encoded by two predicted transcripts were confirmed based on N-terminally acetylated peptides sequenced by tandem mass spectrometry. Finally, six sequence polymorphisms could be annotated based on experimentally obtained peptide sequences. CONCLUSION The peptide sequences from this study were mapped onto the genomic sequence using the distributed annotation system available at Ensembl and can be visualized in the context of all other existing annotations. The strategy described in this paper can be used to correct and confirm genome annotations and permit discovery of novel proteins in a high-throughput manner by mass spectrometry.
Collapse
Affiliation(s)
- Dário E Kalume
- McKusick-Nathans Institute of Genetic Medicine and Department of Biological Chemistry and Oncology, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| | - Suraj Peri
- McKusick-Nathans Institute of Genetic Medicine and Department of Biological Chemistry and Oncology, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, DK-5230, Denmark
| | - Raghunath Reddy
- McKusick-Nathans Institute of Genetic Medicine and Department of Biological Chemistry and Oncology, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
- Institute of Bioinformatics, Discoverer Unit 1, 7th Floor International Tech Park Ltd., Whitefield Road, Bangalore – 560 066, India
| | - Jun Zhong
- McKusick-Nathans Institute of Genetic Medicine and Department of Biological Chemistry and Oncology, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| | - Mobolaji Okulate
- Department of Molecular Microbiology and Immunology, Johns Hopkins Malaria Research Institute, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Nirbhay Kumar
- Department of Molecular Microbiology and Immunology, Johns Hopkins Malaria Research Institute, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Akhilesh Pandey
- McKusick-Nathans Institute of Genetic Medicine and Department of Biological Chemistry and Oncology, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| |
Collapse
|
1934
|
Alves G, Yu YK. Robust accurate identification of peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics. Bioinformatics 2005; 21:3726-32. [PMID: 16105903 DOI: 10.1093/bioinformatics/bti620] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The key to MS -based proteomics is peptide sequencing. The major challenge in peptide sequencing, whether library search or de novo, is to better infer statistical significance and better attain noise reduction. Since the noise in a spectrum depends on experimental conditions, the instrument used and many other factors, it cannot be predicted even if the peptide sequence is known. The characteristics of the noise can only be uncovered once a spectrum is given. We wish to overcome such issues. RESULTS We designed RAId to identify peptides from their associated tandem mass spectrometry data. RAId performs a novel de novo sequencing followed by a search in a peptide library that we created. Through de novo sequencing, we establish the spectrum-specific background score statistics for the library search. When the database search fails to return significant hits, the top-ranking de novo sequences become potential candidates for new peptides that are not yet in the database. The use of spectrum-specific background statistics seems to enable RAId to perform well even when the spectral quality is marginal. Other important features of RAId include its potential in de novo sequencing alone and the ease of incorporating post-translational modifications.
Collapse
Affiliation(s)
- Gelio Alves
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Bethesda, MD 20894, USA
| | | |
Collapse
|
1935
|
Keller A, Eng J, Zhang N, Li XJ, Aebersold R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 2005; 1:2005.0017. [PMID: 16729052 PMCID: PMC1681455 DOI: 10.1038/msb4100024] [Citation(s) in RCA: 567] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2005] [Accepted: 06/21/2005] [Indexed: 11/19/2022] Open
Abstract
The analysis of tandem mass (MS/MS) data to identify and quantify proteins is hampered by the heterogeneity of file formats at the raw spectral data, peptide identification, and protein identification levels. Different mass spectrometers output their raw spectral data in a variety of proprietary formats, and alternative methods that assign peptides to MS/MS spectra and infer protein identifications from those peptide assignments each write their results in different formats. Here we describe an MS/MS analysis platform, the Trans-Proteomic Pipeline, which makes use of open XML file formats for storage of data at the raw spectral data, peptide, and protein levels. This platform enables uniform analysis and exchange of MS/MS data generated from a variety of different instruments, and assigned peptides using a variety of different database search programs. We demonstrate this by applying the pipeline to data sets generated by ThermoFinnigan LCQ, ABI 4700 MALDI-TOF/TOF, and Waters Q-TOF instruments, and searched in turn using SEQUEST, Mascot, and COMET.
Collapse
|
1936
|
Rudnick PA, Wang Y, Evans E, Lee CS, Balgley BM. Large Scale Analysis of MASCOT Results Using a Mass Accuracy-Based THreshold (MATH) Effectively Improves Data Interpretation. J Proteome Res 2005; 4:1353-60. [PMID: 16083287 DOI: 10.1021/pr0500509] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In this report, we take a heuristic approach to studying the effects of mass tolerance settings and database size on the sensitivity and specificity of MASCOT. We also examine the efficacy of the MASCOT Identity Threshold as a discriminator when applied to QqTOF data with an average mass accuracy of 10 ppm or better. As predicted, arbitrarily large mass tolerance settings negatively affect MASCOT's specificity, and to a lesser degree, sensitivity. Increased mass tolerances also render the generation of a significance threshold less effective. To study these effects, we used Bayes' Law to calculate MASCOT's predictive values. With a relatively small search database (Human IPI), MASCOT had a mean positive predictive value of 0.993 when combined with MASCOT's Identity Threshold. However, the corresponding average negative predictive value, or the probability that an ion was not present given no score or a score below threshold, was reduced as mass tolerances were tightened, and had an average value of 0.717. This value was improved upon by extrapolating an empirical threshold using a reversed database search and a new algorithm to rapidly identify false positive identifications. Using the empirical threshold reduced false negative identifications on the average 17% while limiting the false positive rate to below 5%; even larger reductions were obtained using mass tolerances approaching two times the actual error of the experimental data. A simple application of this strategy to the analysis of a microdissected glioblastoma multiforme sample analyzed by IEF/LC-MS/MS is reported, as is a description of the tools required to implement a large scale analysis using this alternative approach.
Collapse
Affiliation(s)
- Paul A Rudnick
- Calibrant Biosystems, 7507 Standish Pl., Rockville, MD 20855, USA.
| | | | | | | | | |
Collapse
|
1937
|
Fagerquist CK, Miller WG, Harden LA, Bates AH, Vensel WH, Wang G, Mandrell RE. Genomic and Proteomic Identification of a DNA-Binding Protein Used in the “Fingerprinting” ofCampylobacterSpecies and Strains by MALDI-TOF-MS Protein Biomarker Analysis. Anal Chem 2005; 77:4897-907. [PMID: 16053303 DOI: 10.1021/ac040193z] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We have identified a prominent approximately 10-kDa protein biomarker observed in the matrix-assisted laser desorption/ionization time-of-flight mass spectra (MALDI-TOF-MS) of cell lysates of five thermophilic species of Campylobacter: jejuni, coli, lari, upsaliensis, and helveticus. The biomarker was unambiguously identified by genomic and proteomic sequencing as a DNA-binding protein HU. We report the amino acid sequence of HU as determined by sequencing the hup gene of four species (12 strains): C. jejuni (2), C. coli (4), C. upsaliensis (4) and C. lari(2). Confirmation of the amino acid sequence was obtained by nanoflow high-performance liquid chromatography-tandem mass spectrometry of the tryptic peptides of the extracted/digested HU protein. Protein identification was also confirmed by comparison of the molecular weight (MW) predicted from the hup gene and the MW of HU as measured by high-resolution mass spectrometry. We found the HU protein to be particularly useful as a biomarker in that it strongly ionizes by MALDI and its MW varies between species and among strains within a species. Intra- and interspecies variation of the HU MW is due to changes in the amino acid sequence of the HU protein and not due to co- or posttranslational modifications. The strong ionization efficiency of HU by MALDI is likely due, in part, to four lysine residues clustered at the carboxyl end of the protein. We also report identification of the HU protein biomarker for a C. helveticus strain, whose hup gene was not sequenced, but whose HU amino acid sequence was partially conserved in C. upsaliensis strains. We have also tentatively assigned a approximately 10.5-kDa protein biomarker of a C. concisus strain as an HU protein.
Collapse
Affiliation(s)
- Clifton K Fagerquist
- Western Regional Research Service, Agricultural Research Service, U.S. Department of Agriculture, Albany, California 94710, USA.
| | | | | | | | | | | | | |
Collapse
|
1938
|
Abstract
Shotgun proteomics has emerged as a powerful approach for the analysis of complex protein mixtures, including biofluids, tissues, cells, organelles or protein complexes. Having evolved from the integration of chromatography and mass spectrometry, innovations in sample preparation, multidimensional chromatography, mass spectrometry and proteomic informatics continually facilitate, enable and challenge shotgun proteomics. As a result, shotgun proteomics continues to evolve and enable new areas of biological research, and is beginning to impact human disease diagnosis and therapeutic intervention.
Collapse
Affiliation(s)
- Selene K Swanson
- Stowers Institute for Medical Research, 1000 E. 50th St., Kansas City, MO 64110, USA
| | | |
Collapse
|
1939
|
Higgs RE, Knierman MD, Gelfanova V, Butler JP, Hale JE. Comprehensive label-free method for the relative quantification of proteins from biological samples. J Proteome Res 2005; 4:1442-50. [PMID: 16083298 DOI: 10.1021/pr050109b] [Citation(s) in RCA: 163] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Pharmaceutical companies and regulatory agencies are broadly pursuing biomarkers as a means to increase the productivity of drug development. Quantifying differential levels of proteins from complex biological samples such as plasma or cerebrospinal fluid is one specific approach being used to identify markers of drug action, efficacy, toxicity, etc. We have developed a comprehensive, fully automated, and label-free approach to relative protein quantification from LC-MS/MS experiments of proteolytic protein digests including: de-noising, mass and charge state estimation, chromatographic alignment, and peptide quantification via integration of extracted ion chromatograms. Results from a variance components study of the entire method indicate that most of the variability is attributable to the LC-MS injection, with a median peptide LC-MS injection coefficient of variation of 8% on a ThermoFinnigan LTQ mass spectrometer. Spiked recovery results suggest a quantifiable range of approximately 32-fold for a sample protein.
Collapse
Affiliation(s)
- Richard E Higgs
- Lilly Research Laboratories, MS 1533, Lilly Corporate Center, Indianapolis, IN 46285, USA.
| | | | | | | | | |
Collapse
|
1940
|
Affiliation(s)
- Jean-Philippe Lambert
- Ottawa Institute of Systems Biology, University of Ottawa, 451 Smyth Road, Ottawa, Ontario, Canada K1H 8M5
| | | | | | | |
Collapse
|
1941
|
Phinney BS, Thelen JJ. Proteomic Characterization of A Triton-Insoluble Fraction from Chloroplasts Defines A Novel Group of Proteins Associated with Macromolecular Structures. J Proteome Res 2005; 4:497-506. [PMID: 15822927 DOI: 10.1021/pr049791k] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteomic analysis of a Triton X-100 insoluble, 30,000 x g pellet from purified pea chloroplasts resulted in the identification of 179 nonredundant proteins. This chloroplast fraction was mostly depleted of chloroplast membranes since only 23% and 9% of the identified proteins were also observed in envelope and thylakoid membranes, respectively. One of the most abundant proteins in this fraction was sulfite reductase, a dual function protein previously shown to act as a plastid DNA condensing protein. Approximately 35 other proteins known (or predicted) to be associated with high-density protein-nucleic acid particles (nucleoids) were also identified including a family of DNA gyrases, as well as proteins involved in plastid transcription and translation. Although nucleoids appeared to be the predominant component of 30k x g Triton-insoluble chloroplast preparations, multi-enzyme protein complexes were also present including each subunit to the pyruvate dehydrogenase and acetyl-CoA carboxylase multi-enzyme complexes, as well as a proposed assembly of the first three enzymes of the Calvin cycle. Approximately 18% of the proteins identified were annonated as unknown or hypothetical proteins and another 20% contained "putative" or "like" in the identifier tag. This is the first proteomic characterization of a membrane-depleted, high-density fraction from plastids and demonstrates the utility of this simple procedure to isolate intact macromolecular structures from purified organelles for analysis of protein-protein and protein-nucleic acid interactions.
Collapse
Affiliation(s)
- Brett S Phinney
- Michigan State University, Proteomics and Mass Spectrometry Facility, East Lansing, Michigan 48824, USA
| | | |
Collapse
|
1942
|
Falkner J, Andrews P. Fast tandem mass spectra-based protein identification regardless of the number of spectra or potential modifications examined. Bioinformatics 2005; 21:2177-84. [PMID: 15746284 DOI: 10.1093/bioinformatics/bti362] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Comparing tandem mass spectra (MSMS) against a known dataset of protein sequences is a common method for identifying unknown proteins; however, the processing of MSMS by current software often limits certain applications, including comprehensive coverage of post-translational modifications, non-specific searches and real-time searches to allow result-dependent instrument control. This problem deserves attention as new mass spectrometers provide the ability for higher throughput and as known protein datasets rapidly grow in size. New software algorithms need to be devised in order to address the performance issues of conventional MSMS protein dataset-based protein identification. METHODS This paper describes a novel algorithm based on converting a collection of monoisotopic, centroided spectra to a new data structure, named 'peptide finite state machine' (PFSM), which may be used to rapidly search a known dataset of protein sequences, regardless of the number of spectra searched or the number of potential modifications examined. The algorithm is verified using a set of commercially available tryptic digest protein standards analyzed using an ABI 4700 MALDI TOFTOF mass spectrometer, and a free, open source PFSM implementation. It is illustrated that a PFSM can accurately search large collections of spectra against large datasets of protein sequences (e.g. NCBI nr) using a regular desktop PC; however, this paper only details the method for identifying peptide and subsequently protein candidates from a dataset of known protein sequences. The concept of using a PFSM as a peptide pre-screening technique for MSMS-based search engines is validated by using PFSM with Mascot and XTandem. AVAILABILITY Complete source code, documentation and examples for the reference PFSM implementation are freely available at the Proteome Commons, http://www.proteomecommons.org and source code may be used both commercially and non-commercially as long as the original authors are credited for their work.
Collapse
Affiliation(s)
- Jayson Falkner
- Department of Biological Chemistry and Program in Bioinformatics, University of Michigan, 1301 Catherine Street, Ann Arbor, MI 48109, USA.
| | | |
Collapse
|
1943
|
Craig R, Cortens JP, Beavis RC. The use of proteotypic peptide libraries for protein identification. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2005; 19:1844-50. [PMID: 15945033 DOI: 10.1002/rcm.1992] [Citation(s) in RCA: 131] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
This paper describes an algorithm to apply proteotypic peptide sequence libraries to protein identifications performed using tandem mass spectrometry (MS/MS). Proteotypic peptides are those peptides in a protein sequence that are most likely to be confidently observed by current MS-based proteomics methods. Libraries of proteotypic peptide sequences were compiled from the Global Proteome Machine Database for Homo sapiens and Saccharomyces cerevisiae model species proteomes. These libraries were used to scan through collections of tandem mass spectra to discover which proteins were represented by the data sets, followed by detailed analysis of the spectra with the full protein sequences corresponding to the discovered proteotypic peptides. This algorithm (Proteotypic Peptide Profiling, or P3) resulted in sequence-to-spectrum matches comparable to those obtained by conventional protein identification algorithms using only full protein sequences, with a 20-fold reduction in the time required to perform the identification calculations. The proteotypic peptide libraries, the open source code for the implementation of the search algorithm and a website for using the software have been made freely available. Approximately 4% of the residues in the H. sapiens proteome were required in the proteotypic peptide library to successfully identify proteins.
Collapse
|
1944
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2448604 DOI: 10.1002/cfg.419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
|
1945
|
John Wiley & Sons, Ltd.. Current literature in mass spectrometry. JOURNAL OF MASS SPECTROMETRY : JMS 2004; 39:1383-1394. [PMID: 15532071 PMCID: PMC7166839 DOI: 10.1002/jms.712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
In order to keep subscribers up‐to‐date with the latest developments in their field, John Wiley & Sons are providing a current awareness service in each issue of the journal. The bibliography contains newly published material in the field of mass spectrometry. Each bibliography is divided into 11 sections: 1 Books, Reviews & Symposia; 2 Instrumental Techniques & Methods; 3 Gas Phase Ion Chemistry; 4 Biology/Biochemistry: Amino Acids, Peptides & Proteins; Carbohydrates; Lipids; Nucleic Acids; 5 Pharmacology/Toxicology; 6 Natural Products; 7 Analysis of Organic Compounds; 8 Analysis of Inorganics/Organometallics; 9 Surface Analysis; 10 Environmental Analysis; 11 Elemental Analysis. Within each section, articles are listed in alphabetical order with respect to author (5 Weeks journals ‐ Search completed at 8th. Sept. 2004)
Collapse
|