1
|
Haga SW, Wu HF. Overview of software options for processing, analysis and interpretation of mass spectrometric proteomic data. JOURNAL OF MASS SPECTROMETRY : JMS 2014; 49:959-969. [PMID: 25303385 DOI: 10.1002/jms.3414] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Revised: 05/23/2014] [Accepted: 06/13/2014] [Indexed: 06/04/2023]
Abstract
Recently, the interests in proteomics have been intensively increased, and the proteomic methods have been widely applied to many problems in cell biology. If the age of 1990s is considered to be a decade of genomics, we can claim that the following years of the new century is a decade of proteomics. The rapid evolution of proteomics has continued through these years, with a series of innovations in separation techniques and the core technologies of two-dimensional gel electrophoresis and MS. Both technologies are fueled by automation and high throughput computation for profiling of proteins from biological systems. As Patterson ever mentioned, 'data analysis is the Achilles heel of proteomics and our ability to generate data now outstrips our ability to analyze it'. The development of automatic and high throughput technologies for rapid identification of proteins is essential for large-scale proteome projects and automatic protein identification and characterization is essential for high throughput proteomics. This review provides a snap shot of the tools and applications that are available for mass spectrometric high throughput biocomputation. The review starts with a brief introduction of proteomics and MS. Computational tools that can be employed at various stages of analysis are presented, including that for data processing, identification, quantification, and the understanding of the biological functions of individual proteins and their dynamic interactions. The challenges of computation software development and its future trends in MS-based proteomics have also been speculated.
Collapse
Affiliation(s)
- Steve W Haga
- Department of Computer Science and Engineering, National Sun Yat Sen University, Kaohsiung, 804, Taiwan
| | | |
Collapse
|
2
|
Yadeta KA, Elmore JM, Coaker G. Advancements in the analysis of the Arabidopsis plasma membrane proteome. FRONTIERS IN PLANT SCIENCE 2013; 4:86. [PMID: 23596451 PMCID: PMC3622881 DOI: 10.3389/fpls.2013.00086] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Accepted: 03/22/2013] [Indexed: 05/09/2023]
Abstract
The plasma membrane (PM) regulates diverse processes essential to plant growth, development, and survival in an ever-changing environment. In addition to maintaining normal cellular homeostasis and plant nutrient status, PM proteins perceive and respond to a myriad of environmental cues. Here we review recent advances in the analysis of the plant PM proteome with a focus on the model plant Arabidopsis thaliana. Due to membrane heterogeneity, hydrophobicity, and low relative abundance, analysis of the PM proteome has been a special challenge. Various experimental techniques to enrich PM proteins and different protein and peptide separation strategies have facilitated the identification of thousands of integral and membrane-associated proteins. Numerous classes of proteins are present at the PM with diverse biological functions. PM microdomains have attracted much attention. However, it still remains a challenge to characterize these cell membrane compartments. Dynamic changes in the PM proteome in response to different biotic and abiotic stimuli are highlighted. Future prospects for PM proteomics research are also discussed.
Collapse
Affiliation(s)
- Koste A. Yadeta
- Department of Plant Pathology, University of California DavisDavis, CA, USA
| | - J. Mitch Elmore
- Department of Plant Pathology, University of California DavisDavis, CA, USA
| | - Gitta Coaker
- Department of Plant Pathology, University of California DavisDavis, CA, USA
| |
Collapse
|
3
|
|
4
|
Helbig AO, Heck AJR, Slijper M. Exploring the membrane proteome--challenges and analytical strategies. J Proteomics 2010; 73:868-78. [PMID: 20096812 DOI: 10.1016/j.jprot.2010.01.005] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2009] [Revised: 01/08/2010] [Accepted: 01/08/2010] [Indexed: 12/22/2022]
Abstract
The analysis of proteins in biological membranes forms a major challenge in proteomics. Despite continuous improvements and the development of more sensitive analytical methods, the analysis of membrane proteins has always been hampered by their hydrophobic properties and relatively low abundance. In this review, we describe recent successful strategies that have led to in-depth analyses of the membrane proteome. To facilitate membrane proteome analysis, it is essential that biochemical enrichment procedures are combined with special analytical workflows that are all optimized to cope with hydrophobic polypeptides. These include techniques for protein solubilization, and also well-matched developments in protein separation and protein digestion procedures. Finally, we discuss approaches to target membrane-protein complexes and lipid-protein interactions, as such approaches offer unique insights into function and architecture of cellular membranes.
Collapse
Affiliation(s)
- Andreas O Helbig
- Biomolecular Mass Spectrometry and Proteomics Group, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | | | | |
Collapse
|
5
|
Abstract
Mass spectrometry instrumentation has continued to develop rapidly in the last two decades, enabled in part by advances in microelectronic hardware controllers and computerized control and data acquisition systems. The wealth and complexity of data produced by a modern instrument is such that the data can no longer be analyzed manually. Computerized data analysis has become de rigueur and the bioinformatics field has expanded to provide software applications for all aspects of the data analysis needed by LC-MS/MS. The bioinformatics field is evolving rapidly and software applications are continually being improved or replaced for existing applications as well as developed to support new types of experiments and analysis enabled by modern instrumentation. Entire books have been written on MS data analysis in proteomics but this review will be necessarily brief. In this chapter we will review the bioinformatics software applications available for different LC-MS/MS analysis tasks.
Collapse
|
6
|
Abstract
Proteomics has advanced in leaps and bounds over the past couple of decades. However, the continuing dependency of mass spectrometry-based protein identification on the searching of spectra against protein sequence databases limits many proteomics experiments. If there is no sequenced genome for a given species, then cross species proteomics is required, attempting to identify proteins across the species boundary, typically using the sequenced genome of a closely related species. Unlike sequence searching for homologues, the proteomics equivalent is confounded by small differences in amino acid sequences, leading to large differences in peptide masses; this renders mass matching of peptides and their product ions difficult. Therefore, the phylogenetic distance between the two species and the attendant level of conservation between the homologous proteins play a huge part in determining the extent of protein identification that is possible across the species boundary. In this chapter, we review the cross species challenge itself, as well as various approaches taken to deal with it and the success met with in past studies. This is followed by recommendations of best practice and suggestions to researchers facing this challenge as well as a final section predicting developments, which may help improve cross species proteomics in the future.
Collapse
Affiliation(s)
- J C Wright
- Department Veterinary Preclinical Sciences, University of Liverpool, Crown Street, Liverpool, UK
| | | | | |
Collapse
|
7
|
Hoelzle LE. Haemotrophic mycoplasmas: recent advances in Mycoplasma suis. Vet Microbiol 2008; 130:215-26. [PMID: 18358641 DOI: 10.1016/j.vetmic.2007.12.023] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2007] [Revised: 12/07/2007] [Accepted: 12/11/2007] [Indexed: 10/22/2022]
Abstract
Haemotrophic mycoplasmas (haemoplasmas) are uncultivable, small epicellular, cell wall less, tetracycline-sensitive bacteria that attach to the surface of host erythrocytes. Today, haemotrophic mycoplasmas are found in a large number of animals, with Mycoplasma suis being the porcine pathogen. Haemoplasmas can cause infections which are clinically marked, either by an overt life-threatening haemolytic anaemia or a mild chronic anaemia, by illthrift, infertility, and immune suppression. The life cycle of haemoplasmas on the surface of nucleus-less red blood cells is unique for mycoplasma and therefore, it is evident that these haemotrophic pathogens must have features that allow them to colonise and replicate on red blood cells. However, the mechanisms of adhesion and replication of M. suis on erythrocytes, for instance, as well as the significance of metabolic interchanges between the agent and the target cells, are completely unknown to date. Far from having gained clear insight into the clinical significance of the haemoplasmas, our knowledge about the physiology, genetics, and host-pathogen interaction of this novel group of bacteria within the Mollicutes order is rather limited. This can be explained primarily by the unculturability of these bacteria. The enormous advances in molecular biology witnessed in recent years have had a major impact on several areas of biological sciences, i.e. the fields of modern medical bacteriology and infectious diseases. This review describes progress made in research of the pathobiology of M. suis these past few years.
Collapse
Affiliation(s)
- L E Hoelzle
- Institute of Veterinary Bacteriology, University of Zurich, Winterthurerstrasse 270, Zurich, Switzerland.
| |
Collapse
|
8
|
Abstract
Heart diseases resulting in heart failure are among the leading causes of morbidity and mortality in developed countries. Underlying molecular causes of cardiac dysfunction in most heart diseases are still largely unknown but are expected to result from causal alterations in gene and protein expression. Proteomic technology now allows us to examine global alterations in protein expression in the diseased heart and can provide new insights into cellular mechanisms involved in cardiac dysfunction. The majority of proteomic investigations still use 2D gel electrophoresis (2-DE) with immobilized pH gradients to separate the proteins in a sample and combine this with mass spectrometry (MS) technologies to identify proteins. In spite of the development of novel gel-free technologies, 2-DE remains the only technique that can be routinely applied to parallel quantitative expression profiling of large sets of complex protein mixtures such as whole cell lysates. It can resolve >5000 proteins simultaneously (approximately 2000 proteins routinely) and can detect <1 ng of protein per spot. Furthermore, 2-DE delivers a map of intact proteins, which reflects changes in protein expression level, isoforms, or post-translational modifications. The use of proteomics to investigate heart disease should result in the generation of new diagnostic and therapeutic markers. In this article, we review the current status of proteomic technologies, describing the 2-DE proteomics workflow, with an overview of protein identification by MS and how these technologies are being applied to studies of human heart disease.
Collapse
|
9
|
Yang D, Ramkissoon K, Hamlett E, Giddings MC. High-accuracy peptide mass fingerprinting using peak intensity data with machine learning. J Proteome Res 2007; 7:62-9. [PMID: 17914788 DOI: 10.1021/pr070088g] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
For MALDI-TOF mass spectrometry, we show that the intensity of a peptide-ion peak is directly correlated with its sequence, with the residues M, H, P, R, and L having the most substantial effect on ionization. We developed a machine learning approach that exploits this relationship to significantly improve peptide mass fingerprint (PMF) accuracy based on training data sets from both true-positive and false-positive PMF searches. The model's cross-validated accuracy in distinguishing real versus false-positive database search results is 91%, rivaling the accuracy of MS/MS-based protein identification.
Collapse
Affiliation(s)
- Dongmei Yang
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | | | | | | |
Collapse
|
10
|
Sriyam S, Sinchaikul S, Tantipaiboonwong P, Tzao C, Phutrakul S, Chen ST. Enhanced detectability in proteome studies. J Chromatogr B Analyt Technol Biomed Life Sci 2006; 849:91-104. [PMID: 17140866 DOI: 10.1016/j.jchromb.2006.10.065] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2006] [Revised: 10/11/2006] [Accepted: 10/27/2006] [Indexed: 11/30/2022]
Abstract
The discovery of candidate biomarkers from biological materials coupled with the development of detection methods holds both incredible clinical potential as well as significant challenges. However, the proteomic techniques still provide the low dynamic range of protein detection at lower abundances. This review describes the current development of potential methods to enhance the detection and quantification in proteome studies. It also includes the bioinformatics tools that are helpfully used for data mining of protein ontology. Therefore, we believe that this review provided many proteomic approaches, which would be very potent and useful for proteome studies and for further diagnostic and therapeutic applications.
Collapse
Affiliation(s)
- Supawadee Sriyam
- Institute of Biological Chemistry and Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | | | | | | | | | | |
Collapse
|
11
|
Biron DG, Brun C, Lefevre T, Lebarbenchon C, Loxdale HD, Chevenet F, Brizard JP, Thomas F. The pitfalls of proteomics experiments without the correct use of bioinformatics tools. Proteomics 2006; 6:5577-96. [PMID: 16991202 DOI: 10.1002/pmic.200600223] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The elucidation of the entire genomic sequence of various organisms, from viruses to complex metazoans, most recently man, is undoubtedly the greatest triumph of molecular biology since the discovery of the DNA double helix. Over the past two decades, the focus of molecular biology has gradually moved from genomes to proteomes, the intention being to discover the functions of the genes themselves. The postgenomic era stimulated the development of new techniques (e.g. 2-DE and MS) and bioinformatics tools to identify the functions, reactions, interactions and location of the gene products in tissues and/or cells of living organisms. Both 2-DE and MS have been very successfully employed to identify proteins involved in biological phenomena (e.g. immunity, cancer, host-parasite interactions, etc.), although recently, several papers have emphasised the pitfalls of 2-DE experiments, especially in relation to experimental design, poor statistical treatment and the high rate of 'false positive' results with regard to protein identification. In the light of these perceived problems, we review the advantages and misuses of bioinformatics tools - from realisation of 2-DE gels to the identification of candidate protein spots - and suggest some useful avenues to improve the quality of 2-DE experiments. In addition, we present key steps which, in our view, need to be to taken into consideration during such analyses. Lastly, we present novel biological entities named 'interactomes', and the bioinformatics tools developed to analyse the large protein-protein interaction networks they form, along with several new perspectives of the field.
Collapse
Affiliation(s)
- David G Biron
- GEMI, UMR CNRS/IRD 2724, Centre IRD, Montpellier, France.
| | | | | | | | | | | | | | | |
Collapse
|
12
|
Hernandez P, Müller M, Appel RD. Automated protein identification by tandem mass spectrometry: issues and strategies. MASS SPECTROMETRY REVIEWS 2006; 25:235-54. [PMID: 16284939 DOI: 10.1002/mas.20068] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Protein identification by tandem mass spectrometry (MS/MS) is key to most proteomics projects and has been widely explored in bioinformatics research. Obtaining good and trustful identification results has important implications for biological and clinical work. Although well matured, automated software identification of proteins from MS/MS data still faces a number of obstacles due to the complexity of the proteome or procedural issues of mass spectrometry data acquisition. Expected or unexpected modifications of the peptide sequences, polymorphisms, errors in databases, missed or non-specific cleavages, unusual fragmentation patterns, and single MS/MS spectra of multiple peptides of the same m/z are so many pitfalls for identification algorithms. A lot of research work has been carried out in recent years that yielded new strategies to handle a number of these issues. Multiple MS/MS identification algorithms are now available or have been theoretically described. The difficulty resides in choosing the most adapted method for each type of spectra being identified. This review presents an overview of the state-of-the-art bioinformatics approaches to the identification of proteins by MS/MS to help the reader doing the spade work of finding the right tools among the many possibilities offered.
Collapse
|
13
|
van der Merwe DE, Oikonomopoulou K, Marshall J, Diamandis EP. Mass Spectrometry: Uncovering the Cancer Proteome for Diagnostics. Adv Cancer Res 2006; 96:23-50. [PMID: 17161675 DOI: 10.1016/s0065-230x(06)96002-3] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Despite impressive scientific achievements over the past few decades, cancer is still a leading cause of death. One of the major reasons is that most cancer patients are diagnosed with advanced disease. This is clearly illustrated with ovarian cancer in which the overall 5-year survival rates are only 20-30%. Conversely, when ovarian cancer is detected early (stage 1), the 5-year survival rate increases to 95%. Biomarkers, as tools for preclinical detection of cancer, have the potential to revolutionize the field of clinical diagnostics. The emerging field of clinical proteomics has found applications across a wide spectrum of cancer research. This chapter will focus on mass spectrometry as a proteomic technology implemented in three areas of cancer: diagnostics, tissue imaging, and biomarker discovery. Despite its power, it is also important to realize the preanalytical, analytical, and postanalytical limitations currently associated with this methodology. The ultimate endpoint of clinical proteomics is individualized therapy. It is essential that research groups, the industry, and physicians collaborate to conduct large prospective, multicenter clinical trials to validate and standardize this technology, for it to have real clinical impact.
Collapse
Affiliation(s)
- Da-Elene van der Merwe
- Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, Ontario M5G1X5, Canada
| | | | | | | |
Collapse
|
14
|
Caporale C, Bertini L, Pucci P, Buonocore V, Caruso C. CysMap and CysJoin: Database and tools for protein disulphides localisation. FEBS Lett 2005; 579:3048-54. [PMID: 15896787 DOI: 10.1016/j.febslet.2005.04.061] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2005] [Revised: 03/30/2005] [Accepted: 04/20/2005] [Indexed: 10/25/2022]
Abstract
We have developed a computer program able to make user-customised databases derived from the public PIR non-redundant reference protein database. When the database of interest has been created, the user will generate the map of all the possible linear peptides containing one and two cysteines for each protein and combine them to calculate the mass of all the possible clusters of linear peptides linked by a disulphide bridge with a cysteine pair. It is also possible to create selected maps corresponding to peptides formed by the action of specific proteases. In this way, mass spectrometric data obtained from the hydrolysis of proteins of unknown sequence can be related to that contained in the database for quick disulphide assignment and protein identification. To confirm signal attribution, the program will also furnish the expected mass of cluster peptides after performing a cycle of Edman degradation. The utility of the program is discussed and examples of application are given.
Collapse
Affiliation(s)
- Carlo Caporale
- Dipartimento di Agrobiologia ed Agrochimica, Universita della Tuscia, via S. Camillo de Lellis, 01100 Viterbo, Italy.
| | | | | | | | | |
Collapse
|
15
|
Rögnvaldsson T, Häkkinen J, Lindberg C, Marko-Varga G, Potthast F, Samuelsson J. Improving automatic peptide mass fingerprint protein identification by combining many peak sets. J Chromatogr B Analyt Technol Biomed Life Sci 2005; 807:209-15. [PMID: 15203031 DOI: 10.1016/j.jchromb.2004.04.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2003] [Revised: 04/08/2004] [Accepted: 04/09/2004] [Indexed: 11/24/2022]
Abstract
An automated peak picking strategy is presented where several peak sets with different signal-to-noise levels are combined to form a more reliable statement on the protein identity. The strategy is compared against both manual peak picking and industry standard automated peak picking on a set of mass spectra obtained after tryptic in gel digestion of 2D-gel samples from human fetal fibroblasts. The set of spectra contain samples ranging from strong to weak spectra, and the proposed multiple-scale method is shown to be much better on weak spectra than the industry standard method and a human operator, and equal in performance to these on strong and medium strong spectra. It is also demonstrated that peak sets selected by a human operator display a considerable variability and that it is impossible to speak of a single "true" peak set for a given spectrum. The described multiple-scale strategy both avoids time-consuming parameter tuning and exceeds the human operator in protein identification efficiency. The strategy therefore promises reliable automated user-independent protein identification using peptide mass fingerprints.
Collapse
Affiliation(s)
- Thorsteinn Rögnvaldsson
- School of Information Science, Computer and Electrical Engineering, Halmstad University, Box 823, SE-301 18 Halmstad, Sweden.
| | | | | | | | | | | |
Collapse
|
16
|
Dudkiewicz M, Mackiewicz P, Mackiewicz D, Kowalczuk M, Nowicka A, Polak N, Smolarczyk K, Banaszak J, Dudek MR, Cebrat S. Higher mutation rate helps to rescue genes from the elimination by selection. Biosystems 2004; 80:193-9. [PMID: 15823418 DOI: 10.1016/j.biosystems.2004.11.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2004] [Revised: 06/17/2004] [Accepted: 11/23/2004] [Indexed: 11/26/2022]
Abstract
Directional mutation pressure associated with replication processes is the main cause of the asymmetry between the leading and lagging DNA strands in bacterial genomes. On the other hand, the asymmetry between sense and antisense strands of protein coding sequences is a result of both mutation and selection pressures. Thus, there are two different ways of superposition of the sense strand, on the leading or lagging strand. Besides many other implications of these two possible situations, one seems to be very important - because of the asymmetric replication-associated mutation pressure, the mutation rate of genes depends on their location. Using Monte Carlo methods, we have simulated, under experimentally determined directional mutation pressure, the divergence rate and the elimination rate of genes depending on their location in respect to the leading/lagging DNA strands in the asymmetric prokaryotic genome. We have found that the best survival strategy for the majority of genes is to sometimes switch between DNA strands. Paradoxically, this strategy results in higher substitution rates but remains in agreement with observations in bacterial genomes that such inversions are very frequent and divergence rate between homologs lying on different DNA strands is very high.
Collapse
Affiliation(s)
- Malgorzata Dudkiewicz
- Institute of Genetics and Microbiology, University of Wrocław, ul. Przybyszewskiego, Wrocław, Poland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Russell SA, Old W, Resing KA, Hunter L. Proteomic informatics. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2004; 61:127-57. [PMID: 15482814 DOI: 10.1016/s0074-7742(04)61006-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Affiliation(s)
- Steven A Russell
- Center for Computational Pharmacology, University of Colorado Health Sciences Center, Aurora, CO 80045, USA
| | | | | | | |
Collapse
|
18
|
Schneider M, Tognolli M, Bairoch A. The Swiss-Prot protein knowledgebase and ExPASy: providing the plant community with high quality proteomic data and tools. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2004; 42:1013-21. [PMID: 15707838 DOI: 10.1016/j.plaphy.2004.10.009] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2004] [Accepted: 10/01/2004] [Indexed: 05/01/2023]
Abstract
The Swiss-Prot protein knowledgebase provides manually annotated entries for all species, but concentrates on the annotation of entries from model organisms to ensure the presence of high quality annotation of representative members of all protein families. A specific Plant Protein Annotation Program (PPAP) was started to cope with the increasing amount of data produced by the complete sequencing of plant genomes. Its main goal is the annotation of proteins from the model plant organism Arabidopsis thaliana. In addition to bibliographic references, experimental results, computed features and sometimes even contradictory conclusions, direct links to specialized databases connect amino acid sequences with the current knowledge in plant sciences. As protein families and groups of plant-specific proteins are regularly reviewed to keep up with current scientific findings, we hope that the wealth of information of Arabidopsis origin accumulated in our knowledgebase, and the numerous software tools provided on the Expert Protein Analysis System (ExPASy) web site might help to identify and reveal the function of proteins originating from other plants. Recently, a single, centralized, authoritative resource for protein sequences and functional information, UniProt, was created by joining the information contained in Swiss-Prot, Translation of the EMBL nucleotide sequence (TrEMBL), and the Protein Information Resource-Protein Sequence Database (PIR-PSD). A rising problem is that an increasing number of nucleotide sequences are not being submitted to the public databases, and thus the proteins inferred from such sequences will have difficulties finding their way to the Swiss-Prot or TrEMBL databases.
Collapse
Affiliation(s)
- Michel Schneider
- Swiss Institute of Bioinformatics, CMU, 1, Rue Michel Servet, 1211 Geneva-4, Switzerland.
| | | | | |
Collapse
|
19
|
Abstract
Proteomics is a multifaceted approach to study various aspects of protein expression, post-translational modification, interactions, organization and function at a global level. While DNA constitutes the 'information archive of the genome', it is the proteins that actually serve as the functional effectors of cellular processes. Thus, analysis of protein derangements on a proteome-wide scale will reveal insights into deregulated pathways and networks involved in the pathogenesis of disease. Although the field of proteomics has advanced tremendously in recent years, there are significant technical challenges that pose limitations to the routine application of mass spectrometry to clinical research. Despite these challenges, proteomic studies have yielded unparalleled information and understanding of the cellular biology of diseased states. The application of mass spectrometry to the study of diseases will ultimately lead to identification of biomarkers that are critical for the detection, diagnosis, prognosis and treatment of specific disease entities.
Collapse
Affiliation(s)
- Megan S Lim
- Department of Pathology, University of Utah Health Sciences Center, Salt Lake City, UT 84132, USA.
| | | |
Collapse
|
20
|
Affiliation(s)
- Joseph A Loo
- Departments of Biochemistry and Biological Chemistry, Molecular Biology Institute, University of California, Los Angeles, California 90095, USA
| |
Collapse
|
21
|
Creasey EA, Delahay RM, Daniell SJ, Frankel G. Yeast two-hybrid system survey of interactions between LEE-encoded proteins of enteropathogenic Escherichia coli. MICROBIOLOGY (READING, ENGLAND) 2003; 149:2093-2106. [PMID: 12904549 DOI: 10.1099/mic.0.26355-0] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Many Gram-negative pathogens employ a specific secretion pathway, termed type III secretion, to deliver virulence effector proteins directly to the membranes and cytosol of host eukaryotic cells. Subsequent functions of many effector proteins delivered in this manner result in subversion of host-signalling pathways to facilitate bacterial entry, survival and dissemination to neighbouring cells and tissues. Whereas the secreted components of type III secretion systems (TTSSs) from different pathogens are structurally and functionally diverse, the structural components and the secretion apparatus itself are largely conserved. TTSSs are large macromolecular assemblies built through interactions between protein components of hundreds of individual subunits. The goal of this project was to screen, using the standard yeast two-hybrid system, pair-wise interactions between components of the enteropathogenic Escherichia coli TTSS. To this end 37 of the 41 genes encoded by the LEE pathogenicity island were cloned into both yeast two-hybrid system vectors and all possible permutations of interacting protein pairs were screened for. This paper reports the identification of 22 novel interactions, including interactions between inner-membrane structural TTSS proteins; between the type III secreted translocator protein EspD and structural TTSS proteins; between established and putative chaperones and their cognate secreted proteins; and between proteins of undefined function.
Collapse
Affiliation(s)
- Elizabeth A Creasey
- Centre for Molecular Microbiology and Infection, Department of Biological Sciences, Flowers Building, Imperial College, London SW7 2AZ, UK
| | - Robin M Delahay
- Centre for Molecular Microbiology and Infection, Department of Biological Sciences, Flowers Building, Imperial College, London SW7 2AZ, UK
| | - Sarah J Daniell
- Centre for Molecular Microbiology and Infection, Department of Biological Sciences, Flowers Building, Imperial College, London SW7 2AZ, UK
| | - Gad Frankel
- Centre for Molecular Microbiology and Infection, Department of Biological Sciences, Flowers Building, Imperial College, London SW7 2AZ, UK
| |
Collapse
|
22
|
Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A. ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res 2003; 31:3784-8. [PMID: 12824418 PMCID: PMC168970 DOI: 10.1093/nar/gkg563] [Citation(s) in RCA: 3326] [Impact Index Per Article: 158.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The ExPASy (the Expert Protein Analysis System) World Wide Web server (http://www.expasy.org), is provided as a service to the life science community by a multidisciplinary team at the Swiss Institute of Bioinformatics (SIB). It provides access to a variety of databases and analytical tools dedicated to proteins and proteomics. ExPASy databases include SWISS-PROT and TrEMBL, SWISS-2DPAGE, PROSITE, ENZYME and the SWISS-MODEL repository. Analysis tools are available for specific tasks relevant to proteomics, similarity searches, pattern and profile searches, post-translational modification prediction, topology prediction, primary, secondary and tertiary structure analysis and sequence alignment. These databases and tools are tightly interlinked: a special emphasis is placed on integration of database entries with related resources developed at the SIB and elsewhere, and the proteomics tools have been designed to read the annotations in SWISS-PROT in order to enhance their predictions. ExPASy started to operate in 1993, as the first WWW server in the field of life sciences. In addition to the main site in Switzerland, seven mirror sites in different continents currently serve the user community.
Collapse
Affiliation(s)
- Elisabeth Gasteiger
- Swiss Institute of Bioinformatics, Centre Médical Universitaire, 1 Rue Michel Servet, 1211 Geneva 4, Switzerland.
| | | | | | | | | | | |
Collapse
|
23
|
Abstract
Matrix-assisted laser desorption/ionization-time of flight mass spectrometry has become a valuable tool in proteomics. With the increasing acquisition rate of mass spectrometers, one of the major issues is the development of accurate, efficient and automatic peptide mass fingerprinting (PMF) identification tools. Current tools are mostly based on counting the number of experimental peptide masses matching with theoretical masses. Almost all of them use additional criteria such as isoelectric point, molecular weight, PTMs, taxonomy or enzymatic cleavage rules to enhance prediction performance. However, these identification tools seldom use peak intensities as parameter as there is currently no model predicting the intensities based on the physicochemical properties of peptides. In this work, we used standard datamining methods such as classification and regression methods to find correlations between peak intensities and the properties of the peptides composing a PMF spectrum. These methods were applied on a dataset comprising a series of PMF experiments involving 157 proteins. We found that the C4.5 method gave the more informative results for the classification task (prediction of the presence or absence of a peptide in a spectra) and M5' for the regression methods (prediction of the normalized intensity of a peptide peak). The C4.5 result correctly classified 88% of the theoretical peaks; whereas the M5' peak intensities had a correlation coefficient of 0.6743 with the experimental peak intensities. These methods enabled us to obtain decision and model trees that can be directly used for prediction and identification of PMF results. The work performed permitted to lay the foundations of a method to analyze factors influencing the peak intensity of PMF spectra. A simple extension of this analysis could lead to improve the accuracy of the results by using a larger dataset. Additional peptide characteristics or even PMF experimental parameters can also be taken into account in the datamining process to analyze their influence on the peak intensity. Furthermore, this datamining approach can certainly be extended to the tandem mass spectrometry domain or other mass spectrometry derived methods.
Collapse
Affiliation(s)
- Steven Gay
- Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | | | | |
Collapse
|
24
|
Lester PJ, Hubbard SJ. Comparative bioinformatic analysis of complete proteomes and protein parameters for cross-species identification in proteomics. Proteomics 2002; 2:1392-405. [PMID: 12422356 DOI: 10.1002/1615-9861(200210)2:10<1392::aid-prot1392>3.0.co;2-l] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Peptide mass fingerprinting (PMF) remains the most amenable technique for protein identification in proteomics, using mass spectrometry as the primary analytical technique coupled with bioinformatics. This relies on the presence of the amino acid sequence of the protein in the current databanks. Despite this, it is desirable to be able to use the technique for organisms whose genomes are not yet fully sequenced and apply cross-species protein identification. In this study, we have re-examined the feasibility of such approaches by considering the extent of protein similarity between genome sequences using a data set of 29 complete bacterial and two eukaryotic genomes. A range of protein and peptide features are considered, including protein isoelectric focussing point, protein mass, and amino acid conservation. The effectiveness of PMF approaches has then been tested with a series of computer simulations with varying peptide number and mass accuracy for several cross-species tests. The results show that PMF alone is unsuitable in general for divergent species jumps, or when protein similarity is less than 70% identity. Despite this, there exists a considerable enrichment above random of tryptic peptide conservation and PMF promises to remain useful when combined with other data than just peptide masses for cross-species protein identification.
Collapse
Affiliation(s)
- Patrick J Lester
- Department of Biomolecular Sciences, University of Manchester Institute of Science and Technology, Manchester, UK
| | | |
Collapse
|
25
|
Choi W, Song SW, Zhang W. Understanding cancer through proteomics. Technol Cancer Res Treat 2002; 1:221-30. [PMID: 12625780 DOI: 10.1177/153303460200100402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Proteomics is a rapidly expanding discipline that aims to gain a comprehensive understanding of the expressions, modification, interactions, and regulation of proteins in cells. New high-throughput technologies, such as protein chips and isotope-coded affinity tag peptide labeling, coupled with classic technologies such as two-dimensional gel electrophoresis and mass spectrometry, complement genomic technologies, providing cancer researchers with powerful tools for cancer diagnosis and prognosis and for the identification of targets for therapy.
Collapse
Affiliation(s)
- Woonyoung Choi
- Department of Pathology, The University of Texas, M. D. Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | | | | |
Collapse
|
26
|
Affiliation(s)
- R Aebersold
- Institute for Systems Biology, 4225 Roosevelt Way NE, Seattle, Washington 98105, USA.
| | | |
Collapse
|
27
|
Verrills NM, Harry JH, Walsh BJ, Hains PG, Robinson ES. Cross-matching marsupial proteins with eutherian mammal databases: proteome analysis of cells from UV-induced skin tumours of an opossum (Monodelphis domestica). Electrophoresis 2000; 21:3810-22. [PMID: 11271499 DOI: 10.1002/1522-2683(200011)21:17<3810::aid-elps3810>3.0.co;2-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The identification and characterisation of Monodelphis proteins has required cross-species analysis. Protein expression was investigated in normal, nonirradiated adult fibroblasts and also in fibroblastic cells from a benign cutaneous tumour after chronic ultraviolet (UVB) exposure and a metastatic cutaneous tumour after intermittent exposure. Proteins were separated and visualised by two-dimensional gel electrophoresis (2-D PAGE) and a peptide mass fingerprint (PMF) was obtained for protein spots using matrix assisted laser desorption/ionisation-time of flight-mass spectrometry (MALDITOF-MS). Cross-species PMF database analysis facilitated the identification of 120 proteins, constituting 46.5% of the proteins analysed. The identification of two proteins was confirmed by internal amino acid sequencing using tandem MS. Differential protein expression was observed between normal fibroblasts and those in tumours chronically or intermittently exposed. A number of tropomyosin and vimentin isoforms were expressed only in cells from the metastatic tumour induced by intermittent exposure to UV radiation. These results highlight the value of cross-species PMF analysis for the rapid characterisation of proteins from a poorly defined species and also show how proteomics can be used to detect changes in protein expression in differentially treated cells.
Collapse
Affiliation(s)
- N M Verrills
- Australian Proteome Analysis Facility, Macquarie University, Sydney.
| | | | | | | | | |
Collapse
|
28
|
Abstract
Proteomics offers a new set of tools for investigating parasites and parasite-associated disease. In this article, John Barrett, Jim Jefferies and Peter Brophy describe the key technologies involved, including two-dimensional gel electrophoresis, image analysis, biological mass spectroscopy and database searching. The potential applications of proteomics in drug and vaccine discovery are reviewed, as are possible future developments.
Collapse
Affiliation(s)
- J Barrett
- Institute of Biological Sciences, University of Wales, Aberystwyth, UK SY23 3DA.
| | | | | |
Collapse
|
29
|
Abstract
The interest in proteomics has recently increased dramatically and proteomic methods are now applied to many problems in cell biology. The method of choice in proteomics for identifying and characterizing proteins is mass spectrometry combined with database searching. Software tools have been improved to increase the sensitivity of protein identification and methods for evaluating the search results have been incorporated
Collapse
Affiliation(s)
- D Fenyö
- ProteoMetrics, LLC, New York, NY 10018, USA
| |
Collapse
|
30
|
|
31
|
Abstract
Mass spectrometry (MS) has become the technique of choice to identify proteins. This has been largely accomplished by the combination of high-resolution two-dimensional (2-D) gel separation with robotic sample preparation, automated MS measurement, data analysis, and database query. Developments during the last five years in MS associated with protein gel separation are reviewed.
Collapse
Affiliation(s)
- H W Lahm
- F. Hoffmann-LaRoche Ltd., Pharmaceutical Research, Roche Genetics, Basel, Switzerland.
| | | |
Collapse
|
32
|
Abstract
The field of proteomics is becoming increasingly important as genome sequences are being completed and annotated. Recent advances in proteomics include experimental and mathematical proofs of the need to complement microarray analysis with protein analysis, improved sensitivity for mass spectrometric analysis of separated proteins, better informatic tools for gel analysis and protein spot annotation, first steps towards automated experimental procedures, and new technology for quantitation of protein changes.
Collapse
Affiliation(s)
- M J Dutt
- School of Chemical Engineering, Cornell University, Ithaca, NY 14853-5201, USA
| | | |
Collapse
|
33
|
Abstract
The pathogenic mechanisms underlying cardiac dysfunction in heart disease are still largely unknown. It is likely, though, that significant alterations in myocardial gene and protein expression underlie these disease processes and determine their progression and outcome. Most molecular studies of cardiac dysfunction have been carried out on specific cellular systems. However, the application of the proteomic approach to the study of heart disease has made it possible to characterize global alterations in protein expression. This promises new insights into the cellular mechanisms involved in cardiac dysfunction and is likely to result in the discovery of novel diagnostic markers and new therapeutic opportunities.
Collapse
|
34
|
Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999. [DOI: 10.1002/(sici)1522-2683(19991201)20:18%3c3551::aid-elps3551%3e3.0.co;2-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
35
|
Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999; 20:3551-67. [PMID: 10612281 DOI: 10.1002/(sici)1522-2683(19991201)20:18<3551::aid-elps3551>3.0.co;2-2] [Citation(s) in RCA: 6096] [Impact Index Per Article: 243.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Several algorithms have been described in the literature for protein identification by searching a sequence database using mass spectrometry data. In some approaches, the experimental data are peptide molecular weights from the digestion of a protein by an enzyme. Other approaches use tandem mass spectrometry (MS/MS) data from one or more peptides. Still others combine mass data with amino acid sequence data. We present results from a new computer program, Mascot, which integrates all three types of search. The scoring algorithm is probability based, which has a number of advantages: (i) A simple rule can be used to judge whether a result is significant or not. This is particularly useful in guarding against false positives. (ii) Scores can be compared with those from other types of search, such as sequence homology. (iii) Search parameters can be readily optimised by iteration. The strengths and limitations of probability-based scoring are discussed, particularly in the context of high throughput, fully automated protein identification.
Collapse
|