401
|
Dönnes P, Höglund A. Predicting protein subcellular localization: past, present, and future. GENOMICS PROTEOMICS & BIOINFORMATICS 2005; 2:209-15. [PMID: 15901249 PMCID: PMC5187447 DOI: 10.1016/s1672-0229(04)02027-3] [Citation(s) in RCA: 78] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Functional characterization of every single protein is a major challenge of the post-genomic era. The large-scale analysis of a cell’s proteins, proteomics, seeks to provide these proteins with reliable annotations regarding their interaction partners and functions in the cellular machinery. An important step on this way is to determine the subcellular localization of each protein. Eukaryotic cells are divided into subcellular compartments, or organelles. Transport across the membrane into the organelles is a highly regulated and complex cellular process. Predicting the subcellular localization by computational means has been an area of vivid activity during recent years. The publicly available prediction methods differ mainly in four aspects: the underlying biological motivation, the computational method used, localization coverage, and reliability, which are of importance to the user. This review provides a short description of the main events in the protein sorting process and an overview of the most commonly used methods in this field.
Collapse
|
402
|
Guda C, Subramaniam S. pTARGET [corrected] a new method for predicting protein subcellular localization in eukaryotes. Bioinformatics 2005; 21:3963-9. [PMID: 16144808 DOI: 10.1093/bioinformatics/bti650] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION There is a scarcity of efficient computational methods for predicting protein subcellular localization in eukaryotes. Currently available methods are inadequate for genome-scale predictions with several limitations. Here, we present a new prediction method, pTARGET that can predict proteins targeted to nine different subcellular locations in the eukaryotic animal species. RESULTS The nine subcellular locations predicted by pTARGET include cytoplasm, endoplasmic reticulum, extracellular/secretory, golgi, lysosomes, mitochondria, nucleus, plasma membrane and peroxisomes. Predictions are based on the location-specific protein functional domains and the amino acid compositional differences across different subcellular locations. Overall, this method can predict 68-87% of the true positives at accuracy rates of 96-99%. Comparison of the prediction performance against PSORT showed that pTARGET prediction rates are higher by 11-60% in 6 of the 8 locations tested. Besides, the pTARGET method is robust enough for genome-scale prediction of protein subcellular localizations since, it does not rely on the presence of signal or target peptides. AVAILABILITY A public web server based on the pTARGET method is accessible at the URL http://bioinformatics.albany.edu/~ptarget. Datasets used for developing pTARGET can be downloaded from this web server. Source code will be available on request from the corresponding author.
Collapse
Affiliation(s)
- Chittibabu Guda
- Gen*NY*sis Center for Excellence in Cancer Genomics, State University of New York, One Discovery Drive, Rensselaer, NY 12144-3456, USA.
| | | |
Collapse
|
403
|
Sharabiani MTA, Siermala M, Lehtinen TO, Vihinen M. Dynamic covariation between gene expression and proteome characteristics. BMC Bioinformatics 2005; 6:215. [PMID: 16131395 PMCID: PMC1236912 DOI: 10.1186/1471-2105-6-215] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2004] [Accepted: 08/30/2005] [Indexed: 02/07/2023] Open
Abstract
Background Cells react to changing intra- and extracellular signals by dynamically modulating complex biochemical networks. Cellular responses to extracellular signals lead to changes in gene and protein expression. Since the majority of genes encode proteins, we investigated possible correlations between protein parameters and gene expression patterns to identify proteome-wide characteristics indicative of trends common to expressed proteins. Results Numerous bioinformatics methods were used to filter and merge information regarding gene and protein annotations. A new statistical time point-oriented analysis was developed for the study of dynamic correlations in large time series data. The method was applied to investigate microarray datasets for different cell types, organisms and processes, including human B and T cell stimulation, Drosophila melanogaster life span, and Saccharomyces cerevisiae cell cycle. Conclusion We show that the properties of proteins synthesized correlate dynamically with the gene expression profile, indicating that not only is the actual identity and function of expressed proteins important for cellular responses but that several physicochemical and other protein properties correlate with gene expression as well. Gene expression correlates strongly with amino acid composition, composition- and sequence-derived variables, functional, structural, localization and gene ontology parameters. Thus, our results suggest that a dynamic relationship exists between proteome properties and gene expression in many biological systems, and therefore this relationship is fundamental to understanding cellular mechanisms in health and disease.
Collapse
Affiliation(s)
| | - Markku Siermala
- Institute of Medical Technology, FI-33014 University of Tampere, Finland
| | - Tommi O Lehtinen
- Institute of Medical Technology, FI-33014 University of Tampere, Finland
| | - Mauno Vihinen
- Institute of Medical Technology, FI-33014 University of Tampere, Finland
- Research Unit, Tampere University Hospital, FI-33520 Tampere, Finland
| |
Collapse
|
404
|
Kulkarni OC, Vigneshwar R, Jayaraman VK, Kulkarni BD. Identification of coding and non-coding sequences using local Holder exponent formalism. Bioinformatics 2005; 21:3818-23. [PMID: 16118261 DOI: 10.1093/bioinformatics/bti639] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Accurate prediction of genes in genomes has always been a challenging task for bioinformaticians and computational biologists. The discovery of existence of distinct scaling relations in coding and non-coding sequences has led to new perspectives in the understanding of the DNA sequences. This has motivated us to exploit the differences in the local singularity distributions for characterization and classification of coding and non-coding sequences. RESULTS The local singularity density distribution in the coding and non-coding sequences of four genomes was first estimated using the wavelet transform modulus maxima methodology. Support vector machines classifier was then trained with the extracted features. The trained classifier is able to provide an average test accuracy of 97.7%. The local singularity features in a DNA sequence can be exploited for successful identification of coding and non-coding sequences. CONTACT Available on request from bd.kulkarni@ncl.res.in.
Collapse
|
405
|
Bhasin M, Raghava GPS. GPCRsclass: a web tool for the classification of amine type of G-protein-coupled receptors. Nucleic Acids Res 2005; 33:W143-7. [PMID: 15980444 PMCID: PMC1160112 DOI: 10.1093/nar/gki351] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
The receptors of amine subfamily are specifically major drug targets for therapy of nervous disorders and psychiatric diseases. The recognition of novel amine type of receptors and their cognate ligands is of paramount interest for pharmaceutical companies. In the past, Chou and co-workers have shown that different types of amine receptors are correlated with their amino acid composition and are predictable on its basis with considerable accuracy [Elrod and Chou (2002) Protein Eng., 15, 713–715]. This motivated us to develop a better method for the recognition of novel amine receptors and for their further classification. The method was developed on the basis of amino acid composition and dipeptide composition of proteins using support vector machine. The method was trained and tested on 167 proteins of amine subfamily of G-protein-coupled receptors (GPCRs). The method discriminated amine subfamily of GPCRs from globular proteins with Matthew's correlation coefficient of 0.98 and 0.99 using amino acid composition and dipeptide composition, respectively. In classifying different types of amine receptors using amino acid composition and dipeptide composition, the method achieved an accuracy of 89.8 and 96.4%, respectively. The performance of the method was evaluated using 5-fold cross-validation. The dipeptide composition based method predicted 67.6% of protein sequences with an accuracy of 100% with a reliability index ≥5. A web server GPCRsclass has been developed for predicting amine-binding receptors from its amino acid sequence [ and (mirror site)].
Collapse
MESH Headings
- Artificial Intelligence
- Dipeptides/chemistry
- Internet
- Receptors, Adrenergic/chemistry
- Receptors, Adrenergic/classification
- Receptors, Biogenic Amine/chemistry
- Receptors, Biogenic Amine/classification
- Receptors, Cholinergic/chemistry
- Receptors, Cholinergic/classification
- Receptors, Dopamine/chemistry
- Receptors, Dopamine/classification
- Receptors, G-Protein-Coupled/chemistry
- Receptors, G-Protein-Coupled/classification
- Receptors, Serotonin/chemistry
- Receptors, Serotonin/classification
- Sequence Analysis, Protein
- Software
Collapse
Affiliation(s)
| | - G. P. S. Raghava
- To whom the correspondence should be addressed. Tel: +91 172 2690557/2695225; Fax: +91 172 2690632/2690585;
| |
Collapse
|
406
|
Xie D, Li A, Wang M, Fan Z, Feng H. LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Res 2005; 33:W105-10. [PMID: 15980436 PMCID: PMC1160120 DOI: 10.1093/nar/gki359] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Subcellular location of a protein is one of the key functional characters as proteins must be localized correctly at the subcellular level to have normal biological function. In this paper, a novel method named LOCSVMPSI has been introduced, which is based on the support vector machine (SVM) and the position-specific scoring matrix generated from profiles of PSI-BLAST. With a jackknife test on the RH2427 data set, LOCSVMPSI achieved a high overall prediction accuracy of 90.2%, which is higher than the prediction results by SubLoc and ESLpred on this data set. In addition, prediction performance of LOCSVMPSI was evaluated with 5-fold cross validation test on the PK7579 data set and the prediction results were consistently better than the previous method based on several SVMs using composition of both amino acids and amino acid pairs. Further test on the SWISSPROT new-unique data set showed that LOCSVMPSI also performed better than some widely used prediction methods, such as PSORTII, TargetP and LOCnet. All these results indicate that LOCSVMPSI is a powerful tool for the prediction of eukaryotic protein subcellular localization. An online web server (current version is 1.3) based on this method has been developed and is freely available to both academic and commercial users, which can be accessed by at .
Collapse
Affiliation(s)
| | | | | | - Zhewen Fan
- Department of Biomedical Engineering, City University of New YorkNY, USA
| | - Huanqing Feng
- To whom correspondence should be addressed. Tel: +86 551 3601800; Fax: +86 551 3601522;
| |
Collapse
|
407
|
Stochastic molecular descriptors for polymers. 3. Markov electrostatic moments as polymer 2D-folding descriptors: RNA–QSAR for mycobacterial promoters. POLYMER 2005. [DOI: 10.1016/j.polymer.2005.04.104] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
408
|
Cai YD, Chou KC. Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. J Theor Biol 2005; 238:395-400. [PMID: 16040052 DOI: 10.1016/j.jtbi.2005.05.035] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2005] [Revised: 05/25/2005] [Accepted: 05/26/2005] [Indexed: 10/25/2022]
Abstract
Given the sequence of a protein, how can we predict whether it is a membrane protein or non-membrane protein? If it is, what membrane protein type it belongs to? Since these questions are closely relevant to the function of an uncharacterized protein, their importance is self-evident. Particularly, with the explosion of protein sequences entering into databanks and the relatively much slower progress in using biochemical experiments to determine their functions, it is highly desired to develop an automated method that can be used to give a fast answers to these questions. By hybridizing the functional domain (FunD) and pseudo-amino acid composition (PseAA), a new strategy called FunD-PseAA predictor was introduced. To test the power of the predictor, a highly non-homologous data set was constructed where none of proteins has 25% sequence identity to any other. The overall success rates obtained with the FunD-PseAA predictor on such a data set by the jackknife cross-validation test was 85% for the case in identifying membrane protein and non-membrane protein, and 91% in identifying the membrane protein type among the following 5 categories: (1) type-1 membrane protein, (2) type-2 membrane protein, (3) multipass transmembrane protein, (4) lipid chain-anchored membrane protein, and (5) GPI-anchored membrane protein. These rates are much higher than those obtained by the other methods on the same stringent data set, indicating that the FunD-PseAA predictor may become a useful high throughput tool in bioinformatics and proteomics.
Collapse
Affiliation(s)
- Yu-Dong Cai
- Biomolecular Sciences Department, University of Manchester Institute of Science & Technology, P.O. Box 88, Manchester, M60 1QD, UK.
| | | |
Collapse
|
409
|
González-Díaz H, Molina R, Uriarte E. Recognition of stable protein mutants with 3D stochastic average electrostatic potentials. FEBS Lett 2005; 579:4297-301. [PMID: 16081074 DOI: 10.1016/j.febslet.2005.06.065] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2004] [Revised: 06/07/2005] [Accepted: 06/23/2005] [Indexed: 11/15/2022]
Abstract
As more and more proteins are applied to biochemical research there is increasing interest in studying their stability. In this study, a Markov model has been used to calculate molecular descriptors of the protein structure and these are called the average electrostatic potentials (xi(k)). These descriptors were intended to encode indirect electrostatic pair-wise interactions between amino acids located at Euclidean distance k within a given 3D protein backbone. The different xi(k) values could be calculated for the protein as a whole or for specific protein regions (orbits), which include amino acids that lie within a given range of distances from the center of charge of the protein. In this work we calculated the xi(k) values for 657 mutants of different proteins. A Linear Discriminant Analysis model correctly classified a subset of 435 out of 493 proteins according to their thermal stability - a level of predictability of 88.2%. This experiment was repeated with three additional subsets of proteins selected at random from the initial series of 657. More specifically, the model predicted 314/356 (88.2%) of mutants with higher stability than the corresponding wild-type protein and 264/301 (86.7%) of proteins with near wild-type stability. These results illustrate the possibilities for the average stochastic potentials xi(k) in the study of 3D-structure/property relationships for biochemically relevant proteins.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela 15782, Spain.
| | | | | |
Collapse
|
410
|
Wang J, Sung WK, Krishnan A, Li KB. Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines. BMC Bioinformatics 2005; 6:174. [PMID: 16011808 PMCID: PMC1190155 DOI: 10.1186/1471-2105-6-174] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2005] [Accepted: 07/13/2005] [Indexed: 11/10/2022] Open
Abstract
Background Predicting the subcellular localization of proteins is important for determining the function of proteins. Previous works focused on predicting protein localization in Gram-negative bacteria obtained good results. However, these methods had relatively low accuracies for the localization of extracellular proteins. This paper studies ways to improve the accuracy for predicting extracellular localization in Gram-negative bacteria. Results We have developed a system for predicting the subcellular localization of proteins for Gram-negative bacteria based on amino acid subalphabets and a combination of multiple support vector machines. The recall of the extracellular site and overall recall of our predictor reach 86.0% and 89.8%, respectively, in 5-fold cross-validation. To the best of our knowledge, these are the most accurate results for predicting subcellular localization in Gram-negative bacteria. Conclusion Clustering 20 amino acids into a few groups by the proposed greedy algorithm provides a new way to extract features from protein sequences to cover more adjacent amino acids and hence reduce the dimensionality of the input vector of protein features. It was observed that a good amino acid grouping leads to an increase in prediction performance. Furthermore, a proper choice of a subset of complementary support vector machines constructed by different features of proteins maximizes the prediction accuracy.
Collapse
Affiliation(s)
- Jiren Wang
- Bioinformatics Institute, 30 Biopolis Street, #07-01 Matrix, Singapore 138671
| | - Wing-Kin Sung
- Department of Computer Science, National University of Singapore, 3 Science Drive 2, Singapore 117543
| | - Arun Krishnan
- Bioinformatics Institute, 30 Biopolis Street, #07-01 Matrix, Singapore 138671
| | - Kuo-Bin Li
- Bioinformatics Institute, 30 Biopolis Street, #07-01 Matrix, Singapore 138671
| |
Collapse
|
411
|
van Diepen MT, Spencer GE, van Minnen J, Gouwenberg Y, Bouwman J, Smit AB, van Kesteren RE. The molluscan RING-finger protein L-TRIM is essential for neuronal outgrowth. Mol Cell Neurosci 2005; 29:74-81. [PMID: 15866048 DOI: 10.1016/j.mcn.2005.01.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2004] [Accepted: 01/17/2005] [Indexed: 01/23/2023] Open
Abstract
The tripartite motif proteins TRIM-2 and TRIM-3 have been put forward as putative organizers of neuronal outgrowth and structural plasticity. Here, we identified a molluscan orthologue of TRIM-2/3, named L-TRIM, which is up-regulated during in vitro neurite outgrowth of central neurons. In adult animals, L-Trim mRNA is ubiquitously expressed at low levels in the central nervous system and in peripheral tissues. Central nervous system expression of L-Trim mRNA is increased during postnatal brain development and during in vitro and in vivo neuronal regeneration. In vitro double-stranded RNA knock-down of L-Trim mRNA resulted in a >70% inhibition of neurite outgrowth. Together, our data establish a crucial role for L-TRIM in developmental neurite outgrowth and functional neuronal regeneration and indicate that TRIM-2/3 family members may have evolutionary conserved functions in neuronal differentiation.
Collapse
Affiliation(s)
- M T van Diepen
- Department of Molecular and Cellular Neurobiology, Faculty of Earth and Life Sciences, Institute of Neuroscience, Vrije Universiteit, De Boelelaan 1085, 1081HV Amsterdam, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
412
|
Huang N, Chen H, Sun Z. CTKPred: an SVM-based method for the prediction and classification of the cytokine superfamily. Protein Eng Des Sel 2005; 18:365-8. [PMID: 15980017 DOI: 10.1093/protein/gzi041] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Cell proliferation, differentiation and death are controlled by a multitude of cell-cell signals and loss of this control has devastating consequences. Prominent among these regulatory signals is the cytokine superfamily, which has crucial functions in the development, differentiation and regulation of immune cells. In this study, a support vector machine (SVM)-based method was developed for predicting families and subfamilies of cytokines using dipeptide composition. The taxonomy of the cytokine superfamily with which our method complies was described in the Cytokine Family cDNA Database (dbCFC) and the dataset used in this study for training and testing was obtained from the dbCFC and Structural Classification of Proteins (SCOP). The method classified cytokines and non-cytokines with an accuracy of 92.5% by 7-fold cross-validation. The method is further able to predict seven major classes of cytokine with an overall accuracy of 94.7%. A server for recognition and classification of cytokines based on multi-class SVMs has been set up at http://bioinfo.tsinghua.edu.cn/~huangni/CTKPred/.
Collapse
Affiliation(s)
- Ni Huang
- Institute of Bioinformatics and System Biology, MOE Key Laboratory of Bioinfomatics, State Key Laboratory of Biomembrane and Membrane Biotechnology, Department of Biological Science and Biotechnology, Tsinghua University, Beijing 100084, China
| | | | | |
Collapse
|
413
|
Sarda D, Chua GH, Li KB, Krishnan A. pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties. BMC Bioinformatics 2005; 6:152. [PMID: 15963230 PMCID: PMC1182350 DOI: 10.1186/1471-2105-6-152] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2004] [Accepted: 06/17/2005] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Protein subcellular localization is an important determinant of protein function and hence, reliable methods for prediction of localization are needed. A number of prediction algorithms have been developed based on amino acid compositions or on the N-terminal characteristics (signal peptides) of proteins. However, such approaches lead to a loss of contextual information. Moreover, where information about the physicochemical properties of amino acids has been used, the methods employed to exploit that information are less than optimal and could use the information more effectively. RESULTS In this paper, we propose a new algorithm called pSLIP which uses Support Vector Machines (SVMs) in conjunction with multiple physicochemical properties of amino acids to predict protein subcellular localization in eukaryotes across six different locations, namely, chloroplast, cytoplasmic, extracellular, mitochondrial, nuclear and plasma membrane. The algorithm was applied to the dataset provided by Park and Kanehisa and we obtained prediction accuracies for the different classes ranging from 87.7%-97.0% with an overall accuracy of 93.1%. CONCLUSION This study presents a physicochemical property based protein localization prediction algorithm. Unlike other algorithms, contextual information is preserved by dividing the protein sequences into clusters. The prediction accuracy shows an improvement over other algorithms based on various types of amino acid composition (single, pair and gapped pair). We have also implemented a web server to predict protein localization across the six classes (available at http://pslip.bii.a-star.edu.sg/).
Collapse
Affiliation(s)
- Deepak Sarda
- Bioinformatics Institute, 30, Biopolis Street, #07-01, Singapore – 138671
| | - Gek Huey Chua
- Bioinformatics Institute, 30, Biopolis Street, #07-01, Singapore – 138671
| | - Kuo-Bin Li
- Bioinformatics Institute, 30, Biopolis Street, #07-01, Singapore – 138671
| | - Arun Krishnan
- Bioinformatics Institute, 30, Biopolis Street, #07-01, Singapore – 138671
| |
Collapse
|
414
|
Saíz-Urra L, González-Díaz H, Uriarte E. Proteins Markovian 3D-QSAR with spherically-truncated average electrostatic potentials. Bioorg Med Chem 2005; 13:3641-7. [PMID: 15862992 DOI: 10.1016/j.bmc.2005.03.041] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2004] [Revised: 03/16/2005] [Accepted: 03/21/2005] [Indexed: 11/17/2022]
Abstract
Proteins 3D-QSAR is an emerging field of bioorganic chemistry. However, the large dimensions of the structures to be handled may become a bottleneck to scaling up classic QSAR problems for proteins. In this sense, truncation approach could be used as in molecular dynamic to perform timely calculations. The spherical truncation of electrostatic field with different functions breaks down long-range interactions at a given cutoff distance (r(off)) resulting in short-range ones. Consequently, a Markov chain model may approach to the average electrostatic potentials of spatial distribution of charges within the protein backbone. These average electrostatic potentials can be used to predict proteins properties. Herein, we explore the effect of abrupt, shifting, force shifting, and switching truncation functions on 3D-QSAR models classifying 26 proteins with different functions: lysozymes, dihydrofolate reductases, and alcohol dehydrogenases. Almost all methods have shown overall accuracies higher than 73%. The present result points to an acceptable robustness of the MC for different truncation schemes and r(off) values. The results of best accuracy 92% with abrupt truncation coincide with our recent communication. We also developed models with the same accuracy value for other truncation functions; however they are more complex functions. PCA analysis for 152 non-homologous proteins has shown that there are five main eigenvalues, which explain more than 87% of the variance of the studied properties. The present molecular descriptors may encode structural information not totally accounted for the previous ones, so success with these descriptors could be expected when classic fails. The present result confirms the utility of our Markov models combined with truncation approach to generate bioorganic structure protein molecular descriptors for QSAR.
Collapse
Affiliation(s)
- Liane Saíz-Urra
- Chemical Bioactives Center, Central University of Las Villas 54830, Cuba
| | | | | |
Collapse
|
415
|
Gao QB, Wang ZZ, Yan C, Du YH. Prediction of protein subcellular location using a combined feature of sequence. FEBS Lett 2005; 579:3444-8. [PMID: 15949806 DOI: 10.1016/j.febslet.2005.05.021] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2005] [Revised: 05/10/2005] [Accepted: 05/10/2005] [Indexed: 11/20/2022]
Abstract
To understand the structure and function of a protein, an important task is to know where it occurs in the cell. Thus, a computational method for properly predicting the subcellular location of proteins would be significant in interpreting the original data produced by the large-scale genome sequencing projects. The present work tries to explore an effective method for extracting features from protein primary sequence and find a novel measurement of similarity among proteins for classifying a protein to its proper subcellular location. We considered four locations in eukaryotic cells and three locations in prokaryotic cells, which have been investigated by several groups in the past. A combined feature of primary sequence defined as a 430D (dimensional) vector was utilized to represent a protein, including 20 amino acid compositions, 400 dipeptide compositions and 10 physicochemical properties. To evaluate the prediction performance of this encoding scheme, a jackknife test based on nearest neighbor algorithm was employed. The prediction accuracies for cytoplasmic, extracellular, mitochondrial, and nuclear proteins in the former dataset were 86.3%, 89.2%, 73.5% and 89.4%, respectively, and the total prediction accuracy reached 86.3%. As for the prediction accuracies of cytoplasmic, extracellular, and periplasmic proteins in the latter dataset, the prediction accuracies were 97.4%, 86.0%, and 79.7, respectively, and the total prediction accuracy of 92.5% was achieved. The results indicate that this method outperforms some existing approaches based on amino acid composition or amino acid composition and dipeptide composition.
Collapse
Affiliation(s)
- Qing-Bin Gao
- Institute of Automation, National University of Defense Technology, Changsha 410073, Peoples Republic of China.
| | | | | | | |
Collapse
|
416
|
Bodén M, Hawkins J. Prediction of subcellular localization using sequence-biased recurrent networks. Bioinformatics 2005; 21:2279-86. [PMID: 15746276 DOI: 10.1093/bioinformatics/bti372] [Citation(s) in RCA: 112] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Targeting peptides direct nascent proteins to their specific subcellular compartment. Knowledge of targeting signals enables informed drug design and reliable annotation of gene products. However, due to the low similarity of such sequences and the dynamical nature of the sorting process, the computational prediction of subcellular localization of proteins is challenging. RESULTS We contrast the use of feed forward models as employed by the popular TargetP/SignalP predictors with a sequence-biased recurrent network model. The models are evaluated in terms of performance at the residue level and at the sequence level, and demonstrate that recurrent networks improve the overall prediction performance. Compared to the original results reported for TargetP, an ensemble of the tested models increases the accuracy by 6 and 5% on non-plant and plant data, respectively. AVAILABILITY The Protein Prowler incorporating the recurrent network predictor described in this paper is available online at http://pprowler.imb.uq.edu.au/
Collapse
Affiliation(s)
- Mikael Bodén
- School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072, Australia.
| | | |
Collapse
|
417
|
Rey S, Acab M, Gardy JL, Laird MR, deFays K, Lambert C, Brinkman FSL. PSORTdb: a protein subcellular localization database for bacteria. Nucleic Acids Res 2005; 33:D164-8. [PMID: 15608169 PMCID: PMC539981 DOI: 10.1093/nar/gki027] [Citation(s) in RCA: 100] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Information about bacterial subcellular localization (SCL) is important for protein function prediction and identification of suitable drug/vaccine/diagnostic targets. PSORTdb (http://db.psort.org/) is a web-accessible database of SCL for bacteria that contains both information determined through laboratory experimentation and computational predictions. The dataset of experimentally verified information (∼2000 proteins) was manually curated by us and represents the largest dataset of its kind. Earlier versions have been used for training SCL predictors, and its incorporation now into this new PSORTdb resource, with its associated additional annotation information and dataset version control, should aid researchers in future development of improved SCL predictors. The second component of this database contains computational analyses of proteins deduced from the most recent NCBI dataset of completely sequenced genomes. Analyses are currently calculated using PSORTb, the most precise automated SCL predictor for bacterial proteins. Both datasets can be accessed through the web using a very flexible text search engine, a data browser, or using BLAST, and the entire database or search results may be downloaded in various formats. Features such as GO ontologies and multiple accession numbers are incorporated to facilitate integration with other bioinformatics resources. PSORTdb is freely available under GNU General Public License.
Collapse
Affiliation(s)
- Sébastien Rey
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada V5A 1S6
| | | | | | | | | | | | | |
Collapse
|
418
|
Chou KC, Cai YD. Using GO-PseAA predictor to identify membrane proteins and their types. Biochem Biophys Res Commun 2005; 327:845-7. [PMID: 15649422 DOI: 10.1016/j.bbrc.2004.12.069] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2004] [Indexed: 11/21/2022]
Abstract
Cell membranes are crucial to the life of a cell. Although the basic structure of biological membrane is provided by the lipid bilayer, most of the specific functions are carried out by membrane proteins. Knowledge of membrane protein type often offers important clues toward determining the function of an uncharacterized protein. Therefore, predicting the type of a membrane protein from its primary sequence, or even just identifying whether the uncharacterized protein belongs to a membrane protein or not, is an important and challenging problem in bioinformatics and proteomics. To deal with these problems, the GO-PseAA predictor is introduced that is operated in a hybridization space by combining the gene ontology and pseudo amino acid composition. Meanwhile, to test the prediction quality, a dataset was constructed that contains 6476 non-membrane proteins and 5122 membrane proteins classified into five different types. To avoid redundancy and bias, none of the proteins included has > or = 40% sequence identity to any other. It has been observed that the overall success rate by the jackknife cross-validation test in identifying non-membrane proteins and membrane proteins was 94.76%, and that in identifying the five membrane protein types was 95.84%. The high success rates suggest that the GO-PseAA predictor can catch the core feature of the statistical samples concerned and may become an automated high throughput toll in molecular and cell biology.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, CA 92130, USA.
| | | |
Collapse
|
419
|
Huang SW, Hwang JK. Computation of conformational entropy from protein sequences using the machine-learning method-Application to the study of the relationship between structural conservation and local structural stability. Proteins 2005; 59:802-9. [PMID: 15828008 DOI: 10.1002/prot.20462] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A complete protein sequence can usually determine a unique conformation; however, the situation is different for shorter subsequences--some of them are able to adopt unique conformations, independent of context; while others assume diverse conformations in different contexts. The conformations of subsequences are determined by the interplay between local and nonlocal interactions. A quantitative measure of such structural conservation or variability will be useful in the understanding of the sequence-structure relationship. In this report, we developed an approach using the support vector machine method to compute the conformational variability directly from sequences, which is referred to as the sequence structural entropy. As a practical application, we studied the relationship between sequence structural entropy and the hydrogen exchange for a set of well-studied proteins. We found that the slowest exchange cores usually comprise amino acids of the lowest sequence structural entropy. Our results indicate that structural conservation is closely related to the local structural stability. This relationship may have interesting implications in the protein folding processes, and may be useful in the study of the sequence-structure relationship.
Collapse
Affiliation(s)
- Shao-Wei Huang
- Institute of Bioinformatics, National Chiao Tung University, Taiwan, Republic of China
| | | |
Collapse
|
420
|
Lu Z, Hunter L. Go molecular function terms are predictive of subcellular localization. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2005:151-61. [PMID: 15759622 PMCID: PMC2652875 DOI: 10.1142/9789812702456_0015] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A protein's function is closely linked to its subcellular localization. Use of Gene Ontology (GO) molecular function terms to extend sequence-based subcellular localization prediction has been previously shown to improve predictive performance. Here, we explore directly the relationship between GO function annotations and localization information, identifying both highly predictive single terms, and terms with large information gain with respect to location. The results identify a number of predictive and informative GO terms with respect to subcellular location, particularly nucleus, extracellular space, membrane, mitochondrion, endoplasmic reticulum and Golgi. There are several clear examples illustrating why the addition of function information provides additional predictive power over sequence alone. Other interesting phenomena can also be seen in the results. Most predictive or informative terms are imperfect, and incorrect prediction may often call out significant biological phenomena. Finally, these results may be useful in the GO annotation process.
Collapse
Affiliation(s)
- Z Lu
- Center for Computational Pharmacology, University of Colorado Health Sciences Centre, School of Medicine, Denver, CO, USA
| | | |
Collapse
|
421
|
Wang ML, Yao H, Xu WB. Prediction by support vector machines and analysis by Z-score of poly-l-proline type II conformation based on local sequence. Comput Biol Chem 2005; 29:95-100. [PMID: 15833437 DOI: 10.1016/j.compbiolchem.2005.02.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2004] [Revised: 01/08/2005] [Accepted: 02/18/2005] [Indexed: 11/25/2022]
Abstract
In recent years, the poly-L-proline type II (PPII) conformation has gained more and more importance. This structure plays vital roles in many biological processes. But few studies have been made to predict PPII secondary structures computationally. The support vector machine (SVM) represents a new approach to supervised pattern classification and has been successfully applied to a wide range of pattern recognition problems. In this paper, we present a SVM prediction method of PPII conformation based on local sequence. The overall accuracy for both the independent testing set and estimate of jackknife testing reached approximately 70%. Matthew's correlation coefficient (MCC) could reach 0.4. By comparing the results of training and testing datasets with different sequence identities, we suggest that the performance of this method correlates with the sequence identity of dataset. The parameter of SVM kernel function was an important factor to the performance of this method. The propensities of residues located at different positions were also analyzed. By computing Z-scores, we found that P and G were the two most important residues to PPII structure conformation.
Collapse
Affiliation(s)
- Ming-Lei Wang
- Laboratory of Bioinformatics, The Key Laboratory of Industrial Biotechnology, Ministry of Education, Southern Yangtze University, Wuxi 214036, China.
| | | | | |
Collapse
|
422
|
Huang J, Shi F. Support vector machines for predicting apoptosis proteins types. Acta Biotheor 2005; 53:39-47. [PMID: 15906142 DOI: 10.1007/s10441-005-7002-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2004] [Revised: 05/17/2004] [Accepted: 10/07/2004] [Indexed: 10/25/2022]
Abstract
Apoptosis proteins have a central role in the development and homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death, and their function is related to their types. According to the classification scheme by Zhou and Doctor (2003), the apoptosis proteins are categorized into the following four types: (1) cytoplasmic protein; (2) plasma membrane-bound protein; (3) mitochondrial inner and outer proteins; (4) other proteins. A powerful learning machine, the Support Vector Machine, is applied for predicting the type of a given apoptosis protein by incorporating the sqrt-amino acid composition effect. High success rates were obtained by the re-substitute test (98/98 = 100 %) and the jackknife test (89/98 = 90.8%).
Collapse
Affiliation(s)
- Jing Huang
- School of Computer, Wuhan University, Hubei Province, PR China.
| | | |
Collapse
|
423
|
Zheng L, Yang J, Landwehr C, Fan F, Ji Y. Identification of an essential glycoprotease in Staphylococcus aureus. FEMS Microbiol Lett 2005; 245:279-85. [PMID: 15837383 DOI: 10.1016/j.femsle.2005.03.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2005] [Revised: 03/11/2005] [Accepted: 03/13/2005] [Indexed: 11/19/2022] Open
Abstract
The emergence of multi-drug resistant bacterial pathogens is generating enormous public health concern, and highlights an urgent need for new, alternative agents for treating multi-drug-resistant pathogens. The gene products essential for bacterial growth in vitro and survival during infection constitute an initial set of protein targets for the development of antibacterial agents. In this study, we employed regulated gene expression approaches and demonstrated that a putative glycoprotease (Gcp) is required for staphylococcal growth in the culture. We found that Staphylococcus aureus becomes more sensitive to the Zn(2+) ion under the downregulation of Gcp expression in vitro. Bioinformatic analyses demonstrated that Gcp is conserved in many Gram-positive pathogens and exists in a variety of Gram-negative pathogens. Our results indicate that Gcp is a potential novel target for the development of antimicrobials against S. aureus infection.
Collapse
Affiliation(s)
- Li Zheng
- Department of Veterinary and Biomedical Sciences, College of Veterinary Medicine, University of Minnesota, 1971 Commonwealth Avenue, St. Paul, MN 55108, USA
| | | | | | | | | |
Collapse
|
424
|
Nair R, Rost B. Mimicking Cellular Sorting Improves Prediction of Subcellular Localization. J Mol Biol 2005; 348:85-100. [PMID: 15808855 DOI: 10.1016/j.jmb.2005.02.025] [Citation(s) in RCA: 219] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2004] [Revised: 02/08/2005] [Accepted: 02/09/2005] [Indexed: 11/24/2022]
Abstract
Predicting the native subcellular compartment of a protein is an important step toward elucidating its function. Here we introduce LOCtree, a hierarchical system combining support vector machines (SVMs) and other prediction methods. LOCtree predicts the subcellular compartment of a protein by mimicking the mechanism of cellular sorting and exploiting a variety of sequence and predicted structural features in its input. Currently LOCtree does not predict localization for membrane proteins, since the compositional properties of membrane proteins significantly differ from those of non-membrane proteins. While any information about function can be used by the system, we present estimates of performance that are valid when only the amino acid sequence of a protein is known. When evaluated on a non-redundant test set, LOCtree achieved sustained levels of 74% accuracy for non-plant eukaryotes, 70% for plants, and 84% for prokaryotes. We rigorously benchmarked LOCtree in comparison to the best alternative methods for localization prediction. LOCtree outperformed all other methods in nearly all benchmarks. Localization assignments using LOCtree agreed quite well with data from recent large-scale experiments. Our preliminary analysis of a few entirely sequenced organisms, namely human (Homo sapiens), yeast (Saccharomyces cerevisiae), and weed (Arabidopsis thaliana) suggested that over 35% of all non-membrane proteins are nuclear, about 20% are retained in the cytosol, and that every fifth protein in the weed resides in the chloroplast.
Collapse
Affiliation(s)
- Rajesh Nair
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA
| | | |
Collapse
|
425
|
González-Díaz H, Saíz-Urra L, Molina R, Uriarte E. Stochastic molecular descriptors for polymers. 2. Spherical truncation of electrostatic interactions on entropy based polymers 3D-QSAR. POLYMER 2005. [DOI: 10.1016/j.polymer.2005.01.066] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
426
|
Chou KC, Cai YD. Prediction of Membrane Protein Types by Incorporating Amphipathic Effects. J Chem Inf Model 2005; 45:407-13. [PMID: 15807506 DOI: 10.1021/ci049686v] [Citation(s) in RCA: 147] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
According to their intramolecular arrangement and position in a cell, membrane proteins are generally classified into the following six types: (1) type I transmembrane, (2) type II transmembrane, (3) multipass transmembrane, (4) lipid chain-anchored membrane, (5) GPI-anchored membrane, and (6) peripheral membrane. Situated in a heteropolar environment, these six types of membrane proteins must have quite different amphiphilic sequence-order patterns in order to stabilize their respective frameworks. To incorporate such a feature into the predictor, the amphiphilic pseudo amino acid composition has been formulated that contains a series of hydrophobic and hydrophilic correlation factors. The success rates thus obtained have been remarkably enhanced in identifying the types of membrane proteins, as demonstrated by the jackknife test and independent data set test, respectively.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, California 92130, USA.
| | | |
Collapse
|
427
|
Drabkin HJ, Hollenbeck C, Hill DP, Blake JA. Ontological visualization of protein-protein interactions. BMC Bioinformatics 2005; 6:29. [PMID: 15707487 PMCID: PMC550656 DOI: 10.1186/1471-2105-6-29] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2004] [Accepted: 02/11/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cellular processes require the interaction of many proteins across several cellular compartments. Determining the collective network of such interactions is an important aspect of understanding the role and regulation of individual proteins. The Gene Ontology (GO) is used by model organism databases and other bioinformatics resources to provide functional annotation of proteins. The annotation process provides a mechanism to document the binding of one protein with another. We have constructed protein interaction networks for mouse proteins utilizing the information encoded in the GO annotations. The work reported here presents a methodology for integrating and visualizing information on protein-protein interactions. RESULTS GO annotation at Mouse Genome Informatics (MGI) captures 1318 curated, documented interactions. These include 129 binary interactions and 125 interaction involving three or more gene products. Three networks involve over 30 partners, the largest involving 109 proteins. Several tools are available at MGI to visualize and analyze these data. CONCLUSIONS Curators at the MGI database annotate protein-protein interaction data from experimental reports from the literature. Integration of these data with the other types of data curated at MGI places protein binding data into the larger context of mouse biology and facilitates the generation of new biological hypotheses based on physical interactions among gene products.
Collapse
Affiliation(s)
- Harold J Drabkin
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA
| | | | - David P Hill
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA
| | - Judith A Blake
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA
| |
Collapse
|
428
|
Bhasin M, Garg A, Raghava GPS. PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 2005; 21:2522-4. [PMID: 15699023 DOI: 10.1093/bioinformatics/bti309] [Citation(s) in RCA: 168] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
SUMMARY We developed a web server PSLpred for predicting subcellular localization of gram-negative bacterial proteins with an overall accuracy of 91.2%. PSLpred is a hybrid approach-based method that integrates PSI-BLAST and three SVM modules based on compositions of residues, dipeptides and physico-chemical properties. The prediction accuracies of 90.7, 86.8, 90.3, 95.2 and 90.6% were attained for cytoplasmic, extracellular, inner-membrane, outer-membrane and periplasmic proteins, respectively. Furthermore, PSLpred was able to predict approximately 74% of sequences with an average prediction accuracy of 98% at RI = 5. AVAILABILITY PSLpred is available at http://www.imtech.res.in/raghava/pslpred/
Collapse
Affiliation(s)
- Manoj Bhasin
- Institute of Microbial Technology, Sector 39A, Chandigarh, India
| | | | | |
Collapse
|
429
|
González-Díaz H, Cruz-Monteagudo M, Molina R, Tenorio E, Uriarte E. Predicting multiple drugs side effects with a general drug-target interaction thermodynamic Markov model. Bioorg Med Chem 2005; 13:1119-29. [PMID: 15670920 DOI: 10.1016/j.bmc.2004.11.030] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2004] [Revised: 11/09/2004] [Accepted: 11/12/2004] [Indexed: 10/26/2022]
Abstract
Most of present molecular descriptors just consider the molecular structure. In the present article we pretend extending the use of Markov chain models to define novel molecular descriptors, which consider in addition to molecular structure other parameters like target site or toxic effect. Specifically, this molecular descriptor takes into consideration not only the molecular structure but the specific system the drug affects too. Herein, it is developed a general Markov model that describes 39 different drugs side effects grouped in 11 affected systems for 301 drugs, being 686 cases finally. The data was processed by linear discriminant analysis (LDA) classifying drugs according to their specific side effects, forward stepwise was fixed as strategy for variables selection. The average percentage of good classification and number of compounds used in the training/predicting sets were 100/100% for systemic phenomena (47 out of 47)/(12 out of 12) and metabolic (18 out of 18)/(5 out of 5), muscular-skeletal (23 out of 23)/(6 out of 6) and neurological manifestations (33 out of 33)/(8 out of 8); 97.6/96.7% for cardiovascular manifestation (122 out of 125)/(30 out of 31); 97.1/97.5% for breathing manifestations (34 out of 35)/(8 out of 9); 97/99.4% for gastrointestinal manifestations (159 out of 164)/(40 out of 41); 97/95% for endocrine manifestations (32 out of 33)/(7 out of 8); 96.4/94.6% for psychiatric manifestations (53 out of 55)/(13 out of 14); 95.1/99.1% for hematological manifestations (98 out of 103)/(25 out of 26) and 88/92.3% for dermal manifestations (44 out of 50)/(12 out of 13). In addition, we report preliminary experimental reversible decrease of lymphocytes differential count after administration of the antibacterial drug G-1 in mice, which coincide with a posterior probability (P%=74.91) predicted by the model. This article develops a model that encompasses a large number of side effects grouped in specific organ systems in a single stochastic framework for the first time.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela 15782, Spain
| | | | | | | | | |
Collapse
|
430
|
de Armas RR, Díaz HG, Molina R, Uriarte E. Stochastic-based descriptors studying biopolymers biological properties: Extended MARCH-INSIDE methodology describing antibacterial activity of lactoferricin derivatives. Biopolymers 2005; 77:247-56. [PMID: 15682438 DOI: 10.1002/bip.20202] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Lactoferricin are a number of related peptides derived from the enzymatic cleavage of lactoferrin, an iron-binding protein. These peptides, and other peptides derived from them by simple amino acid substitutions, have shown interesting antibacterial activity. In this paper we applied the MARCH-INSIDE methodology extended to peptide and proteins, to a QSAR study related to antibacterial activity of 31 derivatives of lactoffericin against E. Coli and S. Aureus by means of Linear Discriminant (LDA) and Multiple Linear Regression Analysis (MLR). In the case of LDA we obtained models that classify correctly more than 80% of all cases (85.7% for E. Coli antibacterial activity and 83.9 for S. Aureus). With the application of a Leave-One-Out Cross Validation Procedure, the percentage of good classification of both classification models remained near the above reported values (87.1% for E. Coli antibacterial activity and 83.9 for S. Aureus). We obtained several linear regression models taking into account total and local descriptors. The inclusion of those local descriptors improved the correlation parameters, the statistical quality, and the predictive power of the former model obtained only with total descriptors. The best models explained more than 80% of the experimental variance in the antimicrobial activity of those compounds. These results are comparable with those reported previously by Strom (Strom, M. B.; Rekdal, O.; Svendesen, J. S. J Peptide Res 2001, 57, 127-139.) and Tore-Lejon (Lejon, T.; Strom, M.; Svendsen, S. J Protein Sci 2001, 7, 74-78.; Lejon, T.; Svendsen J. S.; Haug, B. E. J Peptide Sci 2002, 8, 302-306.) in a smaller dataset applying Z-scales and volume-based descriptors and PLS as statistical techniques.
Collapse
|
431
|
Garg A, Bhasin M, Raghava GPS. Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J Biol Chem 2005; 280:14427-32. [PMID: 15647269 DOI: 10.1074/jbc.m411789200] [Citation(s) in RCA: 153] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Here we report a systematic approach for predicting subcellular localization (cytoplasm, mitochondrial, nuclear, and plasma membrane) of human proteins. First, support vector machine (SVM)-based modules for predicting subcellular localization using traditional amino acid and dipeptide (i + 1) composition achieved overall accuracy of 76.6 and 77.8%, respectively. PSI-BLAST, when carried out using a similarity-based search against a nonredundant data base of experimentally annotated proteins, yielded 73.3% accuracy. To gain further insight, a hybrid module (hybrid1) was developed based on amino acid composition, dipeptide composition, and similarity information and attained better accuracy of 84.9%. In addition, SVM modules based on a different higher order dipeptide i.e. i + 2, i + 3, and i + 4 were also constructed for the prediction of subcellular localization of human proteins, and overall accuracy of 79.7, 77.5, and 77.1% was accomplished, respectively. Furthermore, another SVM module hybrid2 was developed using traditional dipeptide (i + 1) and higher order dipeptide (i + 2, i + 3, and i + 4) compositions, which gave an overall accuracy of 81.3%. We also developed SVM module hybrid3 based on amino acid composition, traditional and higher order dipeptide compositions, and PSI-BLAST output and achieved an overall accuracy of 84.4%. A Web server HSLPred (www.imtech.res.in/raghava/hslpred/ or bioinformatics.uams.edu/raghava/hslpred/) has been designed to predict subcellular localization of human proteins using the above approaches.
Collapse
Affiliation(s)
- Aarti Garg
- Bioinformatics Centre, Institute of Microbial Technology, Sector 39A, Chandigarh, India
| | | | | |
Collapse
|
432
|
Predicting Subcellular Localization of Proteins Using Support Vector Machine with N-Terminal Amino Composition. ACTA ACUST UNITED AC 2005. [DOI: 10.1007/11527503_73] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
433
|
Wang L, Chen K, Ong YS. Bio-kernel Self-organizing Map for HIV Drug Resistance Classification. LECTURE NOTES IN COMPUTER SCIENCE 2005. [PMCID: PMC7122014 DOI: 10.1007/11539087_20] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Kernel self-organizing map has been recently studied by Fyfe and his colleagues [1]. This paper investigates the use of a novel bio-kernel function for the kernel self-organizing map. For verification, the application of the proposed new kernel self-organizing map to HIV drug resistance classification using mutation patterns in protease sequences is presented. The original self-organizing map together with the distributed encoding method was compared. It has been found that the use of the kernel self-organizing map with the novel bio-kernel function leads to better classification and faster convergence rate ...
Collapse
Affiliation(s)
- Lipo Wang
- School of Electrical and Electronic Engineering, Nanyang Technological University, Block S1, Nanyang Avenue, 639798 Singapore
| | - Ke Chen
- School of Software, Sun Yat-Sen University, 510275 Guangzhou, China
| | - Yew Soon Ong
- School of Computer Engineering, Nanyang Technological University, BLK N4, 2b-39, Nanyang Avenue, 639798 Singapore
| |
Collapse
|
434
|
González-Díaz H, Uriarte E, Ramos de Armas R. Predicting stability of Arc repressor mutants with protein stochastic moments. Bioorg Med Chem 2005; 13:323-31. [PMID: 15598555 DOI: 10.1016/j.bmc.2004.10.024] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2004] [Revised: 10/08/2004] [Accepted: 10/09/2004] [Indexed: 11/18/2022]
Abstract
As more and more protein structures are determined and applied to drug manufacture, there is increasing interest in studying their stability. In this study, the stochastic moments ((SR)pi(k)) of 53 Arc repressor mutants were introduced as molecular descriptors modeling protein stability. The Linear Discriminant Analysis model developed correctly classified 43 out of 53, 81.13% of proteins according to their thermal stability. More specifically, the model classified 20/28 (71.4%) proteins with near wild-type stability and 23/25 (92%) proteins with reduced stability. Moreover, validation of the model was carried out by re-substitution procedures (81.0%). In addition, the stochastic moments based model compared favorably with respect to others based on physicochemical and geometric parameters such as D-Fire potential, surface area, volume, partition coefficient, and molar refractivity, which presented less than 77% of accuracy. This result illustrates the possibilities of the stochastic moments' method for the study of bioorganic and medicinal chemistry relevant proteins.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Organic Chemistry, Faculty of Pharmacy, University of Santiago de Compostela 15706, Spain.
| | | | | |
Collapse
|
435
|
Heazlewood JL, Millar AH. AMPDB: the Arabidopsis Mitochondrial Protein Database. Nucleic Acids Res 2005; 33:D605-10. [PMID: 15608271 PMCID: PMC540002 DOI: 10.1093/nar/gki048] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2004] [Revised: 09/30/2004] [Accepted: 09/30/2004] [Indexed: 12/02/2022] Open
Abstract
The Arabidopsis Mitochondrial Protein Database is an Internet-accessible relational database containing information on the predicted and experimentally confirmed protein complement of mitochondria from the model plant Arabidopsis thaliana (http://www.ampdb.bcs.uwa.edu.au/). The database was formed using the total non-redundant nuclear and organelle encoded sets of protein sequences and allows relational searching of published proteomic analyses of Arabidopsis mitochondrial samples, a set of predictions from six independent subcellular-targeting prediction programs, and orthology predictions based on pairwise comparison of the Arabidopsis protein set with known yeast and human mitochondrial proteins and with the proteome of Rickettsia. A variety of precomputed physical-biochemical parameters are also searchable as well as a more detailed breakdown of mass spectral data produced from our proteomic analysis of Arabidopsis mitochondria. It contains hyperlinks to other Arabidopsis genomic resources (MIPS, TIGR and TAIR), which provide rapid access to changing gene models as well as hyperlinks to T-DNA insertion resources, Massively Parallel Signature Sequencing (MPSS) and Genome Tiling Array data and a variety of other Arabidopsis online resources. It also incorporates basic analysis tools built into the query structure such as a BLAST facility and tools for protein sequence alignments for convenient analysis of queried results.
Collapse
Affiliation(s)
- Joshua L Heazlewood
- Plant Molecular Biology Group, School of Biomedical and Chemical Sciences, The University of Western Australia, Crawley 6009, WA, Australia
| | | |
Collapse
|
436
|
Voting Fuzzy k-NN to Predict Protein Subcellular Localization from Normalized Amino Acid Pair Compositions. ACTA ACUST UNITED AC 2005. [DOI: 10.1007/11430919_23] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
437
|
Millar AH, Heazlewood JL, Kristensen BK, Braun HP, Møller IM. The plant mitochondrial proteome. TRENDS IN PLANT SCIENCE 2005; 10:36-43. [PMID: 15642522 DOI: 10.1016/j.tplants.2004.12.002] [Citation(s) in RCA: 127] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The plant mitochondrial proteome might contain as many as 2000-3000 different gene products, each of which might undergo post-translational modification. Recent studies using analytical methods, such as one-, two- and three-dimensional gel electrophoresis and one- and two-dimensional liquid chromatography linked on-line with tandem mass spectrometry, have identified >400 mitochondrial proteins, including subunits of mitochondrial respiratory complexes, supercomplexes, phosphorylated proteins and oxidized proteins. The results also highlight a range of new mitochondrial proteins, new mitochondrial functions and possible new mechanisms for regulating mitochondrial metabolism. More than 70 identified proteins in Arabidopsis mitochondrial samples lack similarity to any protein of known function. In some cases, unknown proteins were found to form part of protein complexes, which allows a functional context to be defined for them. There are indications that some of these proteins add novel activities to mitochondrial protein complexes in plants.
Collapse
Affiliation(s)
- A Harvey Millar
- Plant Molecular Biology Group, School of Biomedical and Chemical Sciences, University of Western Australia, Crawley 6009, W.A., Australia.
| | | | | | | | | |
Collapse
|
438
|
Yu CS, Lin CJ, Hwang JK. Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci 2004; 13:1402-6. [PMID: 15096640 PMCID: PMC2286765 DOI: 10.1110/ps.03479604] [Citation(s) in RCA: 628] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Gram-negative bacteria have five major subcellular localization sites: the cytoplasm, the periplasm, the inner membrane, the outer membrane, and the extracellular space. The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. We present an approach to predict subcellular localization for Gram-negative bacteria. This method uses the support vector machines trained by multiple feature vectors based on n-peptide compositions. For a standard data set comprising 1443 proteins, the overall prediction accuracy reaches 89%, which, to the best of our knowledge, is the highest prediction rate ever reported. Our prediction is 14% higher than that of the recently developed multimodular PSORT-B. Because of its simplicity, this approach can be easily extended to other organisms and should be a useful tool for the high-throughput and large-scale analysis of proteomic and genomic data.
Collapse
Affiliation(s)
- Chin-Sheng Yu
- Department of Biological Science and Technology, National Chiao Tung University, HsinChu 30050, Taiwan
| | | | | |
Collapse
|
439
|
Chou KC, Cai YD. Using GO-PseAA predictor to predict enzyme sub-class. Biochem Biophys Res Commun 2004; 325:506-9. [PMID: 15530421 DOI: 10.1016/j.bbrc.2004.10.058] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2004] [Indexed: 11/25/2022]
Abstract
Enzyme function is much less conserved than anticipated, i.e., the requirement for sequence similarity that implies similarity in enzymatic function is much higher than the requirement that implies similarity in protein structure. This is because the function of an enzyme is an extremely complicated problem that may involve very subtle structural details as well as many other physical chemistry factors. Accordingly, if simply based on the sequence similarity approach, it would hardly get a decent success rate in predicting enzyme sub-class even for a dataset consisting of samples with 50% sequence identity. To cope with such a situation, the GO-PseAA predictor was adopted to identify the sub-class for each of the six main enzyme families. It has been observed that, even for the much more stringent datasets in which none of the enzymes has 25% sequence identity to any others, the overall success rates are 73-95%, suggesting that the GO-PseAA predictor can catch the core features of the statistical samples concerned and may become a useful high throughput tool in proteomics and bioinformatics.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, CA 92130, USA.
| | | |
Collapse
|
440
|
Collier N, Takeuchi K. Comparison of character-level and part of speech features for name recognition in biomedical texts. J Biomed Inform 2004; 37:423-35. [PMID: 15542016 DOI: 10.1016/j.jbi.2004.08.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2004] [Indexed: 10/26/2022]
Abstract
The immense volume of data which is now available from experiments in molecular biology has led to an explosion in reported results most of which are available only in unstructured text format. For this reason there has been great interest in the task of text mining to aid in fact extraction, document screening, citation analysis, and linkage with large gene and gene-product databases. In particular there has been an intensive investigation into the named entity (NE) task as a core technology in all of these tasks which has been driven by the availability of high volume training sets such as the GENIA v3.02 corpus. Despite such large training sets accuracy for biology NE has proven to be consistently far below the high levels of performance in the news domain where F scores above 90 are commonly reported which can be considered near to human performance. We argue that it is crucial that more rigorous analysis of the factors that contribute to the model's performance be applied to discover where the underlying limitations are and what our future research direction should be. Our investigation in this paper reports on variations of two widely used feature types, part of speech (POS) tags and character-level orthographic features, and makes a comparison of how these variations influence performance. We base our experiments on a proven state-of-the-art model, support vector machines using a high quality subset of 100 annotated MEDLINE abstracts. Experiments reveal that the best performing features are orthographic features with F score of 72.6. Although the Brill tagger trained in-domain on the GENIA v3.02p POS corpus gives the best overall performance of any POS tagger, at an F score of 68.6, this is still significantly below the orthographic features. In combination these two features types appear to interfere with each other and degrade performance slightly to an F score of 72.3.
Collapse
Affiliation(s)
- Nigel Collier
- National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan.
| | | |
Collapse
|
441
|
Nucleic Acid Quadratic Indices of the “Macromolecular Graph’s Nucleotides Adjacency Matrix”. Modeling of Footprints after the Interaction of Paromomycin with the HIV-1 Ψ-RNA Packaging Region. Int J Mol Sci 2004. [DOI: 10.3390/i5110276] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
442
|
Scott MS, Thomas DY, Hallett MT. Predicting subcellular localization via protein motif co-occurrence. Genome Res 2004; 14:1957-66. [PMID: 15466294 PMCID: PMC524420 DOI: 10.1101/gr.2650004] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The prediction of subcellular localization of proteins from their primary sequence is a challenging problem in bioinformatics. We have created a Bayesian network localization predictor called PSLT that is based on the combinatorial presence of InterPro motifs and specific membrane domains in human proteins. This probabilistic framework generates a likelihood of localization to all organelles and allows to predict multicompartmental proteins. When used to predict on nine compartments, PSLT achieves an accuracy of 78% as estimated by using a 10-fold cross-validation test and a coverage of 74%. When used to predict the localization of proteins from other closely related species, it achieves a prediction accuracy and a coverage >80%. We compared the localization predictions of PSLT to those determined through GFP-tagging and microscopy for a group of human proteins. We found two general classes of proteins that are mislocalized by the GFP-tagging strategy but are correctly localized by PSLT. This suggests that PSLT can be used in combination with experimental approaches for localization to identify proteins for which additional experimental validation is required. We used our predictor to annotate all 9793 human proteins from SWISS-PROT release 41.25, 16% of which are predicted by PSLT to be present in more than one compartment.
Collapse
Affiliation(s)
- Michelle S Scott
- McGill Center for Bioinformatics, McGill University, Montreal, Quebec H3A 2B4, Canada
| | | | | |
Collapse
|
443
|
Jiang-Ning S, Wei-Jiang L, Wen-Bo X. Cooperativity of the oxidization of cysteines in globular proteins. J Theor Biol 2004; 231:85-95. [PMID: 15363931 DOI: 10.1016/j.jtbi.2004.06.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2003] [Revised: 06/01/2004] [Accepted: 06/07/2004] [Indexed: 11/17/2022]
Abstract
Based on the 639 non-homologous proteins with 2910 cysteine-containing segments of well-resolved three-dimensional structures, a novel approach has been proposed to predict the disulfide-bonding state of cysteines in proteins by constructing a two-stage classifier combining a first global linear discriminator based on their amino acid composition and a second local support vector machine classifier. The overall prediction accuracy of this hybrid classifier for the disulfide-bonding state of cysteines in proteins has scored 84.1% and 80.1%, when measured on cysteine and protein basis using the rigorous jack-knife procedure, respectively. It shows that whether cysteines should form disulfide bonds depends not only on the global structural features of proteins but also on the local sequence environment of proteins. The result demonstrates the applicability of this novel method and provides comparable prediction performance compared with existing methods for the prediction of the oxidation states of cysteines in proteins.
Collapse
Affiliation(s)
- Song Jiang-Ning
- The Key Laboratory of Industrial Biotechnology, Ministry of Education, Southern Yangtze University, 170 Huihe Road, Wuxi 214036, China.
| | | | | |
Collapse
|
444
|
Reczko M, Hatzigerrorgiou A. Prediction of the subcellular localization of eukaryotic proteins using sequence signals and composition. Proteomics 2004; 4:1591-6. [PMID: 15174129 DOI: 10.1002/pmic.200300769] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
A tool called Locfind for the sequence-based prediction of the localization of eukaryotic proteins is introduced. It is based on bidirectional recurrent neural networks trained to read sequentially the amino acid sequence and produce localization information along the sequence. Systematic variation of the network architecture in combination with an efficient learning algorithm lead to a 91% correct localization prediction for novel proteins in fivefold cross-validation. The data and evaluation procedure are the same as the non-plant part of the widely used TargetP tool by Emanuelsson et al. The Locfind system is available on the WWW for predictions (http://www.stepc.gr/~synaptic/locfind.html).
Collapse
Affiliation(s)
- Martin Reczko
- Bioinformatics Lab, Institute of Computer Science, Foundation for Research and Technology--Hellas (FORTH), Heraklion, Crete, Greece.
| | | |
Collapse
|
445
|
Abstract
MOTIVATION Most of the existing methods in predicting protein subcellular location were used to deal with the cases limited within the scope from two to five localizations, and only a few of them can be effectively extended to cover the cases of 12-14 localizations. This is because the more the locations involved are, the poorer the success rate would be. Besides, some proteins may occur in several different subcellular locations, i.e. bear the feature of 'multiplex locations'. So far there is no method that can be used to effectively treat the difficult multiplex location problem. The present study was initiated in an attempt to address (1) how to efficiently identify the localization of a query protein among many possible subcellular locations, and (2) how to deal with the case of multiplex locations. RESULTS By hybridizing gene ontology, functional domain and pseudo amino acid composition approaches, a new method has been developed that can be used to predict subcellular localization of proteins with multiplex location feature. A global analysis of the proteins in budding yeast classified into 22 locations was performed by jack-knife cross-validation with the new method. The overall success identification rate thus obtained is 70%. In contrast to this, the corresponding rates obtained by some other existing methods were only 13-14%, indicating that the new method is very powerful and promising. Furthermore, predictions were made for the four proteins whose localizations could not be determined by experiments, as well as for the 236 proteins whose localizations in budding yeast were ambiguous according to experimental observations. However, according to our predicted results, many of these 'ambiguous proteins' were found to have the same score and ranking for several different subcellular locations, implying that they may simultaneously exist, or move around, in these locations. This finding is intriguing because it reflects the dynamic feature of these proteins in a cell that may be associated with some special biological functions.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute San Diego, CA 92130, USA.
| | | |
Collapse
|
446
|
Jiang XS, Dai J, Sheng QH, Zhang L, Xia QC, Wu JR, Zeng R. A comparative proteomic strategy for subcellular proteome research: ICAT approach coupled with bioinformatics prediction to ascertain rat liver mitochondrial proteins and indication of mitochondrial localization for catalase. Mol Cell Proteomics 2004; 4:12-34. [PMID: 15507458 DOI: 10.1074/mcp.m400079-mcp200] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Subcellular proteomics, as an important step to functional proteomics, has been a focus in proteomic research. However, the co-purification of "contaminating" proteins has been the major problem in all the subcellular proteomic research including all kinds of mitochondrial proteome research. It is often difficult to conclude whether these "contaminants" represent true endogenous partners or artificial associations induced by cell disruption or incomplete purification. To solve such a problem, we applied a high-throughput comparative proteome experimental strategy, ICAT approach performed with two-dimensional LC-MS/MS analysis, coupled with combinational usage of different bioinformatics tools, to study the proteome of rat liver mitochondria prepared with traditional centrifugation (CM) or further purified with a Nycodenz gradient (PM). A total of 169 proteins were identified and quantified convincingly in the ICAT analysis, in which 90 proteins have an ICAT ratio of PM:CM>1.0, while another 79 proteins have an ICAT ratio of PM:CM<1.0. Almost all the proteins annotated as mitochondrial according to Swiss-Prot annotation, bioinformatics prediction, and literature reports have a ratio of PM:CM>1.0, while proteins annotated as extracellular or secreted, cytoplasmic, endoplasmic reticulum, ribosomal, and so on have a ratio of PM:CM<1.0. Catalase and AP endonuclease 1, which have been known as peroxisomal and nuclear, respectively, have shown a ratio of PM:CM>1.0, confirming the reports about their mitochondrial location. Moreover, the 125 proteins with subcellular location annotation have been used as a testing dataset to evaluate the efficiency for ascertaining mitochondrial proteins by ICAT analysis and the bioinformatics tools such as PSORT, TargetP, SubLoc, MitoProt, and Predotar. The results indicated that ICAT analysis coupled with combinational usage of different bioinformatics tools could effectively ascertain mitochondrial proteins and distinguish contaminant proteins and even multilocation proteins. Using such a strategy, many novel proteins, known proteins without subcellular location annotation, and even known proteins that have been annotated as other locations have been strongly indicated for their mitochondrial location.
Collapse
Affiliation(s)
- Xiao-Sheng Jiang
- Research Centre for Proteome Analysis, Key Lab of Proteomics, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Graduate School of the Chinese Academy of Sciences, Shanghai 200031, China
| | | | | | | | | | | | | |
Collapse
|
447
|
Gardy JL, Laird MR, Chen F, Rey S, Walsh CJ, Ester M, Brinkman FSL. PSORTb v.2.0: Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 2004; 21:617-23. [PMID: 15501914 DOI: 10.1093/bioinformatics/bti057] [Citation(s) in RCA: 573] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION PSORTb v.1.1 is the most precise bacterial localization prediction tool available. However, the program's predictive coverage and recall are low and the method is only applicable to Gram-negative bacteria. The goals of the present work are as follows: increase PSORTb's coverage while maintaining the existing precision level, expand it to include Gram-positive bacteria and then carry out a comparative analysis of localization. RESULTS An expanded database of proteins of known localization and new modules using frequent subsequence-based support vector machines was introduced into PSORTb v.2.0. The program attains a precision of 96% for Gram-positive and Gram-negative bacteria and predictive coverage comparable to other tools for whole proteome analysis. We show that the proportion of proteins at each localization is remarkably consistent across species, even in species with varying proteome size. AVAILABILITY Web-based version: http://www.psort.org/psortb. Standalone version: Available through the website under GNU General Public License. CONTACT psort-mail@sfu.ca, brinkman@sfu.ca SUPPLEMENTARY INFORMATION http://www.psort.org/psortb/supplementaryinfo.html.
Collapse
Affiliation(s)
- J L Gardy
- Department of Molecular Biology and Biochemistry, Simon Fraser University Burnaby, BC, Canada V5A 1S6
| | | | | | | | | | | | | |
Collapse
|
448
|
Huff T, Rosorius O, Otto AM, Müller CSG, Ballweber E, Hannappel E, Mannherz HG. Nuclear localisation of the G-actin sequestering peptide thymosin β4. J Cell Sci 2004; 117:5333-41. [PMID: 15466884 DOI: 10.1242/jcs.01404] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Thymosin β4 is regarded as the main G-actin sequestering peptide in the cytoplasm of mammalian cells. It is also thought to be involved in cellular events like cancerogenesis, apoptosis, angiogenesis, blood coagulation and wound healing. Thymosin β4 has been previously reported to localise intracellularly to the cytoplasm as detected by immunofluorescence. It can be selectively labelled at two of its glutamine-residues with fluorescent Oregon Green cadaverine using transglutaminase; however, this labelling does not interfere with its interaction with G-actin. Here we show that after microinjection into intact cells, fluorescently labelled thymosin β4 has a diffuse cytoplasmic and a pronounced nuclear staining. Enzymatic cleavage of fluorescently labelled thymosin β4 with AsnC-endoproteinase yielded two mono-labelled fragments of the peptide. After microinjection of these fragments, only the larger N-terminal fragment, containing the proposed actin-binding sequence exhibited nuclear localisation, whereas the smaller C-terminal fragment remained confined to the cytoplasm. We further showed that in digitonin permeabilised and extracted cells, fluorescent thymosin β4 was solely localised within the cytoplasm, whereas it was found concentrated within the cell nuclei after an additional Triton X100 extraction. Therefore, we conclude that thymosin β4 is specifically translocated into the cell nucleus by an active transport mechanism, requiring an unidentified soluble cytoplasmic factor. Our data furthermore suggest that this peptide may also serve as a G-actin sequestering peptide in the nucleus, although additional nuclear functions cannot be excluded.
Collapse
Affiliation(s)
- Thomas Huff
- Institut für Biochemie, Medizinische Fakultät, Universität Erlangen-Nürnberg, Fahrstr. 17, 91054 Erlangen, Germany.
| | | | | | | | | | | | | |
Collapse
|
449
|
Cai YD, Chou KC. Predicting 22 protein localizations in budding yeast. Biochem Biophys Res Commun 2004; 323:425-8. [PMID: 15369769 DOI: 10.1016/j.bbrc.2004.08.113] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2004] [Indexed: 11/30/2022]
Abstract
According to the recent experiments, proteins in budding yeast can be distinctly classified into 22 subcellular locations. Of these proteins, some bear the multi-locational feature, i.e., occur in more than one location. However, so far all the existing methods in predicting protein subcellular location were developed to deal with only the mono-locational case where a query protein is assumed to belong to one, and only one, subcellular location. To stimulate the development of subcellular location prediction, an augmentation procedure is formulated that will enable the existing methods to tackle the multi-locational problem as well. It has been observed thru a jackknife cross-validation test that the success rate obtained by the augmented GO-FnD-PseAA algorithm [BBRC 320 (2004) 1236] is overwhelmingly higher than those by the other augmented methods. It is anticipated that the augmented GO-FunD-PseAA predictor will become a very useful tool in predicting protein subcellular localization for both basic research and practical application.
Collapse
Affiliation(s)
- Yu-Dong Cai
- Biomolecular Sciences Department, UMIST, P.O. Box 88, Manchester M60 1QD, UK; Gordon Life Science Institute, San Diego, CA 92130, USA.
| | | |
Collapse
|
450
|
Chou KC, Cai YD. Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem Biophys Res Commun 2004; 320:1236-9. [PMID: 15249222 DOI: 10.1016/j.bbrc.2004.06.073] [Citation(s) in RCA: 123] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2004] [Indexed: 11/18/2022]
Abstract
The localization of a protein in a cell is closely correlated with its biological function. With the explosion of protein sequences entering into DataBanks, it is highly desired to develop an automated method that can fast identify their subcellular location. This will expedite the annotation process, providing timely useful information for both basic research and industrial application. In view of this, a powerful predictor has been developed by hybridizing the gene ontology approach [Nat. Genet. 25 (2000) 25], functional domain composition approach [J. Biol. Chem. 277 (2002) 45765], and the pseudo-amino acid composition approach [Proteins Struct. Funct. Genet. 43 (2001) 246; Erratum: ibid. 44 (2001) 60]. As a showcase, the recently constructed dataset [Bioinformatics 19 (2003) 1656] was used for demonstration. The dataset contains 7589 proteins classified into 12 subcellular locations: chloroplast, cytoplasmic, cytoskeleton, endoplasmic reticulum, extracellular, Golgi apparatus, lysosomal, mitochondrial, nuclear, peroxisomal, plasma membrane, and vacuolar. The overall success rate of prediction obtained by the jackknife cross-validation was 92%. This is so far the highest success rate performed on this dataset by following an objective and rigorous cross-validation procedure.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, CA 92130, USA.
| | | |
Collapse
|