101
|
Tiwari AK, Srivastava R. A survey of computational intelligence techniques in protein function prediction. INTERNATIONAL JOURNAL OF PROTEOMICS 2014; 2014:845479. [PMID: 25574395 PMCID: PMC4276698 DOI: 10.1155/2014/845479] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Revised: 10/31/2014] [Accepted: 11/07/2014] [Indexed: 02/08/2023]
Abstract
During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction.
Collapse
Affiliation(s)
- Arvind Kumar Tiwari
- Department of Computer Science & Engineering, Indian Institute of Technology (BHU), Varanasi 221005, India
| | - Rajeev Srivastava
- Department of Computer Science & Engineering, Indian Institute of Technology (BHU), Varanasi 221005, India
| |
Collapse
|
102
|
Kumar R, Kumari B, Srivastava A, Kumar M. NRfamPred: a proteome-scale two level method for prediction of nuclear receptor proteins and their sub-families. Sci Rep 2014; 4:6810. [PMID: 25351274 PMCID: PMC5381360 DOI: 10.1038/srep06810] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 10/09/2014] [Indexed: 11/09/2022] Open
Abstract
Nuclear receptor proteins (NRP) are transcription factor that regulate many vital cellular processes in animal cells. NRPs form a super-family of phylogenetically related proteins and divided into different sub-families on the basis of ligand characteristics and their functions. In the post-genomic era, when new proteins are being added to the database in a high-throughput mode, it becomes imperative to identify new NRPs using information from amino acid sequence alone. In this study we report a SVM based two level prediction systems, NRfamPred, using dipeptide composition of proteins as input. At the 1st level, NRfamPred screens whether the query protein is NRP or non-NRP; if the query protein belongs to NRP class, prediction moves to 2nd level and predicts the sub-family. Using leave-one-out cross-validation, we were able to achieve an overall accuracy of 97.88% at the 1st level and an overall accuracy of 98.11% at the 2nd level with dipeptide composition. Benchmarking on independent datasets showed that NRfamPred had comparable accuracy to other existing methods, developed on the same dataset. Our method predicted the existence of 76 NRPs in the human proteome, out of which 14 are novel NRPs. NRfamPred also predicted the sub-families of these 14 NRPs.
Collapse
Affiliation(s)
- Ravindra Kumar
- Department of Biophysics, University of Delhi South Campus, Benito Juarez Road, New Delhi, India-110021
| | - Bandana Kumari
- Department of Biophysics, University of Delhi South Campus, Benito Juarez Road, New Delhi, India-110021
| | - Abhishikha Srivastava
- Department of Biophysics, University of Delhi South Campus, Benito Juarez Road, New Delhi, India-110021
| | - Manish Kumar
- Department of Biophysics, University of Delhi South Campus, Benito Juarez Road, New Delhi, India-110021
| |
Collapse
|
103
|
Stetson LC, Pearl T, Chen Y, Barnholtz-Sloan JS. Computational identification of multi-omic correlates of anticancer therapeutic response. BMC Genomics 2014; 15 Suppl 7:S2. [PMID: 25573145 PMCID: PMC4243102 DOI: 10.1186/1471-2164-15-s7-s2] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Background A challenge in precision medicine is the transformation of genomic data into knowledge that can be used to stratify patients into treatment groups based on predicted clinical response. Although clinical trials remain the only way to truly measure drug toxicities and effectiveness, as a scientific community we lack the resources to clinically assess all drugs presently under development. Therefore, an effective preclinical model system that enables prediction of anticancer drug response could significantly speed the broader adoption of personalized medicine. Results Three large-scale pharmacogenomic studies have screened anticancer compounds in greater than 1000 distinct human cancer cell lines. We combined these datasets to generate and validate multi-omic predictors of drug response. We compared drug response signatures built using a penalized linear regression model and two non-linear machine learning techniques, random forest and support vector machine. The precision and robustness of each drug response signature was assessed using cross-validation across three independent datasets. Fifteen drugs were common among the datasets. We validated prediction signatures for eleven out of fifteen tested drugs (17-AAG, AZD0530, AZD6244, Erlotinib, Lapatinib, Nultin-3, Paclitaxel, PD0325901, PD0332991, PF02341066, and PLX4720). Conclusions Multi-omic predictors of drug response can be generated and validated for many drugs. Specifically, the random forest algorithm generated more precise and robust prediction signatures when compared to support vector machines and the more commonly used elastic net regression. The resulting drug response signatures can be used to stratify patients into treatment groups based on their individual tumor biology, with two major benefits: speeding the process of bringing preclinical drugs to market, and the repurposing and repositioning of existing anticancer therapies.
Collapse
|
104
|
Abbas SS, Dijkstra TMH, Heskes T. A comparative study of cell classifiers for image-based high-throughput screening. BMC Bioinformatics 2014; 15:342. [PMID: 25336059 PMCID: PMC4287552 DOI: 10.1186/1471-2105-15-342] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Accepted: 09/29/2014] [Indexed: 11/24/2022] Open
Abstract
Background Millions of cells are present in thousands of images created in high-throughput screening (HTS). Biologists could classify each of these cells into a phenotype by visual inspection. But in the presence of millions of cells this visual classification task becomes infeasible. Biologists train classification models on a few thousand visually classified example cells and iteratively improve the training data by visual inspection of the important misclassified phenotypes. Classification methods differ in performance and performance evaluation time. We present a comparative study of computational performance of gentle boosting, joint boosting CellProfiler Analyst (CPA), support vector machines (linear and radial basis function) and linear discriminant analysis (LDA) on two data sets of HT29 and HeLa cancer cells. Results For the HT29 data set we find that gentle boosting, SVM (linear) and SVM (RBF) are close in performance but SVM (linear) is faster than gentle boosting and SVM (RBF). For the HT29 data set the average performance difference between SVM (RBF) and SVM (linear) is 0.42 %. For the HeLa data set we find that SVM (RBF) outperforms other classification methods and is on average 1.41 % better in performance than SVM (linear). Conclusions Our study proposes SVM (linear) for iterative improvement of the training data and SVM (RBF) for the final classifier to classify all unlabeled cells in the whole data set. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-342) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Syed Saiden Abbas
- Institute for Computing and Information Sciences, Radboud University, Nijmegen, Netherlands.
| | | | | |
Collapse
|
105
|
Pacharawongsakda E, Theeramunkong T. Predict subcellular locations of singleplex and multiplex proteins by semi-supervised learning and dimension-reducing general mode of Chou's PseAAC. IEEE Trans Nanobioscience 2014; 12:311-20. [PMID: 23864226 DOI: 10.1109/tnb.2013.2272014] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Predicting protein subcellular location is one of major challenges in Bioinformatics area since such knowledge helps us understand protein functions and enables us to select the targeted proteins during drug discovery process. While many computational techniques have been proposed to improve predictive performance for protein subcellular location, they have several shortcomings. In this work, we propose a method to solve three main issues in such techniques; i) manipulation of multiplex proteins which may exist or move between multiple cellular compartments, ii) handling of high dimensionality in input and output spaces and iii) requirement of sufficient labeled data for model training. Towards these issues, this work presents a new computational method for predicting proteins which have either single or multiple locations. The proposed technique, namely iFLAST-CORE, incorporates the dimensionality reduction in the feature and label spaces with co-training paradigm for semi-supervised multi-label classification. For this purpose, the Singular Value Decomposition (SVD) is applied to transform the high-dimensional feature space and label space into the lower-dimensional spaces. After that, due to limitation of labeled data, the co-training regression makes use of unlabeled data by predicting the target values in the lower-dimensional spaces of unlabeled data. In the last step, the component of SVD is used to project labels in the lower-dimensional space back to those in the original space and an adaptive threshold is used to map a numeric value to a binary value for label determination. A set of experiments on viral proteins and gram-negative bacterial proteins evidence that our proposed method improve the classification performance in terms of various evaluation metrics such as Aiming (or Precision), Coverage (or Recall) and macro F-measure, compared to the traditional method that uses only labeled data.
Collapse
|
106
|
Hooper CM, Tanz SK, Castleden IR, Vacher MA, Small ID, Millar AH. SUBAcon: a consensus algorithm for unifying the subcellular localization data of the Arabidopsis proteome. ACTA ACUST UNITED AC 2014; 30:3356-64. [PMID: 25150248 DOI: 10.1093/bioinformatics/btu550] [Citation(s) in RCA: 123] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
MOTIVATION Knowing the subcellular location of proteins is critical for understanding their function and developing accurate networks representing eukaryotic biological processes. Many computational tools have been developed to predict proteome-wide subcellular location, and abundant experimental data from green fluorescent protein (GFP) tagging or mass spectrometry (MS) are available in the model plant, Arabidopsis. None of these approaches is error-free, and thus, results are often contradictory. RESULTS To help unify these multiple data sources, we have developed the SUBcellular Arabidopsis consensus (SUBAcon) algorithm, a naive Bayes classifier that integrates 22 computational prediction algorithms, experimental GFP and MS localizations, protein-protein interaction and co-expression data to derive a consensus call and probability. SUBAcon classifies protein location in Arabidopsis more accurately than single predictors. AVAILABILITY SUBAcon is a useful tool for recovering proteome-wide subcellular locations of Arabidopsis proteins and is displayed in the SUBA3 database (http://suba.plantenergy.uwa.edu.au). The source code and input data is available through the SUBA3 server (http://suba.plantenergy.uwa.edu.au//SUBAcon.html) and the Arabidopsis SUbproteome REference (ASURE) training set can be accessed using the ASURE web portal (http://suba.plantenergy.uwa.edu.au/ASURE).
Collapse
Affiliation(s)
- Cornelia M Hooper
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Sandra K Tanz
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Ian R Castleden
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Michael A Vacher
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Ian D Small
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - A Harvey Millar
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| |
Collapse
|
107
|
Mao R, Raj Kumar PK, Guo C, Zhang Y, Liang C. Comparative analyses between retained introns and constitutively spliced introns in Arabidopsis thaliana using random forest and support vector machine. PLoS One 2014; 9:e104049. [PMID: 25110928 PMCID: PMC4128822 DOI: 10.1371/journal.pone.0104049] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 07/06/2014] [Indexed: 01/04/2023] Open
Abstract
One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.
Collapse
Affiliation(s)
- Rui Mao
- College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi, China
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
- Department of Biology, Miami University, Oxford, Ohio, United States of America
| | | | - Cheng Guo
- Department of Biology, Miami University, Oxford, Ohio, United States of America
| | - Yang Zhang
- College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling, Shaanxi, China
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
- * E-mail: (YZ); (CL)
| | - Chun Liang
- Department of Biology, Miami University, Oxford, Ohio, United States of America
- Department of Computer Sciences and Software Engineering, Miami University, Oxford, Ohio, United States of America
- * E-mail: (YZ); (CL)
| |
Collapse
|
108
|
Mamun K, Sharma A. Importance of Computational Intelligent in Proteomics. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2014. [DOI: 10.20965/jaciii.2014.p0469] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Computational Intelligent (CI) techniques have become an apparent need in many bioinformatics applications. In this article, we make the interested reader aware of the necessity of CI, providing a basic taxonomy of proteomics, and discussing their use, variety and potential in a number of both common as well as upcoming proteomics application.
Collapse
|
109
|
Ding S, Yan S, Qi S, Li Y, Yao Y. A protein structural classes prediction method based on PSI-BLAST profile. J Theor Biol 2014; 353:19-23. [DOI: 10.1016/j.jtbi.2014.02.034] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 01/27/2014] [Accepted: 02/24/2014] [Indexed: 11/27/2022]
|
110
|
Kumar R, Jain S, Kumari B, Kumar M. Protein sub-nuclear localization prediction using SVM and Pfam domain information. PLoS One 2014; 9:e98345. [PMID: 24897370 PMCID: PMC4045734 DOI: 10.1371/journal.pone.0098345] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Accepted: 05/01/2014] [Indexed: 12/24/2022] Open
Abstract
The nucleus is the largest and the highly organized organelle of eukaryotic cells. Within nucleus exist a number of pseudo-compartments, which are not separated by any membrane, yet each of them contains only a specific set of proteins. Understanding protein sub-nuclear localization can hence be an important step towards understanding biological functions of the nucleus. Here we have described a method, SubNucPred developed by us for predicting the sub-nuclear localization of proteins. This method predicts protein localization for 10 different sub-nuclear locations sequentially by combining presence or absence of unique Pfam domain and amino acid composition based SVM model. The prediction accuracy during leave-one-out cross-validation for centromeric proteins was 85.05%, for chromosomal proteins 76.85%, for nuclear speckle proteins 81.27%, for nucleolar proteins 81.79%, for nuclear envelope proteins 79.37%, for nuclear matrix proteins 77.78%, for nucleoplasm proteins 76.98%, for nuclear pore complex proteins 88.89%, for PML body proteins 75.40% and for telomeric proteins it was 83.33%. Comparison with other reported methods showed that SubNucPred performs better than existing methods. A web-server for predicting protein sub-nuclear localization named SubNucPred has been established at http://14.139.227.92/mkumar/subnucpred/. Standalone version of SubNucPred can also be downloaded from the web-server.
Collapse
Affiliation(s)
- Ravindra Kumar
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
| | - Sohni Jain
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
| | - Bandana Kumari
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
| | - Manish Kumar
- Department of Biophysics, University of Delhi South Campus, New Delhi, India
- * E-mail:
| |
Collapse
|
111
|
Pan R, Kaur N, Hu J. The Arabidopsis mitochondrial membrane-bound ubiquitin protease UBP27 contributes to mitochondrial morphogenesis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2014; 78:1047-59. [PMID: 24707813 DOI: 10.1111/tpj.12532] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Revised: 03/28/2014] [Accepted: 04/01/2014] [Indexed: 05/13/2023]
Abstract
Mitochondria are essential organelles with dynamic morphology and function. Post-translational modifications (PTMs), which include protein ubiquitination, are critically involved in animal and yeast mitochondrial dynamics. How PTMs contribute to plant mitochondrial dynamics is just beginning to be elucidated, and mitochondrial enzymes involved in ubiquitination have not been reported from plants. In this study, we identified an Arabidopsis mitochondrial localized ubiquitin protease, UBP27, through a screen that combined bioinformatics and fluorescent fusion protein targeting analysis. We characterized UBP27 with respect to its membrane topology and enzymatic activities, and analysed the mitochondrial morphological changes in UBP27T-DNA insertion mutants and overexpression lines. We have shown that UBP27 is embedded in the mitochondrial outer membrane with an Nin -Cout orientation and possesses ubiquitin protease activities in vitro. UBP27 demonstrates similar sub-cellular localization, domain structure, membrane topology and enzymatic activities with two mitochondrial deubiquitinases, yeast ScUBP16 and human HsUSP30, which indicated that these proteins are functional orthologues in eukaryotes. Although loss-of-function mutants of UBP27 do not show obvious phenotypes in plant growth and mitochondrial morphology, UBP27 overexpression can change mitochondrial morphology from rod to spherical shape and reduce the mitochondrial association of dynamin-related protein 3 (DRP3) proteins, large GTPases that serve as the main mitochondrial fission factors. Thus, our study has uncovered a plant ubiquitin protease that plays a role in mitochondrial morphogenesis possibly through modulation of the function of organelle division proteins.
Collapse
Affiliation(s)
- Ronghui Pan
- MSU-DOE Plant Research Laboratory, Michigan State University, East Lansing, MI, 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| | | | | |
Collapse
|
112
|
Frost PC, Song K, Wagner ND. A beginner's guide to nutritional profiling in physiology and ecology. Integr Comp Biol 2014; 54:873-9. [PMID: 24876193 DOI: 10.1093/icb/icu054] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The nutritional history of an organism is often difficult to ascertain. Nonetheless, this information on past diet can be particularly important when explaining the role of nutrition in physiological responses and ecological dynamics. One approach to infer the past dietary history of an individual is through characterization of its nutritional phenotype, an interrelated set of molecular and physiological properties that are sensitive to dietary stress. Comparisons of nutritional phenotypes between a study organism and reference phenotypes have the potential to provide insight into the type and intensity of past dietary constraints. Here, we describe this process of nutritional profiling for ecophysiological research in which a suite of molecular and physiological responses are cataloged for animals experiencing known types and intensities of dietary stress and are quantitatively compared with those of unknown individuals. We supplement this delineation of the process of nutritional profiling with a first-order analysis of its sensitivity to the number of response variables in the reference database, their responsiveness to diet, and the size of reference populations. In doing so, we demonstrate the considerable promise this approach has to transform future studies of nutrition by its ability to provide more and better information on responses to dietary stress in animals and their populations.
Collapse
Affiliation(s)
- Paul C Frost
- *Department of Biology, Trent University, Peterborough, Ontario K9J 7B8, Canada; Environmental and Life Sciences Graduate Program, Trent University, Peterborough, Ontario K9J 7B8, Canada
| | - Keunyea Song
- *Department of Biology, Trent University, Peterborough, Ontario K9J 7B8, Canada; Environmental and Life Sciences Graduate Program, Trent University, Peterborough, Ontario K9J 7B8, Canada
| | - Nicole D Wagner
- *Department of Biology, Trent University, Peterborough, Ontario K9J 7B8, Canada; Environmental and Life Sciences Graduate Program, Trent University, Peterborough, Ontario K9J 7B8, Canada
| |
Collapse
|
113
|
Zhang L, Zhao X, Kong L. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou's pseudo amino acid composition. J Theor Biol 2014; 355:105-10. [PMID: 24735902 DOI: 10.1016/j.jtbi.2014.04.008] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Revised: 02/26/2014] [Accepted: 04/04/2014] [Indexed: 10/25/2022]
Abstract
Knowledge of protein structural class plays an important role in characterizing the overall folding type of a given protein. At present, it is still a challenge to extract sequence information solely using protein sequence for protein structural class prediction with low similarity sequence in the current computational biology. In this study, a novel sequence representation method is proposed based on position specific scoring matrix for protein structural class prediction. By defined evolutionary difference formula, varying length proteins are expressed as uniform dimensional vectors, which can represent evolutionary difference information between the adjacent residues of a given protein. To perform and evaluate the proposed method, support vector machine and jackknife tests are employed on three widely used datasets, 25PDB, 1189 and 640 datasets with sequence similarity lower than 25%, 40% and 25%, respectively. Comparison of our results with the previous methods shows that our method may provide a promising method to predict protein structural class especially for low-similarity sequences.
Collapse
Affiliation(s)
- Lichao Zhang
- College of Marine Life Science, Ocean University of China, Yushan Road, Qingdao 266003, PR China
| | - Xiqiang Zhao
- College of Mathematical Science, Ocean University of China, Songling Road, Qingdao 266100, PR China.
| | - Liang Kong
- College of Mathematics and Information Technology, Hebei Normal University of Science and Technology, Qinhuangdao 066004, PR China
| |
Collapse
|
114
|
Verma JK, Gayali S, Dass S, Kumar A, Parveen S, Chakraborty S, Chakraborty N. OsAlba1, a dehydration-responsive nuclear protein of rice (Oryza sativa L. ssp. indica), participates in stress adaptation. PHYTOCHEMISTRY 2014; 100:16-25. [PMID: 24534105 DOI: 10.1016/j.phytochem.2014.01.015] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Revised: 01/16/2014] [Accepted: 01/22/2014] [Indexed: 05/13/2023]
Abstract
Alba proteins have exhibited great functional plasticity through the course of evolution and constitute a superfamily that spans across three domains of life. Earlier, we had developed the dehydration-responsive nuclear proteome of an indica rice cultivar, screening of which led to the identification of an Alba protein. Here we describe, for the first time, the complete sequence of the candidate gene OsAlba1, its genomic organization, and possible function/s in plant. Phylogenetic analysis showed its close proximity to other monocots as compared to dicot Alba proteins. Protein-DNA interaction prediction indicates a DNA-binding property for OsAlba1. Confocal microscopy showed the localization of OsAlba1-GFP fusion protein to the nucleus, and also sparsely to the cytoplasm. Water-deficit conditions triggered OsAlba1 expression suggesting its function in dehydration stress, possibly through an ABA-dependent pathway. Functional complementation of the yeast mutant ΔPop6 established that OsAlba1 also functions in oxidative stress tolerance. The preferential expression of OsAlba1 in the flag leaves implies its role in grain filling. Our findings suggest that the Alba components such as OsAlba1, especially from a plant where there is no evidence for a major chromosomal role, might play important function in stress adaptation.
Collapse
Affiliation(s)
- Jitendra Kumar Verma
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Saurabh Gayali
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Suchismita Dass
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Amit Kumar
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Shaista Parveen
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Subhra Chakraborty
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India
| | - Niranjan Chakraborty
- National Institute of Plant Genome Research, Jawaharlal Nehru University Campus, Aruna Asaf Ali Marg, New Delhi 110067, India.
| |
Collapse
|
115
|
Ding S, Li Y, Shi Z, Yan S. A protein structural classes prediction method based on predicted secondary structure and PSI-BLAST profile. Biochimie 2014; 97:60-5. [DOI: 10.1016/j.biochi.2013.09.013] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 09/16/2013] [Indexed: 10/26/2022]
|
116
|
Ghosh S, Vishveshwara S. Ranking the quality of protein structure models using sidechain based network properties. F1000Res 2014; 3:17. [PMID: 25580218 PMCID: PMC4038323 DOI: 10.12688/f1000research.3-17.v1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/20/2014] [Indexed: 01/31/2023] Open
Abstract
Determining the correct structure of a protein given its sequence still remains an arduous task with many researchers working towards this goal. Most structure prediction methodologies result in the generation of a large number of probable candidates with the final challenge being to select the best amongst these. In this work, we have used Protein Structure Networks of native and modeled proteins in combination with Support Vector Machines to estimate the quality of a protein structure model and finally to provide ranks for these models. Model ranking is performed using regression analysis and helps in model selection from a group of many similar and good quality structures. Our results show that structures with a rank greater than 16 exhibit native protein-like properties while those below 10 are non-native like. The tool is also made available as a web-server ( http://vishgraph.mbu.iisc.ernet.in/GraProStr/native_non_native_ranking.html), where, 5 modelled structures can be evaluated at a given time.
Collapse
Affiliation(s)
- Soma Ghosh
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India ; I.I.Sc. Mathematics Initiative, Indian Institute of Science, Bangalore, 560012, India
| | | |
Collapse
|
117
|
Chen X, Li J, Hou J, Xie Z, Yang F. Mammalian mitochondrial proteomics: insights into mitochondrial functions and mitochondria-related diseases. Expert Rev Proteomics 2014; 7:333-45. [DOI: 10.1586/epr.10.22] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
118
|
Du P, Xu C. Predicting multisite protein subcellular locations: progress and challenges. Expert Rev Proteomics 2014; 10:227-37. [DOI: 10.1586/epr.13.16] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
119
|
Abstract
Bioinformatic tools are an increasingly important resource for Arabidopsis researchers. With them, it is possible to rapidly query the large data sets covering genomes, transcriptomes, proteomes, epigenomes, and other "omes" that have been generated in the past decade. Often these tools can be used to generate quality hypotheses at the click of a mouse. In this chapter, we cover the use of bioinformatic tools for examining gene expression and coexpression patterns, performing promoter analyses, looking for functional classification enrichment for sets of genes, and investigating protein-protein interactions. We also introduce bioinformatic tools that allow integration of data from several sources for improved hypothesis generation.
Collapse
Affiliation(s)
- Miguel de Lucas
- Department of Plant Biology and Genome Center, UC Davis, Davis, CA, USA
| | | | | |
Collapse
|
120
|
Hajisharifi Z, Piryaiee M, Mohammad Beigi M, Behbahani M, Mohabatkar H. Predicting anticancer peptides with Chou′s pseudo amino acid composition and investigating their mutagenicity via Ames test. J Theor Biol 2014; 341:34-40. [DOI: 10.1016/j.jtbi.2013.08.037] [Citation(s) in RCA: 210] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Revised: 08/28/2013] [Accepted: 08/31/2013] [Indexed: 12/27/2022]
|
121
|
Zhang S, Liang Y, Yuan X. Improving the prediction accuracy of protein structural class: Approached with alternating word frequency and normalized Lempel–Ziv complexity. J Theor Biol 2014; 341:71-7. [DOI: 10.1016/j.jtbi.2013.10.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Revised: 09/08/2013] [Accepted: 10/08/2013] [Indexed: 10/26/2022]
|
122
|
Palanisamy B, Heese K. Oxygen distribution in proteins defines functional significance of the genome and proteome of the malaria parasitePlasmodium falciparum3D7. FEMS Microbiol Lett 2013; 351:59-63. [DOI: 10.1111/1574-6968.12355] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 12/06/2013] [Indexed: 11/27/2022] Open
Affiliation(s)
- Balamurugan Palanisamy
- School of Biotechnology and Health Sciences; Karunya University; Coimbatore Tamil Nadu India
| | - Klaus Heese
- Graduate School of Biomedical Science and Engineering; Hanyang University; Seoul Korea
| |
Collapse
|
123
|
Tian J, Zhang Y, Liu B, Zuo D, Jiang T, Guo J, Zhang W, Wu N, Fan Y. Presep: predicting the propensity of a protein being secreted into the supernatant when expressed in Pichia pastoris. PLoS One 2013; 8:e79749. [PMID: 24278168 PMCID: PMC3836778 DOI: 10.1371/journal.pone.0079749] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 10/02/2013] [Indexed: 11/19/2022] Open
Abstract
Pichia pastoris is commonly used for the production of recombinant proteins due to its preferential secretion of recombinant proteins, resulting in lower production costs and increased yields of target proteins. However, not all recombinant proteins can be successfully secreted in P. pastoris. A computational method that predicts the likelihood of a protein being secreted into the supernatant would be of considerable value; however, to the best of our knowledge, no such tool has yet been developed. We present a machine-learning approach called Presep to assess the likelihood of a recombinant protein being secreted by P. pastoris based on its pseudo amino acid composition (PseAA). Using a 20-fold cross validation, Presep demonstrated a high degree of accuracy, with Matthews correlation coefficient (MCC) and overall accuracy (Q2) scores of 0.78 and 95%, respectively. Computational results were validated experimentally, with six β-galactosidase genes expressed in P. pastoris strain GS115 to verify Presep model predictions. A strong correlation (R(2) = 0.967) was observed between Presep prediction secretion propensity and the experimental secretion percentage. Together, these results demonstrate the ability of the Presep model for predicting the secretion propensity of P. pastoris for a given protein. This model may serve as a valuable tool for determining the utility of P. pastoris as a host organism prior to initiating biological experiments. The Presep prediction tool can be freely downloaded at http://www.mobioinfor.cn/Presep.
Collapse
Affiliation(s)
- Jian Tian
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yuhong Zhang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Bo Liu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Dongyang Zuo
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Tao Jiang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jun Guo
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Wei Zhang
- Key Laboratory of Agricultural Genomics (Beijing), Ministry of Agriculture, Beijing, China
- * E-mail: (NW); (WZ)
| | - Ningfeng Wu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
- * E-mail: (NW); (WZ)
| | - Yunliu Fan
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
124
|
Niarchou A, Alexandridou A, Athanasiadis E, Spyrou G. C-PAmP: large scale analysis and database construction containing high scoring computationally predicted antimicrobial peptides for all the available plant species. PLoS One 2013; 8:e79728. [PMID: 24244550 PMCID: PMC3823563 DOI: 10.1371/journal.pone.0079728] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2013] [Accepted: 10/04/2013] [Indexed: 12/03/2022] Open
Abstract
Background Antimicrobial peptides are a promising alternative to conventional antibiotics. Plants are an important source of such peptides; their pharmacological properties are known since antiquity. Access to relevant information, however, is not straightforward, as there are practically no major repositories of experimentally validated and/or predicted plant antimicrobial peptides. PhytAMP is the only database dedicated to plant peptides with confirmed antimicrobial action, holding 273 entries. Data on such peptides can be otherwise retrieved from generic repositories. Description We present C-PAmP, a database of computationally predicted plant antimicrobial peptides. C-PAmP contains 15,174,905 peptides, 5–100 amino acids long, derived from 33,877 proteins of 2,112 plant species in UniProtKB/Swiss-Prot. Its web interface allows queries based on peptide/protein sequence, protein accession number and species. Users can view the corresponding predicted peptides along with their probability score, their classification according to the Collection of Anti-Microbial Peptides (CAMP), and their PhytAMP id where applicable. Moreover, users can visualise protein regions with a high concentration of predicted antimicrobial peptides. In order to identify potential antimicrobial peptides we used a classification algorithm, based on a modified version of the pseudo amino acid concept. The classifier tested all subsequences ranging from 5 to 100 amino acids of the plant proteins in UniProtKB/Swiss-Prot and stored those classified as antimicrobial with a high probability score (>90%). Its performance measures across a 10-fold cross-validation are more than satisfactory (accuracy: 0.91, sensitivity: 0.93, specificity: 0.90) and it succeeded in classifying 99.5% of the PhytAMP peptides correctly. Conclusions We have compiled a major repository of predicted plant antimicrobial peptides using a highly performing classification algorithm. Our repository is accessible from the web and supports multiple querying options to optimise data retrieval. We hope it will greatly benefit drug design research by significantly limiting the range of plant peptides to be experimentally tested for antimicrobial activity.
Collapse
|
125
|
Palanisamy B, Ekambaram R, Heese K. Thymine distribution in genes provides novel insight into the functional significance of the proteome of the malaria parasite Plasmodium falciparum 3D7. Bioinformatics 2013; 30:597-600. [DOI: 10.1093/bioinformatics/btt587] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
126
|
Kaundal R, Sahu SS, Verma R, Weirick T. Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning. BMC Bioinformatics 2013; 14 Suppl 14:S7. [PMID: 24266945 PMCID: PMC3851450 DOI: 10.1186/1471-2105-14-s14-s7] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Plastids are an important component of plant cells, being the site of manufacture and storage of chemical compounds used by the cell, and contain pigments such as those used in photosynthesis, starch synthesis/storage, cell color etc. They are essential organelles of the plant cell, also present in algae. Recent advances in genomic technology and sequencing efforts is generating a huge amount of DNA sequence data every day. The predicted proteome of these genomes needs annotation at a faster pace. In view of this, one such annotation need is to develop an automated system that can distinguish between plastid and non-plastid proteins accurately, and further classify plastid-types based on their functionality. We compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning. RESULTS In this study, we developed separate Support Vector Machine (SVM) trained classifiers for characterizing the plastids in two steps: first distinguishing the plastid vs. non-plastid proteins, and then classifying the identified plastids into their various types based on their function (chloroplast, chromoplast, etioplast, and amyloplast). Five diverse protein features: amino acid composition, dipeptide composition, the pseudo amino acid composition, N(terminal)-Center-C(terminal) composition and the protein physicochemical properties are used to develop SVM models. Overall, the dipeptide composition-based module shows the best performance with an accuracy of 86.80% and Matthews Correlation Coefficient (MCC) of 0.74 in phase-I and 78.60% with a MCC of 0.44 in phase-II. On independent test data, this model also performs better with an overall accuracy of 76.58% and 74.97% in phase-I and phase-II, respectively. The similarity-based PSI-BLAST module shows very low performance with about 50% prediction accuracy for distinguishing plastid vs. non-plastids and only 20% in classifying various plastid-types, indicating the need and importance of machine learning algorithms. CONCLUSION The current work is a first attempt to develop a methodology for classifying various plastid-type proteins. The prediction modules have also been made available as a web tool, PLpred available at http://bioinfo.okstate.edu/PLpred/ for real time identification/characterization. We believe this tool will be very useful in the functional annotation of various genomes.
Collapse
|
127
|
Rosillo R, Giner J, de la Fuente D. The effectiveness of the combined use of VIX and Support Vector Machines on the prediction of S&P 500. Neural Comput Appl 2013. [DOI: 10.1007/s00521-013-1487-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
128
|
Zhang L, Zhao X, Kong L. A protein structural class prediction method based on novel features. Biochimie 2013; 95:1741-4. [DOI: 10.1016/j.biochi.2013.05.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2013] [Accepted: 05/28/2013] [Indexed: 11/28/2022]
|
129
|
Armengaud J, Christie-Oleza JA, Clair G, Malard V, Duport C. Exoproteomics: exploring the world around biological systems. Expert Rev Proteomics 2013. [PMID: 23194272 DOI: 10.1586/epr.12.52] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The term 'exoproteome' describes the protein content that can be found in the extracellular proximity of a given biological system. These proteins arise from cellular secretion, other protein export mechanisms or cell lysis, but only the most stable proteins in this environment will remain in abundance. It has been shown that these proteins reflect the physiological state of the cells in a given condition and are indicators of how living systems interact with their environments. High-throughput proteomic approaches based on a shotgun strategy, and high-resolution mass spectrometers, have modified the authors' view of exoproteomes. In the present review, the authors describe how these new approaches should be exploited to obtain the maximum useful information from a sample, whatever its origin. The methodologies used for studying secretion from model cell lines derived from eukaryotic, multicellular organisms, virulence determinants of pathogens and environmental bacteria and their relationships with their habitats are illustrated with several examples. The implication of such data, in terms of proteogenomics and the discovery of novel protein functions, is discussed.
Collapse
Affiliation(s)
- Jean Armengaud
- CEA, DSV, IBEB, Lab Biochim System Perturb, Bagnols-sur-Cèze, F-30207, France.
| | | | | | | | | |
Collapse
|
130
|
Wang X, Li GZ. Multilabel learning via random label selection for protein subcellular multilocations prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:436-446. [PMID: 23929867 DOI: 10.1109/tcbb.2013.21] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. In the past few years, only a few methods have been proposed to tackle proteins with multiple locations. However, they only adopt a simple strategy, that is, transforming the multilocation proteins to multiple proteins with single location, which does not take correlations among different subcellular locations into account. In this paper, a novel method named random label selection (RALS) (multilabel learning via RALS), which extends the simple binary relevance (BR) method, is proposed to learn from multilocation proteins in an effective and efficient way. RALS does not explicitly find the correlations among labels, but rather implicitly attempts to learn the label correlations from data by augmenting original feature space with randomly selected labels as its additional input features. Through the fivefold cross-validation test on a benchmark data set, we demonstrate our proposed method with consideration of label correlations obviously outperforms the baseline BR method without consideration of label correlations, indicating correlations among different subcellular locations really exist and contribute to improvement of prediction performance. Experimental results on two benchmark data sets also show that our proposed methods achieve significantly higher performance than some other state-of-the-art methods in predicting subcellular multilocations of proteins. The prediction web server is available at >http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ for the public usage.
Collapse
Affiliation(s)
- Xiao Wang
- Key Laboratory of Embedded System and Service Computing, Ministry of Education, Department of Control Science and Engineering, Tongji University, Shanghai 201804, China
| | | |
Collapse
|
131
|
Dutta A, Katarkar A, Chaudhuri K. In-silico structural and functional characterization of a V. cholerae O395 hypothetical protein containing a PDZ1 and an uncommon protease domain. PLoS One 2013; 8:e56725. [PMID: 23441214 PMCID: PMC3575494 DOI: 10.1371/journal.pone.0056725] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2012] [Accepted: 01/14/2013] [Indexed: 11/18/2022] Open
Abstract
Vibrio cholerae, the causative agent of epidemic cholera, has been a constant source of concern for decades. It has constantly evolved itself in order to survive the changing environment. Acquisition of new genetic elements through genomic islands has played a major role in its evolutionary process. In this present study a hypothetical protein was identified which was present in one of the predicted genomic island regions of the large chromosome of V. cholerae O395 showing a strong homology with a conserved phage encoded protein. In-silico physicochemical analysis revealed that the hypothetical protein was a periplasmic protein. Homology modeling study indicated that the hypothetical protein was an unconventional and atypical serine protease belonging to HtrA protein family. The predicted 3D-model of the hypothetical protein revealed a catalytic centre serine utilizing a single catalytic residue for proteolysis. The predicted catalytic triad may help to deduce the active site for the recruitment of the substrate for proteolysis. The active site arrangements of this predicted serine protease homologue with atypical catalytic triad is expected to allow these proteases to work in different environments of the host.
Collapse
Affiliation(s)
- Avirup Dutta
- CSIR-SRF, Molecular and Human Genetics Division, CSIR - Indian Institute of Chemical Biology, Kolkata, West Bengal, India
| | - Atul Katarkar
- ICMR-SRF, Molecular and Human Genetics Division, CSIR - Indian Institute of Chemical Biology, Kolkata, West Bengal, India
| | - Keya Chaudhuri
- Chief Scientist, Molecular and Human Genetics Division, and Head Academic Affairs Division, CSIR - Indian Institute of Chemical Biology, Kolkata, West Bengal, India
- * E-mail:
| |
Collapse
|
132
|
Li GZ, Wang X, Hu X, Liu JM, Zhao RW. Multilabel learning for protein subcellular location prediction. IEEE Trans Nanobioscience 2013; 11:237-43. [PMID: 22987129 DOI: 10.1109/tnb.2012.2212249] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Protein subcellular localization aims at predicting the location of a protein within a cell using computational methods. Knowledge of subcellular localization of proteins indicates protein functions and helps in identifying drug targets. Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. To better reflect the characteristics of multiplex proteins, we formulate prediction of subcellular localization of multiplex proteins as a multilabel learning problem. We present and compare two multilabel learning approaches, which exploit correlations between labels and leverage label-specific features, respectively, to induce a high quality prediction model. Experimental results on six protein data sets under various organisms show that our described methods achieve significantly higher performance than any of the existing methods. Among the different multilabel learning methods, we find that methods exploiting label correlations performs better than those leveraging label-specific features.
Collapse
Affiliation(s)
- Guo-Zheng Li
- Key Laboratory of Embedded System and Service Computing, Ministry of Education, Department of Control Science and Engineering, Tongji University, Shanghai 201804, China.
| | | | | | | | | |
Collapse
|
133
|
On the structural context and identification of enzyme catalytic residues. BIOMED RESEARCH INTERNATIONAL 2013; 2013:802945. [PMID: 23484160 PMCID: PMC3581254 DOI: 10.1155/2013/802945] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 12/28/2012] [Indexed: 11/25/2022]
Abstract
Enzymes play important roles in most of the biological processes. Although only a small fraction of residues are directly involved in catalytic reactions, these catalytic residues are the most crucial parts in enzymes. The study of the fundamental and unique features of catalytic residues benefits the understanding of enzyme functions and catalytic mechanisms. In this work, we analyze the structural context of catalytic residues based on theoretical and experimental structure flexibility. The results show that catalytic residues have distinct structural features and context. Their neighboring residues, whether sequence or structure neighbors within specific range, are usually structurally more rigid than those of noncatalytic residues. The structural context feature is combined with support vector machine to identify catalytic residues from enzyme structure. The prediction results are better or comparable to those of recent structure-based prediction methods.
Collapse
|
134
|
Zhang X, Shen Y, Ding G, Tian Y, Liu Z, Li B, Wang Y, Jiang C. TFPP: an SVM-based tool for recognizing flagellar proteins in Trypanosoma brucei. PLoS One 2013; 8:e54032. [PMID: 23349782 PMCID: PMC3547966 DOI: 10.1371/journal.pone.0054032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Accepted: 12/07/2012] [Indexed: 11/18/2022] Open
Abstract
Trypanosoma brucei is a unicellular flagellated eukaryotic parasite that causes African trypanosomiasis in human and domestic animals with devastating health and economic consequences. Recent studies have revealed the important roles of the single flagellum of T. brucei in many aspects, especially that the flagellar motility is required for the viability of the bloodstream form T. brucei, suggesting that impairment of the flagellar function may provide a promising cure for African sleeping sickness. Knowing the flagellum proteome is crucial to study the molecular mechanism of the flagellar functions. Here we present a novel computational method for identifying flagellar proteins in T. brucei, called trypanosome flagellar protein predictor (TFPP). TFPP was developed based on a list of selected discriminating features derived from protein sequences, and could predict flagellar proteins with ∼92% specificity at a ∼84% sensitivity rate. Applied to the whole T. brucei proteome, TFPP reveals 811 more flagellar proteins with high confidence, suggesting that the flagellar proteome covers ∼10% of the whole proteome. Comparison of the expression profiles of the whole T. brucei proteome at three typical life cycle stages found that ∼45% of the flagellar proteins were significantly changed in expression levels between the three life cycle stages, indicating life cycle stage-specific regulation of flagellar functions in T. brucei. Overall, our study demonstrated that TFPP is highly effective in identifying flagellar proteins and could provide opportunities to study the trypanosome flagellar proteome systematically. Furthermore, the web server for TFPP can be freely accessed at http:/wukong.tongji.edu.cn/tfpp.
Collapse
Affiliation(s)
- Xiaobai Zhang
- Department of Bioinformatics, the School of Life Sciences and Technology, Tongji University, Shanghai, China
- * E-mail: (XZ); (CJ)
| | - Yuefeng Shen
- Department of Bioinformatics, the School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Guitao Ding
- Department of Bioinformatics, the School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yi Tian
- Department of Bioinformatics, the School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Zhenping Liu
- Department of Bioinformatics, the School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Bing Li
- Department of Bioinformatics, the School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yun Wang
- Department of Bioinformatics, the School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Cizhong Jiang
- Department of Bioinformatics, the School of Life Sciences and Technology, Tongji University, Shanghai, China
- * E-mail: (XZ); (CJ)
| |
Collapse
|
135
|
Lei JB, Yin JB, Shen HB. GFO: A data driven approach for optimizing the Gaussian function based similarity metric in computational biology. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2012.07.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
136
|
Tanz SK, Castleden I, Hooper CM, Vacher M, Small I, Millar HA. SUBA3: a database for integrating experimentation and prediction to define the SUBcellular location of proteins in Arabidopsis. Nucleic Acids Res 2013; 41:D1185-91. [PMID: 23180787 PMCID: PMC3531127 DOI: 10.1093/nar/gks1151] [Citation(s) in RCA: 236] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Revised: 10/24/2012] [Accepted: 10/25/2012] [Indexed: 12/27/2022] Open
Abstract
The subcellular location database for Arabidopsis proteins (SUBA3, http://suba.plantenergy.uwa.edu.au) combines manual literature curation of large-scale subcellular proteomics, fluorescent protein visualization and protein-protein interaction (PPI) datasets with subcellular targeting calls from 22 prediction programs. More than 14 500 new experimental locations have been added since its first release in 2007. Overall, nearly 650 000 new calls of subcellular location for 35 388 non-redundant Arabidopsis proteins are included (almost six times the information in the previous SUBA version). A re-designed interface makes the SUBA3 site more intuitive and easier to use than earlier versions and provides powerful options to search for PPIs within the context of cell compartmentation. SUBA3 also includes detailed localization information for reference organelle datasets and incorporates green fluorescent protein (GFP) images for many proteins. To determine as objectively as possible where a particular protein is located, we have developed SUBAcon, a Bayesian approach that incorporates experimental localization and targeting prediction data to best estimate a protein's location in the cell. The probabilities of subcellular location for each protein are provided and displayed as a pictographic heat map of a plant cell in SUBA3.
Collapse
Affiliation(s)
- Sandra K. Tanz
- Centre of Excellence in Computational Systems Biology, ARC Centre of Excellence in Plant Energy Biology and Centre for Comparative Analysis on Biomolecular Networks (CABiN), The University of Western Australia, Perth, WA 6009, Australia
| | - Ian Castleden
- Centre of Excellence in Computational Systems Biology, ARC Centre of Excellence in Plant Energy Biology and Centre for Comparative Analysis on Biomolecular Networks (CABiN), The University of Western Australia, Perth, WA 6009, Australia
| | - Cornelia M. Hooper
- Centre of Excellence in Computational Systems Biology, ARC Centre of Excellence in Plant Energy Biology and Centre for Comparative Analysis on Biomolecular Networks (CABiN), The University of Western Australia, Perth, WA 6009, Australia
| | - Michael Vacher
- Centre of Excellence in Computational Systems Biology, ARC Centre of Excellence in Plant Energy Biology and Centre for Comparative Analysis on Biomolecular Networks (CABiN), The University of Western Australia, Perth, WA 6009, Australia
| | - Ian Small
- Centre of Excellence in Computational Systems Biology, ARC Centre of Excellence in Plant Energy Biology and Centre for Comparative Analysis on Biomolecular Networks (CABiN), The University of Western Australia, Perth, WA 6009, Australia
| | - Harvey A. Millar
- Centre of Excellence in Computational Systems Biology, ARC Centre of Excellence in Plant Energy Biology and Centre for Comparative Analysis on Biomolecular Networks (CABiN), The University of Western Australia, Perth, WA 6009, Australia
| |
Collapse
|
137
|
Su ECY, Chang JM, Cheng CW, Sung TY, Hsu WL. Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing. BMC Bioinformatics 2012; 13 Suppl 17:S13. [PMID: 23282098 PMCID: PMC3521467 DOI: 10.1186/1471-2105-13-s17-s13] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Background Identification of subcellular localization in proteins is crucial to elucidate cellular processes and molecular functions in a cell. However, given a tremendous amount of sequence data generated in the post-genomic era, determining protein localization based on biological experiments can be expensive and time-consuming. Therefore, developing prediction systems to analyze uncharacterised proteins efficiently has played an important role in high-throughput protein analyses. In a eukaryotic cell, many essential biological processes take place in the nucleus. Nuclear proteins shuttle between nucleus and cytoplasm based on recognition of nuclear translocation signals, including nuclear localization signals (NLSs) and nuclear export signals (NESs). Currently, only a few approaches have been developed specifically to predict nuclear localization using sequence features, such as putative NLSs. However, it has been shown that prediction coverage based on the NLSs is very low. In addition, most existing approaches only attained prediction accuracy and Matthew's correlation coefficient (MCC) around 54%~70% and 0.250~0.380 on independent test set, respectively. Moreover, no predictor can generate sequence motifs to characterize features of potential NESs, in which biological properties are not well understood from existing experimental studies. Results In this study, first we propose PSLNuc (Protein Subcellular Localization prediction for Nucleus) for predicting nuclear localization in proteins. First, for feature representation, a protein is represented by gapped-dipeptides and the feature values are weighted by homology information from a smoothed position-specific scoring matrix. After that, we incorporate probabilistic latent semantic indexing (PLSI) for feature reduction. Finally, the reduced features are used as input for a support vector machine (SVM) classifier. In addition to PSLNuc, we further identify gapped-dipeptide signatures for putative NLSs and NESs to develop a prediction method, PSLNTS (Protein Subcellular Localization prediction using Nuclear Translocation Signals). We apply PLSI to generate gapped-dipeptide signatures from both nuclear and non-nuclear proteins, and propose candidate sequence motifs for putative NLSs and NESs. Then, we incorporate only the proposed gapped-dipeptide signatures in an SVM classifier to mimic biological properties of NLSs and NESs for predicting nuclear localization in PSLNTS. Conclusions Experiment results demonstrate that the proposed method shows a significant improvement for nuclear localization prediction. To compare our predictive performance with other approaches, we incorporate two non-redundant benchmark data sets, a training set and an independent test set. Evaluated by five-fold cross-validation on the training set, PSLNuc attains an overall accuracy of 79.7%, which is 4.8% improvement over the state-of-the-art system. In addition, our method also enhances the MCC from 0.497 to 0.595. Compared on the independent test set, PSLNuc outperforms other predictors by 3.9%~19.9% on accuracy and 0.077~0.207 on MCC. This suggests that, in addition to NLSs, which have been shown important for nuclear proteins, NESs can also be an effective indicator to detect non-nuclear proteins. Most notably, using only a few proposed gapped-dipeptide signatures as input features for the SVM classifier, PSLNTS further enhances the accuracy and MCC to 80.9% and 0.618, respectively. Our results demonstrate that gapped-dipeptide signatures can better discriminate nuclear and non-nuclear proteins. Moreover, the proposed gapped-dipeptide signatures can be biologically interpreted and used in further experiment analyses of nuclear translocation signals, including NLSs and NESs.
Collapse
Affiliation(s)
- Emily Chia-Yu Su
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan.
| | | | | | | | | |
Collapse
|
138
|
Resende DM, Rezende AM, Oliveira NJD, Batista ICA, Corrêa-Oliveira R, Reis AB, Ruiz JC. An assessment on epitope prediction methods for protozoa genomes. BMC Bioinformatics 2012; 13:309. [PMID: 23170965 PMCID: PMC3543197 DOI: 10.1186/1471-2105-13-309] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 11/11/2012] [Indexed: 12/03/2022] Open
Abstract
Background Epitope prediction using computational methods represents one of the most promising approaches to vaccine development. Reduction of time, cost, and the availability of completely sequenced genomes are key points and highly motivating regarding the use of reverse vaccinology. Parasites of genus Leishmania are widely spread and they are the etiologic agents of leishmaniasis. Currently, there is no efficient vaccine against this pathogen and the drug treatment is highly toxic. The lack of sufficiently large datasets of experimentally validated parasites epitopes represents a serious limitation, especially for trypanomatids genomes. In this work we highlight the predictive performances of several algorithms that were evaluated through the development of a MySQL database built with the purpose of: a) evaluating individual algorithms prediction performances and their combination for CD8+ T cell epitopes, B-cell epitopes and subcellular localization by means of AUC (Area Under Curve) performance and a threshold dependent method that employs a confusion matrix; b) integrating data from experimentally validated and in silico predicted epitopes; and c) integrating the subcellular localization predictions and experimental data. NetCTL, NetMHC, BepiPred, BCPred12, and AAP12 algorithms were used for in silico epitope prediction and WoLF PSORT, Sigcleave and TargetP for in silico subcellular localization prediction against trypanosomatid genomes. Results A database-driven epitope prediction method was developed with built-in functions that were capable of: a) removing experimental data redundancy; b) parsing algorithms predictions and storage experimental validated and predict data; and c) evaluating algorithm performances. Results show that a better performance is achieved when the combined prediction is considered. This is particularly true for B cell epitope predictors, where the combined prediction of AAP12 and BCPred12 reached an AUC value of 0.77. For T CD8+ epitope predictors, the combined prediction of NetCTL and NetMHC reached an AUC value of 0.64. Finally, regarding the subcellular localization prediction, the best performance is achieved when the combined prediction of Sigcleave, TargetP and WoLF PSORT is used. Conclusions Our study indicates that the combination of B cells epitope predictors is the best tool for predicting epitopes on protozoan parasites proteins. Regarding subcellular localization, the best result was obtained when the three algorithms predictions were combined. The developed pipeline is available upon request to authors.
Collapse
Affiliation(s)
- Daniela M Resende
- Programa de Pós-Graduação em Ciências Farmacêuticas-CiPharma, Laboratório de Pesquisas Clínicas, Escola de Farmácia, Universidade Federal de Ouro Preto, Campus Morro do Cruzeiro, Ouro Preto, MG 35400-000, Brazil
| | | | | | | | | | | | | |
Collapse
|
139
|
Chen YK, Li KB. Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. J Theor Biol 2012; 318:1-12. [PMID: 23137835 DOI: 10.1016/j.jtbi.2012.10.033] [Citation(s) in RCA: 98] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Revised: 10/25/2012] [Accepted: 10/26/2012] [Indexed: 01/04/2023]
Abstract
The type information of un-annotated membrane proteins provides an important hint for their biological functions. The experimental determination of membrane protein types, despite being more accurate and reliable, is not always feasible due to the costly laboratory procedures, thereby creating a need for the development of bioinformatics methods. This article describes a novel computational classifier for the prediction of membrane protein types using proteins' sequences. The classifier, comprising a collection of one-versus-one support vector machines, makes use of the following sequence attributes: (1) the cationic patch sizes, the orientation, and the topology of transmembrane segments; (2) the amino acid physicochemical properties; (3) the presence of signal peptides or anchors; and (4) the specific protein motifs. A new voting scheme was implemented to cope with the multi-class prediction. Both the training and the testing sequences were collected from SwissProt. Homologous proteins were removed such that there is no pair of sequences left in the datasets with a sequence identity higher than 40%. The performance of the classifier was evaluated by a Jackknife cross-validation and an independent testing experiments. Results show that the proposed classifier outperforms earlier predictors in prediction accuracy in seven of the eight membrane protein types. The overall accuracy was increased from 78.3% to 88.2%. Unlike earlier approaches which largely depend on position-specific substitution matrices and amino acid compositions, most of the sequence attributes implemented in the proposed classifier have supported literature evidences. The classifier has been deployed as a web server and can be accessed at http://bsaltools.ym.edu.tw/predmpt.
Collapse
Affiliation(s)
- Yen-Kuang Chen
- Institute of Biomedical Informatics, National Yang-Ming University, No.155, Sec 2, Lih-Nong Street, Taipei, 112, Taiwan, ROC
| | | |
Collapse
|
140
|
Xia X. Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction. SCIENTIFICA 2012; 2012:917540. [PMID: 24278755 PMCID: PMC3820676 DOI: 10.6064/2012/917540] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2012] [Accepted: 10/11/2012] [Indexed: 05/31/2023]
Abstract
Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods, but also a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. However, few generally applicable statistical tests are available for evaluating the significance of site patterns, PWM, and PWM scores (PWMS) of putative motifs. Statistical significance tests of the PWM output, that is, site-specific frequencies, PWM itself, and PWMS, are in disparate sources and have never been collected in a single paper, with the consequence that many implementations of PWM do not include any significance test. Here I review PWM-based methods used in motif characterization and prediction (including a detailed illustration of the Gibbs sampler for de novo motif discovery), present statistical and probabilistic rationales behind statistical significance tests relevant to PWM, and illustrate their application with real data. The multiple comparison problem associated with the test of site-specific frequencies is best handled by false discovery rate methods. The test of PWM, due to the use of pseudocounts, is best done by resampling methods. The test of individual PWMS for each sequence segment should be based on the extreme value distribution.
Collapse
Affiliation(s)
- Xuhua Xia
- Department of Biology, University of Ottawa, 30 Marie Curie, Ottawa, ON, Canada K1N 6N5
| |
Collapse
|
141
|
Karaçali B. Hierarchical motif vectors for prediction of functional sites in amino acid sequences using quasi-supervised learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1432-1441. [PMID: 22585139 DOI: 10.1109/tcbb.2012.68] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
We propose hierarchical motif vectors to represent local amino acid sequence configurations for predicting the functional attributes of amino acid sites on a global scale in a quasi-supervised learning framework. The motif vectors are constructed via wavelet decomposition on the variations of physico-chemical amino acid properties along the sequences. We then formulate a prediction scheme for the functional attributes of amino acid sites in terms of the respective motif vectors using the quasi-supervised learning algorithm that carries out predictions for all sites in consideration using only the experimentally verified sites. We have carried out comparative performance evaluation of the proposed method on the prediction of N-glycosylation of 55,184 sites possessing the consensus N-glycosylation sequon identified over 15,104 human proteins, out of which only 1,939 were experimentally verified N-glycosylation sites. In the experiments, the proposed method achieved better predictive performance than the alternative strategies from the literature. In addition, the predicted N-glycosylation sites showed good agreement with existing potential annotations, while the novel predictions belonged to proteins known to be modified by glycosylation.
Collapse
Affiliation(s)
- Bilge Karaçali
- Department of Electrical and Electronics Engineering, Izmir Institute of Technology, Urla Izmir, Turkey.
| |
Collapse
|
142
|
Liu X, Luo M, Zhang W, Zhao J, Zhang J, Wu K, Tian L, Duan J. Histone acetyltransferases in rice (Oryza sativa L.): phylogenetic analysis, subcellular localization and expression. BMC PLANT BIOLOGY 2012; 12:145. [PMID: 22894565 PMCID: PMC3502346 DOI: 10.1186/1471-2229-12-145] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2011] [Accepted: 08/10/2012] [Indexed: 05/20/2023]
Abstract
BACKGROUND Histone acetyltransferases (HATs) play an important role in eukaryotic transcription. Eight HATs identified in rice (OsHATs) can be organized into four families, namely the CBP (OsHAC701, OsHAC703, and OsHAC704), TAFII250 (OsHAF701), GNAT (OsHAG702, OsHAG703, and OsHAG704), and MYST (OsHAM701) families. The biological functions of HATs in rice remain unknown, so a comprehensive protein sequence analysis of the HAT families was conducted to investigate their potential functions. In addition, the subcellular localization and expression patterns of the eight OsHATs were analyzed. RESULTS On the basis of a phylogenetic and domain analysis, monocotyledonous CBP family proteins can be subdivided into two groups, namely Group I and Group II. Similarly, dicotyledonous CBP family proteins can be divided into two groups, namely Group A and Group B. High similarities of protein sequences, conserved domains and three-dimensional models were identified among OsHATs and their homologs in Arabidopsis thaliana and maize. Subcellular localization predictions indicated that all OsHATs might localize in both the nucleus and cytosol. Transient expression in Arabidopsis protoplasts confirmed the nuclear and cytosolic localization of OsHAC701, OsHAG702, and OsHAG704. Real-time quantitative polymerase chain reaction analysis demonstrated that the eight OsHATs were expressed in all tissues examined with significant differences in transcript abundance, and their expression was modulated by abscisic acid and salicylic acid as well as abiotic factors such as salt, cold, and heat stresses. CONCLUSIONS Both monocotyledonous and dicotyledonous CBP family proteins can be divided into two distinct groups, which suggest the possibility of functional diversification. The high similarities of protein sequences, conserved domains and three-dimensional models among OsHATs and their homologs in Arabidopsis and maize suggested that OsHATs have multiple functions. OsHAC701, OsHAG702, and OsHAG704 were localized in both the nucleus and cytosol in transient expression analyses with Arabidopsis protoplasts. OsHATs were expressed constitutively in rice, and their expression was regulated by exogenous hormones and abiotic stresses, which suggested that OsHATs may play important roles in plant defense responses.
Collapse
Affiliation(s)
- Xia Liu
- South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
- Graduate School of the Chinese Academy of Sciences, Beijing 100039, China
- Southern Crop Protection and Food Research Centre, Agriculture and Agri-Food Canada, London, ON N5V 4T3, Canada
| | - Ming Luo
- South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
- Institute of Plant Biology, National Taiwan University, Taipei 106, Taiwan
| | - Wei Zhang
- South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
- Graduate School of the Chinese Academy of Sciences, Beijing 100039, China
| | - Jinhui Zhao
- South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
- Graduate School of the Chinese Academy of Sciences, Beijing 100039, China
| | - Jianxia Zhang
- South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
| | - Keqiang Wu
- Institute of Plant Biology, National Taiwan University, Taipei 106, Taiwan
| | - Lining Tian
- Southern Crop Protection and Food Research Centre, Agriculture and Agri-Food Canada, London, ON N5V 4T3, Canada
| | - Jun Duan
- South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
| |
Collapse
|
143
|
Sears KT, Ceraul SM, Gillespie JJ, Allen ED, Popov VL, Ammerman NC, Rahman MS, Azad AF. Surface proteome analysis and characterization of surface cell antigen (Sca) or autotransporter family of Rickettsia typhi. PLoS Pathog 2012; 8:e1002856. [PMID: 22912578 PMCID: PMC3415449 DOI: 10.1371/journal.ppat.1002856] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2011] [Accepted: 06/26/2012] [Indexed: 11/20/2022] Open
Abstract
Surface proteins of the obligate intracellular bacterium Rickettsia typhi, the agent of murine or endemic typhus fever, comprise an important interface for host-pathogen interactions including adherence, invasion and survival in the host cytoplasm. In this report, we present analyses of the surface exposed proteins of R. typhi based on a suite of predictive algorithms complemented by experimental surface-labeling with thiol-cleavable sulfo-NHS-SS-biotin and identification of labeled peptides by LC MS/MS. Further, we focus on proteins belonging to the surface cell antigen (Sca) autotransporter (AT) family which are known to be involved in rickettsial infection of mammalian cells. Each species of Rickettsia has a different complement of sca genes in various states; R. typhi, has genes sca1 thru sca5. In silico analyses indicate divergence of the Sca paralogs across the four Rickettsia groups and concur with previous evidence of positive selection. Transcripts for each sca were detected during infection of L929 cells and four of the five Sca proteins were detected in the surface proteome analysis. We observed that each R. typhi Sca protein is expressed during in vitro infections and selected Sca proteins were expressed during in vivo infections. Using biotin-affinity pull down assays, negative staining electron microscopy, and flow cytometry, we demonstrate that the Sca proteins in R. typhi are localized to the surface of the bacteria. All Scas were detected during infection of L929 cells by immunogold electron microscopy. Immunofluorescence assays demonstrate that Scas 1–3 and 5 are expressed in the spleens of infected Sprague-Dawley rats and Scas 3, 4 and 5 are expressed in cat fleas (Ctenocephalides felis). Sca proteins may be crucial in the recognition and invasion of different host cell types. In short, continuous expression of all Scas may ensure that rickettsiae are primed i) to infect mammalian cells should the flea bite a host, ii) to remain infectious when extracellular and iii) to infect the flea midgut when ingested with a blood meal. Each Sca protein may be important for survival of R. typhi and the lack of host restricted expression may indicate a strategy of preparedness for infection of a new host. Rickettsia typhi, a member of the typhus group (TG) rickettsia, is the agent of murine or endemic typhus fever – a disease exhibiting mild to severe flu-like symptoms resulting in significant morbidity. It is maintained in a flearodent transmission cycle in urban and suburban environments. The obligate intracellular lifestyle of rickettsiae makes genetic manipulation difficult and impedes progress towards identification of virulence factors. All five Scas were detected on the surface of R.. typhi using a combination of a biotin-labeled affinity assay, negative stain electron microscopy and flow cytometry. Sca proteins are members of the autotransporter (AT) family or type V secretion system (TVSS). We employed detailed bioinformatic analyses and evaluated their transcript abundance in an in vitro infection model where sca transcripts are detected at varying levels over the course of a 5 day in vitro infection. We also observe expression of selected Sca proteins during infection of fleas and rats. Our study provides a proteomic analysis of the bacterial surface and an initial characterization of the Sca family as it exists in R. typhi.
Collapse
Affiliation(s)
- Khandra T Sears
- Department of Microbiology and Immunology, School of Medicine, University of Maryland Baltimore, Baltimore, Maryland, United States of America.
| | | | | | | | | | | | | | | |
Collapse
|
144
|
Renier S, Micheau P, Talon R, Hébraud M, Desvaux M. Subcellular localization of extracytoplasmic proteins in monoderm bacteria: rational secretomics-based strategy for genomic and proteomic analyses. PLoS One 2012; 7:e42982. [PMID: 22912771 PMCID: PMC3415414 DOI: 10.1371/journal.pone.0042982] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2012] [Accepted: 07/13/2012] [Indexed: 11/20/2022] Open
Abstract
Genome-scale prediction of subcellular localization (SCL) is not only useful for inferring protein function but also for supporting proteomic data. In line with the secretome concept, a rational and original analytical strategy mimicking the secretion steps that determine ultimate SCL was developed for Gram-positive (monoderm) bacteria. Based on the biology of protein secretion, a flowchart and decision trees were designed considering (i) membrane targeting, (ii) protein secretion systems, (iii) membrane retention, and (iv) cell-wall retention by domains or post-translocational modifications, as well as (v) incorporation to cell-surface supramolecular structures. Using Listeria monocytogenes as a case study, results were compared with known data set from SCL predictors and experimental proteomics. While in good agreement with experimental extracytoplasmic fractions, the secretomics-based method outperforms other genomic analyses, which were simply not intended to be as inclusive. Compared to all other localization predictors, this method does not only supply a static snapshot of protein SCL but also offers the full picture of the secretion process dynamics: (i) the protein routing is detailed, (ii) the number of distinct SCL and protein categories is comprehensive, (iii) the description of protein type and topology is provided, (iv) the SCL is unambiguously differentiated from the protein category, and (v) the multiple SCL and protein category are fully considered. In that sense, the secretomics-based method is much more than a SCL predictor. Besides a major step forward in genomics and proteomics of protein secretion, the secretomics-based method appears as a strategy of choice to generate in silico hypotheses for experimental testing.
Collapse
Affiliation(s)
- Sandra Renier
- INRA, UR454 Microbiology, Saint-Genès Champanelle, France
| | - Pierre Micheau
- INRA, UR454 Microbiology, Saint-Genès Champanelle, France
| | - Régine Talon
- INRA, UR454 Microbiology, Saint-Genès Champanelle, France
| | - Michel Hébraud
- INRA, UR454 Microbiology, Saint-Genès Champanelle, France
| | - Mickaël Desvaux
- INRA, UR454 Microbiology, Saint-Genès Champanelle, France
- * E-mail:
| |
Collapse
|
145
|
Furtado C, Kunrath-Lima M, Rajão MA, Mendes IC, de Moura MB, Campos PC, Macedo AM, Franco GR, Pena SDJ, Teixeira SMR, Van Houten B, Machado CR. Functional characterization of 8-oxoguanine DNA glycosylase of Trypanosoma cruzi. PLoS One 2012; 7:e42484. [PMID: 22876325 PMCID: PMC3411635 DOI: 10.1371/journal.pone.0042484] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 07/06/2012] [Indexed: 11/18/2022] Open
Abstract
The oxidative lesion 8-oxoguanine (8-oxoG) is removed during base excision repair by the 8-oxoguanine DNA glycosylase 1 (Ogg1). This lesion can erroneously pair with adenine, and the excision of this damaged base by Ogg1 enables the insertion of a guanine and prevents DNA mutation. In this report, we identified and characterized Ogg1 from the protozoan parasite Trypanosoma cruzi (TcOgg1), the causative agent of Chagas disease. Like most living organisms, T. cruzi is susceptible to oxidative stress, hence DNA repair is essential for its survival and improvement of infection. We verified that the TcOGG1 gene encodes an 8-oxoG DNA glycosylase by complementing an Ogg1-defective Saccharomyces cerevisiae strain. Heterologous expression of TcOGG1 reestablished the mutation frequency of the yeast mutant ogg1(-/-) (CD138) to wild type levels. We also demonstrate that the overexpression of TcOGG1 increases T. cruzi sensitivity to hydrogen peroxide (H(2)O(2)). Analysis of DNA lesions using quantitative PCR suggests that the increased susceptibility to H(2)O(2) of TcOGG1-overexpressor could be a consequence of uncoupled BER in abasic sites and/or strand breaks generated after TcOgg1 removes 8-oxoG, which are not rapidly repaired by the subsequent BER enzymes. This hypothesis is supported by the observation that TcOGG1-overexpressors have reduced levels of 8-oxoG both in the nucleus and in the parasite mitochondrion. The localization of TcOgg1 was examined in parasite transfected with a TcOgg1-GFP fusion, which confirmed that this enzyme is in both organelles. Taken together, our data indicate that T. cruzi has a functional Ogg1 ortholog that participates in nuclear and mitochondrial BER.
Collapse
Affiliation(s)
- Carolina Furtado
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Marianna Kunrath-Lima
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Matheus Andrade Rajão
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Isabela Cecília Mendes
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Michelle Barbi de Moura
- Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine and the University of Pittsburgh Cancer Institute, Hillman Cancer Center, Pittsburgh, Pennsylvania, United States of America
| | - Priscila Carneiro Campos
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Andrea Mara Macedo
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Glória Regina Franco
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Sérgio Danilo Junho Pena
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Santuza Maria Ribeiro Teixeira
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Bennett Van Houten
- Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine and the University of Pittsburgh Cancer Institute, Hillman Cancer Center, Pittsburgh, Pennsylvania, United States of America
| | - Carlos Renato Machado
- Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- * E-mail:
| |
Collapse
|
146
|
Schiller M, Massalski C, Kurth T, Steinebrunner I. The Arabidopsis apyrase AtAPY1 is localized in the Golgi instead of the extracellular space. BMC PLANT BIOLOGY 2012; 12:123. [PMID: 22849572 PMCID: PMC3511161 DOI: 10.1186/1471-2229-12-123] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Accepted: 07/09/2012] [Indexed: 05/04/2023]
Abstract
BACKGROUND The two highly similar Arabidopsis apyrases AtAPY1 and AtAPY2 were previously shown to be involved in plant growth and development, evidently by regulating extracellular ATP signals. The subcellular localization of AtAPY1 was investigated to corroborate an extracellular function. RESULTS Transgenic Arabidopsis lines expressing AtAPY1 fused to the SNAP-(O(6)-alkylguanine-DNA alkyltransferase)-tag were used for indirect immunofluorescence and AtAPY1 was detected in punctate structures within the cell. The same signal pattern was found in seedlings stably overexpressing AtAPY1-GFP by indirect immunofluorescence and live imaging. In order to identify the nature of the AtAPY1-positive structures, AtAPY1-GFP expressing seedlings were treated with the endocytic marker stain FM4-64 (N-(3-triethylammoniumpropyl)-4-(p-diethylaminophenyl-hexatrienyl)-pyridinium dibromide) and crossed with a transgenic line expressing the trans-Golgi marker Rab E1d. Neither FM4-64 nor Rab E1d co-localized with AtAPY1. However, live imaging of transgenic Arabidopsis lines expressing AtAPY1-GFP and either the fluorescent protein-tagged Golgi marker Membrin 12, Syntaxin of plants 32 or Golgi transport 1 protein homolog showed co-localization. The Golgi localization was confirmed by immunogold labeling of AtAPY1-GFP. There was no indication of extracellular AtAPY1 by indirect immunofluorescence using antibodies against SNAP and GFP, live imaging of AtAPY1-GFP and immunogold labeling of AtAPY1-GFP. Activity assays with AtAPY1-GFP revealed GDP, UDP and IDP as substrates, but neither ATP nor ADP. To determine if AtAPY1 is a soluble or membrane protein, microsomal membranes were isolated and treated with various solubilizing agents. Only SDS and urea (not alkaline or high salt conditions) were able to release the AtAPY1 protein from microsomal membranes. CONCLUSIONS AtAPY1 is an integral Golgi protein with the substrate specificity typical for Golgi apyrases. It is therefore not likely to regulate extracellular nucleotide signals as previously thought. We propose instead that AtAPY1 exerts its growth and developmental effects by possibly regulating glycosylation reactions in the Golgi.
Collapse
Affiliation(s)
- Madlen Schiller
- Department of Biology, Section of Molecular Biotechnology, Technische Universität Dresden, Helmholtzstraße 10, Dresden 01069, Germany
| | - Carolin Massalski
- Department of Biology, Section of Molecular Biotechnology, Technische Universität Dresden, Helmholtzstraße 10, Dresden 01069, Germany
| | - Thomas Kurth
- DFG-Center for Regenerative Therapies Dresden (CRTD), Technische Universität Dresden, Fetscherstraße 105, Dresden 01307, Germany
| | - Iris Steinebrunner
- Department of Biology, Section of Molecular Biotechnology, Technische Universität Dresden, Helmholtzstraße 10, Dresden 01069, Germany
| |
Collapse
|
147
|
Predicted protein subcellular localization in dominant surface ocean bacterioplankton. Appl Environ Microbiol 2012; 78:6550-7. [PMID: 22773648 DOI: 10.1128/aem.01406-12] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Bacteria consume dissolved organic matter (DOM) through hydrolysis, transport and intracellular metabolism, and these activities occur in distinct subcellular localizations. Bacterial protein subcellular localizations for several major marine bacterial groups were predicted using genomic, metagenomic and metatranscriptomic data sets following modification of MetaP software for use with partial gene sequences. The most distinct pattern of subcellular localization was found for Bacteroidetes, whose genomes were substantially enriched with outer membrane and extracellular proteins but depleted of inner membrane proteins compared with five other taxa (SAR11, Roseobacter, Synechococcus, Prochlorococcus, oligotrophic marine Gammaproteobacteria). When subcellular localization patterns were compared between genes and transcripts, three taxa had expression biased toward proteins localized to cell locations outside of the cytosol (SAR11, Roseobacter, and Synechococcus), as expected based on the importance of carbon and nutrient acquisition in an oligotrophic ocean, but two taxa did not (oligotrophic marine Gammaproteobacteria and Bacteroidetes). Diel variations in the fraction and putative gene functions of transcripts encoding inner membrane and periplasmic proteins compared to cytoplasmic proteins suggest a close coupling of photosynthetic extracellular release and bacterial consumption, providing insights into interactions between phytoplankton, bacteria, and DOM.
Collapse
|
148
|
Lin JR, Mondal AM, Liu R, Hu J. Minimalist ensemble algorithms for genome-wide protein localization prediction. BMC Bioinformatics 2012; 13:157. [PMID: 22759391 PMCID: PMC3426488 DOI: 10.1186/1471-2105-13-157] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2011] [Accepted: 07/03/2012] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. RESULTS This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. CONCLUSIONS We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Collapse
Affiliation(s)
- Jhih-Rong Lin
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
| | | | | | | |
Collapse
|
149
|
He J, Gu H, Liu W. Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PLoS One 2012; 7:e37155. [PMID: 22715364 PMCID: PMC3371015 DOI: 10.1371/journal.pone.0037155] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Accepted: 04/14/2012] [Indexed: 12/20/2022] Open
Abstract
It is well known that an important step toward understanding the functions of a protein is to determine its subcellular location. Although numerous prediction algorithms have been developed, most of them typically focused on the proteins with only one location. In recent years, researchers have begun to pay attention to the subcellular localization prediction of the proteins with multiple sites. However, almost all the existing approaches have failed to take into account the correlations among the locations caused by the proteins with multiple sites, which may be the important information for improving the prediction accuracy of the proteins with multiple sites. In this paper, a new algorithm which can effectively exploit the correlations among the locations is proposed by using gaussian process model. Besides, the algorithm also can realize optimal linear combination of various feature extraction technologies and could be robust to the imbalanced data set. Experimental results on a human protein data set show that the proposed algorithm is valid and can achieve better performance than the existing approaches.
Collapse
Affiliation(s)
- Jianjun He
- School of Control Science and Engineering, Dalian University of Technology, Dalian, Liaoning, China
| | - Hong Gu
- School of Control Science and Engineering, Dalian University of Technology, Dalian, Liaoning, China
- * E-mail:
| | - Wenqi Liu
- School of Control Science and Engineering, Dalian University of Technology, Dalian, Liaoning, China
| |
Collapse
|
150
|
Wang X, Li GZ. A multi-label predictor for identifying the subcellular locations of singleplex and multiplex eukaryotic proteins. PLoS One 2012; 7:e36317. [PMID: 22629314 PMCID: PMC3358325 DOI: 10.1371/journal.pone.0036317] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Accepted: 04/01/2012] [Indexed: 01/30/2023] Open
Abstract
Subcellular locations of proteins are important functional attributes. An effective and efficient subcellular localization predictor is necessary for rapidly and reliably annotating subcellular locations of proteins. Most of existing subcellular localization methods are only used to deal with single-location proteins. Actually, proteins may simultaneously exist at, or move between, two or more different subcellular locations. To better reflect characteristics of multiplex proteins, it is highly desired to develop new methods for dealing with them. In this paper, a new predictor, called Euk-ECC-mPLoc, by introducing a powerful multi-label learning approach which exploits correlations between subcellular locations and hybridizing gene ontology with dipeptide composition information, has been developed that can be used to deal with systems containing both singleplex and multiplex eukaryotic proteins. It can be utilized to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centrosome, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome, (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole. Experimental results on a stringent benchmark dataset of eukaryotic proteins by jackknife cross validation test show that the average success rate and overall success rate obtained by Euk-ECC-mPLoc were 69.70% and 81.54%, respectively, indicating that our approach is quite promising. Particularly, the success rates achieved by Euk-ECC-mPLoc for small subsets were remarkably improved, indicating that it holds a high potential for simulating the development of the area. As a user-friendly web-server, Euk-ECC-mPLoc is freely accessible to the public at the website http://levis.tongji.edu.cn:8080/bioinfo/Euk-ECC-mPLoc/. We believe that Euk-ECC-mPLoc may become a useful high-throughput tool, or at least play a complementary role to the existing predictors in identifying subcellular locations of eukaryotic proteins.
Collapse
Affiliation(s)
| | - Guo-Zheng Li
- The MOE Key Laboratory of Embedded System and Service Computing, Department of Control Science and Engineering, Tongji University, Shanghai, China
| |
Collapse
|