Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hua S, Sun Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics 2001;17:721-8. [PMID: 11524373 DOI: 10.1093/bioinformatics/17.8.721] [Citation(s) in RCA: 479] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

For:	Hua S, Sun Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics 2001;17:721-8. [PMID: 11524373 DOI: 10.1093/bioinformatics/17.8.721] [Citation(s) in RCA: 479] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

101

Tiwari AK, Srivastava R. A survey of computational intelligence techniques in protein function prediction. INTERNATIONAL JOURNAL OF PROTEOMICS 2014;2014:845479. [PMID: 25574395 PMCID: PMC4276698 DOI: 10.1155/2014/845479] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Revised: 10/31/2014] [Accepted: 11/07/2014] [Indexed: 02/08/2023]

102

Kumar R, Kumari B, Srivastava A, Kumar M. NRfamPred: a proteome-scale two level method for prediction of nuclear receptor proteins and their sub-families. Sci Rep 2014;4:6810. [PMID: 25351274 PMCID: PMC5381360 DOI: 10.1038/srep06810] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 10/09/2014] [Indexed: 11/09/2022] Open

103

Stetson LC, Pearl T, Chen Y, Barnholtz-Sloan JS. Computational identification of multi-omic correlates of anticancer therapeutic response. BMC Genomics 2014;15 Suppl 7:S2. [PMID: 25573145 PMCID: PMC4243102 DOI: 10.1186/1471-2164-15-s7-s2] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

Abstract

Background

A challenge in precision medicine is the transformation of genomic data into knowledge that can be used to stratify patients into treatment groups based on predicted clinical response. Although clinical trials remain the only way to truly measure drug toxicities and effectiveness, as a scientific community we lack the resources to clinically assess all drugs presently under development. Therefore, an effective preclinical model system that enables prediction of anticancer drug response could significantly speed the broader adoption of personalized medicine.

Results

Three large-scale pharmacogenomic studies have screened anticancer compounds in greater than 1000 distinct human cancer cell lines. We combined these datasets to generate and validate multi-omic predictors of drug response. We compared drug response signatures built using a penalized linear regression model and two non-linear machine learning techniques, random forest and support vector machine. The precision and robustness of each drug response signature was assessed using cross-validation across three independent datasets. Fifteen drugs were common among the datasets. We validated prediction signatures for eleven out of fifteen tested drugs (17-AAG, AZD0530, AZD6244, Erlotinib, Lapatinib, Nultin-3, Paclitaxel, PD0325901, PD0332991, PF02341066, and PLX4720).

Conclusions

Multi-omic predictors of drug response can be generated and validated for many drugs. Specifically, the random forest algorithm generated more precise and robust prediction signatures when compared to support vector machines and the more commonly used elastic net regression. The resulting drug response signatures can be used to stratify patients into treatment groups based on their individual tumor biology, with two major benefits: speeding the process of bringing preclinical drugs to market, and the repurposing and repositioning of existing anticancer therapies.

Collapse

104

Abbas SS, Dijkstra TMH, Heskes T. A comparative study of cell classifiers for image-based high-throughput screening. BMC Bioinformatics 2014;15:342. [PMID: 25336059 PMCID: PMC4287552 DOI: 10.1186/1471-2105-15-342] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Accepted: 09/29/2014] [Indexed: 11/24/2022] Open

105

Pacharawongsakda E, Theeramunkong T. Predict subcellular locations of singleplex and multiplex proteins by semi-supervised learning and dimension-reducing general mode of Chou's PseAAC. IEEE Trans Nanobioscience 2014;12:311-20. [PMID: 23864226 DOI: 10.1109/tnb.2013.2272014] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Abstract

Predicting protein subcellular location is one of major challenges in Bioinformatics area since such knowledge helps us understand protein functions and enables us to select the targeted proteins during drug discovery process. While many computational techniques have been proposed to improve predictive performance for protein subcellular location, they have several shortcomings. In this work, we propose a method to solve three main issues in such techniques; i) manipulation of multiplex proteins which may exist or move between multiple cellular compartments, ii) handling of high dimensionality in input and output spaces and iii) requirement of sufficient labeled data for model training. Towards these issues, this work presents a new computational method for predicting proteins which have either single or multiple locations. The proposed technique, namely iFLAST-CORE, incorporates the dimensionality reduction in the feature and label spaces with co-training paradigm for semi-supervised multi-label classification. For this purpose, the Singular Value Decomposition (SVD) is applied to transform the high-dimensional feature space and label space into the lower-dimensional spaces. After that, due to limitation of labeled data, the co-training regression makes use of unlabeled data by predicting the target values in the lower-dimensional spaces of unlabeled data. In the last step, the component of SVD is used to project labels in the lower-dimensional space back to those in the original space and an adaptive threshold is used to map a numeric value to a binary value for label determination. A set of experiments on viral proteins and gram-negative bacterial proteins evidence that our proposed method improve the classification performance in terms of various evaluation metrics such as Aiming (or Precision), Coverage (or Recall) and macro F-measure, compared to the traditional method that uses only labeled data.

Collapse

106

Hooper CM, Tanz SK, Castleden IR, Vacher MA, Small ID, Millar AH. SUBAcon: a consensus algorithm for unifying the subcellular localization data of the Arabidopsis proteome. ACTA ACUST UNITED AC 2014;30:3356-64. [PMID: 25150248 DOI: 10.1093/bioinformatics/btu550] [Citation(s) in RCA: 123] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Affiliation(s)

Cornelia M Hooper Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
Sandra K Tanz Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
Ian R Castleden Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
Michael A Vacher Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
Ian D Small Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
A Harvey Millar Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia

Collapse

107

Mao R, Raj Kumar PK, Guo C, Zhang Y, Liang C. Comparative analyses between retained introns and constitutively spliced introns in Arabidopsis thaliana using random forest and support vector machine. PLoS One 2014;9:e104049. [PMID: 25110928 PMCID: PMC4128822 DOI: 10.1371/journal.pone.0104049] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 07/06/2014] [Indexed: 01/04/2023] Open

Abstract

One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.

Collapse

108

Mamun K, Sharma A. Importance of Computational Intelligent in Proteomics. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2014. [DOI: 10.20965/jaciii.2014.p0469] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

109

Ding S, Yan S, Qi S, Li Y, Yao Y. A protein structural classes prediction method based on PSI-BLAST profile. J Theor Biol 2014;353:19-23. [DOI: 10.1016/j.jtbi.2014.02.034] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 01/27/2014] [Accepted: 02/24/2014] [Indexed: 11/27/2022]

110

Kumar R, Jain S, Kumari B, Kumar M. Protein sub-nuclear localization prediction using SVM and Pfam domain information. PLoS One 2014;9:e98345. [PMID: 24897370 PMCID: PMC4045734 DOI: 10.1371/journal.pone.0098345] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2013] [Accepted: 05/01/2014] [Indexed: 12/24/2022] Open

111

Pan R, Kaur N, Hu J. The Arabidopsis mitochondrial membrane-bound ubiquitin protease UBP27 contributes to mitochondrial morphogenesis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2014;78:1047-59. [PMID: 24707813 DOI: 10.1111/tpj.12532] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Revised: 03/28/2014] [Accepted: 04/01/2014] [Indexed: 05/13/2023]

112

Frost PC, Song K, Wagner ND. A beginner's guide to nutritional profiling in physiology and ecology. Integr Comp Biol 2014;54:873-9. [PMID: 24876193 DOI: 10.1093/icb/icu054] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

113

Zhang L, Zhao X, Kong L. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou's pseudo amino acid composition. J Theor Biol 2014;355:105-10. [PMID: 24735902 DOI: 10.1016/j.jtbi.2014.04.008] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Revised: 02/26/2014] [Accepted: 04/04/2014] [Indexed: 10/25/2022]

114

Verma JK, Gayali S, Dass S, Kumar A, Parveen S, Chakraborty S, Chakraborty N. OsAlba1, a dehydration-responsive nuclear protein of rice (Oryza sativa L. ssp. indica), participates in stress adaptation. PHYTOCHEMISTRY 2014;100:16-25. [PMID: 24534105 DOI: 10.1016/j.phytochem.2014.01.015] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Revised: 01/16/2014] [Accepted: 01/22/2014] [Indexed: 05/13/2023]

115

Ding S, Li Y, Shi Z, Yan S. A protein structural classes prediction method based on predicted secondary structure and PSI-BLAST profile. Biochimie 2014;97:60-5. [DOI: 10.1016/j.biochi.2013.09.013] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 09/16/2013] [Indexed: 10/26/2022]

116

Ghosh S, Vishveshwara S. Ranking the quality of protein structure models using sidechain based network properties. F1000Res 2014;3:17. [PMID: 25580218 PMCID: PMC4038323 DOI: 10.12688/f1000research.3-17.v1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/20/2014] [Indexed: 01/31/2023] Open

117

Chen X, Li J, Hou J, Xie Z, Yang F. Mammalian mitochondrial proteomics: insights into mitochondrial functions and mitochondria-related diseases. Expert Rev Proteomics 2014;7:333-45. [DOI: 10.1586/epr.10.22] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

118

Du P, Xu C. Predicting multisite protein subcellular locations: progress and challenges. Expert Rev Proteomics 2014;10:227-37. [DOI: 10.1586/epr.13.16] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

119

de Lucas M, Provart NJ, Brady SM. Bioinformatic tools in Arabidopsis research. Methods Mol Biol 2014;1062:97-136. [PMID: 24057362 DOI: 10.1007/978-1-62703-580-4_5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

120

Hajisharifi Z, Piryaiee M, Mohammad Beigi M, Behbahani M, Mohabatkar H. Predicting anticancer peptides with Chou′s pseudo amino acid composition and investigating their mutagenicity via Ames test. J Theor Biol 2014;341:34-40. [DOI: 10.1016/j.jtbi.2013.08.037] [Citation(s) in RCA: 210] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Revised: 08/28/2013] [Accepted: 08/31/2013] [Indexed: 12/27/2022]

121

Zhang S, Liang Y, Yuan X. Improving the prediction accuracy of protein structural class: Approached with alternating word frequency and normalized Lempel–Ziv complexity. J Theor Biol 2014;341:71-7. [DOI: 10.1016/j.jtbi.2013.10.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Revised: 09/08/2013] [Accepted: 10/08/2013] [Indexed: 10/26/2022]

122

Palanisamy B, Heese K. Oxygen distribution in proteins defines functional significance of the genome and proteome of the malaria parasitePlasmodium falciparum3D7. FEMS Microbiol Lett 2013;351:59-63. [DOI: 10.1111/1574-6968.12355] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 12/06/2013] [Indexed: 11/27/2022] Open

123

Tian J, Zhang Y, Liu B, Zuo D, Jiang T, Guo J, Zhang W, Wu N, Fan Y. Presep: predicting the propensity of a protein being secreted into the supernatant when expressed in Pichia pastoris. PLoS One 2013;8:e79749. [PMID: 24278168 PMCID: PMC3836778 DOI: 10.1371/journal.pone.0079749] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 10/02/2013] [Indexed: 11/19/2022] Open

124

Niarchou A, Alexandridou A, Athanasiadis E, Spyrou G. C-PAmP: large scale analysis and database construction containing high scoring computationally predicted antimicrobial peptides for all the available plant species. PLoS One 2013;8:e79728. [PMID: 24244550 PMCID: PMC3823563 DOI: 10.1371/journal.pone.0079728] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2013] [Accepted: 10/04/2013] [Indexed: 12/03/2022] Open

Abstract

Background

Antimicrobial peptides are a promising alternative to conventional antibiotics. Plants are an important source of such peptides; their pharmacological properties are known since antiquity. Access to relevant information, however, is not straightforward, as there are practically no major repositories of experimentally validated and/or predicted plant antimicrobial peptides. PhytAMP is the only database dedicated to plant peptides with confirmed antimicrobial action, holding 273 entries. Data on such peptides can be otherwise retrieved from generic repositories.

Description

We present C-PAmP, a database of computationally predicted plant antimicrobial peptides. C-PAmP contains 15,174,905 peptides, 5–100 amino acids long, derived from 33,877 proteins of 2,112 plant species in UniProtKB/Swiss-Prot. Its web interface allows queries based on peptide/protein sequence, protein accession number and species. Users can view the corresponding predicted peptides along with their probability score, their classification according to the Collection of Anti-Microbial Peptides (CAMP), and their PhytAMP id where applicable. Moreover, users can visualise protein regions with a high concentration of predicted antimicrobial peptides. In order to identify potential antimicrobial peptides we used a classification algorithm, based on a modified version of the pseudo amino acid concept. The classifier tested all subsequences ranging from 5 to 100 amino acids of the plant proteins in UniProtKB/Swiss-Prot and stored those classified as antimicrobial with a high probability score (>90%). Its performance measures across a 10-fold cross-validation are more than satisfactory (accuracy: 0.91, sensitivity: 0.93, specificity: 0.90) and it succeeded in classifying 99.5% of the PhytAMP peptides correctly.

Conclusions

We have compiled a major repository of predicted plant antimicrobial peptides using a highly performing classification algorithm. Our repository is accessible from the web and supports multiple querying options to optimise data retrieval. We hope it will greatly benefit drug design research by significantly limiting the range of plant peptides to be experimentally tested for antimicrobial activity.

Collapse

125

Palanisamy B, Ekambaram R, Heese K. Thymine distribution in genes provides novel insight into the functional significance of the proteome of the malaria parasite Plasmodium falciparum 3D7. Bioinformatics 2013;30:597-600. [DOI: 10.1093/bioinformatics/btt587] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

126

Kaundal R, Sahu SS, Verma R, Weirick T. Identification and characterization of plastid-type proteins from sequence-attributed features using machine learning. BMC Bioinformatics 2013;14 Suppl 14:S7. [PMID: 24266945 PMCID: PMC3851450 DOI: 10.1186/1471-2105-14-s14-s7] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Abstract

BACKGROUND

Plastids are an important component of plant cells, being the site of manufacture and storage of chemical compounds used by the cell, and contain pigments such as those used in photosynthesis, starch synthesis/storage, cell color etc. They are essential organelles of the plant cell, also present in algae. Recent advances in genomic technology and sequencing efforts is generating a huge amount of DNA sequence data every day. The predicted proteome of these genomes needs annotation at a faster pace. In view of this, one such annotation need is to develop an automated system that can distinguish between plastid and non-plastid proteins accurately, and further classify plastid-types based on their functionality. We compared the amino acid compositions of plastid proteins with those of non-plastid ones and found significant differences, which were used as a basis to develop various feature-based prediction models using similarity-search and machine learning.

RESULTS

In this study, we developed separate Support Vector Machine (SVM) trained classifiers for characterizing the plastids in two steps: first distinguishing the plastid vs. non-plastid proteins, and then classifying the identified plastids into their various types based on their function (chloroplast, chromoplast, etioplast, and amyloplast). Five diverse protein features: amino acid composition, dipeptide composition, the pseudo amino acid composition, N(terminal)-Center-C(terminal) composition and the protein physicochemical properties are used to develop SVM models. Overall, the dipeptide composition-based module shows the best performance with an accuracy of 86.80% and Matthews Correlation Coefficient (MCC) of 0.74 in phase-I and 78.60% with a MCC of 0.44 in phase-II. On independent test data, this model also performs better with an overall accuracy of 76.58% and 74.97% in phase-I and phase-II, respectively. The similarity-based PSI-BLAST module shows very low performance with about 50% prediction accuracy for distinguishing plastid vs. non-plastids and only 20% in classifying various plastid-types, indicating the need and importance of machine learning algorithms.

CONCLUSION

The current work is a first attempt to develop a methodology for classifying various plastid-type proteins. The prediction modules have also been made available as a web tool, PLpred available at http://bioinfo.okstate.edu/PLpred/ for real time identification/characterization. We believe this tool will be very useful in the functional annotation of various genomes.

Collapse

127

Rosillo R, Giner J, de la Fuente D. The effectiveness of the combined use of VIX and Support Vector Machines on the prediction of S&P 500. Neural Comput Appl 2013. [DOI: 10.1007/s00521-013-1487-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

128

Zhang L, Zhao X, Kong L. A protein structural class prediction method based on novel features. Biochimie 2013;95:1741-4. [DOI: 10.1016/j.biochi.2013.05.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2013] [Accepted: 05/28/2013] [Indexed: 11/28/2022]

129

Armengaud J, Christie-Oleza JA, Clair G, Malard V, Duport C. Exoproteomics: exploring the world around biological systems. Expert Rev Proteomics 2013. [PMID: 23194272 DOI: 10.1586/epr.12.52] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

130

Wang X, Li GZ. Multilabel learning via random label selection for protein subcellular multilocations prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013;10:436-446. [PMID: 23929867 DOI: 10.1109/tcbb.2013.21] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

Abstract

Prediction of protein subcellular localization is an important but challenging problem, particularly when proteins may simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular localization methods are only used to deal with the single-location proteins. In the past few years, only a few methods have been proposed to tackle proteins with multiple locations. However, they only adopt a simple strategy, that is, transforming the multilocation proteins to multiple proteins with single location, which does not take correlations among different subcellular locations into account. In this paper, a novel method named random label selection (RALS) (multilabel learning via RALS), which extends the simple binary relevance (BR) method, is proposed to learn from multilocation proteins in an effective and efficient way. RALS does not explicitly find the correlations among labels, but rather implicitly attempts to learn the label correlations from data by augmenting original feature space with randomly selected labels as its additional input features. Through the fivefold cross-validation test on a benchmark data set, we demonstrate our proposed method with consideration of label correlations obviously outperforms the baseline BR method without consideration of label correlations, indicating correlations among different subcellular locations really exist and contribute to improvement of prediction performance. Experimental results on two benchmark data sets also show that our proposed methods achieve significantly higher performance than some other state-of-the-art methods in predicting subcellular multilocations of proteins. The prediction web server is available at >http://levis.tongji.edu.cn:8080/bioinfo/MLPred-Euk/ for the public usage.

Collapse

131

Dutta A, Katarkar A, Chaudhuri K. In-silico structural and functional characterization of a V. cholerae O395 hypothetical protein containing a PDZ1 and an uncommon protease domain. PLoS One 2013;8:e56725. [PMID: 23441214 PMCID: PMC3575494 DOI: 10.1371/journal.pone.0056725] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2012] [Accepted: 01/14/2013] [Indexed: 11/18/2022] Open

132

Li GZ, Wang X, Hu X, Liu JM, Zhao RW. Multilabel learning for protein subcellular location prediction. IEEE Trans Nanobioscience 2013;11:237-43. [PMID: 22987129 DOI: 10.1109/tnb.2012.2212249] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

133

On the structural context and identification of enzyme catalytic residues. BIOMED RESEARCH INTERNATIONAL 2013;2013:802945. [PMID: 23484160 PMCID: PMC3581254 DOI: 10.1155/2013/802945] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 12/28/2012] [Indexed: 11/25/2022]

134

Zhang X, Shen Y, Ding G, Tian Y, Liu Z, Li B, Wang Y, Jiang C. TFPP: an SVM-based tool for recognizing flagellar proteins in Trypanosoma brucei. PLoS One 2013;8:e54032. [PMID: 23349782 PMCID: PMC3547966 DOI: 10.1371/journal.pone.0054032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2012] [Accepted: 12/07/2012] [Indexed: 11/18/2022] Open

135

Lei JB, Yin JB, Shen HB. GFO: A data driven approach for optimizing the Gaussian function based similarity metric in computational biology. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2012.07.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

136

Tanz SK, Castleden I, Hooper CM, Vacher M, Small I, Millar HA. SUBA3: a database for integrating experimentation and prediction to define the SUBcellular location of proteins in Arabidopsis. Nucleic Acids Res 2013;41:D1185-91. [PMID: 23180787 PMCID: PMC3531127 DOI: 10.1093/nar/gks1151] [Citation(s) in RCA: 236] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Revised: 10/24/2012] [Accepted: 10/25/2012] [Indexed: 12/27/2022] Open

137

Su ECY, Chang JM, Cheng CW, Sung TY, Hsu WL. Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing. BMC Bioinformatics 2012;13 Suppl 17:S13. [PMID: 23282098 PMCID: PMC3521467 DOI: 10.1186/1471-2105-13-s17-s13] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

Abstract

Background

Identification of subcellular localization in proteins is crucial to elucidate cellular processes and molecular functions in a cell. However, given a tremendous amount of sequence data generated in the post-genomic era, determining protein localization based on biological experiments can be expensive and time-consuming. Therefore, developing prediction systems to analyze uncharacterised proteins efficiently has played an important role in high-throughput protein analyses. In a eukaryotic cell, many essential biological processes take place in the nucleus. Nuclear proteins shuttle between nucleus and cytoplasm based on recognition of nuclear translocation signals, including nuclear localization signals (NLSs) and nuclear export signals (NESs). Currently, only a few approaches have been developed specifically to predict nuclear localization using sequence features, such as putative NLSs. However, it has been shown that prediction coverage based on the NLSs is very low. In addition, most existing approaches only attained prediction accuracy and Matthew's correlation coefficient (MCC) around 54%~70% and 0.250~0.380 on independent test set, respectively. Moreover, no predictor can generate sequence motifs to characterize features of potential NESs, in which biological properties are not well understood from existing experimental studies.

Results

In this study, first we propose PSLNuc (Protein Subcellular Localization prediction for Nucleus) for predicting nuclear localization in proteins. First, for feature representation, a protein is represented by gapped-dipeptides and the feature values are weighted by homology information from a smoothed position-specific scoring matrix. After that, we incorporate probabilistic latent semantic indexing (PLSI) for feature reduction. Finally, the reduced features are used as input for a support vector machine (SVM) classifier. In addition to PSLNuc, we further identify gapped-dipeptide signatures for putative NLSs and NESs to develop a prediction method, PSLNTS (Protein Subcellular Localization prediction using Nuclear Translocation Signals). We apply PLSI to generate gapped-dipeptide signatures from both nuclear and non-nuclear proteins, and propose candidate sequence motifs for putative NLSs and NESs. Then, we incorporate only the proposed gapped-dipeptide signatures in an SVM classifier to mimic biological properties of NLSs and NESs for predicting nuclear localization in PSLNTS.

Conclusions

Experiment results demonstrate that the proposed method shows a significant improvement for nuclear localization prediction. To compare our predictive performance with other approaches, we incorporate two non-redundant benchmark data sets, a training set and an independent test set. Evaluated by five-fold cross-validation on the training set, PSLNuc attains an overall accuracy of 79.7%, which is 4.8% improvement over the state-of-the-art system. In addition, our method also enhances the MCC from 0.497 to 0.595. Compared on the independent test set, PSLNuc outperforms other predictors by 3.9%~19.9% on accuracy and 0.077~0.207 on MCC. This suggests that, in addition to NLSs, which have been shown important for nuclear proteins, NESs can also be an effective indicator to detect non-nuclear proteins. Most notably, using only a few proposed gapped-dipeptide signatures as input features for the SVM classifier, PSLNTS further enhances the accuracy and MCC to 80.9% and 0.618, respectively. Our results demonstrate that gapped-dipeptide signatures can better discriminate nuclear and non-nuclear proteins. Moreover, the proposed gapped-dipeptide signatures can be biologically interpreted and used in further experiment analyses of nuclear translocation signals, including NLSs and NESs.

Collapse

138

Resende DM, Rezende AM, Oliveira NJD, Batista ICA, Corrêa-Oliveira R, Reis AB, Ruiz JC. An assessment on epitope prediction methods for protozoa genomes. BMC Bioinformatics 2012;13:309. [PMID: 23170965 PMCID: PMC3543197 DOI: 10.1186/1471-2105-13-309] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 11/11/2012] [Indexed: 12/03/2022] Open

Abstract

Background

Epitope prediction using computational methods represents one of the most promising approaches to vaccine development. Reduction of time, cost, and the availability of completely sequenced genomes are key points and highly motivating regarding the use of reverse vaccinology. Parasites of genus Leishmania are widely spread and they are the etiologic agents of leishmaniasis. Currently, there is no efficient vaccine against this pathogen and the drug treatment is highly toxic. The lack of sufficiently large datasets of experimentally validated parasites epitopes represents a serious limitation, especially for trypanomatids genomes. In this work we highlight the predictive performances of several algorithms that were evaluated through the development of a MySQL database built with the purpose of: a) evaluating individual algorithms prediction performances and their combination for CD8+ T cell epitopes, B-cell epitopes and subcellular localization by means of AUC (Area Under Curve) performance and a threshold dependent method that employs a confusion matrix; b) integrating data from experimentally validated and in silico predicted epitopes; and c) integrating the subcellular localization predictions and experimental data. NetCTL, NetMHC, BepiPred, BCPred12, and AAP12 algorithms were used for in silico epitope prediction and WoLF PSORT, Sigcleave and TargetP for in silico subcellular localization prediction against trypanosomatid genomes.

Results

A database-driven epitope prediction method was developed with built-in functions that were capable of: a) removing experimental data redundancy; b) parsing algorithms predictions and storage experimental validated and predict data; and c) evaluating algorithm performances. Results show that a better performance is achieved when the combined prediction is considered. This is particularly true for B cell epitope predictors, where the combined prediction of AAP12 and BCPred12 reached an AUC value of 0.77. For T CD8+ epitope predictors, the combined prediction of NetCTL and NetMHC reached an AUC value of 0.64. Finally, regarding the subcellular localization prediction, the best performance is achieved when the combined prediction of Sigcleave, TargetP and WoLF PSORT is used.

Conclusions

Our study indicates that the combination of B cells epitope predictors is the best tool for predicting epitopes on protozoan parasites proteins. Regarding subcellular localization, the best result was obtained when the three algorithms predictions were combined. The developed pipeline is available upon request to authors.

Collapse

139

Chen YK, Li KB. Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. J Theor Biol 2012;318:1-12. [PMID: 23137835 DOI: 10.1016/j.jtbi.2012.10.033] [Citation(s) in RCA: 98] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Revised: 10/25/2012] [Accepted: 10/26/2012] [Indexed: 01/04/2023]

140

Xia X. Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction. SCIENTIFICA 2012;2012:917540. [PMID: 24278755 PMCID: PMC3820676 DOI: 10.6064/2012/917540] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2012] [Accepted: 10/11/2012] [Indexed: 05/31/2023]

141

Karaçali B. Hierarchical motif vectors for prediction of functional sites in amino acid sequences using quasi-supervised learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012;9:1432-1441. [PMID: 22585139 DOI: 10.1109/tcbb.2012.68] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

142

Liu X, Luo M, Zhang W, Zhao J, Zhang J, Wu K, Tian L, Duan J. Histone acetyltransferases in rice (Oryza sativa L.): phylogenetic analysis, subcellular localization and expression. BMC PLANT BIOLOGY 2012;12:145. [PMID: 22894565 PMCID: PMC3502346 DOI: 10.1186/1471-2229-12-145] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2011] [Accepted: 08/10/2012] [Indexed: 05/20/2023]

Abstract

BACKGROUND

Histone acetyltransferases (HATs) play an important role in eukaryotic transcription. Eight HATs identified in rice (OsHATs) can be organized into four families, namely the CBP (OsHAC701, OsHAC703, and OsHAC704), TAFII250 (OsHAF701), GNAT (OsHAG702, OsHAG703, and OsHAG704), and MYST (OsHAM701) families. The biological functions of HATs in rice remain unknown, so a comprehensive protein sequence analysis of the HAT families was conducted to investigate their potential functions. In addition, the subcellular localization and expression patterns of the eight OsHATs were analyzed.

RESULTS

On the basis of a phylogenetic and domain analysis, monocotyledonous CBP family proteins can be subdivided into two groups, namely Group I and Group II. Similarly, dicotyledonous CBP family proteins can be divided into two groups, namely Group A and Group B. High similarities of protein sequences, conserved domains and three-dimensional models were identified among OsHATs and their homologs in Arabidopsis thaliana and maize. Subcellular localization predictions indicated that all OsHATs might localize in both the nucleus and cytosol. Transient expression in Arabidopsis protoplasts confirmed the nuclear and cytosolic localization of OsHAC701, OsHAG702, and OsHAG704. Real-time quantitative polymerase chain reaction analysis demonstrated that the eight OsHATs were expressed in all tissues examined with significant differences in transcript abundance, and their expression was modulated by abscisic acid and salicylic acid as well as abiotic factors such as salt, cold, and heat stresses.

CONCLUSIONS

Both monocotyledonous and dicotyledonous CBP family proteins can be divided into two distinct groups, which suggest the possibility of functional diversification. The high similarities of protein sequences, conserved domains and three-dimensional models among OsHATs and their homologs in Arabidopsis and maize suggested that OsHATs have multiple functions. OsHAC701, OsHAG702, and OsHAG704 were localized in both the nucleus and cytosol in transient expression analyses with Arabidopsis protoplasts. OsHATs were expressed constitutively in rice, and their expression was regulated by exogenous hormones and abiotic stresses, which suggested that OsHATs may play important roles in plant defense responses.

Collapse

143

Sears KT, Ceraul SM, Gillespie JJ, Allen ED, Popov VL, Ammerman NC, Rahman MS, Azad AF. Surface proteome analysis and characterization of surface cell antigen (Sca) or autotransporter family of Rickettsia typhi. PLoS Pathog 2012;8:e1002856. [PMID: 22912578 PMCID: PMC3415449 DOI: 10.1371/journal.ppat.1002856] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2011] [Accepted: 06/26/2012] [Indexed: 11/20/2022] Open

Abstract

Surface proteins of the obligate intracellular bacterium Rickettsia typhi, the agent of murine or endemic typhus fever, comprise an important interface for host-pathogen interactions including adherence, invasion and survival in the host cytoplasm. In this report, we present analyses of the surface exposed proteins of R. typhi based on a suite of predictive algorithms complemented by experimental surface-labeling with thiol-cleavable sulfo-NHS-SS-biotin and identification of labeled peptides by LC MS/MS. Further, we focus on proteins belonging to the surface cell antigen (Sca) autotransporter (AT) family which are known to be involved in rickettsial infection of mammalian cells. Each species of Rickettsia has a different complement of sca genes in various states; R. typhi, has genes sca1 thru sca5. In silico analyses indicate divergence of the Sca paralogs across the four Rickettsia groups and concur with previous evidence of positive selection. Transcripts for each sca were detected during infection of L929 cells and four of the five Sca proteins were detected in the surface proteome analysis. We observed that each R. typhi Sca protein is expressed during in vitro infections and selected Sca proteins were expressed during in vivo infections. Using biotin-affinity pull down assays, negative staining electron microscopy, and flow cytometry, we demonstrate that the Sca proteins in R. typhi are localized to the surface of the bacteria. All Scas were detected during infection of L929 cells by immunogold electron microscopy. Immunofluorescence assays demonstrate that Scas 1–3 and 5 are expressed in the spleens of infected Sprague-Dawley rats and Scas 3, 4 and 5 are expressed in cat fleas (Ctenocephalides felis). Sca proteins may be crucial in the recognition and invasion of different host cell types. In short, continuous expression of all Scas may ensure that rickettsiae are primed i) to infect mammalian cells should the flea bite a host, ii) to remain infectious when extracellular and iii) to infect the flea midgut when ingested with a blood meal. Each Sca protein may be important for survival of R. typhi and the lack of host restricted expression may indicate a strategy of preparedness for infection of a new host.

Rickettsia typhi, a member of the typhus group (TG) rickettsia, is the agent of murine or endemic typhus fever – a disease exhibiting mild to severe flu-like symptoms resulting in significant morbidity. It is maintained in a flearodent transmission cycle in urban and suburban environments. The obligate intracellular lifestyle of rickettsiae makes genetic manipulation difficult and impedes progress towards identification of virulence factors. All five Scas were detected on the surface of R.. typhi using a combination of a biotin-labeled affinity assay, negative stain electron microscopy and flow cytometry. Sca proteins are members of the autotransporter (AT) family or type V secretion system (TVSS). We employed detailed bioinformatic analyses and evaluated their transcript abundance in an in vitro infection model where sca transcripts are detected at varying levels over the course of a 5 day in vitro infection. We also observe expression of selected Sca proteins during infection of fleas and rats. Our study provides a proteomic analysis of the bacterial surface and an initial characterization of the Sca family as it exists in R. typhi.

Collapse

144

Renier S, Micheau P, Talon R, Hébraud M, Desvaux M. Subcellular localization of extracytoplasmic proteins in monoderm bacteria: rational secretomics-based strategy for genomic and proteomic analyses. PLoS One 2012;7:e42982. [PMID: 22912771 PMCID: PMC3415414 DOI: 10.1371/journal.pone.0042982] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2012] [Accepted: 07/13/2012] [Indexed: 11/20/2022] Open

145

Furtado C, Kunrath-Lima M, Rajão MA, Mendes IC, de Moura MB, Campos PC, Macedo AM, Franco GR, Pena SDJ, Teixeira SMR, Van Houten B, Machado CR. Functional characterization of 8-oxoguanine DNA glycosylase of Trypanosoma cruzi. PLoS One 2012;7:e42484. [PMID: 22876325 PMCID: PMC3411635 DOI: 10.1371/journal.pone.0042484] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 07/06/2012] [Indexed: 11/18/2022] Open

Affiliation(s)

Carolina Furtado Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Marianna Kunrath-Lima Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Matheus Andrade Rajão Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Isabela Cecília Mendes Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Michelle Barbi de Moura Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine and the University of Pittsburgh Cancer Institute, Hillman Cancer Center, Pittsburgh, Pennsylvania, United States of America
Priscila Carneiro Campos Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Andrea Mara Macedo Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Glória Regina Franco Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Sérgio Danilo Junho Pena Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Santuza Maria Ribeiro Teixeira Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Bennett Van Houten Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine and the University of Pittsburgh Cancer Institute, Hillman Cancer Center, Pittsburgh, Pennsylvania, United States of America
Carlos Renato Machado Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil * E-mail:

Collapse

146

Schiller M, Massalski C, Kurth T, Steinebrunner I. The Arabidopsis apyrase AtAPY1 is localized in the Golgi instead of the extracellular space. BMC PLANT BIOLOGY 2012;12:123. [PMID: 22849572 PMCID: PMC3511161 DOI: 10.1186/1471-2229-12-123] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Accepted: 07/09/2012] [Indexed: 05/04/2023]

Abstract

BACKGROUND

The two highly similar Arabidopsis apyrases AtAPY1 and AtAPY2 were previously shown to be involved in plant growth and development, evidently by regulating extracellular ATP signals. The subcellular localization of AtAPY1 was investigated to corroborate an extracellular function.

RESULTS

Transgenic Arabidopsis lines expressing AtAPY1 fused to the SNAP-(O(6)-alkylguanine-DNA alkyltransferase)-tag were used for indirect immunofluorescence and AtAPY1 was detected in punctate structures within the cell. The same signal pattern was found in seedlings stably overexpressing AtAPY1-GFP by indirect immunofluorescence and live imaging. In order to identify the nature of the AtAPY1-positive structures, AtAPY1-GFP expressing seedlings were treated with the endocytic marker stain FM4-64 (N-(3-triethylammoniumpropyl)-4-(p-diethylaminophenyl-hexatrienyl)-pyridinium dibromide) and crossed with a transgenic line expressing the trans-Golgi marker Rab E1d. Neither FM4-64 nor Rab E1d co-localized with AtAPY1. However, live imaging of transgenic Arabidopsis lines expressing AtAPY1-GFP and either the fluorescent protein-tagged Golgi marker Membrin 12, Syntaxin of plants 32 or Golgi transport 1 protein homolog showed co-localization. The Golgi localization was confirmed by immunogold labeling of AtAPY1-GFP. There was no indication of extracellular AtAPY1 by indirect immunofluorescence using antibodies against SNAP and GFP, live imaging of AtAPY1-GFP and immunogold labeling of AtAPY1-GFP. Activity assays with AtAPY1-GFP revealed GDP, UDP and IDP as substrates, but neither ATP nor ADP. To determine if AtAPY1 is a soluble or membrane protein, microsomal membranes were isolated and treated with various solubilizing agents. Only SDS and urea (not alkaline or high salt conditions) were able to release the AtAPY1 protein from microsomal membranes.

CONCLUSIONS

AtAPY1 is an integral Golgi protein with the substrate specificity typical for Golgi apyrases. It is therefore not likely to regulate extracellular nucleotide signals as previously thought. We propose instead that AtAPY1 exerts its growth and developmental effects by possibly regulating glycosylation reactions in the Golgi.

Collapse

147

Predicted protein subcellular localization in dominant surface ocean bacterioplankton. Appl Environ Microbiol 2012;78:6550-7. [PMID: 22773648 DOI: 10.1128/aem.01406-12] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

148

Lin JR, Mondal AM, Liu R, Hu J. Minimalist ensemble algorithms for genome-wide protein localization prediction. BMC Bioinformatics 2012;13:157. [PMID: 22759391 PMCID: PMC3426488 DOI: 10.1186/1471-2105-13-157] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2011] [Accepted: 07/03/2012] [Indexed: 01/09/2023] Open

Abstract

BACKGROUND

Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms.

RESULTS

This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors.

CONCLUSIONS

We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.

Collapse

149

He J, Gu H, Liu W. Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PLoS One 2012;7:e37155. [PMID: 22715364 PMCID: PMC3371015 DOI: 10.1371/journal.pone.0037155] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Accepted: 04/14/2012] [Indexed: 12/20/2022] Open

150

Wang X, Li GZ. A multi-label predictor for identifying the subcellular locations of singleplex and multiplex eukaryotic proteins. PLoS One 2012;7:e36317. [PMID: 22629314 PMCID: PMC3358325 DOI: 10.1371/journal.pone.0036317] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Accepted: 04/01/2012] [Indexed: 01/30/2023] Open

Abstract

Subcellular locations of proteins are important functional attributes. An effective and efficient subcellular localization predictor is necessary for rapidly and reliably annotating subcellular locations of proteins. Most of existing subcellular localization methods are only used to deal with single-location proteins. Actually, proteins may simultaneously exist at, or move between, two or more different subcellular locations. To better reflect characteristics of multiplex proteins, it is highly desired to develop new methods for dealing with them. In this paper, a new predictor, called Euk-ECC-mPLoc, by introducing a powerful multi-label learning approach which exploits correlations between subcellular locations and hybridizing gene ontology with dipeptide composition information, has been developed that can be used to deal with systems containing both singleplex and multiplex eukaryotic proteins. It can be utilized to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centrosome, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome, (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole. Experimental results on a stringent benchmark dataset of eukaryotic proteins by jackknife cross validation test show that the average success rate and overall success rate obtained by Euk-ECC-mPLoc were 69.70% and 81.54%, respectively, indicating that our approach is quite promising. Particularly, the success rates achieved by Euk-ECC-mPLoc for small subsets were remarkably improved, indicating that it holds a high potential for simulating the development of the area. As a user-friendly web-server, Euk-ECC-mPLoc is freely accessible to the public at the website http://levis.tongji.edu.cn:8080/bioinfo/Euk-ECC-mPLoc/. We believe that Euk-ECC-mPLoc may become a useful high-throughput tool, or at least play a complementary role to the existing predictors in identifying subcellular locations of eukaryotic proteins.

Collapse