Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Hua S, Sun Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics 2001;17:721-8. [PMID: 11524373 DOI: 10.1093/bioinformatics/17.8.721] [Citation(s) in RCA: 479] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

For:	Hua S, Sun Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics 2001;17:721-8. [PMID: 11524373 DOI: 10.1093/bioinformatics/17.8.721] [Citation(s) in RCA: 479] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

301

Hillson NJ, Hu P, Andersen GL, Shapiro L. Caulobacter crescentus as a whole-cell uranium biosensor. Appl Environ Microbiol 2007;73:7615-21. [PMID: 17905881 PMCID: PMC2168040 DOI: 10.1128/aem.01566-07] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

302

Bhasin M, Raghava GPS. A hybrid approach for predicting promiscuous MHC class I restricted T cell epitopes. J Biosci 2007;32:31-42. [PMID: 17426378 DOI: 10.1007/s12038-007-0004-5] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

303

Rashid M, Saha S, Raghava GPS. Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs. BMC Bioinformatics 2007;8:337. [PMID: 17854501 PMCID: PMC2147037 DOI: 10.1186/1471-2105-8-337] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2007] [Accepted: 09/13/2007] [Indexed: 11/17/2022] Open

Abstract

Background

In past number of methods have been developed for predicting subcellular location of eukaryotic, prokaryotic (Gram-negative and Gram-positive bacteria) and human proteins but no method has been developed for mycobacterial proteins which may represent repertoire of potent immunogens of this dreaded pathogen. In this study, attempt has been made to develop method for predicting subcellular location of mycobacterial proteins.

Results

The models were trained and tested on 852 mycobacterial proteins and evaluated using five-fold cross-validation technique. First SVM (Support Vector Machine) model was developed using amino acid composition and overall accuracy of 82.51% was achieved with average accuracy (mean of class-wise accuracy) of 68.47%. In order to utilize evolutionary information, a SVM model was developed using PSSM (Position-Specific Scoring Matrix) profiles obtained from PSI-BLAST (Position-Specific Iterated BLAST) and overall accuracy achieved was of 86.62% with average accuracy of 73.71%. In addition, HMM (Hidden Markov Model), MEME/MAST (Multiple Em for Motif Elicitation/Motif Alignment and Search Tool) and hybrid model that combined two or more models were also developed. We achieved maximum overall accuracy of 86.8% with average accuracy of 89.00% using combination of PSSM based SVM model and MEME/MAST. Performance of our method was compared with that of the existing methods developed for predicting subcellular locations of Gram-positive bacterial proteins.

Conclusion

A highly accurate method has been developed for predicting subcellular location of mycobacterial proteins. This method also predicts very important class of proteins that is membrane-attached proteins. This method will be useful in annotating newly sequenced or hypothetical mycobacterial proteins. Based on above study, a freely accessible web server TBpred http://www.imtech.res.in/raghava/tbpred/ has been developed.

Collapse

304

Su ECY, Chiu HS, Lo A, Hwang JK, Sung TY, Hsu WL. Protein subcellular localization prediction based on compartment-specific features and structure conservation. BMC Bioinformatics 2007;8:330. [PMID: 17825110 PMCID: PMC2040162 DOI: 10.1186/1471-2105-8-330] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2007] [Accepted: 09/08/2007] [Indexed: 01/17/2023] Open

Abstract

Background

Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins.

Results

We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%.

Conclusion

Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant improvement. The biological features are interpretable and can be applied in advanced analyses and experimental designs. Moreover, the overall accuracy of combining the structural homology approach is further improved, which suggests that structural conservation could be a useful indicator for inferring localization in addition to sequence homology. The proposed method can be used in large-scale analyses of proteomes.

Collapse

305

Wang Z, Jiang L, Li M, Sun L, Lin R. Fast Fourier transform-based support vector machine for subcellular localization prediction using different substitution models. Acta Biochim Biophys Sin (Shanghai) 2007;39:715-21. [PMID: 17805467 DOI: 10.1111/j.1745-7270.2007.00326.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open

306

Huang WL, Tung CW, Huang HL, Hwang SF, Ho SY. ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features. Biosystems 2007;90:573-81. [PMID: 17291684 DOI: 10.1016/j.biosystems.2007.01.001] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2006] [Revised: 12/09/2006] [Accepted: 01/01/2007] [Indexed: 11/24/2022]

307

Chen YL, Li QZ. Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. J Theor Biol 2007;248:377-81. [PMID: 17572445 DOI: 10.1016/j.jtbi.2007.05.019] [Citation(s) in RCA: 113] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2007] [Revised: 04/18/2007] [Accepted: 05/10/2007] [Indexed: 10/23/2022]

308

Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L. Prediction of mitochondrial proteins based on genetic algorithm - partial least squares and support vector machine. Amino Acids 2007;33:669-75. [PMID: 17701100 DOI: 10.1007/s00726-006-0465-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2006] [Accepted: 10/15/2006] [Indexed: 11/25/2022]

309

GO molecular function coding based protein subcellular localization prediction. CHINESE SCIENCE BULLETIN-CHINESE 2007. [DOI: 10.1007/s11434-007-0336-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

310

Liu J, Kang S, Tang C, Ellis LB, Li T. Meta-prediction of protein subcellular localization with reduced voting. Nucleic Acids Res 2007;35:e96. [PMID: 17670799 PMCID: PMC1976432 DOI: 10.1093/nar/gkm562] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

311

Fungal CSL transcription factors. BMC Genomics 2007;8:233. [PMID: 17629904 PMCID: PMC1973085 DOI: 10.1186/1471-2164-8-233] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2007] [Accepted: 07/13/2007] [Indexed: 11/28/2022] Open

312

Quantitative assessment of relationship between sequence similarity and function similarity. BMC Genomics 2007;8:222. [PMID: 17620139 PMCID: PMC1949826 DOI: 10.1186/1471-2164-8-222] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2006] [Accepted: 07/09/2007] [Indexed: 11/16/2022] Open

313

Thireou T, Reczko M. Bidirectional Long Short-Term Memory Networks for predicting the subcellular localization of eukaryotic proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2007;4:441-446. [PMID: 17666763 DOI: 10.1109/tcbb.2007.1015] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]

314

Emanuelsson O, Brunak S, von Heijne G, Nielsen H. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2007;2:953-71. [PMID: 17446895 DOI: 10.1038/nprot.2007.131] [Citation(s) in RCA: 2496] [Impact Index Per Article: 138.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

315

Zhou XB, Chen C, Li ZC, Zou XY. Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 2007;248:546-51. [PMID: 17628605 DOI: 10.1016/j.jtbi.2007.06.001] [Citation(s) in RCA: 233] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2007] [Revised: 05/16/2007] [Accepted: 06/02/2007] [Indexed: 10/23/2022]

316

Lu CH, Chen YC, Yu CS, Hwang JK. Predicting disulfide connectivity patterns. Proteins 2007;67:262-70. [PMID: 17285623 DOI: 10.1002/prot.21309] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Abstract

Disulfide bonds play an important role in stabilizing protein structure and regulating protein function. Therefore, the ability to infer disulfide connectivity from protein sequences will be valuable in structural modeling and functional analysis. However, to predict disulfide connectivity directly from sequences presents a challenge to computational biologists due to the nonlocal nature of disulfide bonds, i.e., the close spatial proximity of the cysteine pair that forms the disulfide bond does not necessarily imply the short sequence separation of the cysteine residues. Recently, Chen and Hwang (Proteins 2005;61:507-512) treated this problem as a multiple class classification by defining each distinct disulfide pattern as a class. They used multiple support vector machines based on a variety of sequence features to predict the disulfide patterns. Their results compare favorably with those in the literature for a benchmark dataset sharing less than 30% sequence identity. However, since the number of disulfide patterns grows rapidly when the number of disulfide bonds increases, their method performs unsatisfactorily for the cases of large number of disulfide bonds. In this work, we propose a novel method to represent disulfide connectivity in terms of cysteine pairs, instead of disulfide patterns. Since the number of bonding states of the cysteine pairs is independent of that of disulfide bonds, the problem of class explosion is avoided. The bonding states of the cysteine pairs are predicted using the support vector machines together with the genetic algorithm optimization for feature selection. The complete disulfide patterns are then determined from the connectivity matrices that are constructed from the predicted bonding states of the cysteine pairs. Our approach outperforms the current approaches in the literature.

Collapse

317

Jiang P, Wu H, Wei J, Sang F, Sun X, Lu Z. RF-DYMHC: detecting the yeast meiotic recombination hotspots and coldspots by random forest model using gapped dinucleotide composition features. Nucleic Acids Res 2007;35:W47-51. [PMID: 17478517 PMCID: PMC1933199 DOI: 10.1093/nar/gkm217] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

318

Guo J, Pu X, Lin Y, Leung H. Protein subcellular localization based on PSI-BLAST and machine learning. J Bioinform Comput Biol 2007;4:1181-95. [PMID: 17245809 DOI: 10.1142/s0219720006002405] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2006] [Revised: 08/02/2006] [Accepted: 08/02/2006] [Indexed: 11/18/2022]

319

Jia P, Qian Z, Zeng Z, Cai Y, Li Y. Prediction of subcellular protein localization based on functional domain composition. Biochem Biophys Res Commun 2007;357:366-70. [PMID: 17428441 DOI: 10.1016/j.bbrc.2007.03.139] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2007] [Accepted: 03/21/2007] [Indexed: 11/19/2022]

320

Oğul H, Mumcuoğu EU. Subcellular localization prediction with new protein encoding schemes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2007;4:227-32. [PMID: 17473316 DOI: 10.1109/tcbb.2007.070209] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]

321

Ensemblator: An ensemble of classifiers for reliable classification of biological data. Pattern Recognit Lett 2007. [DOI: 10.1016/j.patrec.2006.10.012] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

322

Holloway DT, Kon M, DeLisi C. Machine learning for regulatory analysis and transcription factor target prediction in yeast. SYSTEMS AND SYNTHETIC BIOLOGY 2007;1:25-46. [PMID: 19003435 PMCID: PMC2533145 DOI: 10.1007/s11693-006-9003-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]

Abstract

High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps-the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104 Saccharomyces cerevisiae regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying k-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data.

Collapse

323

Klee EW, Sosa CP. Computational classification of classically secreted proteins. Drug Discov Today 2007;12:234-40. [PMID: 17331888 DOI: 10.1016/j.drudis.2007.01.008] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2006] [Revised: 01/05/2007] [Accepted: 01/23/2007] [Indexed: 11/21/2022]

324

Raghuraj R, Lakshminarayanan S. VPMCD: Variable interaction modeling approach for class discrimination in biological systems. FEBS Lett 2007;581:826-30. [PMID: 17289035 DOI: 10.1016/j.febslet.2007.01.052] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2006] [Revised: 01/19/2007] [Accepted: 01/25/2007] [Indexed: 11/20/2022]

325

Carrie C, Murcha MW, Millar AH, Smith SM, Whelan J. Nine 3-ketoacyl-CoA thiolases (KATs) and acetoacetyl-CoA thiolases (ACATs) encoded by five genes in Arabidopsis thaliana are targeted either to peroxisomes or cytosol but not to mitochondria. PLANT MOLECULAR BIOLOGY 2007;63:97-108. [PMID: 17120136 DOI: 10.1007/s11103-006-9075-1] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2006] [Accepted: 08/10/2006] [Indexed: 05/12/2023]

326

Mukai Y, Hirokawa T, Tomii K, Asai K, Akiyama Y, Suwa M. Identification of Glycosyltransferases Focusing on Golgi Transmembrane Region. TRENDS GLYCOSCI GLYC 2007. [DOI: 10.4052/tigg.19.41] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

327

González-Díaz H, Pérez-Castillo Y, Podda G, Uriarte E. Computational chemistry comparison of stable/nonstable protein mutants classification models based on 3D and topological indices. J Comput Chem 2007;28:1990-5. [PMID: 17450569 DOI: 10.1002/jcc.20700] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

328

Nakai K, Horton P. Computational prediction of subcellular localization. Methods Mol Biol 2007;390:429-66. [PMID: 17951705 DOI: 10.1007/978-1-59745-466-7_29] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

329

González-Díaz H, Uriarte E. Biopolymer stochastic moments. I. Modeling human rhinovirus cellular recognition with protein surface electrostatic moments. Biopolymers 2006;77:296-303. [PMID: 15648087 DOI: 10.1002/bip.20234] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

330

Bodył A, Mackiewicz P. Analysis of the targeting sequences of an iron-containing superoxide dismutase (SOD) of the dinoflagellate Lingulodinium polyedrum suggests function in multiple cellular compartments. Arch Microbiol 2006;187:281-96. [PMID: 17143625 DOI: 10.1007/s00203-006-0194-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Accepted: 11/06/2006] [Indexed: 01/19/2023]

331

Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinformatics 2006;7:518. [PMID: 17134515 PMCID: PMC1716183 DOI: 10.1186/1471-2105-7-518] [Citation(s) in RCA: 137] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2006] [Accepted: 11/30/2006] [Indexed: 11/10/2022] Open

332

Machine learning techniques in disease forecasting: a case study on rice blast prediction. BMC Bioinformatics 2006;7:485. [PMID: 17083731 PMCID: PMC1647291 DOI: 10.1186/1471-2105-7-485] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2006] [Accepted: 11/03/2006] [Indexed: 11/22/2022] Open

Abstract

Background

Diverse modeling approaches viz. neural networks and multiple regression have been followed to date for disease prediction in plant populations. However, due to their inability to predict value of unknown data points and longer training times, there is need for exploiting new prediction softwares for better understanding of plant-pathogen-environment relationships. Further, there is no online tool available which can help the plant researchers or farmers in timely application of control measures. This paper introduces a new prediction approach based on support vector machines for developing weather-based prediction models of plant diseases.

Results

Six significant weather variables were selected as predictor variables. Two series of models (cross-location and cross-year) were developed and validated using a five-fold cross validation procedure. For cross-year models, the conventional multiple regression (REG) approach achieved an average correlation coefficient (r) of 0.50, which increased to 0.60 and percent mean absolute error (%MAE) decreased from 65.42 to 52.24 when back-propagation neural network (BPNN) was used. With generalized regression neural network (GRNN), the r increased to 0.70 and %MAE also improved to 46.30, which further increased to r = 0.77 and %MAE = 36.66 when support vector machine (SVM) based method was used. Similarly, cross-location validation achieved r = 0.48, 0.56 and 0.66 using REG, BPNN and GRNN respectively, with their corresponding %MAE as 77.54, 66.11 and 58.26. The SVM-based method outperformed all the three approaches by further increasing r to 0.74 with improvement in %MAE to 44.12. Overall, this SVM-based prediction approach will open new vistas in the area of forecasting plant diseases of various crops.

Conclusion

Our case study demonstrated that SVM is better than existing machine learning techniques and conventional REG approaches in forecasting plant diseases. In this direction, we have also developed a SVM-based web server for rice blast prediction, a first of its kind worldwide, which can help the plant science community and farmers in their decision making process. The server is freely available at .

Collapse

333

Huang WL, Chen HM, Hwang SF, Ho SY. Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method. Biosystems 2006;90:405-13. [PMID: 17140725 DOI: 10.1016/j.biosystems.2006.10.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2006] [Revised: 10/15/2006] [Accepted: 10/22/2006] [Indexed: 10/24/2022]

334

Zhang ZH, Wang ZH, Zhang ZR, Wang YX. A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett 2006;580:6169-74. [PMID: 17069811 DOI: 10.1016/j.febslet.2006.10.017] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2006] [Revised: 09/25/2006] [Accepted: 10/06/2006] [Indexed: 11/25/2022]

335

Yu CS, Chen YC, Lu CH, Hwang JK. Prediction of protein subcellular localization. Proteins 2006;64:643-51. [PMID: 16752418 DOI: 10.1002/prot.21018] [Citation(s) in RCA: 1120] [Impact Index Per Article: 58.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Abstract

Because the protein's function is usually related to its subcellular localization, the ability to predict subcellular localization directly from protein sequences will be useful for inferring protein functions. Recent years have seen a surging interest in the development of novel computational tools to predict subcellular localization. At present, these approaches, based on a wide range of algorithms, have achieved varying degrees of success for specific organisms and for certain localization categories. A number of authors have noticed that sequence similarity is useful in predicting subcellular localization. For example, Nair and Rost (Protein Sci 2002;11:2836-2847) have carried out extensive analysis of the relation between sequence similarity and identity in subcellular localization, and have found a close relationship between them above a certain similarity threshold. However, many existing benchmark data sets used for the prediction accuracy assessment contain highly homologous sequences-some data sets comprising sequences up to 80-90% sequence identity. Using these benchmark test data will surely lead to overestimation of the performance of the methods considered. Here, we develop an approach based on a two-level support vector machine (SVM) system: the first level comprises a number of SVM classifiers, each based on a specific type of feature vectors derived from sequences; the second level SVM classifier functions as the jury machine to generate the probability distribution of decisions for possible localizations. We compare our approach with a global sequence alignment approach and other existing approaches for two benchmark data sets-one comprising prokaryotic sequences and the other eukaryotic sequences. Furthermore, we carried out all-against-all sequence alignment for several data sets to investigate the relationship between sequence homology and subcellular localization. Our results, which are consistent with previous studies, indicate that the homology search approach performs well down to 30% sequence identity, although its performance deteriorates considerably for sequences sharing lower sequence identity. A data set of high homology levels will undoubtedly lead to biased assessment of the performances of the predictive approaches-especially those relying on homology search or sequence annotations. Our two-level classification system based on SVM does not rely on homology search; therefore, its performance remains relatively unaffected by sequence homology. When compared with other approaches, our approach performed significantly better. Furthermore, we also develop a practical hybrid method, which combines the two-level SVM classifier and the homology search method, as a general tool for the sequence annotation of subcellular localization.

Collapse

336

Song J, Burrage K. Predicting residue-wise contact orders in proteins by support vector regression. BMC Bioinformatics 2006;7:425. [PMID: 17014735 PMCID: PMC1618864 DOI: 10.1186/1471-2105-7-425] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2006] [Accepted: 10/03/2006] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships.

RESULTS

We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods.

CONCLUSION

The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

Collapse

337

Chou KC, Shen HB. Predicting protein subcellular location by fusing multiple classifiers. J Cell Biochem 2006;99:517-27. [PMID: 16639720 DOI: 10.1002/jcb.20879] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Abstract

One of the fundamental goals in cell biology and proteomics is to identify the functions of proteins in the context of compartments that organize them in the cellular environment. Knowledge of subcellular locations of proteins can provide key hints for revealing their functions and understanding how they interact with each other in cellular networking. Unfortunately, it is both time-consuming and expensive to determine the localization of an uncharacterized protein in a living cell purely based on experiments. With the avalanche of newly found protein sequences emerging in the post genomic era, we are facing a critical challenge, that is, how to develop an automated method to fast and reliably identify their subcellular locations so as to be able to timely use them for basic research and drug discovery. In view of this, an ensemble classifier was developed by the approach of fusing many basic individual classifiers through a voting system. Each of these basic classifiers was trained in a different dimension of the amphiphilic pseudo amino acid composition (Chou [2005] Bioinformatics 21: 10-19). As a demonstration, predictions were performed with the fusion classifier for proteins among the following 14 localizations: (1) cell wall, (2) centriole, (3) chloroplast, (4) cytoplasm, (5) cytoskeleton, (6) endoplasmic reticulum, (7) extracellular, (8) Golgi apparatus, (9) lysosome, (10) mitochondria, (11) nucleus, (12) peroxisome, (13) plasma membrane, and (14) vacuole. The overall success rates thus obtained via the resubstitution test, jackknife test, and independent dataset test were all significantly higher than those by the existing classifiers. It is anticipated that the novel ensemble classifier may also become a very useful vehicle in classifying other attributes of proteins according to their sequences, such as membrane protein type, enzyme family/sub-family, G-protein coupled receptor (GPCR) type, and structural class, among many others. The fusion ensemble classifier will be available at www.pami.sjtu.edu.cn/people/hbshen.

Collapse

338

Zhang T, Ding Y, Chou KC. Prediction of protein subcellular location using hydrophobic patterns of amino acid sequence. Comput Biol Chem 2006;30:367-71. [PMID: 16963318 DOI: 10.1016/j.compbiolchem.2006.08.003] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2006] [Accepted: 08/03/2006] [Indexed: 11/17/2022]

339

Wang Y, Xue Z, Xu J. Better prediction of the location of alpha-turns in proteins with support vector machine. Proteins 2006;65:49-54. [PMID: 16894602 DOI: 10.1002/prot.21062] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

340

Gardy JL, Brinkman FSL. Methods for predicting bacterial protein subcellular localization. Nat Rev Microbiol 2006;4:741-51. [PMID: 16964270 DOI: 10.1038/nrmicro1494] [Citation(s) in RCA: 120] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

341

Lee K, Kim DW, Na D, Lee KH, Lee D. PLPD: reliable protein localization prediction from imbalanced and overlapped datasets. Nucleic Acids Res 2006;34:4655-66. [PMID: 16966337 PMCID: PMC1636404 DOI: 10.1093/nar/gkl638] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

342

Guo J, Lin Y, Liu X. GNBSL: A new integrative system to predict the subcellular location for Gram-negative bacteria proteins. Proteomics 2006;6:5099-105. [PMID: 16955516 DOI: 10.1002/pmic.200600064] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

343

Hao Z, Li X, Qiao T, Du R, Zhang G, Fan D. Subcellular localization of CIAPIN1. J Histochem Cytochem 2006;54:1437-44. [PMID: 16957168 PMCID: PMC3958118 DOI: 10.1369/jhc.6a6960.2006] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open

344

Azizi AA, Gelpi E, Yang JW, Rupp B, Godwin AK, Slater C, Slavc I, Lubec G. Mass spectrometric identification of serine hydrolase OVCA2 in the medulloblastoma cell line DAOY. Cancer Lett 2006;241:235-49. [PMID: 16368187 DOI: 10.1016/j.canlet.2005.10.023] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2005] [Revised: 10/14/2005] [Accepted: 10/17/2005] [Indexed: 11/18/2022]

345

Wang Y, Xue ZD, Shi XH, Xu J. Prediction of π-turns in proteins using PSI-BLAST profiles and secondary structure information. Biochem Biophys Res Commun 2006;347:574-80. [PMID: 16844090 DOI: 10.1016/j.bbrc.2006.06.066] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2006] [Accepted: 06/14/2006] [Indexed: 11/28/2022]

346

Guda C. pTARGET: a web server for predicting protein subcellular localization. Nucleic Acids Res 2006;34:W210-3. [PMID: 16844995 PMCID: PMC1538910 DOI: 10.1093/nar/gkl093] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open

347

Haveman SA, Holmes DE, Ding YHR, Ward JE, Didonato RJ, Lovley DR. c-Type cytochromes in Pelobacter carbinolicus. Appl Environ Microbiol 2006;72:6980-5. [PMID: 16936056 PMCID: PMC1636167 DOI: 10.1128/aem.01128-06] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

348

Ofran Y, Margalit H. Proteins of the same fold and unrelated sequences have similar amino acid composition. Proteins 2006;64:275-9. [PMID: 16565950 DOI: 10.1002/prot.20964] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

349

Moguerza JM, Muñoz A. Support Vector Machines with Applications. Stat Sci 2006. [DOI: 10.1214/088342306000000493] [Citation(s) in RCA: 153] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

350

Cruz-Monteagudo M, González-Díaz H, Uriarte E. Simple Stochastic Fingerprints Towards Mathematical Modeling in Biology and Medicine 2. Unifying Markov Model for Drugs Side Effects. Bull Math Biol 2006;68:1527-54. [PMID: 16847720 DOI: 10.1007/s11538-005-9013-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2005] [Accepted: 05/09/2005] [Indexed: 10/24/2022]

Abstract

Most of present mathematical models for biological activity consider just the molecular structure. In the present article we pretend extending the use of Markov chain models to define novel molecular descriptors, which consider in addition other parameters like target site or biological effect. Specifically, this mathematical model takes into consideration not only the molecular structure but the specific biological system the drug affects too. Herein, a general Markov model is developed that describes 19 different drugs side effects grouped in eight affected biological systems for 178 drugs, being 270 cases finally. The data was processed by linear discriminant analysis (LDA) classifying drugs according to their specific side effects, forward stepwise was fixed as strategy for variables selection. The average percentage of good classification and number of compounds used in the training/predicting sets were 100/95.8% for endocrine manifestations, (18 out of 18)/(13 out of 14); 90.5/92.3% for gastrointestinal manifestations, (38 out of 42)/(30 out of 32); 88.5/86.5% for systemic phenomena, (23 out of 26)/(17 out of 20); 81.8/77.3% for neurological manifestations, (27 out of 33)/(19 out of 25); 81.6/86.2% for dermal manifestations, (31 out of 38)/(25 out of 29); 78.4/85.1% for cardiovascular manifestation, (29 out of 37)/(24 out of 28); 77.1/75.7% for breathing manifestations, (27 out of 35)/(20 out of 26) and 75.6/75% for psychiatric manifestations, (31 out of 41)/(23 out of 31). Additionally a back-projection analysis (BPA) was carried out for two ulcerogenic drugs to prove in structural terms the physical interpretation of the models obtained. This article develops a mathematical model that encompasses a large number of drugs side effects grouped in specifics biological systems using stochastic absolute probabilities of interaction ((A)pi(k)(j)) by the first time.

Collapse