201
|
Sun C, Zhao XM, Tang W, Chen L. FGsub: Fusarium graminearum protein subcellular localizations predicted from primary structures. BMC SYSTEMS BIOLOGY 2010; 4 Suppl 2:S12. [PMID: 20840726 PMCID: PMC2982686 DOI: 10.1186/1752-0509-4-s2-s12] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Background The fungal pathogen Fusarium graminearum (telomorph Gibberella zeae) is the causal agent of several destructive crop diseases, where a set of genes usually work in concert to cause diseases to crops. To function appropriately, the F. graminearum proteins inside one cell should be assigned to different compartments, i.e. subcellular localizations. Therefore, the subcellular localizations of F. graminearum proteins can provide insights into protein functions and pathogenic mechanisms of this destructive pathogen fungus. Unfortunately, there are no subcellular localization information for F. graminearum proteins available now. Computational approaches provide an alternative way to predicting F. graminearum protein subcellular localizations due to the expensive and time-consuming biological experiments in lab. Results In this paper, we developed a novel predictor, namely FGsub, to predict F. graminearum protein subcellular localizations from the primary structures. First, a non-redundant fungi data set with subcellular localization annotation is collected from UniProtKB database and used as training set, where the subcellular locations are classified into 10 groups. Subsequently, Support Vector Machine (SVM) is trained on the training set and used to predict F. graminearum protein subcellular localizations for those proteins that do not have significant sequence similarity to those in training set. The performance of SVMs on training set with 10-fold cross-validation demonstrates the efficiency and effectiveness of the proposed method. In addition, for F. graminearum proteins that have significant sequence similarity to those in training set, BLAST is utilized to transfer annotations of homologous proteins to uncharacterized F. graminearum proteins so that the F. graminearum proteins are annotated more comprehensively. Conclusions In this work, we present FGsub to predict F. graminearum protein subcellular localizations in a comprehensive manner. We make four fold contributions to this filed. First, we present a new algorithm to cope with imbalance problem that arises in protein subcellular localization prediction, which can solve imbalance problem and avoid false positive results. Second, we design an ensemble classifier which employs feature selection to further improve prediction accuracy. Third, we use BLAST to complement machine learning based methods, which enlarges our prediction coverage. Last and most important, we predict the subcellular localizations of 12786 F. graminearum proteins, which provide insights into protein functions and pathogenic mechanisms of this destructive pathogen fungus.
Collapse
Affiliation(s)
- Chenglei Sun
- Institute of Systems Biology, Shanghai University, Shanghai, China.
| | | | | | | |
Collapse
|
202
|
Kaundal R, Saini R, Zhao PX. Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis. PLANT PHYSIOLOGY 2010; 154:36-54. [PMID: 20647376 PMCID: PMC2938157 DOI: 10.1104/pp.110.156851] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2010] [Accepted: 07/13/2010] [Indexed: 05/20/2023]
Abstract
A complete map of the Arabidopsis (Arabidopsis thaliana) proteome is clearly a major goal for the plant research community in terms of determining the function and regulation of each encoded protein. Developing genome-wide prediction tools such as for localizing gene products at the subcellular level will substantially advance Arabidopsis gene annotation. To this end, we performed a comprehensive study in Arabidopsis and created an integrative support vector machine-based localization predictor called AtSubP (for Arabidopsis subcellular localization predictor) that is based on the combinatorial presence of diverse protein features, such as its amino acid composition, sequence-order effects, terminal information, Position-Specific Scoring Matrix, and similarity search-based Position-Specific Iterated-Basic Local Alignment Search Tool information. When used to predict seven subcellular compartments through a 5-fold cross-validation test, our hybrid-based best classifier achieved an overall sensitivity of 91% with high-confidence precision and Matthews correlation coefficient values of 90.9% and 0.89, respectively. Benchmarking AtSubP on two independent data sets, one from Swiss-Prot and another containing green fluorescent protein- and mass spectrometry-determined proteins, showed a significant improvement in the prediction accuracy of species-specific AtSubP over some widely used "general" tools such as TargetP, LOCtree, PA-SUB, MultiLoc, WoLF PSORT, Plant-PLoc, and our newly created All-Plant method. Cross-comparison of AtSubP on six nontrained eukaryotic organisms (rice [Oryza sativa], soybean [Glycine max], human [Homo sapiens], yeast [Saccharomyces cerevisiae], fruit fly [Drosophila melanogaster], and worm [Caenorhabditis elegans]) revealed inferior predictions. AtSubP significantly outperformed all the prediction tools being currently used for Arabidopsis proteome annotation and, therefore, may serve as a better complement for the plant research community. A supplemental Web site that hosts all the training/testing data sets and whole proteome predictions is available at http://bioinfo3.noble.org/AtSubP/.
Collapse
|
203
|
Zahiri A, Heimel K, Wahl R, Rath M, Kämper J. The Ustilago maydis forkhead transcription factor Fox1 is involved in the regulation of genes required for the attenuation of plant defenses during pathogenic development. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2010; 23:1118-29. [PMID: 20687802 DOI: 10.1094/mpmi-23-9-1118] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Ustilago maydis is a plant-pathogenic fungus that establishes a biotrophic relationship with its host plant, Zea mays. The pathogenic stage of U. maydis is initiated by the fusion of two haploid cells, resulting in the formation of a dikaryotic hypha that invades the plant cell. The switch from saprophytic, yeast-like cells to the biotrophic hyphae requires the complex regulation of a multitude of biological processes to constitute the compatible host-fungus interaction. Transcriptional regulators involved in the establishment of the infectious dikaryon and penetration of the host tissue have been identified; however, regulators required during the post-penetration stages remained to be elucidated. In this study, we report the identification of a U. maydis forkhead transcription factor, Fox1, which is exclusively expressed during biotrophic development. Deletion of fox1 results in reduced virulence and impaired tumor development. The Deltafox1 hyphae induce the accumulation of H(2)O(2) in and around infected cells and a maize defense response phenotypically represented by the encasement of proliferating hyphae in a cellulose-containing matrix. The phenotype can be attributed to the fox1-dependent deregulation of several effector genes that are linked to pathogenic development and host defense suppression.
Collapse
Affiliation(s)
- Alexander Zahiri
- Karlsruhe Institute of Technology, Institute for Applied Biosciences, Department of Genetics, D-76187 Karlsruhe, Germany
| | | | | | | | | |
Collapse
|
204
|
Yu L, Guo Y, Li Y, Li G, Li M, Luo J, Xiong W, Qin W. SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition. J Theor Biol 2010; 267:1-6. [PMID: 20691704 DOI: 10.1016/j.jtbi.2010.08.001] [Citation(s) in RCA: 98] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2010] [Revised: 07/30/2010] [Accepted: 08/01/2010] [Indexed: 11/17/2022]
Abstract
Protein secretion plays an important role in bacterial lifestyles. Secreted proteins are crucial for bacterial pathogenesis by making bacteria interact with their environments, particularly delivering pathogenic and symbiotic bacteria into their eukaryotic hosts. Therefore, identification of bacterial secreted proteins becomes an important process for the study of various diseases and the corresponding drugs. In this paper, fusing several new features into Chou's pseudo-amino acid composition (PseAAC), two support vector machine (SVM)-based ternary classifiers are developed to predict secreted proteins of Gram-negative and Gram-positive bacteria. For the two types of bacteria, the high accuracy of 94.03% and 94.36% are obtained in distinguishing classically secreted, non-classically secreted and non-secreted proteins by our method. In order to compare the practical ability of our method in identifying bacterial secreted proteins with those of six published methods, proteins in Escherichia coli and Bacillus subtilis are collected to construct the test sets of Gram-negative and Gram-positive bacteria, and the prediction results of our method are comparable to those of existing methods. When performed on two public independent data sets for predicting NCSPs, it also yields satisfactory results for Gram-negative bacterial proteins. The prediction server SecretP can be accessed at http://cic.scu.edu.cn/bioinformatics/secretPV2/index.htm.
Collapse
Affiliation(s)
- Lezheng Yu
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | | | | | | | | | | | | | | |
Collapse
|
205
|
Zhang X, Cui J, Nilsson D, Gunasekera K, Chanfon A, Song X, Wang H, Xu Y, Ochsenreiter T. The Trypanosoma brucei MitoCarta and its regulation and splicing pattern during development. Nucleic Acids Res 2010; 38:7378-87. [PMID: 20660476 PMCID: PMC2995047 DOI: 10.1093/nar/gkq618] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
It has long been known that trypanosomes regulate mitochondrial biogenesis during the life cycle of the parasite; however, the mitochondrial protein inventory (MitoCarta) and its regulation remain unknown. We present a novel computational method for genome-wide prediction of mitochondrial proteins using a support vector machine-based classifier with ∼90% prediction accuracy. Using this method, we predicted the mitochondrial localization of 468 proteins with high confidence and have experimentally verified the localization of a subset of these proteins. We then applied a recently developed parallel sequencing technology to determine the expression profiles and the splicing patterns of a total of 1065 predicted MitoCarta transcripts during the development of the parasite, and showed that 435 of the transcripts significantly changed their expressions while 630 remain unchanged in any of the three life stages analyzed. Furthermore, we identified 298 alternatively splicing events, a small subset of which could lead to dual localization of the corresponding proteins.
Collapse
Affiliation(s)
- Xiaobai Zhang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu 210016 China
| | | | | | | | | | | | | | | | | |
Collapse
|
206
|
Yang Y, Lu BL. Protein subcellular multi-localization prediction using a min-max modular support vector machine. Int J Neural Syst 2010; 20:13-28. [PMID: 20180250 DOI: 10.1142/s0129065710002206] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Prediction of protein subcellular localization is an important issue in computational biology because it provides important clues for the characterization of protein functions. Currently, much research has been dedicated to developing automatic prediction tools. Most, however, focus on mono-locational proteins, i.e., they assume that proteins exist in only one location. It should be noted that many proteins bear multi-locational characteristics and carry out crucial functions in biological processes. This work aims to develop a general pattern classifier for predicting multiple subcellular locations of proteins. We use an ensemble classifier, called the min-max modular support vector machine (M(3)-SVM), to solve protein subcellular multi-localization problems; and, propose a module decomposition method based on gene ontology (GO) semantic information for M(3)-SVM. The amino acid composition with secondary structure and solvent accessibility information is adopted to represent features of protein sequences. We apply our method to two multi-locational protein data sets. The M(3)-SVMs show higher accuracy and efficiency than traditional SVMs using the same feature vectors. And the GO decomposition also helps to improve prediction accuracy. Moreover, our method has a much higher rate of accuracy than existing subcellular localization predictors in predicting protein multi-localization.
Collapse
Affiliation(s)
- Yang Yang
- Department of Computer Science and Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai, 201306, China.
| | | |
Collapse
|
207
|
A time-series-based feature extraction approach for prediction of protein structural class. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2010:235451. [PMID: 18464911 PMCID: PMC3171390 DOI: 10.1155/2008/235451] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2007] [Revised: 11/21/2007] [Accepted: 03/10/2008] [Indexed: 11/17/2022]
Abstract
This paper presents a novel feature vector based on physicochemical property of amino acids for prediction protein structural classes. The proposed method is divided into three different stages. First, a discrete time series representation to protein sequences using physicochemical scale is provided. Later on, a wavelet-based time-series technique is proposed for extracting features from mapped amino acid sequence and a fixed length feature vector for classification is constructed. The proposed feature space summarizes the variance information of ten different biological properties of amino acids. Finally, an optimized support vector machine model is constructed for prediction of each protein structural class. The proposed approach is evaluated using leave-one-out cross-validation tests on two standard datasets. Comparison of our result with existing approaches shows that overall accuracy achieved by our approach is better than exiting methods.
Collapse
|
208
|
Lapins M, Wikberg JE. Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques. BMC Bioinformatics 2010; 11:339. [PMID: 20569422 PMCID: PMC2910025 DOI: 10.1186/1471-2105-11-339] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2009] [Accepted: 06/22/2010] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Protein kinases play crucial roles in cell growth, differentiation, and apoptosis. Abnormal function of protein kinases can lead to many serious diseases, such as cancer. Kinase inhibitors have potential for treatment of these diseases. However, current inhibitors interact with a broad variety of kinases and interfere with multiple vital cellular processes, which causes toxic effects. Bioinformatics approaches that can predict inhibitor-kinase interactions from the chemical properties of the inhibitors and the kinase macromolecules might aid in design of more selective therapeutic agents, that show better efficacy and lower toxicity. RESULTS We applied proteochemometric modelling to correlate the properties of 317 wild-type and mutated kinases and 38 inhibitors (12,046 inhibitor-kinase combinations) to the respective combination's interaction dissociation constant (Kd). We compared six approaches for description of protein kinases and several linear and non-linear correlation methods. The best performing models encoded kinase sequences with amino acid physico-chemical z-scale descriptors and used support vector machines or partial least- squares projections to latent structures for the correlations. Modelling performance was estimated by double cross-validation. The best models showed high predictive ability; the squared correlation coefficient for new kinase-inhibitor pairs ranging P2 = 0.67-0.73; for new kinases it ranged P2kin = 0.65-0.70. Models could also separate interacting from non-interacting inhibitor-kinase pairs with high sensitivity and specificity; the areas under the ROC curves ranging AUC = 0.92-0.93. We also investigated the relationship between the number of protein kinases in the dataset and the modelling results. Using only 10% of all data still a valid model was obtained with P2 = 0.47, P2kin = 0.42 and AUC = 0.83. CONCLUSIONS Our results strongly support the applicability of proteochemometrics for kinome-wide interaction modelling. Proteochemometrics might be used to speed-up identification and optimization of protein kinase targeted and multi-targeted inhibitors.
Collapse
Affiliation(s)
- Maris Lapins
- Department of Pharmaceutical Pharmacology, Uppsala University, Sweden
| | | |
Collapse
|
209
|
Li Z, Zhou X, Dai Z, Zou X. Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm. BMC Bioinformatics 2010; 11:325. [PMID: 20550715 PMCID: PMC2905366 DOI: 10.1186/1471-2105-11-325] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 06/16/2010] [Indexed: 11/25/2022] Open
Abstract
Background Because a priori knowledge about function of G protein-coupled receptors (GPCRs) can provide useful information to pharmaceutical research, the determination of their function is a quite meaningful topic in protein science. However, with the rapid increase of GPCRs sequences entering into databanks, the gap between the number of known sequence and the number of known function is widening rapidly, and it is both time-consuming and expensive to determine their function based only on experimental techniques. Therefore, it is vitally significant to develop a computational method for quick and accurate classification of GPCRs. Results In this study, a novel three-layer predictor based on support vector machine (SVM) and feature selection is developed for predicting and classifying GPCRs directly from amino acid sequence data. The maximum relevance minimum redundancy (mRMR) is applied to pre-evaluate features with discriminative information while genetic algorithm (GA) is utilized to find the optimized feature subsets. SVM is used for the construction of classification models. The overall accuracy with three-layer predictor at levels of superfamily, family and subfamily are obtained by cross-validation test on two non-redundant dataset. The results are about 0.5% to 16% higher than those of GPCR-CA and GPCRPred. Conclusion The results with high success rates indicate that the proposed predictor is a useful automated tool in predicting GPCRs. GPCR-SVMFS, a corresponding executable program for GPCRs prediction and classification, can be acquired freely on request from the authors.
Collapse
Affiliation(s)
- Zhanchao Li
- School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, PR China
| | | | | | | |
Collapse
|
210
|
Medema MH, Zhou M, van Hijum SAFT, Gloerich J, Wessels HJCT, Siezen RJ, Strous M. A predicted physicochemically distinct sub-proteome associated with the intracellular organelle of the anammox bacterium Kuenenia stuttgartiensis. BMC Genomics 2010; 11:299. [PMID: 20459862 PMCID: PMC2881027 DOI: 10.1186/1471-2164-11-299] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2010] [Accepted: 05/12/2010] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Anaerobic ammonium-oxidizing (anammox) bacteria perform a key step in global nitrogen cycling. These bacteria make use of an organelle to oxidize ammonia anaerobically to nitrogen (N2) and so contribute approximately 50% of the nitrogen in the atmosphere. It is currently unknown which proteins constitute the organellar proteome and how anammox bacteria are able to specifically target organellar and cell-envelope proteins to their correct final destinations. Experimental approaches are complicated by the absence of pure cultures and genetic accessibility. However, the genome of the anammox bacterium Candidatus "Kuenenia stuttgartiensis" has recently been sequenced. Here, we make use of these genome data to predict the organellar sub-proteome and address the molecular basis of protein sorting in anammox bacteria. RESULTS Two training sets representing organellar (30 proteins) and cell envelope (59 proteins) proteins were constructed based on previous experimental evidence and comparative genomics. Random forest (RF) classifiers trained on these two sets could differentiate between organellar and cell envelope proteins with ~89% accuracy using 400 features consisting of frequencies of two adjacent amino acid combinations. A physicochemically distinct organellar sub-proteome containing 562 proteins was predicted with the best RF classifier. This set included almost all catabolic and respiratory factors encoded in the genome. Apparently, the cytoplasmic membrane performs no catabolic functions. We predict that the Tat-translocation system is located exclusively in the organellar membrane, whereas the Sec-translocation system is located on both the organellar and cytoplasmic membranes. Canonical signal peptides were predicted and validated experimentally, but a specific (N- or C-terminal) signal that could be used for protein targeting to the organelle remained elusive. CONCLUSIONS A physicochemically distinct organellar sub-proteome was predicted from the genome of the anammox bacterium K. stuttgartiensis. This result provides strong in silico support for the existing experimental evidence for the existence of an organelle in this bacterium, and is an important step forward in unravelling a geochemically relevant case of cytoplasmic differentiation in bacteria. The predicted dual location of the Sec-translocation system and the apparent absence of a specific N- or C-terminal signal in the organellar proteins suggests that additional chaperones may be necessary that act on an as-yet unknown property of the targeted proteins.
Collapse
Affiliation(s)
- Marnix H Medema
- Department of Microbiology, Radboud University Nijmegen, Toernooiveld 1, 6525 ED Nijmegen, the Netherlands
| | | | | | | | | | | | | |
Collapse
|
211
|
Cui Z, Hou J, Chen X, Li J, Xie Z, Xue P, Cai T, Wu P, Xu T, Yang F. The Profile of Mitochondrial Proteins and Their Phosphorylation Signaling Network in INS-1 β Cells. J Proteome Res 2010; 9:2898-908. [DOI: 10.1021/pr100139z] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Ziyou Cui
- Laborotary of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, Tianjin Key Laboratory for Biomarkers of Occupational and Environmental Hazard, Medical College of CAPF, Tianjin 300162, China, National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, and Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Junjie Hou
- Laborotary of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, Tianjin Key Laboratory for Biomarkers of Occupational and Environmental Hazard, Medical College of CAPF, Tianjin 300162, China, National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, and Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiulan Chen
- Laborotary of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, Tianjin Key Laboratory for Biomarkers of Occupational and Environmental Hazard, Medical College of CAPF, Tianjin 300162, China, National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, and Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jing Li
- Laborotary of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, Tianjin Key Laboratory for Biomarkers of Occupational and Environmental Hazard, Medical College of CAPF, Tianjin 300162, China, National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, and Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhensheng Xie
- Laborotary of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, Tianjin Key Laboratory for Biomarkers of Occupational and Environmental Hazard, Medical College of CAPF, Tianjin 300162, China, National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, and Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Peng Xue
- Laborotary of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, Tianjin Key Laboratory for Biomarkers of Occupational and Environmental Hazard, Medical College of CAPF, Tianjin 300162, China, National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, and Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tanxi Cai
- Laborotary of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, Tianjin Key Laboratory for Biomarkers of Occupational and Environmental Hazard, Medical College of CAPF, Tianjin 300162, China, National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, and Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Peng Wu
- Laborotary of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, Tianjin Key Laboratory for Biomarkers of Occupational and Environmental Hazard, Medical College of CAPF, Tianjin 300162, China, National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, and Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tao Xu
- Laborotary of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, Tianjin Key Laboratory for Biomarkers of Occupational and Environmental Hazard, Medical College of CAPF, Tianjin 300162, China, National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, and Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fuquan Yang
- Laborotary of Proteomics, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, Tianjin Key Laboratory for Biomarkers of Occupational and Environmental Hazard, Medical College of CAPF, Tianjin 300162, China, National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China, and Graduate University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
212
|
Ban HJ, Heo JY, Oh KS, Park KJ. Identification of type 2 diabetes-associated combination of SNPs using support vector machine. BMC Genet 2010; 11:26. [PMID: 20416077 PMCID: PMC2875201 DOI: 10.1186/1471-2156-11-26] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Accepted: 04/23/2010] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Type 2 diabetes mellitus (T2D), a metabolic disorder characterized by insulin resistance and relative insulin deficiency, is a complex disease of major public health importance. Its incidence is rapidly increasing in the developed countries. Complex diseases are caused by interactions between multiple genes and environmental factors. Most association studies aim to identify individual susceptibility single markers using a simple disease model. Recent studies are trying to estimate the effects of multiple genes and multi-locus in genome-wide association. However, estimating the effects of association is very difficult. We aim to assess the rules for classifying diseased and normal subjects by evaluating potential gene-gene interactions in the same or distinct biological pathways. RESULTS We analyzed the importance of gene-gene interactions in T2D susceptibility by investigating 408 single nucleotide polymorphisms (SNPs) in 87 genes involved in major T2D-related pathways in 462 T2D patients and 456 healthy controls from the Korean cohort studies. We evaluated the support vector machine (SVM) method to differentiate between cases and controls using SNP information in a 10-fold cross-validation test. We achieved a 65.3% prediction rate with a combination of 14 SNPs in 12 genes by using the radial basis function (RBF)-kernel SVM. Similarly, we investigated subpopulation data sets of men and women and identified different SNP combinations with the prediction rates of 70.9% and 70.6%, respectively. As the high-throughput technology for genome-wide SNPs improves, it is likely that a much higher prediction rate with biologically more interesting combination of SNPs can be acquired by using this method. CONCLUSIONS Support Vector Machine based feature selection method in this research found novel association between combinations of SNPs and T2D in a Korean population.
Collapse
Affiliation(s)
- Hyo-Jeong Ban
- Division of Bio-Medical Informatics, Center for Genome Science, National Institute of Health, Korea Center for Disease Control and Prevention, 194, Tongil-Lo, Eunpyung-Gu, Seoul 122-701, Republic of Korea
| | | | | | | |
Collapse
|
213
|
RASCAL is a new human cytomegalovirus-encoded protein that localizes to the nuclear lamina and in cytoplasmic vesicles at late times postinfection. J Virol 2010; 84:6483-96. [PMID: 20392852 DOI: 10.1128/jvi.02462-09] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
The products of numerous open reading frames (ORFs) present in the genome of human cytomegalovirus (CMV) have not been characterized. Here, we describe the identification of a new CMV protein localizing to the nuclear envelope and in cytoplasmic vesicles at late times postinfection. Based on this distinctive localization pattern, we called this new protein nuclear rim-associated cytomegaloviral protein, or RASCAL. Two RASCAL isoforms exist, a short version of 97 amino acids encoded by the majority of CMV strains and a longer version of 176 amino acids encoded by the Towne, Toledo, HAN20, and HAN38 strains. Both isoforms colocalize with lamin B in deep intranuclear invaginations of the inner nuclear membrane (INM) and in novel cytoplasmic vesicular structures possibly derived from the nuclear envelope. INM infoldings have been previously described as sites of nucleocapsid egress, which is mediated by the localized disruption of the nuclear lamina, promoted by the activities of viral and cellular kinases recruited by the lamina-associated proteins UL50 and UL53. RASCAL accumulation at the nuclear membrane required the presence of UL50 but not of UL53. RASCAL and UL50 also appeared to specifically interact, suggesting that RASCAL is a new component of the nuclear egress complex (NEC) and possibly involved in mediating nucleocapsid egress from the nucleus. Finally, the presence of RASCAL within cytoplasmic vesicles raises the intriguing possibility that this protein might participate in additional steps of virion maturation occurring after capsid release from the nucleus.
Collapse
|
214
|
Goudenège D, Avner S, Lucchetti-Miganeh C, Barloy-Hubler F. CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources. BMC Microbiol 2010; 10:88. [PMID: 20331850 PMCID: PMC2850352 DOI: 10.1186/1471-2180-10-88] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 03/23/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The functions of proteins are strongly related to their localization in cell compartments (for example the cytoplasm or membranes) but the experimental determination of the sub-cellular localization of proteomes is laborious and expensive. A fast and low-cost alternative approach is in silico prediction, based on features of the protein primary sequences. However, biologists are confronted with a very large number of computational tools that use different methods that address various localization features with diverse specificities and sensitivities. As a result, exploiting these computer resources to predict protein localization accurately involves querying all tools and comparing every prediction output; this is a painstaking task. Therefore, we developed a comprehensive database, called CoBaltDB, that gathers all prediction outputs concerning complete prokaryotic proteomes. DESCRIPTION The current version of CoBaltDB integrates the results of 43 localization predictors for 784 complete bacterial and archaeal proteomes (2.548.292 proteins in total). CoBaltDB supplies a simple user-friendly interface for retrieving and exploring relevant information about predicted features (such as signal peptide cleavage sites and transmembrane segments). Data are organized into three work-sets ("specialized tools", "meta-tools" and "additional tools"). The database can be queried using the organism name, a locus tag or a list of locus tags and may be browsed using numerous graphical and text displays. CONCLUSIONS With its new functionalities, CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations with higher confidence than previously possible. CoBaltDB is available at http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software/cobalten.
Collapse
Affiliation(s)
- David Goudenège
- CNRS UMR 6026, ICM, Equipe B@SIC, Université de Rennes 1, Campus de Beaulieu, 35042 Rennes, France
| | | | | | | |
Collapse
|
215
|
Briesemeister S, Rahnenführer J, Kohlbacher O. Going from where to why--interpretable prediction of protein subcellular localization. ACTA ACUST UNITED AC 2010; 26:1232-8. [PMID: 20299325 PMCID: PMC2859129 DOI: 10.1093/bioinformatics/btq115] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Protein subcellular localization is pivotal in understanding a protein's function. Computational prediction of subcellular localization has become a viable alternative to experimental approaches. While current machine learning-based methods yield good prediction accuracy, most of them suffer from two key problems: lack of interpretability and dealing with multiple locations. RESULTS We present YLoc, a novel method for predicting protein subcellular localization that addresses these issues. Due to its simple architecture, YLoc can identify the relevant features of a protein sequence contributing to its subcellular localization, e.g. localization signals or motifs relevant to protein sorting. We present several example applications where YLoc identifies the sequence features responsible for protein localization, and thus reveals not only to which location a protein is transported to, but also why it is transported there. YLoc also provides a confidence estimate for the prediction. Thus, the user can decide what level of error is acceptable for a prediction. Due to a probabilistic approach and the use of several thousands of dual-targeted proteins, YLoc is able to predict multiple locations per protein. YLoc was benchmarked using several independent datasets for protein subcellular localization and performs on par with other state-of-the-art predictors. Disregarding low-confidence predictions, YLoc can achieve prediction accuracies of over 90%. Moreover, we show that YLoc is able to reliably predict multiple locations and outperforms the best predictors in this area. AVAILABILITY www.multiloc.org/YLoc.
Collapse
|
216
|
|
217
|
KAWAI K, TAKAHASHI Y. Virtual Screening of Antihypertensive Drugs Using Support Vector Machines. JOURNAL OF COMPUTER CHEMISTRY-JAPAN 2010. [DOI: 10.2477/jccj.h2137] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
218
|
Hu Y, Lehrach H, Janitz M. Comparative analysis of an experimental subcellular protein localization assay and in silico prediction methods. J Mol Histol 2009; 40:343-52. [PMID: 20033263 PMCID: PMC2834777 DOI: 10.1007/s10735-009-9247-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Accepted: 12/01/2009] [Indexed: 12/12/2022]
Abstract
The subcellular localization of a protein can provide important information about its function within the cell. As eukaryotic cells and particularly mammalian cells are characterized by a high degree of compartmentalization, most protein activities can be assigned to particular cellular compartments. The categorization of proteins by their subcellular localization is therefore one of the essential goals of the functional annotation of the human genome. We previously performed a subcellular localization screen of 52 proteins encoded on human chromosome 21. In the current study, we compared the experimental localization data to the in silico results generated by nine leading software packages with different prediction resolutions. The comparison revealed striking differences between the programs in the accuracy of their subcellular protein localization predictions. Our results strongly suggest that the recently developed predictors utilizing multiple prediction methods tend to provide significantly better performance over purely sequence-based or homology-based predictions.
Collapse
Affiliation(s)
- Yuhui Hu
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Ihnestr. 73, 14195 Berlin, Germany
- Max Delbrück Center for Molecular Medicine (MDC) in der Helmholtz-Gemeinschaft, The Berlin Institute for Medical Systems Biology, 13125 Berlin-Buch, Germany
| | - Hans Lehrach
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Ihnestr. 73, 14195 Berlin, Germany
| | - Michal Janitz
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Ihnestr. 73, 14195 Berlin, Germany
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052 Australia
| |
Collapse
|
219
|
Huang WL, Tung CW, Huang HL, Ho SY. Predicting protein subnuclear localization using GO-amino-acid composition features. Biosystems 2009; 98:73-9. [DOI: 10.1016/j.biosystems.2009.06.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2009] [Revised: 06/10/2009] [Accepted: 06/26/2009] [Indexed: 10/20/2022]
|
220
|
Using auto covariance method for functional discrimination of membrane proteins based on evolution information. Amino Acids 2009; 38:1497-503. [DOI: 10.1007/s00726-009-0362-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2009] [Accepted: 09/24/2009] [Indexed: 11/29/2022]
|
221
|
Briesemeister S, Blum T, Brady S, Lam Y, Kohlbacher O, Shatkay H. SherLoc2: A High-Accuracy Hybrid Method for Predicting Subcellular Localization of Proteins. J Proteome Res 2009; 8:5363-6. [PMID: 19764776 DOI: 10.1021/pr900665y] [Citation(s) in RCA: 106] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Sebastian Briesemeister
- Division for Simulation of Biological Systems, Center for Bioinformatics Tübingen, Eberhard-Karls-Universität Tübingen, Germany, and School of Computing, Queen’s University, Kingston, Ontario, Canada
| | - Torsten Blum
- Division for Simulation of Biological Systems, Center for Bioinformatics Tübingen, Eberhard-Karls-Universität Tübingen, Germany, and School of Computing, Queen’s University, Kingston, Ontario, Canada
| | - Scott Brady
- Division for Simulation of Biological Systems, Center for Bioinformatics Tübingen, Eberhard-Karls-Universität Tübingen, Germany, and School of Computing, Queen’s University, Kingston, Ontario, Canada
| | - Yin Lam
- Division for Simulation of Biological Systems, Center for Bioinformatics Tübingen, Eberhard-Karls-Universität Tübingen, Germany, and School of Computing, Queen’s University, Kingston, Ontario, Canada
| | - Oliver Kohlbacher
- Division for Simulation of Biological Systems, Center for Bioinformatics Tübingen, Eberhard-Karls-Universität Tübingen, Germany, and School of Computing, Queen’s University, Kingston, Ontario, Canada
| | - Hagit Shatkay
- Division for Simulation of Biological Systems, Center for Bioinformatics Tübingen, Eberhard-Karls-Universität Tübingen, Germany, and School of Computing, Queen’s University, Kingston, Ontario, Canada
| |
Collapse
|
222
|
Keerthikumar S, Bhadra S, Kandasamy K, Raju R, Ramachandra YL, Bhattacharyya C, Imai K, Ohara O, Mohan S, Pandey A. Prediction of candidate primary immunodeficiency disease genes using a support vector machine learning approach. DNA Res 2009; 16:345-51. [PMID: 19801557 PMCID: PMC2780952 DOI: 10.1093/dnares/dsp019] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Screening and early identification of primary immunodeficiency disease (PID) genes is a major challenge for physicians. Many resources have catalogued molecular alterations in known PID genes along with their associated clinical and immunological phenotypes. However, these resources do not assist in identifying candidate PID genes. We have recently developed a platform designated Resource of Asian PDIs, which hosts information pertaining to molecular alterations, protein-protein interaction networks, mouse studies and microarray gene expression profiling of all known PID genes. Using this resource as a discovery tool, we describe the development of an algorithm for prediction of candidate PID genes. Using a support vector machine learning approach, we have predicted 1442 candidate PID genes using 69 binary features of 148 known PID genes and 3162 non-PID genes as a training data set. The power of this approach is illustrated by the fact that six of the predicted genes have recently been experimentally confirmed to be PID genes. The remaining genes in this predicted data set represent attractive candidates for testing in patients where the etiology cannot be ascribed to any of the known PID genes.
Collapse
|
223
|
Matre P, Meyer C, Lillo C. Diversity in subcellular targeting of the PP2A B'eta subfamily members. PLANTA 2009; 230:935-45. [PMID: 19672620 DOI: 10.1007/s00425-009-0998-z] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2009] [Accepted: 07/22/2009] [Indexed: 05/20/2023]
Abstract
Protein phosphatase 2A (PP2A) is a serine/threonine-specific phosphatase comprising a catalytic subunit (C), a scaffolding subunit (A), and a regulatory subunit (B). The B subunits are believed to be responsible for substrate specificity and localization of the PP2A complex. In plants, three families of B subunits exist, i.e. B (B55), B', and B''. Here, we report differential subcellular targeting within the Arabidopsis B'eta subfamily, which consists of the close homologs B'eta, B'theta, B'gamma and B'zeta. Phenotypes of corresponding knockouts were observed, and particularly revealed delayed flowering for the B'eta knockout. The B' subunits were linked to fluorescent tags and transiently expressed in various tissues of onion, tobacco and Arabidopsis. B'eta and B'gamma targeted the cytosol and nucleus. B'zeta localized to the cytoplasm and partly co-localized with mitochondrial markers when the N-terminus was free. Provided its C-terminus was free, the B'theta subunit targeted peroxisomes. The importance of the C-terminal end for peroxisomal targeting was further confirmed by truncation of the C-terminus. The results revealed that the closely related B' subunits are targeting different organelles in plants, and exemplify the usage of the peptide serine-serine-leucine as a PTS1 peroxisomal signaling peptide.
Collapse
Affiliation(s)
- Polina Matre
- Faculty of Science and Technology, University of Stavanger, Centre for Organelle Research, 4036 Stavanger, Norway
| | | | | |
Collapse
|
224
|
Blum T, Briesemeister S, Kohlbacher O. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction. BMC Bioinformatics 2009; 10:274. [PMID: 19723330 PMCID: PMC2745392 DOI: 10.1186/1471-2105-10-274] [Citation(s) in RCA: 212] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2009] [Accepted: 09/01/2009] [Indexed: 11/10/2022] Open
Abstract
Background Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. Results We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study with five localizations, MultiLoc2 performs considerably better than other methods for animal and plant proteins and comparably for fungal proteins. Furthermore, MultiLoc2 performs clearly better when using a second dataset that extends the benchmark study to all eleven main eukaryotic subcellular localizations. Conclusion MultiLoc2 is an extensive high-performance subcellular protein localization prediction system. By incorporating phylogenetic profiles and Gene Ontology terms MultiLoc2 yields higher accuracies compared to its previous version. Moreover, it outperforms other prediction systems in two benchmarks studies. MultiLoc2 is available as user-friendly and free web-service, available at: .
Collapse
Affiliation(s)
- Torsten Blum
- Division for Simulation of Biological Systems, ZBIT/WSI, Eberhard-Karls-Universität Tübingen, Germany.
| | | | | |
Collapse
|
225
|
SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm. J Theor Biol 2009; 261:330-5. [PMID: 19679138 DOI: 10.1016/j.jtbi.2009.08.004] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2009] [Revised: 07/30/2009] [Accepted: 08/01/2009] [Indexed: 11/23/2022]
Abstract
The chloroplast is a type of plant specific subcellular organelle. It is of central importance in several biological processes like photosynthesis and amino acid biosynthesis. Thus, understanding the function of chloroplast proteins is of significant value. Since the function of chloroplast proteins correlates with their subchloroplast locations, the knowledge of their subchloroplast locations can be very helpful in understanding their role in the biological processes. In the current paper, by introducing the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm, we developed a method for predicting the protein subchloroplast locations. This is the first algorithm for predicting the protein subchloroplast locations. We have implemented our algorithm as an online service, SubChlo (http://bioinfo.au.tsinghua.edu.cn/subchlo). This service may be useful to the chloroplast proteome research.
Collapse
|
226
|
Li J, Cai T, Wu P, Cui Z, Chen X, Hou J, Xie Z, Xue P, Shi L, Liu P, Yates JR, Yang F. Proteomic analysis of mitochondria from Caenorhabditis elegans. Proteomics 2009; 9:4539-53. [DOI: 10.1002/pmic.200900101] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
227
|
Cai YD, Lu L, Chen L, He JF. Predicting subcellular location of proteins using integrated-algorithm method. Mol Divers 2009; 14:551-8. [DOI: 10.1007/s11030-009-9182-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2009] [Accepted: 07/11/2009] [Indexed: 12/11/2022]
|
228
|
Nucleotide's bilinear indices: novel bio-macromolecular descriptors for bioinformatics studies of nucleic acids. I. Prediction of paromomycin's affinity constant with HIV-1 Psi-RNA packaging region. J Theor Biol 2009; 259:229-41. [PMID: 19272394 DOI: 10.1016/j.jtbi.2009.02.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2008] [Revised: 02/24/2009] [Accepted: 02/25/2009] [Indexed: 02/03/2023]
Abstract
A new set of nucleotide-based bio-macromolecular descriptors are presented. This novel approach to bio-macromolecular design from a linear algebra point of view is relevant to nucleic acids quantitative structure-activity relationship (QSAR) studies. These bio-macromolecular indices are based on the calculus of bilinear maps on Re(n)[b(mk)(x (m),y (m)):Re(n) x Re(n)-->Re] in canonical basis. Nucleic acid's bilinear indices are calculated from kth power of non-stochastic and stochastic nucleotide's graph-theoretic electronic-contact matrices, M(m)(k) and (s)M(m)(k), respectively. That is to say, the kth non-stochastic and stochastic nucleic acid's bilinear indices are calculated using M(m)(k) and (s)M(m)(k) as matrix operators of bilinear transformations. Moreover, biochemical information is codified by using different pair combinations of nucleotide-base properties as weightings (experimental molar absorption coefficient epsilon(260) at 260 nm and pH=7.0, first (Delta E(1)) and second (Delta E(2)) single excitation energies in eV, and first (f(1)) and second (f(2)) oscillator strength values (of the first singlet excitation energies) of the nucleotide DNA-RNA bases. As example of this approach, an interaction study of the antibiotic paromomycin with the packaging region of the HIV-1 Psi-RNA have been performed and it have been obtained several linear models in order to predict the interaction strength. The best linear model obtained by using non-stochastic bilinear indices explains about 91% of the variance of the experimental Log K (R=0.95 and s=0.08 x 10(-4)M(-1)) as long as the best stochastic bilinear indices-based equation account for 93% of the Log K variance (R=0.97 and s=0.07 x 10(-4)M(-1)). The leave-one-out (LOO) press statistics, evidenced high predictive ability of both models (q(2)=0.86 and s(cv)=0.09 x 10(-4)M(-1) for non-stochastic and q(2)=0.91 and s(cv)=0.08 x 10(-4)M(-1) for stochastic bilinear indices). The nucleic acid's bilinear indices-based models compared favorably with other nucleic acid's indices-based approaches reported nowadays. These models also permit the interpretation of the driving forces of the interaction process. In this sense, developed equations involve short-reaching (k<or=3), middle-reaching (4<k<9), and far-reaching (k=10 or greater) nucleotide's bilinear indices. This situation points to electronic and topologic nucleotide's backbone interactions control of the stability profile of paromomycin-RNA complexes. Consequently, the present approach represents a novel and rather promising way to theoretical-biology studies.
Collapse
|
229
|
Qiu P, Cai XY, Ding W, Zhang Q, Norris ED, Greene JR. HCV genotyping using statistical classification approach. J Biomed Sci 2009; 16:62. [PMID: 19586537 PMCID: PMC2720937 DOI: 10.1186/1423-0127-16-62] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2009] [Accepted: 07/08/2009] [Indexed: 01/24/2023] Open
Abstract
The genotype of Hepatitis C Virus (HCV) strains is an important determinant of the severity and aggressiveness of liver infection as well as patient response to antiviral therapy. Fast and accurate determination of viral genotype could provide direction in the clinical management of patients with chronic HCV infections. Using publicly available HCV nucleotide sequences, we built a global Position Weight Matrix (PWM) for the HCV genome. Based on the PWM, a set of genotype specific nucleotide sequence "signatures" were selected from the 5' NCR, CORE, E1, and NS5B regions of the HCV genome. We evaluated the predictive power of these signatures for predicting the most common HCV genotypes and subtypes. We observed that nucleotide sequence signatures selected from NS5B and E1 regions generally demonstrated stronger discriminant power in differentiating major HCV genotypes and subtypes than that from 5' NCR and CORE regions. Two discriminant methods were used to build predictive models. Through 10 fold cross validation, over 99% prediction accuracy was achieved using both support vector machine (SVM) and random forest based classification methods in a dataset of 1134 sequences for NS5B and 947 sequences for E1. Prediction accuracy for each genotype is also reported.
Collapse
Affiliation(s)
- Ping Qiu
- Molecular Design and Informatics, Schering-Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, NJ 07033, USA.
| | | | | | | | | | | |
Collapse
|
230
|
González-Díaz H, Dea-Ayuela MA, Pérez-Montoto LG, Prado-Prado FJ, Agüero-Chapín G, Bolas-Fernández F, Vazquez-Padrón RI, Ubeira FM. QSAR for RNases and theoretic-experimental study of molecular diversity on peptide mass fingerprints of a new Leishmania infantum protein. Mol Divers 2009; 14:349-69. [PMID: 19578942 PMCID: PMC7088557 DOI: 10.1007/s11030-009-9178-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2009] [Accepted: 06/13/2009] [Indexed: 11/29/2022]
Abstract
The toxicity and low success of current treatments for Leishmaniosis determines the search of new peptide drugs and/or molecular targets in Leishmania pathogen species (L. infantum and L. major). For example, Ribonucleases (RNases) are enzymes relevant to several biologic processes; then, theoretical and experimental study of the molecular diversity of Peptide Mass Fingerprints (PMFs) of RNases is useful for drug design. This study introduces a methodology that combines QSAR models, 2D-Electrophoresis (2D-E), MALDI-TOF Mass Spectroscopy (MS), BLAST alignment, and Molecular Dynamics (MD) to explore PMFs of RNases. We illustrate this approach by investigating for the first time the PMFs of a new protein of L. infantum. Here we report and compare new versus old predictive models for RNases based on Topological Indices (TIs) of Markov Pseudo-Folding Lattices. These group of indices called Pseudo-folding Lattice 2D-TIs include: Spectral moments pi ( k )(x,y), Mean Electrostatic potentials xi ( k )(x,y), and Entropy measures theta ( k )(x,y). The accuracy of the models (training/cross-validation) was as follows: xi ( k )(x,y)-model (96.0%/91.7%)>pi ( k )(x,y)-model (84.7/83.3) > theta ( k )(x,y)-model (66.0/66.7). We also carried out a 2D-E analysis of biological samples of L. infantum promastigotes focusing on a 2D-E gel spot of one unknown protein with M<20, 100 and pI <7. MASCOT search identified 20 proteins with Mowse score >30, but not one >52 (threshold value), the higher value of 42 was for a probable DNA-directed RNA polymerase. However, we determined experimentally the sequence of more than 140 peptides. We used QSAR models to predict RNase scores for these peptides and BLAST alignment to confirm some results. We also calculated 3D-folding TIs based on MD experiments and compared 2D versus 3D-TIs on molecular phylogenetic analysis of the molecular diversity of these peptides. This combined strategy may be of interest in drug development or target identification.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Microbiology and Parasitology, and Department of Organic Chemistry, Faculty of Pharmacy, USC, 15782, Santiago de Compostela, Spain.
| | | | | | | | | | | | | | | |
Collapse
|
231
|
Mitsuda N, Ohme-Takagi M. Functional analysis of transcription factors in Arabidopsis. PLANT & CELL PHYSIOLOGY 2009; 50:1232-48. [PMID: 19478073 PMCID: PMC2709548 DOI: 10.1093/pcp/pcp075] [Citation(s) in RCA: 184] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2009] [Accepted: 05/26/2009] [Indexed: 05/17/2023]
Abstract
Transcription factors (TFs) regulate the expression of genes at the transcriptional level. Modification of TF activity dynamically alters the transcriptome, which leads to metabolic and phenotypic changes. Thus, functional analysis of TFs using 'omics-based' methodologies is one of the most important areas of the post-genome era. In this mini-review, we present an overview of Arabidopsis TFs and introduce strategies for the functional analysis of plant TFs, which include both traditional and recently developed technologies. These strategies can be assigned to five categories: bioinformatic analysis; analysis of molecular function; expression analysis; phenotype analysis; and network analysis for the description of entire transcriptional regulatory networks.
Collapse
Affiliation(s)
| | - Masaru Ohme-Takagi
- Research Institute of Genome-Based Biofactory, National Institute of Advanced Industrial Science and Technology (AIST), Central 4, Higashi 1-1-1, Tsukuba, 305-8562 Japan
| |
Collapse
|
232
|
Qiu JD, Huang JH, Liang RP, Lu XQ. Prediction of G-protein-coupled receptor classes based on the concept of Chou’s pseudo amino acid composition: An approach from discrete wavelet transform. Anal Biochem 2009; 390:68-73. [DOI: 10.1016/j.ab.2009.04.009] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2009] [Revised: 03/27/2009] [Accepted: 04/06/2009] [Indexed: 10/20/2022]
|
233
|
Abstract
This chapter outlines key considerations for constructing and implementing an EST database. Instead of showing the technological details step by step, emphasis is put on the design of an EST database suited to the specific needs of EST projects and how to choose the most suitable tools. Using TBestDB as an example, we illustrate the essential factors to be considered for database construction and the steps for data population and annotation. This process employs technologies such as PostgreSQL, Perl, and PHP to build the database and interface, and tools such as AutoFACT for data processing and annotation. We discuss these in comparison to other available technologies and tools, and explain the reasons for our choices.
Collapse
|
234
|
A novel representation for apoptosis protein subcellular localization prediction using support vector machine. J Theor Biol 2009; 259:361-5. [DOI: 10.1016/j.jtbi.2009.03.025] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2008] [Revised: 03/13/2009] [Accepted: 03/13/2009] [Indexed: 11/21/2022]
|
235
|
Dumas E, Desvaux M, Chambon C, Hébraud M. Insight into the core and variant exoproteomes of Listeria monocytogenes species by comparative subproteomic analysis. Proteomics 2009; 9:3136-55. [DOI: 10.1002/pmic.200800765] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
236
|
Kaundal R, Raghava GPS. RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information. Proteomics 2009; 9:2324-42. [DOI: 10.1002/pmic.200700597] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
237
|
Liu ZJ, Shao FX, Tang GY, Shan L, Bi YP. [Cloning and characterization of a transcription factor ZmNAC1 in maize (Zea mays)]. YI CHUAN = HEREDITAS 2009; 31:199-205. [PMID: 19273429 DOI: 10.3724/sp.j.1005.2009.00199] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
NAC transcription factors are a family of functionally diverse proteins. They are unique to plants and play an important role in regulation of plant growth and development, hormone regulation and responses to various stresses. A cDNA encoding the NAC-like gene homologue was isolated from maize (Zea mays L.) by RT-PCR and designated ZmNAC1 (GenBank Accession No. EU224278). Sequence analysis showed that cDNA of ZmNAC1 was 1,029 bp long and contained a single open reading frame (ORF, 26 to approximately 907 bp). The predicted ZmNAC1 protein has 293 amino acids with an estimated molecular mass of 32.3 kDa and an isoelectric point of 8.65. RT-PCR analysis showed that the expression of ZmNAC1 was induced by low temperature, PEG, salt, and ABA, respectively. These results suggest that ZmNAC1 may play important roles in biotic and abiotic resistance pathways. This is the first NAC-like gene reported in maize.
Collapse
Affiliation(s)
- Zhan-Ji Liu
- Hi-Tech Research Centre, Shandong Academy of Agricultural Sciences; Key Laboratory for Genetic Improvement of Crop, Animal and Poultry of Shandong Province, Jinan 250100, China.
| | | | | | | | | |
Collapse
|
238
|
Keller B, Meier M, Adamski J. Comparison of predicted and experimental subcellular localization of two putative rat steroid dehydrogenases from the short-chain dehydrogenase/reductase protein superfamily. Mol Cell Endocrinol 2009; 301:43-6. [PMID: 18775470 DOI: 10.1016/j.mce.2008.07.022] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/03/2008] [Revised: 07/24/2008] [Accepted: 07/24/2008] [Indexed: 10/21/2022]
Abstract
In the characterization of newly identified proteins, subcellular localization studies can provide important hints to the proteins' metabolic functions. Depending on the biochemical task of an enzyme, certain subcellular environmental conditions as pH or availability of cofactors and substrates have to be fulfilled. Consequently, misdirected proteins often cannot conduct the proper chemical reaction. This study is aimed at detecting differences in bioinformatic and wet lab experiments and presenting ways for reliable analysis of subcellular localization. On a set of ten enzymes from the short-chain dehydrogenase/reductase (SDR) superfamily, we have performed predictions and experimental analyses of subcellular localization. Exemplarily, we show the localization studies on rat short-chain dehydrogenases/reductases dhrs7b and dhrs8. We demonstrate in particular that all of the prediction algorithms tested failed to assign the SDR enzymes to an experimentally verified subcellular compartment.
Collapse
Affiliation(s)
- Brigitte Keller
- Helmholtz Zentrum München, German Research Center for Environmental Health, Institute for Experimental Genetics, Genome Analysis Center, Ingolstaedter Landstrasse 1, 85764 Neuherberg, Germany
| | | | | |
Collapse
|
239
|
Carrie C, Kühn K, Murcha MW, Duncan O, Small ID, O'Toole N, Whelan J. Approaches to defining dual-targeted proteins in Arabidopsis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2009; 57:1128-39. [PMID: 19036033 DOI: 10.1111/j.1365-313x.2008.03745.x] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
A variety of approaches were used to predict dual-targeted proteins in Arabidopsis thaliana. These predictions were experimentally tested using GFP fusions. Twelve new dual-targeted proteins were identified: five that were dual-targeted to mitochondria and plastids, six that were dual-targeted to mitochondria and peroxisomes, and one that was dual-targeted to mitochondria and the nucleus. Two methods to predict dual-targeted proteins had a high success rate: (1) combining the AraPerox database with a variety of subcellular prediction programs to identify mitochondrial- and peroxisomal-targeted proteins, and (2) using a variety of prediction programs on a biochemical pathway or process known to contain at least one dual-targeted protein. Several technical parameters need to be taken into account before assigning subcellular localization using GFP fusion proteins. The position of GFP with respect to the tagged polypeptide, the tissue or cells used to detect subcellular localization, and the portion of a candidate protein fused to GFP are all relevant to the expression and targeting of a fusion protein. Testing all gene models for a chromosomal locus is required if more than one model exists.
Collapse
Affiliation(s)
- Chris Carrie
- ARC Centre of Excellence in Plant Energy Biology, University of Western Australia, 35 Stirling Highway, Crawley 6009, WA, Australia
| | | | | | | | | | | | | |
Collapse
|
240
|
Vicentini R, Menossi M. The predicted subcellular localisation of the sugarcane proteome. FUNCTIONAL PLANT BIOLOGY : FPB 2009; 36:242-250. [PMID: 32688643 DOI: 10.1071/fp08252] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2008] [Accepted: 01/12/2009] [Indexed: 06/11/2023]
Abstract
Plant cells are highly organised, and many biological processes are associated with specialised subcellular structures. Subcellular localisation is a key feature of proteins, since it is related to biological function. The subcellular localisation of such proteins can be predicted, providing information that is particularly relevant to those proteins with unknown or putative function. We performed the first in silico genome-wide subcellular localisation analysis for the sugarcane transcriptome (with 11 882 predicted proteins) and found that most of the proteins were localised in four compartments: nucleus (44%), cytosol (19%), mitochondria (12%) and secretory destinations (11%). We also showed that ~19% of the proteins were localised in multiple compartments. Other results allowed identification of a potential set of sugarcane proteins that could show dual targeting by the use of N-truncated forms that started from the nearest downstream in-frame AUG codons. This study was a first step in increasing knowledge about the subcellular localisation of the sugarcane proteome.
Collapse
Affiliation(s)
- Renato Vicentini
- Departamento de Genética e Evolução, Laboratório de Genoma Funcional, Instituto de Biologia, CP 6109, Universidade Estadual de Campinas - UNICAMP, 13083-970, Campinas, SP, Brazil
| | - Marcelo Menossi
- Departamento de Genética e Evolução, Laboratório de Genoma Funcional, Instituto de Biologia, CP 6109, Universidade Estadual de Campinas - UNICAMP, 13083-970, Campinas, SP, Brazil
| |
Collapse
|
241
|
Bhasin M, Reinherz EL, Reche PA. Recognition and classification of histones using support vector machine. J Comput Biol 2009; 13:102-12. [PMID: 16472024 DOI: 10.1089/cmb.2006.13.102] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Histones are DNA-binding proteins found in the chromatin of all eukaryotic cells. They are highly conserved and can be grouped into five major classes: H1/H5, H2A, H2B, H3, and H4. Two copies of H2A, H2B, H3, and H4 bind to about 160 base pairs of DNA forming the core of the nucleosome (the repeating structure of chromatin) and H1/H5 bind to its DNA linker sequence. Overall, histones have a high arginine/lysine content that is optimal for interaction with DNA. This sequence bias can make the classification of histones difficult using standard sequence similarity approaches. Therefore, in this paper, we applied support vector machine (SVM) to recognize and classify histones on the basis of their amino acid and dipeptide composition. On evaluation through a five-fold cross-validation, the SVM-based method was able to distinguish histones from nonhistones (nuclear proteins) with an accuracy around 98%. Similarly, we obtained an overall >95% accuracy in discriminating the five classes of histones through the application of 1-versus-rest (1-v-r) SVM. Finally, we have applied this SVM-based method to the detection of histones from whole proteomes and found a comparable sensitivity to that accomplished by hidden Markov motifs (HMM) profiles.
Collapse
Affiliation(s)
- Manoj Bhasin
- Laboratory of Immunobiology and Department of Medical Oncology, Dana-Farber Cancer Institute and Department of Medicine, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | | | | |
Collapse
|
242
|
Tian J, Wu N, Guo J, Fan Y. Prediction of amyloid fibril-forming segments based on a support vector machine. BMC Bioinformatics 2009; 10 Suppl 1:S45. [PMID: 19208147 PMCID: PMC2648769 DOI: 10.1186/1471-2105-10-s1-s45] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Amyloid fibrillar aggregates of proteins or polypeptides are known to be associated with many human diseases. Recent studies suggest that short protein regions trigger this aggregation. Thus, identifying these short peptides is critical for understanding diseases and finding potential therapeutic targets. RESULTS We propose a method, named Pafig (Prediction of amyloid fibril-forming segments) based on support vector machines, to identify the hexpeptides associated with amyloid fibrillar aggregates. The features of Pafig were obtained by a two-round selection from AAindex. Using a 10-fold cross validation test on Hexpepset dataset, Pafig performed well with regards to overall accuracy of 81% and Matthews correlation coefficient of 0.63. Pafig was used to predict the potential fibril-forming hexpeptides in all of the 64,000,000 hexpeptides. As a result, approximately 5.08% of hexpeptides showed a high aggregation propensity. In the predicted fibril-forming hexpeptides, the amino acids--alanine, phenylalanine, isoleucine, leucine and valine occurred at the higher frequencies and the amino acids--aspartic acid, glutamic acid, histidine, lysine, arginine and praline, appeared with lower frequencies. CONCLUSION The performance of Pafig indicates that it is a powerful tool for identifying the hexpeptides associated with fibrillar aggregates and will be useful for large-scale analysis of proteomic data.
Collapse
Affiliation(s)
- Jian Tian
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, PR China.
| | | | | | | |
Collapse
|
243
|
Tung TQ, Lee D. A method to improve protein subcellular localization prediction by integrating various biological data sources. BMC Bioinformatics 2009; 10 Suppl 1:S43. [PMID: 19208145 PMCID: PMC2648781 DOI: 10.1186/1471-2105-10-s1-s43] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the number of protein in each location remarkably different; and there are many proteins located in multiple locations. Thus it is necessary to explore new features and appropriate classification methods to improve the prediction performance. Results In this paper we propose a new predicting method which combines two key ideas: 1) Information of neighbour proteins in a probabilistic gene network is integrated to enrich the prediction features. 2) Fuzzy k-NN, a classification method based on fuzzy set theory is applied to predict protein locating in multiple sites. Experiment was conducted on a dataset consisting of 22 locations from Budding yeast proteins and significant improvement was observed. Conclusion Our results suggest that the neighbourhood information from functional gene networks is predictive to subcellular localization. The proposed method thus can be integrated and complementary to other available prediction methods.
Collapse
Affiliation(s)
- Thai Quang Tung
- Department of Bio & Brain Engineering, KAIST, Daejeon City, Republic of Korea.
| | | |
Collapse
|
244
|
Abstract
BACKGROUND Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational method. The location information can indicate key functionalities of proteins. Accurate predictions of subcellular localizations of protein can aid the prediction of protein function and genome annotation, as well as the identification of drug targets. Computational methods based on machine learning, such as support vector machine approaches, have already been widely used in the prediction of protein subcellular localization. However, a major drawback of these machine learning-based approaches is that a large amount of data should be labeled in order to let the prediction system learn a classifier of good generalization ability. However, in real world cases, it is laborious, expensive and time-consuming to experimentally determine the subcellular localization of a protein and prepare instances of labeled data. RESULTS In this paper, we present an approach based on a new learning framework, semi-supervised learning, which can use much fewer labeled instances to construct a high quality prediction model. We construct an initial classifier using a small set of labeled examples first, and then use unlabeled instances to refine the classifier for future predictions. CONCLUSION Experimental results show that our methods can effectively reduce the workload for labeling data using the unlabeled data. Our method is shown to enhance the state-of-the-art prediction results of SVM classifiers by more than 10%.
Collapse
Affiliation(s)
- Qian Xu
- Program of Bioengineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.
| | | | | | | | | |
Collapse
|
245
|
Gao QB, Jin ZC, Ye XF, Wu C, He J. Prediction of nuclear receptors with optimal pseudo amino acid composition. Anal Biochem 2009; 387:54-9. [PMID: 19454254 DOI: 10.1016/j.ab.2009.01.018] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2008] [Revised: 12/04/2008] [Accepted: 01/09/2009] [Indexed: 10/21/2022]
Abstract
Nuclear receptors are involved in multiple cellular signaling pathways that affect and regulate processes such as organ development and maintenance, ion transport, homeostasis, and apoptosis. In this article, an optimal pseudo amino acid composition based on physicochemical characters of amino acids is suggested to represent proteins for predicting the subfamilies of nuclear receptors. Six physicochemical characters of amino acids were adopted to generate the protein sequence features via web server PseAAC. The optimal values of the rank of correlation factor and the weighting factor about PseAAC were determined to get the appropriate descriptor of proteins that leads to the best performance. A nonredundant dataset of nuclear receptors in four subfamilies is constructed to evaluate the method using support vector machines. An overall accuracy of 99.6% was achieved in the fivefold cross-validation test as well as the jackknife test, and an overall accuracy of 98.4% was reached in a blind dataset test. The performance is very competitive with that of some previous methods.
Collapse
Affiliation(s)
- Qing-Bin Gao
- Department of Health Statistics, Second Military Medical University, Shanghai 200433, China
| | | | | | | | | |
Collapse
|
246
|
Improving Protein Localization Prediction Using Amino Acid Group Based Physichemical Encoding. ACTA ACUST UNITED AC 2009. [DOI: 10.1007/978-3-642-00727-9_24] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
|
247
|
Elstner M, Andreoli C, Klopstock T, Meitinger T, Prokisch H. The mitochondrial proteome database: MitoP2. Methods Enzymol 2009; 457:3-20. [PMID: 19426859 DOI: 10.1016/s0076-6879(09)05001-0] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Defining the mitochondrial proteome is a prerequisite for fully understanding the organelles function as well as mechanisms underlying mitochondrial pathology. The core functions of mitochondria include oxidative phosphorylation, amino acid metabolism, fatty acid oxidation, and ion homeostasis. In addition to these well-known functions, many crucial properties in cell signaling, cell differentiation and cell death are only now being elucidated, and with them the proteins involved. With the wealth of information arriving from single protein studies and sophisticated genome-wide approaches, MitoP2 was designed and is maintained to consolidate knowledge on mitochondrial proteins in one comprehensive database, thus making all pertinent data readily accessible (http://www.mitop2.de). Although the identification of the human mitochondrial proteome is ultimately the prime objective, integration of other species includes Saccharomyces cerevisiae, mouse, Arabidopsis thaliana, and Neurospora crassa so orthology between these species can be interrogated. Data from genome-wide studies can be individually retrieved and are also processed by a support vector machine (SVM) to generate a score that indicates the likelihood of a candidate protein having a mitochondrial location. Manually validated proteins constitute the reference set of the database that contains over 590 yeast, 920 human, and 1020 mouse entries, and that is used for benchmarking the SVM score. Multiple search options allow for the interrogation of the reference set, candidates, disease related proteins, chromosome locations as well as availability of mouse models. Taken together, MitoP2 is a valuable tool for basic scientists, geneticists, and clinicians who are investigating mitochondrial physiology and dysfunction.
Collapse
Affiliation(s)
- M Elstner
- Institute of Human Genetics, Helmholtz Zentrum Munich-German Research Center for Environmental Health, Neuherberg, Germany
| | | | | | | | | |
Collapse
|
248
|
Wichadakul D, McDermott J, Samudrala R. Prediction and integration of regulatory and protein-protein interactions. Methods Mol Biol 2009; 541:101-43. [PMID: 19381527 DOI: 10.1007/978-1-59745-243-4_6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Knowledge of transcriptional regulatory interactions (TRIs) is essential for exploring functional genomics and systems biology in any organism. While several results from genome-wide analysis of transcriptional regulatory networks are available, they are limited to model organisms such as yeast ( 1 ) and worm ( 2 ). Beyond these networks, experiments on TRIs study only individual genes and proteins of specific interest. In this chapter, we present a method for the integration of various data sets to predict TRIs for 54 organisms in the Bioverse ( 3 ). We describe how to compile and handle various formats and identifiers of data sets from different sources and how to predict TRIs using a homology-based approach, utilizing the compiled data sets. Integrated data sets include experimentally verified TRIs, binding sites of transcription factors, promoter sequences, protein subcellular localization, and protein families. Predicted TRIs expand the networks of gene regulation for a large number of organisms. The integration of experimentally verified and predicted TRIs with other known protein-protein interactions (PPIs) gives insight into specific pathways, network motifs, and the topological dynamics of an integrated network with gene expression under different conditions, essential for exploring functional genomics and systems biology.
Collapse
|
249
|
|
250
|
Urban A, Behm-Ansmant I, Branlant C, Motorin Y. RNA sequence and two-dimensional structure features required for efficient substrate modification by the Saccharomyces cerevisiae RNA:{Psi}-synthase Pus7p. J Biol Chem 2008; 284:5845-58. [PMID: 19114708 DOI: 10.1074/jbc.m807986200] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The RNA:pseudouridine (Psi) synthase Pus7p of Saccharomyces cerevisiae is a multisite-specific enzyme that is able to modify U(13) in several yeast tRNAs, U(35) in the pre-tRNA(Tyr) (GPsiA), U(35) in U2 small nuclear RNA, and U(50) in 5 S rRNA. Pus7p belongs to the universally conserved TruD-like family of RNA:Psi-synthases found in bacteria, archaea, and eukarya. Although several RNA substrates for yeast Pus7p have been identified, specificity of their recognition and modification has not been studied. However, conservation of a 7-nt-long sequence, including the modified U residue, in all natural Pus7p substrates suggested the importance of these nucleotides for Pus7p recognition and/or catalysis. Using site-directed mutagenesis, we designed a set of RNA variants derived from the yeast tRNA(Asp)(GUC), pre-tRNA(Tyr)(GPsiA), and U2 small nuclear RNA and tested their ability to be modified by Pus7p in vitro. We demonstrated that the highly conserved U(-2) and A(+1) residues (nucleotide numbers refer to target U(0)) are crucial identity elements for efficient modification by Pus7p. Nucleotide substitutions at other surrounding positions (-4, -3, +2, +3) have only a moderate effect. Surprisingly, the identity of the nucleotide immediately 5' to the target U(0) residue (position -1) is not important for efficient modification. Alteration of tRNA three-dimensional structure had no detectable effect on Pus7p activity at position 13. However, our results suggest that the presence of at least one stem-loop structure including or close to the target U nucleotide is required for Pus7p-catalyzed modification.
Collapse
Affiliation(s)
- Alan Urban
- Laboratoire de Maturation des ARN et Enzymologie Moléculaire, UMR 7567, CNRS-UHP Nancy I, Nancy Université, 54506 Vandoeuvre-les-Nancy Cedex, France
| | | | | | | |
Collapse
|