Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang T, Zhang H, Chen K, Shen S, Ruan J, Kurgan L. Accurate sequence-based prediction of catalytic residues. ACTA ACUST UNITED AC 2008;24:2329-38. [PMID: 18710875 DOI: 10.1093/bioinformatics/btn433] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

For:	Zhang T, Zhang H, Chen K, Shen S, Ruan J, Kurgan L. Accurate sequence-based prediction of catalytic residues. ACTA ACUST UNITED AC 2008;24:2329-38. [PMID: 18710875 DOI: 10.1093/bioinformatics/btn433] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Number

Cited by Other Article(s)

Xiong Y, Xia J, Zhang W, Liu J. Exploiting a reduced set of weighted average features to improve prediction of DNA-binding residues from 3D structures. PLoS One 2011;6:e28440. [PMID: 22174808 PMCID: PMC3234263 DOI: 10.1371/journal.pone.0028440] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2011] [Accepted: 11/08/2011] [Indexed: 01/29/2023] Open

Gaussian kernel optimization: Complex problem and a simple solution. Neurocomputing 2011. [DOI: 10.1016/j.neucom.2011.07.017] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Jeon J, Nam HJ, Choi YS, Yang JS, Hwang J, Kim S. Molecular evolution of protein conformational changes revealed by a network of evolutionarily coupled residues. Mol Biol Evol 2011;28:2675-85. [PMID: 21470969 DOI: 10.1093/molbev/msr094] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Dou Y, Geng X, Gao H, Yang J, Zheng X, Wang J. Sequence Conservation in the Prediction of Catalytic Sites. Protein J 2011;30:229-39. [PMID: 21465136 DOI: 10.1007/s10930-011-9324-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Novel feature for catalytic protein residues reflecting interactions with other residues. PLoS One 2011;6:e16932. [PMID: 21468322 PMCID: PMC3066176 DOI: 10.1371/journal.pone.0016932] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2010] [Accepted: 01/10/2011] [Indexed: 11/29/2022] Open

Prymula K, Jadczyk T, Roterman I. Catalytic residues in hydrolases: analysis of methods designed for ligand-binding site prediction. J Comput Aided Mol Des 2010;25:117-33. [PMID: 21104192 PMCID: PMC3032897 DOI: 10.1007/s10822-010-9402-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2010] [Accepted: 11/08/2010] [Indexed: 11/26/2022]

Networks of high mutual information define the structural proximity of catalytic sites: implications for catalytic residue identification. PLoS Comput Biol 2010;6:e1000978. [PMID: 21079665 PMCID: PMC2973806 DOI: 10.1371/journal.pcbi.1000978] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2010] [Accepted: 09/27/2010] [Indexed: 11/19/2022] Open

Abstract

Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution.

Collapse

Zhu L, Yang J, Song JN, Chou KC, Shen HB. Improving the accuracy of predicting disulfide connectivity by feature selection. J Comput Chem 2010;31:1478-85. [PMID: 20127740 DOI: 10.1002/jcc.21433] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]

Horst JA, Samudrala R. A protein sequence meta-functional signature for calcium binding residue prediction. Pattern Recognit Lett 2010;31:2103-2112. [PMID: 20824111 PMCID: PMC2932634 DOI: 10.1016/j.patrec.2010.04.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Thangudu RR, Tyagi M, Shoemaker BA, Bryant SH, Panchenko AR, Madej T. Knowledge-based annotation of small molecule binding sites in proteins. BMC Bioinformatics 2010;11:365. [PMID: 20594344 PMCID: PMC2909224 DOI: 10.1186/1471-2105-11-365] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 07/01/2010] [Indexed: 11/16/2022] Open

Abstract

Background

The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity.

Results

We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones.

Conclusions

A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi.

Collapse

Hung SS, Wasmuth J, Sanford C, Parkinson J. DETECT--a density estimation tool for enzyme classification and its application to Plasmodium falciparum. ACTA ACUST UNITED AC 2010;26:1690-8. [PMID: 20513663 DOI: 10.1093/bioinformatics/btq266] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Prediction of catalytic residues based on an overlapping amino acid classification. Amino Acids 2010;39:1353-61. [DOI: 10.1007/s00726-010-0587-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2009] [Accepted: 03/27/2010] [Indexed: 10/19/2022]

Cilia E, Passerini A. Automatic prediction of catalytic residues by modeling residue structural neighborhood. BMC Bioinformatics 2010;11:115. [PMID: 20199672 PMCID: PMC2844391 DOI: 10.1186/1471-2105-11-115] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2009] [Accepted: 03/03/2010] [Indexed: 02/05/2023] Open

Ko S, Lee H. Integrative approaches to the prediction of protein functions based on the feature selection. BMC Bioinformatics 2009;10:455. [PMID: 20043848 PMCID: PMC2813249 DOI: 10.1186/1471-2105-10-455] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2009] [Accepted: 12/31/2009] [Indexed: 01/30/2023] Open

Abstract

Background

Protein function prediction has been one of the most important issues in functional genomics. With the current availability of various genomic data sets, many researchers have attempted to develop integration models that combine all available genomic data for protein function prediction. These efforts have resulted in the improvement of prediction quality and the extension of prediction coverage. However, it has also been observed that integrating more data sources does not always increase the prediction quality. Therefore, selecting data sources that highly contribute to the protein function prediction has become an important issue.

Results

We present systematic feature selection methods that assess the contribution of genome-wide data sets to predict protein functions and then investigate the relationship between genomic data sources and protein functions. In this study, we use ten different genomic data sources in Mus musculus, including: protein-domains, protein-protein interactions, gene expressions, phenotype ontology, phylogenetic profiles and disease data sources to predict protein functions that are labelled with Gene Ontology (GO) terms. We then apply two approaches to feature selection: exhaustive search feature selection using a kernel based logistic regression (KLR), and a kernel based L1-norm regularized logistic regression (KL1LR). In the first approach, we exhaustively measure the contribution of each data set for each function based on its prediction quality. In the second approach, we use the estimated coefficients of features as measures of contribution of data sources. Our results show that the proposed methods improve the prediction quality compared to the full integration of all data sources and other filter-based feature selection methods. We also show that contributing data sources can differ depending on the protein function. Furthermore, we observe that highly contributing data sets can be similar among a group of protein functions that have the same parent in the GO hierarchy.

Conclusions

In contrast to previous integration methods, our approaches not only increase the prediction quality but also gather information about highly contributing data sources for each protein function. This information can help researchers collect relevant data sources for annotating protein functions.

Collapse

Liu B, Wang X, Lin L, Tang B, Dong Q, Wang X. Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC Bioinformatics 2009;10:381. [PMID: 19925685 PMCID: PMC2785799 DOI: 10.1186/1471-2105-10-381] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2009] [Accepted: 11/20/2009] [Indexed: 01/08/2023] Open

Abstract

Background

Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance.

Results

In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods.

Conclusion

The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.

Collapse

Bray T, Chan P, Bougouffa S, Greaves R, Doig AJ, Warwicker J. SitesIdentify: a protein functional site prediction tool. BMC Bioinformatics 2009;10:379. [PMID: 19922660 PMCID: PMC2783165 DOI: 10.1186/1471-2105-10-379] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2009] [Accepted: 11/18/2009] [Indexed: 01/31/2023] Open

Schmidt T, Haas J, Cassarino TG, Schwede T. Assessment of ligand binding residue predictions in CASP8. Proteins 2009;77 Suppl 9:138-46. [PMID: 19714771 PMCID: PMC3204792 DOI: 10.1002/prot.22557] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]