51
|
Xiong Y, Xia J, Zhang W, Liu J. Exploiting a reduced set of weighted average features to improve prediction of DNA-binding residues from 3D structures. PLoS One 2011; 6:e28440. [PMID: 22174808 PMCID: PMC3234263 DOI: 10.1371/journal.pone.0028440] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2011] [Accepted: 11/08/2011] [Indexed: 01/29/2023] Open
Abstract
Predicting DNA-binding residues from a protein three-dimensional structure is a key task of computational structural proteomics. In the present study, based on machine learning technology, we aim to explore a reduced set of weighted average features for improving prediction of DNA-binding residues on protein surfaces. Via constructing the spatial environment around a DNA-binding residue, a novel weighting factor is first proposed to quantify the distance-dependent contribution of each neighboring residue in determining the location of a binding residue. Then, a weighted average scheme is introduced to represent the surface patch of the considering residue. Finally, the classifier is trained on the reduced set of these weighted average features, consisting of evolutionary profile, interface propensity, betweenness centrality and solvent surface area of side chain. Experimental results on 5-fold cross validation and independent tests indicate that the new feature set are effective to describe DNA-binding residues and our approach has significantly better performance than two previous methods. Furthermore, a brief case study suggests that the weighted average features are powerful for identifying DNA-binding residues and are promising for further study of protein structure-function relationship. The source code and datasets are available upon request.
Collapse
Affiliation(s)
- Yi Xiong
- School of Computer, Wuhan University, Wuhan, China
| | - Junfeng Xia
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Wen Zhang
- School of Computer, Wuhan University, Wuhan, China
| | - Juan Liu
- School of Computer, Wuhan University, Wuhan, China
- * E-mail:
| |
Collapse
|
52
|
|
53
|
Jeon J, Nam HJ, Choi YS, Yang JS, Hwang J, Kim S. Molecular evolution of protein conformational changes revealed by a network of evolutionarily coupled residues. Mol Biol Evol 2011; 28:2675-85. [PMID: 21470969 DOI: 10.1093/molbev/msr094] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
An improved understanding of protein conformational changes has broad implications for elucidating the mechanisms of various biological processes and for the design of protein engineering experiments. Understanding rearrangements of residue interactions is a key component in the challenge of describing structural transitions. Evolutionary properties of protein sequences and structures are extensively studied; however, evolution of protein motions, especially with respect to interaction rearrangements, has yet to be explored. Here, we investigated the relationship between sequence evolution and protein conformational changes and discovered that structural transitions are encoded in amino acid sequences as coevolving residue pairs. Furthermore, we found that highly coevolving residues are clustered in the flexible regions of proteins and facilitate structural transitions by forming and disrupting their interactions cooperatively. Our results provide insight into the evolution of protein conformational changes and help to identify residues important for structural transitions.
Collapse
Affiliation(s)
- Jouhyun Jeon
- Division of Molecular and Life Science, Pohang University of Science and Technology, Pohang, Korea
| | | | | | | | | | | |
Collapse
|
54
|
Dou Y, Geng X, Gao H, Yang J, Zheng X, Wang J. Sequence Conservation in the Prediction of Catalytic Sites. Protein J 2011; 30:229-39. [PMID: 21465136 DOI: 10.1007/s10930-011-9324-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
55
|
Novel feature for catalytic protein residues reflecting interactions with other residues. PLoS One 2011; 6:e16932. [PMID: 21468322 PMCID: PMC3066176 DOI: 10.1371/journal.pone.0016932] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2010] [Accepted: 01/10/2011] [Indexed: 11/29/2022] Open
Abstract
Owing to their potential for systematic analysis, complex networks have been
widely used in proteomics. Representing a protein structure as a topology
network provides novel insight into understanding protein folding mechanisms,
stability and function. Here, we develop a new feature to reveal
correlations between residues using a protein structure network. In an original
attempt to quantify the effects of several key residues on catalytic residues, a
power function was used to model interactions between residues. The results
indicate that focusing on a few residues is a feasible approach to identifying
catalytic residues. The spatial environment surrounding a catalytic residue was
analyzed in a layered manner. We present evidence that correlation between
residues is related to their distance apart most environmental parameters of the
outer layer make a smaller contribution to prediction and ii catalytic residues
tend to be located near key positions in enzyme folds. Feature analysis revealed
satisfactory performance for our features, which were combined with several
conventional features in a prediction model for catalytic residues using a
comprehensive data set from the Catalytic Site Atlas. Values of 88.6 for
sensitivity and 88.4 for specificity were obtained by 10fold crossvalidation.
These results suggest that these features reveal the mutual dependence of
residues and are promising for further study of structurefunction
relationship.
Collapse
|
56
|
Prymula K, Jadczyk T, Roterman I. Catalytic residues in hydrolases: analysis of methods designed for ligand-binding site prediction. J Comput Aided Mol Des 2010; 25:117-33. [PMID: 21104192 PMCID: PMC3032897 DOI: 10.1007/s10822-010-9402-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2010] [Accepted: 11/08/2010] [Indexed: 11/26/2022]
Abstract
The comparison of eight tools applicable to ligand-binding site prediction is presented. The methods examined cover three types of approaches: the geometrical (CASTp, PASS, Pocket-Finder), the physicochemical (Q-SiteFinder, FOD) and the knowledge-based (ConSurf, SuMo, WebFEATURE). The accuracy of predictions was measured in reference to the catalytic residues documented in the Catalytic Site Atlas. The test was performed on a set comprising selected chains of hydrolases. The results were analysed with regard to size, polarity, secondary structure, accessible solvent area of predicted sites as well as parameters commonly used in machine learning (F-measure, MCC). The relative accuracies of predictions are presented in the ROC space, allowing determination of the optimal methods by means of the ROC convex hull. Additionally the minimum expected cost analysis was performed. Both advantages and disadvantages of the eight methods are presented. Characterization of protein chains in respect to the level of difficulty in the active site prediction is introduced. The main reasons for failures are discussed. Overall, the best performance offers SuMo followed by FOD, while Pocket-Finder is the best method among the geometrical approaches.
Collapse
Affiliation(s)
- Katarzyna Prymula
- Faculty of Chemistry, Jagiellonian University, 3 Ingardena Street, 30-060 Krakow, Poland
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, 7E Kopernika Street, 31-034 Krakow, Poland
| | - Tomasz Jadczyk
- Department of Electronics, AGH University of Science and Technology, 30 Mickiewicza Avenue, 30-059 Krakow, Poland
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, 16 Lazarza Street, 31-530 Krakow, Poland
| |
Collapse
|
57
|
Networks of high mutual information define the structural proximity of catalytic sites: implications for catalytic residue identification. PLoS Comput Biol 2010; 6:e1000978. [PMID: 21079665 PMCID: PMC2973806 DOI: 10.1371/journal.pcbi.1000978] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2010] [Accepted: 09/27/2010] [Indexed: 11/19/2022] Open
Abstract
Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution.
Collapse
|
58
|
Zhu L, Yang J, Song JN, Chou KC, Shen HB. Improving the accuracy of predicting disulfide connectivity by feature selection. J Comput Chem 2010; 31:1478-85. [PMID: 20127740 DOI: 10.1002/jcc.21433] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Disulfide bonds are primary covalent cross-links formed between two cysteine residues in the same or different protein polypeptide chains, which play important roles in the folding and stability of proteins. However, computational prediction of disulfide connectivity directly from protein primary sequences is challenging due to the nonlocal nature of disulfide bonds in the context of sequences, and the number of possible disulfide patterns grows exponentially when the number of cysteine residues increases. In the previous studies, disulfide connectivity prediction was usually performed in high-dimensional feature space, which can cause a variety of problems in statistical learning, such as the dimension disaster, overfitting, and feature redundancy. In this study, we propose an efficient feature selection technique for analyzing the importance of each feature component. On the basis of this approach, we selected the most important features for predicting the connectivity pattern of intra-chain disulfide bonds. Our results have shown that the high-dimensional features contain redundant information, and the prediction performance can be further improved when these high-dimensional features are reduced to a lower but more compact dimensional space. Our results also indicate that the global protein features contribute little to the formation and prediction of disulfide bonds, while the local sequential and structural information play important roles. All these findings provide important insights for structural studies of disulfide-rich proteins.
Collapse
Affiliation(s)
- Lin Zhu
- Department of Bioinformatics, Institute of Image Processing & Pattern Recognition, Shanghai Jiaotong University, 800 Dongchuan Road, Shanghai 200240, China
| | | | | | | | | |
Collapse
|
59
|
Horst JA, Samudrala R. A protein sequence meta-functional signature for calcium binding residue prediction. Pattern Recognit Lett 2010; 31:2103-2112. [PMID: 20824111 PMCID: PMC2932634 DOI: 10.1016/j.patrec.2010.04.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The diversity of characterized protein functions found amongst experimentally interrogated proteins suggests that a vast array of unknown functions remains undiscovered. These protein functions are imparted by specific geometric distributions of amino acid residue chemical moieties, each contributing a functional interaction. We hypothesize that individual residue function contributions are predictable through sequence analytic knowledge based algorithms, and that they can be recombined to understand composite protein function by predicting spatial relation in tertiary structure. We assess the former by training a meta-functional signature algorithm to specifically predict calcium ion binding residues from protein sequence. We estimate the latter by testing for match between predictive contribution of positions in predicted secondary structures and patterns of side chain proximity forced by secondary structure moieties. Specific training for calcium binding results in 83% area under the receiver operator characteristic curve added value over random (AUCoR) and p<10(-300) significance as measured by Kendall's τ in ten fold cross validation for parallel sets of 811 residues in 336 proteins and 696 residues in 299 proteins. Training for generalized function results in 63% AUCoR and p≅10(-221) for the same tests. Including inference of side chain proximity improves predictive ability by 2% AUCoR consistently. The results demonstrate that protein meta-functional signatures can be trained to predict specific protein functions by considering amino acid identity and structural features accessible from sequence, laying the groundwork for composite sequence based function site prediction.
Collapse
Affiliation(s)
- Jeremy A Horst
- Department of Oral Biology, School of Dentistry, University of Washington, 1959 NE Pacific St #357132, Seattle, WA 98195
- Department of Microbiology, School of Medicine, University of Washington, 1959 NE Pacific St #357132, Seattle, WA 98195
| | - Ram Samudrala
- Department of Oral Biology, School of Dentistry, University of Washington, 1959 NE Pacific St #357132, Seattle, WA 98195
- Department of Microbiology, School of Medicine, University of Washington, 1959 NE Pacific St #357132, Seattle, WA 98195
| |
Collapse
|
60
|
Thangudu RR, Tyagi M, Shoemaker BA, Bryant SH, Panchenko AR, Madej T. Knowledge-based annotation of small molecule binding sites in proteins. BMC Bioinformatics 2010; 11:365. [PMID: 20594344 PMCID: PMC2909224 DOI: 10.1186/1471-2105-11-365] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Accepted: 07/01/2010] [Indexed: 11/16/2022] Open
Abstract
Background The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity. Results We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones. Conclusions A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi.
Collapse
Affiliation(s)
- Ratna R Thangudu
- National Center for Biotechnology Information, 8600 Rockville Pike, Building 38A, Bethesda, MD 20894, USA
| | | | | | | | | | | |
Collapse
|
61
|
Hung SS, Wasmuth J, Sanford C, Parkinson J. DETECT--a density estimation tool for enzyme classification and its application to Plasmodium falciparum. ACTA ACUST UNITED AC 2010; 26:1690-8. [PMID: 20513663 DOI: 10.1093/bioinformatics/btq266] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION A major challenge in genomics is the accurate annotation of component genes. Enzymes are typically predicted using homology-based search methods, where the membership of a protein to an enzyme family is based on single-sequence comparisons. As such, these methods are often error-prone and lack useful measures of reliability for the prediction. RESULTS Here, we present DETECT, a probabilistic method for enzyme prediction that accounts for the sequence diversity across enzyme families. By comparing the global alignment scores of an unknown protein to those of all known enzymes, an integrated likelihood score can be readily calculated, ranking the reaction classes relevant for that protein. Comparisons to BLAST reveal significant improvements in enzyme annotation accuracy. Applied to Plasmodium falciparum, we identify potential annotation errors and predict novel enzymes of therapeutic interest. AVAILABILITY A standalone application is available from the website: http://www.compsysbio.org/projects/DETECT/
Collapse
Affiliation(s)
- Stacy S Hung
- Program in Molecular Structure and Function, Hospital for Sick Children, 15-704 MaRS TMDT East, 101 College Street, Toronto, ON M5G 1L7, Canada
| | | | | | | |
Collapse
|
62
|
Prediction of catalytic residues based on an overlapping amino acid classification. Amino Acids 2010; 39:1353-61. [DOI: 10.1007/s00726-010-0587-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2009] [Accepted: 03/27/2010] [Indexed: 10/19/2022]
|
63
|
Cilia E, Passerini A. Automatic prediction of catalytic residues by modeling residue structural neighborhood. BMC Bioinformatics 2010; 11:115. [PMID: 20199672 PMCID: PMC2844391 DOI: 10.1186/1471-2105-11-115] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2009] [Accepted: 03/03/2010] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Prediction of catalytic residues is a major step in characterizing the function of enzymes. In its simpler formulation, the problem can be cast into a binary classification task at the residue level, by predicting whether the residue is directly involved in the catalytic process. The task is quite hard also when structural information is available, due to the rather wide range of roles a functional residue can play and to the large imbalance between the number of catalytic and non-catalytic residues. RESULTS We developed an effective representation of structural information by modeling spherical regions around candidate residues, and extracting statistics on the properties of their content such as physico-chemical properties, atomic density, flexibility, presence of water molecules. We trained an SVM classifier combining our features with sequence-based information and previously developed 3D features, and compared its performance with the most recent state-of-the-art approaches on different benchmark datasets. We further analyzed the discriminant power of the information provided by the presence of heterogens in the residue neighborhood. CONCLUSIONS Our structure-based method achieves consistent improvements on all tested datasets over both sequence-based and structure-based state-of-the-art approaches. Structural neighborhood information is shown to be responsible for such results, and predicting the presence of nearby heterogens seems to be a promising direction for further improvements.
Collapse
Affiliation(s)
- Elisa Cilia
- Information Engineering and Computer Science Department, via Sommarive 14 - I38100 (Povo) Trento, Italy
| | - Andrea Passerini
- Information Engineering and Computer Science Department, via Sommarive 14 - I38100 (Povo) Trento, Italy
| |
Collapse
|
64
|
Ko S, Lee H. Integrative approaches to the prediction of protein functions based on the feature selection. BMC Bioinformatics 2009; 10:455. [PMID: 20043848 PMCID: PMC2813249 DOI: 10.1186/1471-2105-10-455] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2009] [Accepted: 12/31/2009] [Indexed: 01/30/2023] Open
Abstract
Background Protein function prediction has been one of the most important issues in functional genomics. With the current availability of various genomic data sets, many researchers have attempted to develop integration models that combine all available genomic data for protein function prediction. These efforts have resulted in the improvement of prediction quality and the extension of prediction coverage. However, it has also been observed that integrating more data sources does not always increase the prediction quality. Therefore, selecting data sources that highly contribute to the protein function prediction has become an important issue. Results We present systematic feature selection methods that assess the contribution of genome-wide data sets to predict protein functions and then investigate the relationship between genomic data sources and protein functions. In this study, we use ten different genomic data sources in Mus musculus, including: protein-domains, protein-protein interactions, gene expressions, phenotype ontology, phylogenetic profiles and disease data sources to predict protein functions that are labelled with Gene Ontology (GO) terms. We then apply two approaches to feature selection: exhaustive search feature selection using a kernel based logistic regression (KLR), and a kernel based L1-norm regularized logistic regression (KL1LR). In the first approach, we exhaustively measure the contribution of each data set for each function based on its prediction quality. In the second approach, we use the estimated coefficients of features as measures of contribution of data sources. Our results show that the proposed methods improve the prediction quality compared to the full integration of all data sources and other filter-based feature selection methods. We also show that contributing data sources can differ depending on the protein function. Furthermore, we observe that highly contributing data sets can be similar among a group of protein functions that have the same parent in the GO hierarchy. Conclusions In contrast to previous integration methods, our approaches not only increase the prediction quality but also gather information about highly contributing data sources for each protein function. This information can help researchers collect relevant data sources for annotating protein functions.
Collapse
Affiliation(s)
- Seokha Ko
- Department of Information and Communications, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea.
| | | |
Collapse
|
65
|
Liu B, Wang X, Lin L, Tang B, Dong Q, Wang X. Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC Bioinformatics 2009; 10:381. [PMID: 19925685 PMCID: PMC2785799 DOI: 10.1186/1471-2105-10-381] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2009] [Accepted: 11/20/2009] [Indexed: 01/08/2023] Open
Abstract
Background Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance. Results In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods. Conclusion The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.
Collapse
Affiliation(s)
- Bin Liu
- Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, PR China.
| | | | | | | | | | | |
Collapse
|
66
|
Bray T, Chan P, Bougouffa S, Greaves R, Doig AJ, Warwicker J. SitesIdentify: a protein functional site prediction tool. BMC Bioinformatics 2009; 10:379. [PMID: 19922660 PMCID: PMC2783165 DOI: 10.1186/1471-2105-10-379] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2009] [Accepted: 11/18/2009] [Indexed: 01/31/2023] Open
Abstract
Background The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function is useful in order to gain information about its potential role. There are many available approaches to predict functional site, but many are not made available via a publicly-accessible application. Results Here we present a functional site prediction tool (SitesIdentify), based on combining sequence conservation information with geometry-based cleft identification, that is freely available via a web-server. We have shown that SitesIdentify compares favourably to other functional site prediction tools in a comparison of seven methods on a non-redundant set of 237 enzymes with annotated active sites. Conclusion SitesIdentify is able to produce comparable accuracy in predicting functional sites to its closest available counterpart, but in addition achieves improved accuracy for proteins with few characterised homologues. SitesIdentify is available via a webserver at http://www.manchester.ac.uk/bioinformatics/sitesidentify/
Collapse
Affiliation(s)
- Tracey Bray
- Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK.
| | | | | | | | | | | |
Collapse
|
67
|
Abstract
Here we detail the assessment process for the binding site prediction category of the eighth Critical Assessment of Protein Structure Prediction experiment (CASP8). Predictions were only evaluated for those targets that bound biologically relevant ligands and were assessed using the Matthews Correlation Coefficient. The results of the analysis clearly demonstrate that three predictors from two groups (Lee and Sternberg) stand out from the rest. A further two groups perform well over subsets of metal binding or nonmetal ligand binding targets. The best methods were able to make consistently reliable predictions based on model structures, though it was noticeable that the two targets that were not well predicted were also the hardest targets. The number of predictors that submitted new methods in this category was highly encouraging and suggests that current technology is at the level that experimental biochemists and structural biologists could benefit from what is clearly a growing field.
Collapse
Affiliation(s)
- Tobias Schmidt
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Jürgen Haas
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Tiziano Gallo Cassarino
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|