Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Innis CA, Anand AP, Sowdhamini R. Prediction of functional sites in proteins using conserved functional group analysis. J Mol Biol 2004;337:1053-68. [PMID: 15033369 DOI: 10.1016/j.jmb.2004.01.053] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2003] [Revised: 01/20/2004] [Accepted: 01/28/2004] [Indexed: 11/21/2022]

For:	Innis CA, Anand AP, Sowdhamini R. Prediction of functional sites in proteins using conserved functional group analysis. J Mol Biol 2004;337:1053-68. [PMID: 15033369 DOI: 10.1016/j.jmb.2004.01.053] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2003] [Revised: 01/20/2004] [Accepted: 01/28/2004] [Indexed: 11/21/2022]

Number

Cited by Other Article(s)

Chivot L, Mathieux N, Cosson A, Bridier-Nahmias A, Favennec L, Gelly JC, Clain J, Coppée R. CONSTRUCT: an algorithmic tool for identifying functional or structurally important regions in protein tertiary structure. Bioinformatics 2025;41:btaf166. [PMID: 40220324 PMCID: PMC12034385 DOI: 10.1093/bioinformatics/btaf166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 04/02/2025] [Accepted: 04/10/2025] [Indexed: 04/14/2025] Open

Snoeck S, Lee HK, Schmid MW, Bender KW, Neeracher MJ, Fernández-Fernández AD, Santiago J, Zipfel C. Leveraging coevolutionary insights and AI-based structural modeling to unravel receptor-peptide ligand-binding mechanisms. Proc Natl Acad Sci U S A 2024;121:e2400862121. [PMID: 39106311 PMCID: PMC11331138 DOI: 10.1073/pnas.2400862121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 07/05/2024] [Indexed: 08/09/2024] Open

Abstract

Secreted signaling peptides are central regulators of growth, development, and stress responses, but specific steps in the evolution of these peptides and their receptors are not well understood. Also, the molecular mechanisms of peptide-receptor binding are only known for a few examples, primarily owing to the limited availability of protein structural determination capabilities to few laboratories worldwide. Plants have evolved a multitude of secreted signaling peptides and corresponding transmembrane receptors. Stress-responsive SERINE RICH ENDOGENOUS PEPTIDES (SCOOPs) were recently identified. Bioactive SCOOPs are proteolytically processed by subtilases and are perceived by the leucine-rich repeat receptor kinase MALE DISCOVERER 1-INTERACTING RECEPTOR-LIKE KINASE 2 (MIK2) in the model plant Arabidopsis thaliana. How SCOOPs and MIK2 have (co)evolved, and how SCOOPs bind to MIK2 are unknown. Using in silico analysis of 350 plant genomes and subsequent functional testing, we revealed the conservation of MIK2 as SCOOP receptor within the plant order Brassicales. We then leveraged AI-based structural modeling and comparative genomics to identify two conserved putative SCOOP-MIK2 binding pockets across Brassicales MIK2 homologues predicted to interact with the "SxS" motif of otherwise sequence-divergent SCOOPs. Mutagenesis of both predicted binding pockets compromised SCOOP binding to MIK2, SCOOP-induced complex formation between MIK2 and its coreceptor BRASSINOSTEROID INSENSITIVE 1-ASSOCIATED KINASE 1, and SCOOP-induced reactive oxygen species production, thus, confirming our in silico predictions. Collectively, in addition to revealing the elusive SCOOP-MIK2 binding mechanism, our analytic pipeline combining phylogenomics, AI-based structural predictions, and experimental biochemical and physiological validation provides a blueprint for the elucidation of peptide ligand-receptor perception mechanisms.

Collapse

Wang L, Guo S, Zeng B, Wang S, Chen Y, Cheng S, Liu B, Wang C, Wang Y, Meng Q. Draft Genome Assembly and Annotation for Cutaneotrichosporon dermatis NICC30027, an Oleaginous Yeast Capable of Simultaneous Glucose and Xylose Assimilation. MYCOBIOLOGY 2022;50:69-81. [PMID: 35291590 PMCID: PMC8890563 DOI: 10.1080/12298093.2022.2038844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 01/10/2022] [Accepted: 02/02/2022] [Indexed: 06/14/2023]

Affiliation(s)

Laiyou Wang School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
Shuxian Guo School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
Bo Zeng School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
Shanshan Wang School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
Yan Chen School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
Shuang Cheng School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
Bingbing Liu School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
Chunyan Wang School of Biological and Chemical Engineering, Nanyang Institute of Technology, Nanyang, China Henan Key Laboratory of Industrial Microbial Resources and Fermentation Technology, Nanyang Institute of Technology, Nanyang, China
Yu Wang College of Biological Science and Engineering, Jiangxi Agricultural University, Nanchang, China
Qingshan Meng State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China

Collapse

Das S, Scholes HM, Sen N, Orengo C. CATH functional families predict functional sites in proteins. Bioinformatics 2021;37:1099-1106. [PMID: 33135053 PMCID: PMC8150129 DOI: 10.1093/bioinformatics/btaa937] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Revised: 09/30/2020] [Accepted: 10/27/2020] [Indexed: 01/12/2023] Open

Aubailly S, Piazza F. Cutoff lensing: predicting catalytic sites in enzymes. Sci Rep 2015;5:14874. [PMID: 26445900 PMCID: PMC4597221 DOI: 10.1038/srep14874] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Accepted: 09/10/2015] [Indexed: 01/12/2023] Open

Pappalardo M, Wass MN. VarMod: modelling the functional effects of non-synonymous variants. Nucleic Acids Res 2014;42:W331-6. [PMID: 24906884 PMCID: PMC4086131 DOI: 10.1093/nar/gku483] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open

Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures. PLoS Comput Biol 2014;10:e1003429. [PMID: 24453956 PMCID: PMC3894161 DOI: 10.1371/journal.pcbi.1003429] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 11/22/2013] [Indexed: 11/30/2022] Open

Abstract

A critical question in biology is the identification of functionally important amino acid sites in proteins. Because functionally important sites are under stronger purifying selection, site-specific substitution rates tend to be lower than usual at these sites. A large number of phylogenetic models have been developed to estimate site-specific substitution rates in proteins and the extraordinarily low substitution rates have been used as evidence of function. Most of the existing tools, e.g. Rate4Site, assume that site-specific substitution rates are independent across sites. However, site-specific substitution rates may be strongly correlated in the protein tertiary structure, since functionally important sites tend to be clustered together to form functional patches. We have developed a new model, GP4Rate, which incorporates the Gaussian process model with the standard phylogenetic model to identify slowly evolved regions in protein tertiary structures. GP4Rate uses the Gaussian process to define a nonparametric prior distribution of site-specific substitution rates, which naturally captures the spatial correlation of substitution rates. Simulations suggest that GP4Rate can potentially estimate site-specific substitution rates with a much higher accuracy than Rate4Site and tends to report slowly evolved regions rather than individual sites. In addition, GP4Rate can estimate the strength of the spatial correlation of substitution rates from the data. By applying GP4Rate to a set of mammalian B7-1 genes, we found a highly conserved region which coincides with experimental evidence. GP4Rate may be a useful tool for the in silico prediction of functionally important regions in the proteins with known structures.

To understand how a protein functions, a critical step is to know which regions in its protein tertiary structure may be functionally important. Functionally important protein regions are typically more conserved than other regions because mutations in these regions are more likely to be deleterious. A number of phylogenetic models have been developed to identify conserved sites or regions in proteins by comparing protein sequences from multiple species. However, most of these methods treat amino acid sites independently and do not consider the spatial clustering of conserved sites in the protein tertiary structure. Therefore, their power of identifying functional protein regions is limited. We develop a new statistical model, GP4Rate, which combines the information from the protein sequences and the protein tertiary structure to infer conserved regions. We demonstrate that GP4Rate outperforms Rate4Site, the most widely used phylogenetic software for inferring functional amino acid sites, via simulations with a case study of B7-1 genes. GP4Rate is a potentially useful tool for guiding mutagenesis experiments or providing insights on the relationship between protein structures and functions.

Collapse

Nemoto W, Saito A, Oikawa H. Recent advances in functional region prediction by using structural and evolutionary information - Remaining problems and future extensions. Comput Struct Biotechnol J 2013;8:e201308007. [PMID: 24688747 PMCID: PMC3962155 DOI: 10.5936/csbj.201308007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 11/12/2013] [Accepted: 11/13/2013] [Indexed: 11/22/2022] Open

Manoharan M, Sankar K, Offmann B, Ramanathan S. Association of Putative Members to Family of Mosquito Odorant Binding Proteins: Scoring Scheme Using Fuzzy Functional Templates and Cys Residue Positions. Bioinform Biol Insights 2013;7:231-51. [PMID: 23908587 PMCID: PMC3728099 DOI: 10.4137/bbi.s11096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Nemoto W, Toh H. Functional region prediction with a set of appropriate homologous sequences--an index for sequence selection by integrating structure and sequence information with spatial statistics. BMC STRUCTURAL BIOLOGY 2012;12:11. [PMID: 22643026 PMCID: PMC3533907 DOI: 10.1186/1472-6807-12-11] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2011] [Accepted: 04/19/2012] [Indexed: 11/17/2022]

Abstract

Background

The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions.

Results

We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods.

Conclusions

Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems.

Collapse

Dou Y, Wang J, Yang J, Zhang C. L1pred: a sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier. PLoS One 2012;7:e35666. [PMID: 22558194 PMCID: PMC3338704 DOI: 10.1371/journal.pone.0035666] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2012] [Accepted: 03/19/2012] [Indexed: 12/01/2022] Open

LRR conservation mapping to predict functional sites within protein leucine-rich repeat domains. PLoS One 2011;6:e21614. [PMID: 21789174 PMCID: PMC3138743 DOI: 10.1371/journal.pone.0021614] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2011] [Accepted: 06/03/2011] [Indexed: 11/19/2022] Open

Dou Y, Geng X, Gao H, Yang J, Zheng X, Wang J. Sequence Conservation in the Prediction of Catalytic Sites. Protein J 2011;30:229-39. [PMID: 21465136 DOI: 10.1007/s10930-011-9324-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Sonavane S, Chakrabarti P. Prediction of active site cleft using support vector machines. J Chem Inf Model 2010;50:2266-73. [PMID: 21080689 DOI: 10.1021/ci1002922] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Networks of high mutual information define the structural proximity of catalytic sites: implications for catalytic residue identification. PLoS Comput Biol 2010;6:e1000978. [PMID: 21079665 PMCID: PMC2973806 DOI: 10.1371/journal.pcbi.1000978] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2010] [Accepted: 09/27/2010] [Indexed: 11/19/2022] Open

Abstract

Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution.

Collapse

Nagao C, Nagano N, Mizuguchi K. Relationships between functional subclasses and information contained in active-site and ligand-binding residues in diverse superfamilies. Proteins 2010;78:2369-84. [PMID: 20544971 DOI: 10.1002/prot.22750] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Prediction of catalytic residues based on an overlapping amino acid classification. Amino Acids 2010;39:1353-61. [DOI: 10.1007/s00726-010-0587-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2009] [Accepted: 03/27/2010] [Indexed: 10/19/2022]

Sankararaman S, Sha F, Kirsch JF, Jordan MI, Sjölander K. Active site prediction using evolutionary and structural information. ACTA ACUST UNITED AC 2010;26:617-24. [PMID: 20080507 PMCID: PMC2828116 DOI: 10.1093/bioinformatics/btq008] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Dou Y, Zheng X, Wang J. Several appropriate background distributions for entropy-based protein sequence conservation measures. J Theor Biol 2009;262:317-22. [PMID: 19808039 DOI: 10.1016/j.jtbi.2009.09.030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Revised: 09/25/2009] [Accepted: 09/25/2009] [Indexed: 11/25/2022]

Nimrod G, Schushan M, Steinberg DM, Ben-Tal N. Detection of functionally important regions in "hypothetical proteins" of known structure. Structure 2009;16:1755-63. [PMID: 19081051 DOI: 10.1016/j.str.2008.10.017] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2008] [Revised: 10/16/2008] [Accepted: 10/19/2008] [Indexed: 10/21/2022]

Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L. Bridging protein local structures and protein functions. Amino Acids 2008;35:627-50. [PMID: 18421562 PMCID: PMC7088341 DOI: 10.1007/s00726-008-0088-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Accepted: 03/10/2008] [Indexed: 12/11/2022]

Manikandan K, Pal D, Ramakumar S, Brener NE, Iyengar SS, Seetharaman G. Functionally important segments in proteins dissected using Gene Ontology and geometric clustering of peptide fragments. Genome Biol 2008;9:R52. [PMID: 18331637 PMCID: PMC2397504 DOI: 10.1186/gb-2008-9-3-r52] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2007] [Revised: 02/24/2008] [Accepted: 03/10/2008] [Indexed: 11/25/2022] Open

Tong W, Williams RJ, Wei Y, Murga LF, Ko J, Ondrechen MJ. Enhanced performance in prediction of protein active sites with THEMATICS and support vector machines. Protein Sci 2007;17:333-41. [PMID: 18096640 DOI: 10.1110/ps.073213608] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Dunning FM, Sun W, Jansen KL, Helft L, Bent AF. Identification and mutational analysis of Arabidopsis FLS2 leucine-rich repeat domain residues that contribute to flagellin perception. THE PLANT CELL 2007;19:3297-313. [PMID: 17933906 PMCID: PMC2174712 DOI: 10.1105/tpc.106.048801] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2006] [Revised: 09/13/2007] [Accepted: 09/19/2007] [Indexed: 05/19/2023]

Innis CA. siteFiNDER|3D: a web-based tool for predicting the location of functional sites in proteins. Nucleic Acids Res 2007;35:W489-94. [PMID: 17553829 PMCID: PMC1933183 DOI: 10.1093/nar/gkm422] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

How accurate and statistically robust are catalytic site predictions based on closeness centrality? BMC Bioinformatics 2007;8:153. [PMID: 17498304 PMCID: PMC1876251 DOI: 10.1186/1471-2105-8-153] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2006] [Accepted: 05/11/2007] [Indexed: 11/25/2022] Open

Abstract

Background

We examine the accuracy of enzyme catalytic residue predictions from a network representation of protein structure. In this model, amino acid α-carbons specify vertices within a graph and edges connect vertices that are proximal in structure. Closeness centrality, which has shown promise in previous investigations, is used to identify important positions within the network. Closeness centrality, a global measure of network centrality, is calculated as the reciprocal of the average distance between vertex i and all other vertices.

Results

We benchmark the approach against 283 structurally unique proteins within the Catalytic Site Atlas. Our results, which are inline with previous investigations of smaller datasets, indicate closeness centrality predictions are statistically significant. However, unlike previous approaches, we specifically focus on residues with the very best scores. Over the top five closeness centrality scores, we observe an average true to false positive rate ratio of 6.8 to 1. As demonstrated previously, adding a solvent accessibility filter significantly improves predictive power; the average ratio is increased to 15.3 to 1. We also demonstrate (for the first time) that filtering the predictions by residue identity improves the results even more than accessibility filtering. Here, we simply eliminate residues with physiochemical properties unlikely to be compatible with catalytic requirements from consideration. Residue identity filtering improves the average true to false positive rate ratio to 26.3 to 1. Combining the two filters together has little affect on the results. Calculated p-values for the three prediction schemes range from 2.7E-9 to less than 8.8E-134. Finally, the sensitivity of the predictions to structure choice and slight perturbations is examined.

Conclusion

Our results resolutely confirm that closeness centrality is a viable prediction scheme whose predictions are statistically significant. Simple filtering schemes substantially improve the method's predicted power. Moreover, no clear effect on performance is observed when comparing ligated and unligated structures. Similarly, the CC prediction results are robust to slight structural perturbations from molecular dynamics simulation.

Collapse

Relating destabilizing regions to known functional sites in proteins. BMC Bioinformatics 2007;8:141. [PMID: 17470296 PMCID: PMC1890302 DOI: 10.1186/1471-2105-8-141] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2006] [Accepted: 04/30/2007] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Most methods for predicting functional sites in protein 3D structures, rely on information on related proteins and cannot be applied to proteins with no known relatives. Another limitation of these methods is the lack of a well annotated set of functional sites to use as benchmark for validating their predictions. Experimental findings and theoretical considerations suggest that residues involved in function often contribute unfavorably to the native state stability. We examine the possibility of systematically exploiting this intrinsic property to identify functional sites using an original procedure that detects destabilizing regions in protein structures. In addition, to relate destabilizing regions to known functional sites, a novel benchmark consisting of a diverse set of hand-curated protein functional sites is derived.

RESULTS

A procedure for detecting clusters of destabilizing residues in protein structures is presented. Individual residue contributions to protein stability are evaluated using detailed atomic models and a force-field successfully applied in computational protein design. The most destabilizing residues, and some of their closest neighbours, are clustered into destabilizing regions following a rigorous protocol. Our procedure is applied to high quality apo-structures of 63 unrelated proteins. The biologically relevant binding sites of these proteins were annotated using all available information, including structural data and literature curation, resulting in the largest hand-curated dataset of binding sites in proteins available to date. Comparing the destabilizing regions with the annotated binding sites in these proteins, we find that the overlap is on average limited, but significantly better than random. Results depend on the type of bound ligand. Significant overlap is obtained for most polysaccharide- and small ligand-binding sites, whereas no overlap is observed for most nucleic acid binding sites. These differences are rationalised in terms of the geometry and energetics of the binding site.

CONCLUSION

We find that although destabilizing regions as detected here can in general not be used to predict binding sites in protein structures, they can provide useful information, particularly on the location of functional sites that bind polysaccharides and small ligands. This information can be exploited in methods for predicting function in protein structures with no known relatives. Our publicly available benchmark of hand-curated functional sites in proteins should help other workers derive and validate new prediction methods.

Collapse

Selective prediction of interaction sites in protein structures with THEMATICS. BMC Bioinformatics 2007;8:119. [PMID: 17419878 PMCID: PMC1877815 DOI: 10.1186/1471-2105-8-119] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2006] [Accepted: 04/09/2007] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Methods are now available for the prediction of interaction sites in protein 3D structures. While many of these methods report high success rates for site prediction, often these predictions are not very selective and have low precision. Precision in site prediction is addressed using Theoretical Microscopic Titration Curves (THEMATICS), a simple computational method for the identification of active sites in enzymes. Recall and precision are measured and compared with other methods for the prediction of catalytic sites.

RESULTS

Using a test set of 169 enzymes from the original Catalytic Residue Dataset (CatRes) it is shown that THEMATICS can deliver precise, localised site predictions. Furthermore, adjustment of the cut-off criteria can improve the recall rates for catalytic residues with only a small sacrifice in precision. Recall rates for CatRes/CSA annotated catalytic residues are 41.1%, 50.4%, and 54.2% for Z score cut-off values of 1.00, 0.99, and 0.98, respectively. The corresponding precision rates are 19.4%, 17.9%, and 16.4%. The success rate for catalytic sites is higher, with correct or partially correct predictions for 77.5%, 85.8%, and 88.2% of the enzymes in the test set, corresponding to the same respective Z score cut-offs, if only the CatRes annotations are used as the reference set. Incorporation of additional literature annotations into the reference set gives total success rates of 89.9%, 92.9%, and 94.1%, again for corresponding cut-off values of 1.00, 0.99, and 0.98. False positive rates for a 75-protein test set are 1.95%, 2.60%, and 3.12% for Z score cut-offs of 1.00, 0.99, and 0.98, respectively.

CONCLUSION

With a preferred cut-off value of 0.99, THEMATICS achieves a high success rate of interaction site prediction, about 86% correct or partially correct using CatRes/CSA annotations only and about 93% with an expanded reference set. Success rates for catalytic residue prediction are similar to those of other structure-based methods, but with substantially better precision and lower false positive rates. THEMATICS performs well across the spectrum of E.C. classes. The method requires only the structure of the query protein as input. THEMATICS predictions may be obtained via the web from structures in PDB format at: http://pfweb.chem.neu.edu/thematics/submit.html.

Collapse

Chakrabarti S, Lanczycki CJ. Analysis and prediction of functionally important sites in proteins. Protein Sci 2007;16:4-13. [PMID: 17192586 PMCID: PMC2222836 DOI: 10.1110/ps.062506407] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Srinivasan N. Computational Biology and Bioinformatics: a tinge of Indian spice. Bioinformation 2006;1:105-9. [PMID: 17611616 PMCID: PMC1904514 DOI: 10.6026/97320630001105] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Mayer KM, McCorkle SR, Shanklin J. Linking enzyme sequence to function using Conserved Property Difference Locator to identify and annotate positions likely to control specific functionality. BMC Bioinformatics 2005;6:284. [PMID: 16318626 PMCID: PMC1326233 DOI: 10.1186/1471-2105-6-284] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2005] [Accepted: 11/30/2005] [Indexed: 11/21/2022] Open

Abstract

Background

Families of homologous enzymes evolved from common progenitors. The availability of multiple sequences representing each activity presents an opportunity for extracting information specifying the functionality of individual homologs. We present a straightforward method for the identification of residues likely to determine class specific functionality in which multiple sequence alignments are converted to an annotated graphical form by the Conserved Property Difference Locator (CPDL) program.

Results

Three test cases, each comprised of two groups of funtionally-distinct homologs, are presented. Of the test cases, one is a membrane and two are soluble enzyme families. The desaturase/hydroxylase data was used to design and test the CPDL algorithm because a comparative sequence approach had been successfully applied to manipulate the specificity of these enzymes. The other two cases, ATP/GTP cyclases, and MurD/MurE synthases were chosen because they are well characterized structurally and biochemically. For the desaturase/hydroxylase enzymes, the ATP/GTP cyclases and the MurD/MurE synthases, groups of 8 (of ~400), 4 (of ~150) and 10 (of >400) residues, respectively, of interest were identified that contain empirically defined specificity determining positions.

Conclusion

CPDL consistently identifies positions near enzyme active sites that include those predicted from structural and/or biochemical studies to be important for specificity and/or function. This suggests that CPDL will have broad utility for the identification of potential class determining residues based on multiple sequence analysis of groups of homologous proteins. Because the method is sequence, rather than structure, based it is equally well suited for designing structure-function experiments to investigate membrane and soluble proteins.

Collapse

Glaser F, Morris RJ, Najmanovich RJ, Laskowski RA, Thornton JM. A method for localizing ligand binding pockets in protein structures. Proteins 2005;62:479-88. [PMID: 16304646 DOI: 10.1002/prot.20769] [Citation(s) in RCA: 141] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Pei J, Cai W, Kinch LN, Grishin NV. Prediction of functional specificity determinants from protein sequences using log-likelihood ratios. Bioinformatics 2005;22:164-71. [PMID: 16278237 DOI: 10.1093/bioinformatics/bti766] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Minshull J, Ness JE, Gustafsson C, Govindarajan S. Predicting enzyme function from protein sequence. Curr Opin Chem Biol 2005;9:202-9. [PMID: 15811806 DOI: 10.1016/j.cbpa.2005.02.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Watson JD, Laskowski RA, Thornton JM. Predicting protein function from sequence and structural data. Curr Opin Struct Biol 2005;15:275-84. [PMID: 15963890 DOI: 10.1016/j.sbi.2005.04.003] [Citation(s) in RCA: 203] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2005] [Revised: 02/04/2005] [Accepted: 04/18/2005] [Indexed: 10/25/2022]

Greaves R, Warwicker J. Active site identification through geometry-based and sequence profile-based calculations: burial of catalytic clefts. J Mol Biol 2005;349:547-57. [PMID: 15882869 DOI: 10.1016/j.jmb.2005.04.018] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2005] [Revised: 03/30/2005] [Accepted: 04/08/2005] [Indexed: 12/30/2022]

Varrazzo D, Bernini A, Spiga O, Ciutti A, Chiellini S, Venditti V, Bracci L, Niccolai N. Three-dimensional computation of atom depth in complex molecular structures. Bioinformatics 2005;21:2856-60. [PMID: 15827080 DOI: 10.1093/bioinformatics/bti444] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Ko J, Murga LF, André P, Yang H, Ondrechen MJ, Williams RJ, Agunwamba A, Budil DE. Statistical criteria for the identification of protein active sites using theoretical microscopic titration curves. Proteins 2005;59:183-95. [PMID: 15739204 DOI: 10.1002/prot.20418] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Pazos F, Sternberg MJE. Automated prediction of protein function and detection of functional sites from structure. Proc Natl Acad Sci U S A 2004;101:14754-9. [PMID: 15456910 PMCID: PMC522026 DOI: 10.1073/pnas.0404569101] [Citation(s) in RCA: 122] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2004] [Indexed: 11/18/2022] Open