Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Capra JA, Singh M. Characterization and prediction of residues determining protein functional specificity. ACTA ACUST UNITED AC 2008;24:1473-80. [PMID: 18450811 PMCID: PMC2718669 DOI: 10.1093/bioinformatics/btn214] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

For:	Capra JA, Singh M. Characterization and prediction of residues determining protein functional specificity. ACTA ACUST UNITED AC 2008;24:1473-80. [PMID: 18450811 PMCID: PMC2718669 DOI: 10.1093/bioinformatics/btn214] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Number

Cited by Other Article(s)

Suplatov D, Kirilin E, Arbatsky M, Takhaveev V, Svedas V. pocketZebra: a web-server for automated selection and classification of subfamily-specific binding sites by bioinformatic analysis of diverse protein families. Nucleic Acids Res 2014;42:W344-9. [PMID: 24852248 PMCID: PMC4086101 DOI: 10.1093/nar/gku448] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Chakraborty A, Chakrabarti S. A survey on prediction of specificity-determining sites in proteins. Brief Bioinform 2014;16:71-88. [DOI: 10.1093/bib/bbt092] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Nagao C, Nagano N, Mizuguchi K. Prediction of detailed enzyme functions and identification of specificity determining residues by random forests. PLoS One 2014;9:e84623. [PMID: 24416252 PMCID: PMC3885575 DOI: 10.1371/journal.pone.0084623] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Accepted: 11/15/2013] [Indexed: 12/03/2022] Open

Zhang ZH, Khoo AA, Mihalek I. Cube - an online tool for comparison and contrasting of protein sequences. PLoS One 2013;8:e79480. [PMID: 24363790 PMCID: PMC3867285 DOI: 10.1371/journal.pone.0079480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2013] [Accepted: 09/23/2013] [Indexed: 01/10/2023] Open

Gaston D, Roger AJ. Functional divergence and convergent evolution in the plastid-targeted glyceraldehyde-3-phosphate dehydrogenases of diverse eukaryotic algae. PLoS One 2013;8:e70396. [PMID: 23936198 PMCID: PMC3728087 DOI: 10.1371/journal.pone.0070396] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2013] [Accepted: 06/18/2013] [Indexed: 11/19/2022] Open

Abstract

Background

Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is a key enzyme of the glycolytic pathway, reversibly catalyzing the sixth step of glycolysis and concurrently reducing the coenzyme NAD⁺ to NADH. In photosynthetic organisms a GAPDH paralog (Gap2 in Cyanobacteria, GapA in most photosynthetic eukaryotes) functions in the Calvin cycle, performing the reverse of the glycolytic reaction and using the coenzyme NADPH preferentially. In a number of photosynthetic eukaryotes that acquired their plastid by the secondary endosymbiosis of a eukaryotic red alga (Alveolates, haptophytes, cryptomonads and stramenopiles) GapA has been apparently replaced with a paralog of the host’s own cytosolic GAPDH (GapC1). Plastid GapC1 and GapA therefore represent two independent cases of functional divergence and adaptations to the Calvin cycle entailing a shift in subcellular targeting and a shift in binding preference from NAD⁺ to NADPH.

Methods

We used the programs FunDi, GroupSim, and Difference Evolutionary-Trace to detect sites involved in the functional divergence of these two groups of GAPDH sequences and to identify potential cases of convergent evolution in the Calvin-cycle adapted GapA and GapC1 families. Sites identified as being functionally divergent by all or some of these programs were then investigated with respect to their possible roles in the structure and function of both glycolytic and plastid-targeted GAPDH isoforms.

Conclusions

In this work we found substantial evidence for convergent evolution in GapA/B and GapC1. In many cases sites in GAPDHs of these groups converged on identical amino acid residues in specific positions of the protein known to play a role in the function and regulation of plastid-functioning enzymes relative to their cytosolic counterparts. In addition, we demonstrate that bioinformatic software like FunDi are important tools for the generation of meaningful biological hypotheses that can then be tested with direct experimental techniques.

Collapse

Jessen LE, Hoof I, Lund O, Nielsen M. SigniSite: Identification of residue-level genotype-phenotype correlations in protein multiple sequence alignments. Nucleic Acids Res 2013;41:W286-91. [PMID: 23761454 PMCID: PMC3692133 DOI: 10.1093/nar/gkt497] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open

Gopavajhula VR, Chaitanya KV, Akbar Ali Khan P, Shaik JP, Reddy PN, Alanazi M. Modeling and analysis of soybean (Glycine max. L) Cu/Zn, Mn and Fe superoxide dismutases. Genet Mol Biol 2013;36:225-36. [PMID: 23885205 PMCID: PMC3715289 DOI: 10.1590/s1415-47572013005000023] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 02/10/2013] [Indexed: 01/05/2023] Open

Gu X, Zou Y, Su Z, Huang W, Zhou Z, Arendsee Z, Zeng Y. An update of DIVERGE software for functional divergence analysis of protein family. Mol Biol Evol 2013;30:1713-9. [PMID: 23589455 DOI: 10.1093/molbev/mst069] [Citation(s) in RCA: 146] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open

Addington TA, Mertz RW, Siegel JB, Thompson JM, Fisher AJ, Filkov V, Fleischman NM, Suen AA, Zhang C, Toney MD. Janus: prediction and ranking of mutations required for functional interconversion of enzymes. J Mol Biol 2013;425:1378-89. [PMID: 23396064 DOI: 10.1016/j.jmb.2013.01.034] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2012] [Revised: 01/27/2013] [Accepted: 01/30/2013] [Indexed: 10/27/2022]

Suplatov D, Shalaeva D, Kirilin E, Arzhanik V, Švedas V. Bioinformatic analysis of protein families for identification of variable amino acid residues responsible for functional diversity. J Biomol Struct Dyn 2013;32:75-87. [DOI: 10.1080/07391102.2012.750249] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013;449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Teppa E, Wilkins AD, Nielsen M, Buslje CM. Disentangling evolutionary signals: conservation, specificity determining positions and coevolution. Implication for catalytic residue prediction. BMC Bioinformatics 2012;13:235. [PMID: 22978315 PMCID: PMC3515339 DOI: 10.1186/1471-2105-13-235] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Accepted: 09/05/2012] [Indexed: 11/11/2022] Open

Abstract

Background

A large panel of methods exists that aim to identify residues with critical impact on protein function based on evolutionary signals, sequence and structure information. However, it is not clear to what extent these different methods overlap, and if any of the methods have higher predictive potential compared to others when it comes to, in particular, the identification of catalytic residues (CR) in proteins. Using a large set of enzymatic protein families and measures based on different evolutionary signals, we sought to break up the different components of the information content within a multiple sequence alignment to investigate their predictive potential and degree of overlap.

Results

Our results demonstrate that the different methods included in the benchmark in general can be divided into three groups with a limited mutual overlap. One group containing real-value Evolutionary Trace (rvET) methods and conservation, another containing mutual information (MI) methods, and the last containing methods designed explicitly for the identification of specificity determining positions (SDPs): integer-value Evolutionary Trace (ivET), SDPfox, and XDET. In terms of prediction of CR, we find using a proximity score integrating structural information (as the sum of the scores of residues located within a given distance of the residue in question) that only the methods from the first two groups displayed a reliable performance. Next, we investigated to what degree proximity scores for conservation, rvET and cumulative MI (cMI) provide complementary information capable of improving the performance for CR identification. We found that integrating conservation with proximity scores for rvET and cMI achieved the highest performance. The proximity conservation score contained no complementary information when integrated with proximity rvET. Moreover, the signal from rvET provided only a limited gain in predictive performance when integrated with mutual information and conservation proximity scores. Combined, these observations demonstrate that the rvET and cMI scores add complementary information to the prediction system.

Conclusions

This work contributes to the understanding of the different signals of evolution and also shows that it is possible to improve the detection of catalytic residues by integrating structural and higher order sequence evolutionary information with sequence conservation.

Collapse

Nagao C, Izako N, Soga S, Khan SH, Kawabata S, Shirai H, Mizuguchi K. Computational design, construction, and characterization of a set of specificity determining residues in protein-protein interactions. Proteins 2012;80:2426-36. [PMID: 22674858 DOI: 10.1002/prot.24127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2011] [Revised: 04/26/2012] [Accepted: 05/28/2012] [Indexed: 01/18/2023]

Neuwald AF, Lanczycki CJ, Marchler-Bauer A. Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures. BMC Bioinformatics 2012;13:144. [PMID: 22726767 PMCID: PMC3599474 DOI: 10.1186/1471-2105-13-144] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Accepted: 06/09/2012] [Indexed: 11/17/2022] Open

Abstract

Background

The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches.

Results

Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day.

Conclusions

This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups.

Collapse

Chakraborty A, Mandloi S, Lanczycki CJ, Panchenko AR, Chakrabarti S. SPEER-SERVER: a web server for prediction of protein specificity determining sites. Nucleic Acids Res 2012;40:W242-8. [PMID: 22689646 PMCID: PMC3394334 DOI: 10.1093/nar/gks559] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

Padawer T, Leighty RE, Wang D. Duplicate gene enrichment and expression pattern diversification in multicellularity. Nucleic Acids Res 2012;40:7597-605. [PMID: 22645319 PMCID: PMC3439886 DOI: 10.1093/nar/gks464] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open

Abstract

The enrichment of duplicate genes, and therefore paralogs (proteins coded by duplicate genes), in multicellular versus unicellular organisms enhances genomic functional innovation. This study quantitatively examined relationships among paralog enrichment, expression pattern diversification and multicellularity, aiming to better understand genomic basis of multicellularity. Paralog abundance in specific cells was compared with those in unicellular proteomes and the whole proteomes of multicellular organisms. The budding yeast, Saccharomyces cerevisiae and the nematode, Caenorhabditis elegans, for which the gene sets expressed in specific cells are available, were used as uni and multicellular models, respectively. Paralog count (K) distributions [P_(k)] follow a power-law relationship [P_(k) ∝ k^−α] in the whole proteomes of both species and in specific C. elegans cells. The value of the constant α can be used as a gauge of paralog abundance; the higher the value, the lower the paralog abundance. The α-value is indeed lower in the whole proteome of C. elegans (1.74) than in S. cerevisiae (2.34), quantifying the enrichment of paralogs in multicellular species. We also found that the power-law relationship applies to the proteomes of specific C. elegans cells. Strikingly, values of α in specific cells are higher and comparable to that in S. cerevisiae. Thus, paralog abundance in specific cells is lower and comparable to that in unicellular species. Furthermore, how much the expression level of a gene fluctuates across different C. elegans cells correlates positively with its paralog count, which is further confirmed by human gene-expression patterns across different tissues. Taken together, these results quantitatively and mechanistically establish enrichment of paralogs with diversifying expression patterns as genomic and evolutionary basis of multicellularity.

Collapse

Zhang ZH, Bharatham K, Chee SMQ, Mihalek I. Cube-DB: detection of functional divergence in human protein families. Nucleic Acids Res 2012;40:D490-4. [PMID: 22139934 PMCID: PMC3245124 DOI: 10.1093/nar/gkr1129] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2011] [Revised: 11/08/2011] [Accepted: 11/08/2011] [Indexed: 12/11/2022] Open

Zhang Y, Zagnitko O, Rodionova I, Osterman A, Godzik A. The FGGY carbohydrate kinase family: insights into the evolution of functional specificities. PLoS Comput Biol 2011;7:e1002318. [PMID: 22215998 PMCID: PMC3245297 DOI: 10.1371/journal.pcbi.1002318] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Accepted: 11/06/2011] [Indexed: 12/29/2022] Open

Abstract

Function diversification in large protein families is a major mechanism driving expansion of cellular networks, providing organisms with new metabolic capabilities and thus adding to their evolutionary success. However, our understanding of the evolutionary mechanisms of functional diversity in such families is very limited, which, among many other reasons, is due to the lack of functionally well-characterized sets of proteins. Here, using the FGGY carbohydrate kinase family as an example, we built a confidently annotated reference set (CARS) of proteins by propagating experimentally verified functional assignments to a limited number of homologous proteins that are supported by their genomic and functional contexts. Then, we analyzed, on both the phylogenetic and the molecular levels, the evolution of different functional specificities in this family. The results show that the different functions (substrate specificities) encoded by FGGY kinases have emerged only once in the evolutionary history following an apparently simple divergent evolutionary model. At the same time, on the molecular level, one isofunctional group (L-ribulokinase, AraB) evolved at least two independent solutions that employed distinct specificity-determining residues for the recognition of a same substrate (L-ribulose). Our analysis provides a detailed model of the evolution of the FGGY kinase family. It also shows that only combined molecular and phylogenetic approaches can help reconstruct a full picture of functional diversifications in such diverse families.

The protein universe is under constant expansion and is reshaping through multiple duplication, gene losses, lateral gene transfers, and speciation events. Large and functionally heterogeneous protein families that evolve through these processes contain conserved motifs and structural scaffolds, yet their individual members often perform diverse functions. For this reason, the exact functional annotation for their individual members is difficult without detailed analysis of the family. In our study, we performed such a detailed analysis of a particularly heterogeneous FGGY kinase family through the integration of several computational approaches. The combination of phylogenetic and molecular approaches allowed us to precisely assign function to hundreds of proteins, thus reconstructing carbohydrate utilization pathways in almost 200 bacterial species. This analysis also showed that different molecular mechanisms could evolve within a group of isofunctional proteins. Moreover, based on our experience with this specific protein family of FGGY kinases, we believe that our approach can be generally adapted for the analyses of other protein families and that the accumulation of evolutionary models for various families would lead to a better understanding of the protein universe.

Collapse

Barrantes-Reynolds R, Wallace SS, Bond JP. Using shifts in amino acid frequency and substitution rate to identify latent structural characters in base-excision repair enzymes. PLoS One 2011;6:e25246. [PMID: 21998646 PMCID: PMC3188539 DOI: 10.1371/journal.pone.0025246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2010] [Accepted: 08/30/2011] [Indexed: 12/30/2022] Open

Abstract

Protein evolution includes the birth and death of structural motifs. For example, a zinc finger or a salt bridge may be present in some, but not all, members of a protein family. We propose that such transitions are manifest in sequence phylogenies as concerted shifts in substitution rates of amino acids that are neighbors in a representative structure. First, we identified rate shifts in a quartet from the Fpg/Nei family of base excision repair enzymes using a method developed by Xun Gu and coworkers. We found the shifts to be spatially correlated, more precisely, associated with a flexible loop involved in bacterial Fpg substrate specificity. Consistent with our result, sequences and structures provide convincing evidence that this loop plays a very different role in other family members. Second, then, we developed a method for identifying latent protein structural characters (LSC) given a set of homologous sequences based on Gu's method and proximity in a high-resolution structure. Third, we identified LSC and assigned states of LSC to clades within the Fpg/Nei family of base excision repair enzymes. We describe seven LSC; an accompanying Proteopedia page (http://proteopedia.org/wiki/index.php/Fpg_Nei_Protein_Family) describes these in greater detail and facilitates 3D viewing. The LSC we found provided a surprisingly complete picture of the interaction of the protein with the DNA capturing familiar examples, such as a Zn finger, as well as more subtle interactions. Their preponderance is consistent with an important role as phylogenetic characters. Phylogenetic inference based on LSC provided convincing evidence of independent losses of Zn fingers. Structural motifs may serve as important phylogenetic characters and modeling transitions involving structural motifs may provide a much deeper understanding of protein evolution.

Collapse

Determinants, discriminants, conserved residues--a heuristic approach to detection of functional divergence in protein families. PLoS One 2011;6:e24382. [PMID: 21931701 PMCID: PMC3171465 DOI: 10.1371/journal.pone.0024382] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2011] [Accepted: 08/08/2011] [Indexed: 11/19/2022] Open

Gaston D, Susko E, Roger AJ. A phylogenetic mixture model for the identification of functionally divergent protein residues. ACTA ACUST UNITED AC 2011;27:2655-63. [PMID: 21840876 DOI: 10.1093/bioinformatics/btr470] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Abstract

MOTIVATION

To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples.

RESULTS

We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions.

AVAILABILITY

http://rogerlab.biochem.dal.ca/Software

CONTACT

andrew.roger@dal.ca

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms. Stat Appl Genet Mol Biol 2011;10:Article 36. [PMID: 22331370 DOI: 10.2202/1544-6115.1666] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Carroll SM, Ortlund EA, Thornton JW. Mechanisms for the evolution of a derived function in the ancestral glucocorticoid receptor. PLoS Genet 2011;7:e1002117. [PMID: 21698144 PMCID: PMC3116920 DOI: 10.1371/journal.pgen.1002117] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2011] [Accepted: 04/19/2011] [Indexed: 11/19/2022] Open

Abstract

Understanding the genetic, structural, and biophysical mechanisms that caused protein functions to evolve is a central goal of molecular evolutionary studies. Ancestral sequence reconstruction (ASR) offers an experimental approach to these questions. Here we use ASR to shed light on the earliest functions and evolution of the glucocorticoid receptor (GR), a steroid-activated transcription factor that plays a key role in the regulation of vertebrate physiology. Prior work showed that GR and its paralog, the mineralocorticoid receptor (MR), duplicated from a common ancestor roughly 450 million years ago; the ancestral functions were largely conserved in the MR lineage, but the functions of GRs-reduced sensitivity to all hormones and increased selectivity for glucocorticoids-are derived. Although the mechanisms for the evolution of glucocorticoid specificity have been identified, how reduced sensitivity evolved has not yet been studied. Here we report on the reconstruction of the deepest ancestor in the GR lineage (AncGR1) and demonstrate that GR's reduced sensitivity evolved before the acquisition of restricted hormone specificity, shortly after the GR-MR split. Using site-directed mutagenesis, X-ray crystallography, and computational analyses of protein stability to recapitulate and determine the effects of historical mutations, we show that AncGR1's reduced ligand sensitivity evolved primarily due to three key substitutions. Two large-effect mutations weakened hydrogen bonds and van der Waals interactions within the ancestral protein, reducing its stability. The degenerative effect of these two mutations is extremely strong, but a third permissive substitution, which has no apparent effect on function in the ancestral background and is likely to have occurred first, buffered the effects of the destabilizing mutations. Taken together, our results highlight the potentially creative role of substitutions that partially degrade protein structure and function and reinforce the importance of permissive mutations in protein evolution.

Collapse

García-Torres I, Cabrera N, Torres-Larios A, Rodríguez-Bolaños M, Díaz-Mazariegos S, Gómez-Puyou A, Perez-Montfort R. Identification of amino acids that account for long-range interactions in two triosephosphate isomerases from pathogenic trypanosomes. PLoS One 2011;6:e18791. [PMID: 21533154 PMCID: PMC3078909 DOI: 10.1371/journal.pone.0018791] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2010] [Accepted: 03/18/2011] [Indexed: 11/18/2022] Open

González Montoro A, Chumpen Ramirez S, Quiroga R, Valdez Taubas J. Specificity of transmembrane protein palmitoylation in yeast. PLoS One 2011;6:e16969. [PMID: 21383992 PMCID: PMC3044718 DOI: 10.1371/journal.pone.0016969] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2010] [Accepted: 01/11/2011] [Indexed: 11/29/2022] Open

Abstract

Many proteins are modified after their synthesis, by the addition of a lipid molecule to one or more cysteine residues, through a thioester bond. This modification is called S-acylation, and more commonly palmitoylation. This reaction is carried out by a family of enzymes, called palmitoyltransferases (PATs), characterized by the presence of a conserved 50- aminoacids domain called “Asp-His-His-Cys- Cysteine Rich Domain” (DHHC-CRD). There are 7 members of this family in the yeast Saccharomyces cerevisiae, and each of these proteins is thought to be responsible for the palmitoylation of a subset of substrates. Substrate specificity of PATs, however, is not yet fully understood. Several yeast PATs seem to have overlapping specificity, and it has been proposed that the machinery responsible for palmitoylating peripheral membrane proteins in mammalian cells, lacks specificity altogether.

Here we investigate the specificity of transmembrane protein palmitoylation in S. cerevisiae, which is carried out predominantly by two PATs, Swf1 and Pfa4. We show that palmitoylation of transmembrane substrates requires dedicated PATs, since other yeast PATs are mostly unable to perform Swf1 or Pfa4 functions, even when overexpressed. Furthermore, we find that Swf1 is highly specific for its substrates, as it is unable to substitute for other PATs. To identify where Swf1 specificity lies, we carried out a bioinformatics survey to identify amino acids responsible for the determination of specificity or Specificity Determination Positions (SDPs) and showed experimentally, that mutation of the two best SDP candidates, A145 and K148, results in complete and partial loss of function, respectively. These residues are located within the conserved catalytic DHHC domain suggesting that it could also be involved in the determination of specificity. Finally, we show that modifying the position of the cysteines in Tlg1, a Swf1 substrate, results in lack of palmitoylation, as expected for a highly specific enzymatic reaction.

Collapse

Neuwald AF. Bayesian classification of residues associated with protein functional divergence: Arf and Arf-like GTPases. Biol Direct 2010;5:66. [PMID: 21129209 PMCID: PMC3012027 DOI: 10.1186/1745-6150-5-66] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Accepted: 12/03/2010] [Indexed: 11/22/2022] Open

Abstract

Background

Certain residues within proteins are highly conserved across very distantly related organisms, yet their (presumably critical) structural or mechanistic roles are completely unknown. To obtain clues regarding such residues within Arf and Arf-like (Arf/Arl) GTPases--which function as on/off switches regulating vesicle trafficking, phospholipid metabolism and cytoskeletal remodeling--I apply a new sampling procedure for comparative sequence analysis, termed multiple category Bayesian Partitioning with Pattern Selection (mcBPPS).

Results

The mcBPPS sampler classified sequences within the entire P-loop GTPase class into multiple categories by identifying those evolutionarily-divergent residues most likely to be responsible for functional specialization. Here I focus on categories of residues that most distinguish various Arf/Arl GTPases from other GTPases. This identified residues whose specific roles have been previously proposed (and in some cases corroborated experimentally and that thus serve as positive controls), as well as several categories of co-conserved residues whose possible roles are first hinted at here. For example, Arf/Arl/Sar GTPases are most distinguished from other GTPases by a conserved aspartate residue within the phosphate binding loop (P-loop) and by co-conserved residues nearby that, together, can form a network of salt-bridge and hydrogen bond interactions centered on the GTPase active site. Residues corresponding to an N-[VI] motif that is conserved within Arf/Arl GTPases may play a role in the interswitch toggle characteristic of the Arf family, whereas other, co-conserved residues may modulate the flexibility of the guanine binding loop. Arl8 GTPases conserve residues that strikingly diverge from those typically found in other Arf/Arl GTPases and that form structural interactions suggestive of a novel interswitch toggle mechanism.

Conclusions

This analysis suggests specific mutagenesis experiments to explore mechanisms underlying GTP hydrolysis, nucleotide exchange and interswitch toggling within Arf/Arl GTPases. More generally, it illustrates how the mcBPPS sampler can complement traditional evolutionary analyses by providing an objective, quantitative and statistically rigorous way to explore protein functional-divergence in molecular detail. Because the sampler classifies the input sequences at the same time, it can be used to generate subgroup profiles, in which functionally-divergent categories of residues are annotated automatically.

Reviewers

This article was reviewed by Frank Eisenhaber, L Aravind and Daniel Gaston (nominated by Eric Bapteste). For the full reviews, go to the Reviewers' comments section.

Collapse

Dessailly BH, Redfern OC, Cuff AL, Orengo CA. Detailed analysis of function divergence in a large and diverse domain superfamily: toward a refined protocol of function classification. Structure 2010;18:1522-35. [PMID: 21070951 PMCID: PMC3023962 DOI: 10.1016/j.str.2010.08.017] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2010] [Revised: 08/06/2010] [Accepted: 08/13/2010] [Indexed: 10/18/2022]

Slama P, Geman D. Identification of family-determining residues in PHD fingers. Nucleic Acids Res 2010;39:1666-79. [PMID: 21059680 PMCID: PMC3061080 DOI: 10.1093/nar/gkq947] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

de Melo-Minardi RC, Bastard K, Artiguenave F. Identification of subfamily-specific sites based on active sites modeling and clustering. ACTA ACUST UNITED AC 2010;26:3075-82. [PMID: 20980272 DOI: 10.1093/bioinformatics/btq595] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Abstract

MOTIVATION

Current computational approaches to function prediction are mostly based on protein sequence classification and transfer of annotation from known proteins to their closest homologous sequences relying on the orthology concept of function conservation. This approach suffers a major weakness: annotation reliability depends on global sequence similarity to known proteins and is poorly efficient for enzyme superfamilies that catalyze different reactions. Structural biology offers a different strategy to overcome the problem of annotation by adding information about protein 3D structures. This information can be used to identify amino acids located in active sites, focusing on detection of functional polymorphisms residues in an enzyme superfamily. Structural genomics programs are providing more and more novel protein structures at a high-throughput rate. However, there is still a huge gap between the number of sequences and available structures. Computational methods, such as homology modeling provides reliable approaches to bridge this gap and could be a new precise tool to annotate protein functions.

RESULTS

Here, we present Active Sites Modeling and Clustering (ASMC) method, a novel unsupervised method to classify sequences using structural information of protein pockets. ASMC combines homology modeling of family members, structural alignment of modeled active sites and a subsequent hierarchical conceptual classification. Comparison of profiles obtained from computed clusters allows the identification of residues correlated to subfamily function divergence, called specificity determining positions. ASMC method has been validated on a benchmark of 42 Pfam families for which previous resolved holo-structures were available. ASMC was also applied to several families containing known protein structures and comprehensive functional annotations. We will discuss how ASMC improves annotation and understanding of protein families functions by giving some specific illustrative examples on nucleotidyl cyclases, protein kinases and serine proteases.

AVAILABILITY

http://www.genoscope.fr/ASMC/.

Collapse

Mondal S, Nagao C, Mizuguchi K. Detecting subtle functional differences in ketopantoate reductase and related enzymes using a rule-based approach with sequence-structure homology recognition scores. Protein Eng Des Sel 2010;23:859-69. [PMID: 20876192 DOI: 10.1093/protein/gzq062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Fromer M, Linial M. Exposing the co-adaptive potential of protein-protein interfaces through computational sequence design. ACTA ACUST UNITED AC 2010;26:2266-72. [PMID: 20679332 DOI: 10.1093/bioinformatics/btq412] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Mazin PV, Gelfand MS, Mironov AA, Rakhmaninova AB, Rubinov AR, Russell RB, Kalinina OV. An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies. Algorithms Mol Biol 2010;5:29. [PMID: 20633297 PMCID: PMC2914642 DOI: 10.1186/1748-7188-5-29] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2009] [Accepted: 07/15/2010] [Indexed: 11/30/2022] Open

Brandt BW, Feenstra KA, Heringa J. Multi-Harmony: detecting functional specificity from sequence alignment. Nucleic Acids Res 2010;38:W35-40. [PMID: 20525785 PMCID: PMC2896201 DOI: 10.1093/nar/gkq415] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Harms MJ, Thornton JW. Analyzing protein structure and function using ancestral gene reconstruction. Curr Opin Struct Biol 2010;20:360-6. [PMID: 20413295 PMCID: PMC2916957 DOI: 10.1016/j.sbi.2010.03.005] [Citation(s) in RCA: 161] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2010] [Accepted: 03/22/2010] [Indexed: 01/06/2023]

Wass MN, Kelley LA, Sternberg MJE. 3DLigandSite: predicting ligand-binding sites using similar structures. Nucleic Acids Res 2010;38:W469-73. [PMID: 20513649 PMCID: PMC2896164 DOI: 10.1093/nar/gkq406] [Citation(s) in RCA: 474] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open

Horan K, Shelton CR, Girke T. Predicting conserved protein motifs with Sub-HMMs. BMC Bioinformatics 2010;11:205. [PMID: 20420695 PMCID: PMC2879284 DOI: 10.1186/1471-2105-11-205] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2009] [Accepted: 04/26/2010] [Indexed: 11/16/2022] Open

Field SF, Matz MV. Retracing evolution of red fluorescence in GFP-like proteins from Faviina corals. Mol Biol Evol 2010;27:225-33. [PMID: 19793832 PMCID: PMC2877551 DOI: 10.1093/molbev/msp230] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Röttig M, Rausch C, Kohlbacher O. Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families. PLoS Comput Biol 2010;6:e1000636. [PMID: 20072606 PMCID: PMC2796266 DOI: 10.1371/journal.pcbi.1000636] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2009] [Accepted: 12/08/2009] [Indexed: 12/25/2022] Open

Abstract

An important aspect of the functional annotation of enzymes is not only the type of reaction catalysed by an enzyme, but also the substrate specificity, which can vary widely within the same family. In many cases, prediction of family membership and even substrate specificity is possible from enzyme sequence alone, using a nearest neighbour classification rule. However, the combination of structural information and sequence information can improve the interpretability and accuracy of predictive models. The method presented here, Active Site Classification (ASC), automatically extracts the residues lining the active site from one representative three-dimensional structure and the corresponding residues from sequences of other members of the family. From a set of representatives with known substrate specificity, a Support Vector Machine (SVM) can then learn a model of substrate specificity. Applied to a sequence of unknown specificity, the SVM can then predict the most likely substrate. The models can also be analysed to reveal the underlying structural reasons determining substrate specificities and thus yield valuable insights into mechanisms of enzyme specificity. We illustrate the high prediction accuracy achieved on two benchmark data sets and the structural insights gained from ASC by a detailed analysis of the family of decarboxylating dehydrogenases. The ASC web service is available at http://asc.informatik.uni-tuebingen.de/.

Prediction of enzymatic function of experimentally uncharacterised sequences is an important task in annotation of sequence databases. While all the information on an enzyme's specificity is necessarily contained in its sequence, prediction methods using sequence alone often do not perform all that well. Obviously, structural information – if available – will yield precious hints on the function and relative importance of specific sequence positions with respect to substrate specificity. We propose a novel method (Active Site Classification, ASC) for enzyme classification bringing together structural information and sequence information. Our ASC web server allows users to build predictive models in an automated way focused on relevant enzyme residues and furthermore to interpret the models to gain insights into the mechanism of enzyme substrate specificity.

Collapse

Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol 2009;5:e1000585. [PMID: 19997483 PMCID: PMC2777313 DOI: 10.1371/journal.pcbi.1000585] [Citation(s) in RCA: 302] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2009] [Accepted: 10/30/2009] [Indexed: 11/20/2022] Open

Abstract

Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (http://compbio.cs.princeton.edu/concavity/).

Protein molecules are ubiquitous in the cell; they perform thousands of functions crucial for life. Proteins accomplish nearly all of these functions by interacting with other molecules. These interactions are mediated by specific amino acid positions in the proteins. Knowledge of these “functional sites” is crucial for understanding the molecular mechanisms by which proteins carry out their functions; however, functional sites have not been identified in the vast majority of proteins. Here, we present ConCavity, a computational method that predicts small molecule binding sites in proteins by combining analysis of evolutionary sequence conservation and protein 3D structure. ConCavity provides significant improvement over previous approaches, especially on large, multi-chain proteins. In contrast to earlier methods which only predict entire binding sites, ConCavity makes specific predictions of positions in space that are likely to overlap ligand atoms and of residues that are likely to contact bound ligands. These predictions can be used to aid computational function prediction, to guide experimental protein analysis, and to focus computationally intensive techniques used in drug discovery.

Collapse

Kapoor K, Rehan M, Kaushiki A, Pasrija R, Lynn AM, Prasad R. Rational mutational analysis of a multidrug MFS transporter CaMdr1p of Candida albicans by employing a membrane environment based computational approach. PLoS Comput Biol 2009;5:e1000624. [PMID: 20041202 PMCID: PMC2789324 DOI: 10.1371/journal.pcbi.1000624] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2009] [Accepted: 11/20/2009] [Indexed: 01/31/2023] Open

Dou Y, Zheng X, Wang J. Several appropriate background distributions for entropy-based protein sequence conservation measures. J Theor Biol 2009;262:317-22. [PMID: 19808039 DOI: 10.1016/j.jtbi.2009.09.030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Revised: 09/25/2009] [Accepted: 09/25/2009] [Indexed: 11/25/2022]

Chakrabarti S, Panchenko AR. Ensemble approach to predict specificity determinants: benchmarking and validation. BMC Bioinformatics 2009;10:207. [PMID: 19573245 PMCID: PMC2716344 DOI: 10.1186/1471-2105-10-207] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2009] [Accepted: 07/02/2009] [Indexed: 11/29/2022] Open

Kalinina OV, Gelfand MS, Russell RB. Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics 2009;10:174. [PMID: 19508719 PMCID: PMC2709924 DOI: 10.1186/1471-2105-10-174] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 06/09/2009] [Indexed: 11/16/2022] Open

Abstract

Background

Predicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities.

Results

Here we present a novel method for functional site prediction based on identification of conserved positions, as well as those responsible for determining ligand specificity. We define Specificity-Determining Positions (SDPs), as those occupied by conserved residues within sub-groups of proteins in a family having a common specificity, but differ between groups, and are thus likely to account for specific recognition events. We benchmark the approach on enzyme families of known 3D structure with bound substrates, and find that in nearly all families residues predicted by SDPsite are in contact with the bound substrate, and that the addition of SDPs significantly improves functional site prediction accuracy. We apply SDPsite to various families of proteins containing known three-dimensional structures, but lacking clear functional annotations, and discusse several illustrative examples.

Conclusion

The results suggest a better means to predict functional details for the thousands of protein structures determined prior to a clear understanding of molecular function.

Collapse

Dessailly BH, Redfern OC, Cuff A, Orengo CA. Exploiting structural classifications for function prediction: towards a domain grammar for protein function. Curr Opin Struct Biol 2009;19:349-56. [PMID: 19398323 PMCID: PMC2920418 DOI: 10.1016/j.sbi.2009.03.009] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2009] [Revised: 02/17/2009] [Accepted: 03/16/2009] [Indexed: 12/28/2022]

Chakrabarti S, Panchenko AR. Coevolution in defining the functional specificity. Proteins 2009;75:231-40. [PMID: 18831050 PMCID: PMC2649964 DOI: 10.1002/prot.22239] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Sankararaman S, Sjölander K. INTREPID--INformation-theoretic TREe traversal for Protein functional site IDentification. ACTA ACUST UNITED AC 2008;24:2445-52. [PMID: 18776193 PMCID: PMC2572704 DOI: 10.1093/bioinformatics/btn474] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Abstract

Motivation: Identification of functionally important residues in proteins plays a significant role in biological discovery. Here, we present INTREPID—an information–theoretic approach for functional site identification that exploits the information in large diverse multiple sequence alignments (MSAs). INTREPID uses a traversal of the phylogeny in combination with a positional conservation score, based on Jensen–Shannon divergence, to rank positions in an MSA. While knowledge of protein 3D structure can significantly improve the accuracy of functional site identification, since structural information is not available for a majority of proteins, INTREPID relies solely on sequence information. We evaluated INTREPID on two tasks: predicting catalytic residues and predicting specificity determinants.

Results: In catalytic residue prediction, INTREPID provides significant improvements over Evolutionary Trace, ConSurf as well as over a baseline global conservation method on a set of 100 manually curated enzymes from the Catalytic Site Atlas. In particular, INTREPID is able to better predict catalytic positions that are not globally conserved and hence, attains improved sensitivity at high values of specificity. We also investigated the performance of INTREPID as a function of the evolutionary divergence of the protein family. We found that INTREPID is better able to exploit the diversity in such families and that accuracy improves when homologs with very low sequence identity are included in an alignment. In specificity determinant prediction, when subtype information is known, INTREPID-SPEC, a variant of INTREPID, attains accuracies that are competitive with other approaches for this task.

Availability: INTREPID is available for 16919 families in the PhyloFacts resource (http://phylogenomics.berkeley.edu/phylofacts).

Contact:sriram_s@cs.berkeley.edu

Supplementary information: Relevant online supplementary material is available at http://phylogenomics.berkeley.edu/INTREPID.

Collapse