51
|
Suplatov D, Kirilin E, Arbatsky M, Takhaveev V, Svedas V. pocketZebra: a web-server for automated selection and classification of subfamily-specific binding sites by bioinformatic analysis of diverse protein families. Nucleic Acids Res 2014; 42:W344-9. [PMID: 24852248 PMCID: PMC4086101 DOI: 10.1093/nar/gku448] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The new web-server pocketZebra implements the power of bioinformatics and geometry-based structural approaches to identify and rank subfamily-specific binding sites in proteins by functional significance, and select particular positions in the structure that determine selective accommodation of ligands. A new scoring function has been developed to annotate binding sites by the presence of the subfamily-specific positions in diverse protein families. pocketZebra web-server has multiple input modes to meet the needs of users with different experience in bioinformatics. The server provides on-site visualization of the results as well as off-line version of the output in annotated text format and as PyMol sessions ready for structural analysis. pocketZebra can be used to study structure–function relationship and regulation in large protein superfamilies, classify functionally important binding sites and annotate proteins with unknown function. The server can be used to engineer ligand-binding sites and allosteric regulation of enzymes, or implemented in a drug discovery process to search for potential molecular targets and novel selective inhibitors/effectors. The server, documentation and examples are freely available at http://biokinet.belozersky.msu.ru/pocketzebra and there are no login requirements.
Collapse
Affiliation(s)
- Dmitry Suplatov
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Vorobjev hills 1-73, Moscow 119991, Russia
| | - Eugeny Kirilin
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Vorobjev hills 1-73, Moscow 119991, Russia
| | - Mikhail Arbatsky
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Vorobjev hills 1-73, Moscow 119991, Russia
| | - Vakil Takhaveev
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Vorobjev hills 1-73, Moscow 119991, Russia
| | - Vytas Svedas
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Vorobjev hills 1-73, Moscow 119991, Russia
| |
Collapse
|
52
|
Chakraborty A, Chakrabarti S. A survey on prediction of specificity-determining sites in proteins. Brief Bioinform 2014; 16:71-88. [DOI: 10.1093/bib/bbt092] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
53
|
Nagao C, Nagano N, Mizuguchi K. Prediction of detailed enzyme functions and identification of specificity determining residues by random forests. PLoS One 2014; 9:e84623. [PMID: 24416252 PMCID: PMC3885575 DOI: 10.1371/journal.pone.0084623] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Accepted: 11/15/2013] [Indexed: 12/03/2022] Open
Abstract
Determining enzyme functions is essential for a thorough understanding of cellular processes. Although many prediction methods have been developed, it remains a significant challenge to predict enzyme functions at the fourth-digit level of the Enzyme Commission numbers. Functional specificity of enzymes often changes drastically by mutations of a small number of residues and therefore, information about these critical residues can potentially help discriminate detailed functions. However, because these residues must be identified by mutagenesis experiments, the available information is limited, and the lack of experimentally verified specificity determining residues (SDRs) has hindered the development of detailed function prediction methods and computational identification of SDRs. Here we present a novel method for predicting enzyme functions by random forests, EFPrf, along with a set of putative SDRs, the random forests derived SDRs (rf-SDRs). EFPrf consists of a set of binary predictors for enzymes in each CATH superfamily and the rf-SDRs are the residue positions corresponding to the most highly contributing attributes obtained from each predictor. EFPrf showed a precision of 0.98 and a recall of 0.89 in a cross-validated benchmark assessment. The rf-SDRs included many residues, whose importance for specificity had been validated experimentally. The analysis of the rf-SDRs revealed both a general tendency that functionally diverged superfamilies tend to include more active site residues in their rf-SDRs than in less diverged superfamilies, and superfamily-specific conservation patterns of each functional residue. EFPrf and the rf-SDRs will be an effective tool for annotating enzyme functions and for understanding how enzyme functions have diverged within each superfamily.
Collapse
Affiliation(s)
- Chioko Nagao
- National Institute of Biomedical Innovation, Ibaraki, Osaka, Japan
- * E-mail: (CN); (KM)
| | - Nozomi Nagano
- Computational Biology Research Center, AIST, Koto-ku, Tokyo, Japan
| | - Kenji Mizuguchi
- National Institute of Biomedical Innovation, Ibaraki, Osaka, Japan
- * E-mail: (CN); (KM)
| |
Collapse
|
54
|
Zhang ZH, Khoo AA, Mihalek I. Cube - an online tool for comparison and contrasting of protein sequences. PLoS One 2013; 8:e79480. [PMID: 24363790 PMCID: PMC3867285 DOI: 10.1371/journal.pone.0079480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2013] [Accepted: 09/23/2013] [Indexed: 01/10/2023] Open
Abstract
When comparing sequences of similar proteins, two kinds of questions can be asked, and the related two kinds of inference made. First, one may ask to what degree they are similar, and then, how they differ. In the first case one may tentatively conclude that the conserved elements common to all sequences are of central and common importance to the protein's function. In the latter case the regions of specialization may be discriminative of the function or binding partners across subfamilies of related proteins. Experimental efforts - mutagenesis or pharmacological intervention - can then be pointed in either direction, depending on the context of the study. Cube simplifies this process for users that already have their favorite sets of sequences, and helps them collate the information by visualization of the conservation and specialization scores on the sequence and on the structure, and by spreadsheet tabulation. All information can be visualized on the spot, or downloaded for reference and later inspection. Server homepage: http://eopsf.org/cube
Collapse
Affiliation(s)
- Zong Hong Zhang
- Bioinformatics Institute, Agency for Science, Technology and Research, Singapore
| | - Aik Aun Khoo
- Bioinformatics Institute, Agency for Science, Technology and Research, Singapore
| | - Ivana Mihalek
- Bioinformatics Institute, Agency for Science, Technology and Research, Singapore
- * E-mail: Corresponding
| |
Collapse
|
55
|
Gaston D, Roger AJ. Functional divergence and convergent evolution in the plastid-targeted glyceraldehyde-3-phosphate dehydrogenases of diverse eukaryotic algae. PLoS One 2013; 8:e70396. [PMID: 23936198 PMCID: PMC3728087 DOI: 10.1371/journal.pone.0070396] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2013] [Accepted: 06/18/2013] [Indexed: 11/19/2022] Open
Abstract
Background Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is a key enzyme of the glycolytic pathway, reversibly catalyzing the sixth step of glycolysis and concurrently reducing the coenzyme NAD+ to NADH. In photosynthetic organisms a GAPDH paralog (Gap2 in Cyanobacteria, GapA in most photosynthetic eukaryotes) functions in the Calvin cycle, performing the reverse of the glycolytic reaction and using the coenzyme NADPH preferentially. In a number of photosynthetic eukaryotes that acquired their plastid by the secondary endosymbiosis of a eukaryotic red alga (Alveolates, haptophytes, cryptomonads and stramenopiles) GapA has been apparently replaced with a paralog of the host’s own cytosolic GAPDH (GapC1). Plastid GapC1 and GapA therefore represent two independent cases of functional divergence and adaptations to the Calvin cycle entailing a shift in subcellular targeting and a shift in binding preference from NAD+ to NADPH. Methods We used the programs FunDi, GroupSim, and Difference Evolutionary-Trace to detect sites involved in the functional divergence of these two groups of GAPDH sequences and to identify potential cases of convergent evolution in the Calvin-cycle adapted GapA and GapC1 families. Sites identified as being functionally divergent by all or some of these programs were then investigated with respect to their possible roles in the structure and function of both glycolytic and plastid-targeted GAPDH isoforms. Conclusions In this work we found substantial evidence for convergent evolution in GapA/B and GapC1. In many cases sites in GAPDHs of these groups converged on identical amino acid residues in specific positions of the protein known to play a role in the function and regulation of plastid-functioning enzymes relative to their cytosolic counterparts. In addition, we demonstrate that bioinformatic software like FunDi are important tools for the generation of meaningful biological hypotheses that can then be tested with direct experimental techniques.
Collapse
Affiliation(s)
- Daniel Gaston
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Andrew J. Roger
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
- * E-mail:
| |
Collapse
|
56
|
Jessen LE, Hoof I, Lund O, Nielsen M. SigniSite: Identification of residue-level genotype-phenotype correlations in protein multiple sequence alignments. Nucleic Acids Res 2013; 41:W286-91. [PMID: 23761454 PMCID: PMC3692133 DOI: 10.1093/nar/gkt497] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Identifying which mutation(s) within a given genotype is responsible for an observable phenotype is important in many aspects of molecular biology. Here, we present SigniSite, an online application for subgroup-free residue-level genotype-phenotype correlation. In contrast to similar methods, SigniSite does not require any pre-definition of subgroups or binary classification. Input is a set of protein sequences where each sequence has an associated real number, quantifying a given phenotype. SigniSite will then identify which amino acid residues are significantly associated with the data set phenotype. As output, SigniSite displays a sequence logo, depicting the strength of the phenotype association of each residue and a heat-map identifying 'hot' or 'cold' regions. SigniSite was benchmarked against SPEER, a state-of-the-art method for the prediction of specificity determining positions (SDP) using a set of human immunodeficiency virus protease-inhibitor genotype-phenotype data and corresponding resistance mutation scores from the Stanford University HIV Drug Resistance Database, and a data set of protein families with experimentally annotated SDPs. For both data sets, SigniSite was found to outperform SPEER. SigniSite is available at: http://www.cbs.dtu.dk/services/SigniSite/.
Collapse
Affiliation(s)
- Leon Eyrich Jessen
- Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Kemitorvet, Building 208, DK-2800 Lyngby, Denmark.
| | | | | | | |
Collapse
|
57
|
Gopavajhula VR, Chaitanya KV, Akbar Ali Khan P, Shaik JP, Reddy PN, Alanazi M. Modeling and analysis of soybean (Glycine max. L) Cu/Zn, Mn and Fe superoxide dismutases. Genet Mol Biol 2013; 36:225-36. [PMID: 23885205 PMCID: PMC3715289 DOI: 10.1590/s1415-47572013005000023] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 02/10/2013] [Indexed: 01/05/2023] Open
Abstract
Superoxide dismutase (SOD, EC 1.15.1.1) is an important metal-containing antioxidant enzyme that provides the first line of defense against toxic superoxide radicals by catalyzing their dismutation to oxygen and hydrogen peroxide. SOD is classified into four metalloprotein isoforms, namely, Cu/Zn SOD, Mn SOD, Ni SOD and Fe SOD. The structural models of soybean SOD isoforms have not yet been solved. In this study, we describe structural models for soybean Cu/Zn SOD, Mn SOD and Fe SOD and provide insights into the molecular function of this metal-binding enzyme in improving tolerance to oxidative stress in plants.
Collapse
|
58
|
Gu X, Zou Y, Su Z, Huang W, Zhou Z, Arendsee Z, Zeng Y. An update of DIVERGE software for functional divergence analysis of protein family. Mol Biol Evol 2013; 30:1713-9. [PMID: 23589455 DOI: 10.1093/molbev/mst069] [Citation(s) in RCA: 146] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
DIVERGE is a software system for phylogeny-based analyses of protein family evolution and functional divergence. It provides a suite of statistical tools for selection and prioritization of the amino acid sites that are responsible for the functional divergence of a gene family. The synergistic efforts of DIVERGE and other methods have convincingly demonstrated that the pattern of rate change at a particular amino acid site may contain insightful information about the underlying functional divergence following gene duplication. These predicted sites may be used as candidates for further experiments. We are now releasing an updated version of DIVERGE with the following improvements: 1) a feasible approach to examining functional divergence in nearly complete sequences by including deletions and insertions (indels); 2) the calculation of the false discovery rate of functionally diverging sites; 3) estimation of the effective number of functional divergence-related sites that is reliable and insensitive to cutoffs; 4) a statistical test for asymmetric functional divergence; and 5) a new method to infer functional divergence specific to a given duplicate cluster. In addition, we have made efforts to improve software design and produce a well-written software manual for the general user.
Collapse
Affiliation(s)
- Xun Gu
- State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, China.
| | | | | | | | | | | | | |
Collapse
|
59
|
Addington TA, Mertz RW, Siegel JB, Thompson JM, Fisher AJ, Filkov V, Fleischman NM, Suen AA, Zhang C, Toney MD. Janus: prediction and ranking of mutations required for functional interconversion of enzymes. J Mol Biol 2013; 425:1378-89. [PMID: 23396064 DOI: 10.1016/j.jmb.2013.01.034] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2012] [Revised: 01/27/2013] [Accepted: 01/30/2013] [Indexed: 10/27/2022]
Abstract
Identification of residues responsible for functional specificity in enzymes is a challenging and important problem in protein chemistry. Active-site residues are generally easy to identify, but residues outside the active site are also important to catalysis and their identities and roles are more difficult to determine. We report a method based on analysis of multiple sequence alignments, embodied in our program Janus, for predicting mutations required to interconvert structurally related but functionally distinct enzymes. Conversion of aspartate aminotransferase into tyrosine aminotransferase is demonstrated and compared to previous efforts. Incorporation of 35 predicted mutations resulted in an enzyme with the desired substrate specificity but low catalytic activity. A single round of DNA back-shuffling with wild-type aspartate aminotransferase on this variant generated mutants with tyrosine aminotransferase activities better than those previously realized from rational design or directed evolution. Methods such as this, coupled with computational modeling, may prove invaluable in furthering our understanding of enzyme catalysis and engineering.
Collapse
|
60
|
Suplatov D, Shalaeva D, Kirilin E, Arzhanik V, Švedas V. Bioinformatic analysis of protein families for identification of variable amino acid residues responsible for functional diversity. J Biomol Struct Dyn 2013; 32:75-87. [DOI: 10.1080/07391102.2012.750249] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
61
|
Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013; 449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions (‘decorations’ at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure–function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.
Collapse
|
62
|
Teppa E, Wilkins AD, Nielsen M, Buslje CM. Disentangling evolutionary signals: conservation, specificity determining positions and coevolution. Implication for catalytic residue prediction. BMC Bioinformatics 2012; 13:235. [PMID: 22978315 PMCID: PMC3515339 DOI: 10.1186/1471-2105-13-235] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Accepted: 09/05/2012] [Indexed: 11/11/2022] Open
Abstract
Background A large panel of methods exists that aim to identify residues with critical impact on protein function based on evolutionary signals, sequence and structure information. However, it is not clear to what extent these different methods overlap, and if any of the methods have higher predictive potential compared to others when it comes to, in particular, the identification of catalytic residues (CR) in proteins. Using a large set of enzymatic protein families and measures based on different evolutionary signals, we sought to break up the different components of the information content within a multiple sequence alignment to investigate their predictive potential and degree of overlap. Results Our results demonstrate that the different methods included in the benchmark in general can be divided into three groups with a limited mutual overlap. One group containing real-value Evolutionary Trace (rvET) methods and conservation, another containing mutual information (MI) methods, and the last containing methods designed explicitly for the identification of specificity determining positions (SDPs): integer-value Evolutionary Trace (ivET), SDPfox, and XDET. In terms of prediction of CR, we find using a proximity score integrating structural information (as the sum of the scores of residues located within a given distance of the residue in question) that only the methods from the first two groups displayed a reliable performance. Next, we investigated to what degree proximity scores for conservation, rvET and cumulative MI (cMI) provide complementary information capable of improving the performance for CR identification. We found that integrating conservation with proximity scores for rvET and cMI achieved the highest performance. The proximity conservation score contained no complementary information when integrated with proximity rvET. Moreover, the signal from rvET provided only a limited gain in predictive performance when integrated with mutual information and conservation proximity scores. Combined, these observations demonstrate that the rvET and cMI scores add complementary information to the prediction system. Conclusions This work contributes to the understanding of the different signals of evolution and also shows that it is possible to improve the detection of catalytic residues by integrating structural and higher order sequence evolutionary information with sequence conservation.
Collapse
Affiliation(s)
- Elin Teppa
- Fundación Instituto Leloir, Avda, Patricias Argentinas 435, CABA, C1405BWE, Argentina
| | | | | | | |
Collapse
|
63
|
Nagao C, Izako N, Soga S, Khan SH, Kawabata S, Shirai H, Mizuguchi K. Computational design, construction, and characterization of a set of specificity determining residues in protein-protein interactions. Proteins 2012; 80:2426-36. [PMID: 22674858 DOI: 10.1002/prot.24127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2011] [Revised: 04/26/2012] [Accepted: 05/28/2012] [Indexed: 01/18/2023]
Abstract
Proteins interact with different partners to perform different functions and it is important to elucidate the determinants of partner specificity in protein complex formation. Although methods for detecting specificity determining positions have been developed previously, direct experimental evidence for these amino acid residues is scarce, and the lack of information has prevented further computational studies. In this article, we constructed a dataset that is likely to exhibit specificity in protein complex formation, based on available crystal structures and several intuitive ideas about interaction profiles and functional subclasses. We then defined a "structure-based specificity determining position (sbSDP)" as a set of equivalent residues in a protein family showing a large variation in their interaction energy with different partners. We investigated sequence and structural features of sbSDPs and demonstrated that their amino acid propensities significantly differed from those of other interacting residues and that the importance of many of these residues for determining specificity had been verified experimentally.
Collapse
Affiliation(s)
- Chioko Nagao
- National Institute of Biomedical Innovation, Ibaraki, Osaka, Japan.
| | | | | | | | | | | | | |
Collapse
|
64
|
Neuwald AF, Lanczycki CJ, Marchler-Bauer A. Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures. BMC Bioinformatics 2012; 13:144. [PMID: 22726767 PMCID: PMC3599474 DOI: 10.1186/1471-2105-13-144] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2012] [Accepted: 06/09/2012] [Indexed: 11/17/2022] Open
Abstract
Background The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches. Results Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day. Conclusions This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups.
Collapse
Affiliation(s)
- Andrew F Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, BioPark II, Room 617, 801 West Baltimore St, Baltimore, MD 21201, USA.
| | | | | |
Collapse
|
65
|
Chakraborty A, Mandloi S, Lanczycki CJ, Panchenko AR, Chakrabarti S. SPEER-SERVER: a web server for prediction of protein specificity determining sites. Nucleic Acids Res 2012; 40:W242-8. [PMID: 22689646 PMCID: PMC3394334 DOI: 10.1093/nar/gks559] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Sites that show specific conservation patterns within subsets of proteins in a protein family are likely to be involved in the development of functional specificity. These sites, generally termed specificity determining sites (SDS), might play a crucial role in binding to a specific substrate or proteins. Identification of SDS through experimental techniques is a slow, difficult and tedious job. Hence, it is very important to develop efficient computational methods that can more expediently identify SDS. Herein, we present Specificity prediction using amino acids’ Properties, Entropy and Evolution Rate (SPEER)-SERVER, a web server that predicts SDS by analyzing quantitative measures of the conservation patterns of protein sites based on their physico-chemical properties and the heterogeneity of evolutionary changes between and within the protein subfamilies. This web server provides an improved representation of results, adds useful input and output options and integrates a wide range of analysis and data visualization tools when compared with the original standalone version of the SPEER algorithm. Extensive benchmarking finds that SPEER-SERVER exhibits sensitivity and precision performance that, on average, meets or exceeds that of other currently available methods. SPEER-SERVER is available at http://www.hpppi.iicb.res.in/ss/.
Collapse
Affiliation(s)
- Abhijit Chakraborty
- Structural Biology and Bioinformatics Division, Council for Scientific and Industrial Research (CSIR)-Indian Institute of Chemical Biology (IICB), Kolkata, West Bengal 700032, India
| | | | | | | | | |
Collapse
|
66
|
Padawer T, Leighty RE, Wang D. Duplicate gene enrichment and expression pattern diversification in multicellularity. Nucleic Acids Res 2012; 40:7597-605. [PMID: 22645319 PMCID: PMC3439886 DOI: 10.1093/nar/gks464] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
The enrichment of duplicate genes, and therefore paralogs (proteins coded by duplicate genes), in multicellular versus unicellular organisms enhances genomic functional innovation. This study quantitatively examined relationships among paralog enrichment, expression pattern diversification and multicellularity, aiming to better understand genomic basis of multicellularity. Paralog abundance in specific cells was compared with those in unicellular proteomes and the whole proteomes of multicellular organisms. The budding yeast, Saccharomyces cerevisiae and the nematode, Caenorhabditis elegans, for which the gene sets expressed in specific cells are available, were used as uni and multicellular models, respectively. Paralog count (K) distributions [P(k)] follow a power-law relationship [P(k) ∝ k−α] in the whole proteomes of both species and in specific C. elegans cells. The value of the constant α can be used as a gauge of paralog abundance; the higher the value, the lower the paralog abundance. The α-value is indeed lower in the whole proteome of C. elegans (1.74) than in S. cerevisiae (2.34), quantifying the enrichment of paralogs in multicellular species. We also found that the power-law relationship applies to the proteomes of specific C. elegans cells. Strikingly, values of α in specific cells are higher and comparable to that in S. cerevisiae. Thus, paralog abundance in specific cells is lower and comparable to that in unicellular species. Furthermore, how much the expression level of a gene fluctuates across different C. elegans cells correlates positively with its paralog count, which is further confirmed by human gene-expression patterns across different tissues. Taken together, these results quantitatively and mechanistically establish enrichment of paralogs with diversifying expression patterns as genomic and evolutionary basis of multicellularity.
Collapse
Affiliation(s)
- Timothy Padawer
- Department of Cell Biology, Microbiology and Molecular Biology, University of South Florida, BSF218, Tampa, FL 33620, USA
| | | | | |
Collapse
|
67
|
Zhang ZH, Bharatham K, Chee SMQ, Mihalek I. Cube-DB: detection of functional divergence in human protein families. Nucleic Acids Res 2012; 40:D490-4. [PMID: 22139934 PMCID: PMC3245124 DOI: 10.1093/nar/gkr1129] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2011] [Revised: 11/08/2011] [Accepted: 11/08/2011] [Indexed: 12/11/2022] Open
Abstract
Cube-DB is a database of pre-evaluated results for detection of functional divergence in human/vertebrate protein families. The analysis is organized around the nomenclature associated with the human proteins, but based on all currently available vertebrate genomes. Using full genomes enables us, through a mutual-best-hit strategy, to construct comparable taxonomical samples for all paralogues under consideration. Functional specialization is scored on the residue level according to two models of behavior after divergence: heterotachy and homotachy. In the first case, the positions on the protein sequence are scored highly if they are conserved in the reference group of orthologs, and overlap poorly with the residue type choice in the paralogs groups (such positions will also be termed functional determinants). The second model additionally requires conservation within each group of paralogs (functional discriminants). The scoring functions are phylogeny independent, but sensitive to the residue type similarity. The results are presented as a table of per-residue scores, and mapped onto related structure (when available) via browser-embedded visualization tool. They can also be downloaded as a spreadsheet table, and sessions for two additional molecular visualization tools. The database interface is available at http://epsf.bmad.bii.a-star.edu.sg/cube/db/html/home.html.
Collapse
Affiliation(s)
- Zong Hong Zhang
- Bioinformatics Institute 30 Biopolis Street, #07-01 Matrix, Singapore 138671 and School of Biological Sciences, Nanyang Technological University, 50 Nanyang Avenue, Singapore 63979
| | - Kavitha Bharatham
- Bioinformatics Institute 30 Biopolis Street, #07-01 Matrix, Singapore 138671 and School of Biological Sciences, Nanyang Technological University, 50 Nanyang Avenue, Singapore 63979
| | - Sharon M. Q. Chee
- Bioinformatics Institute 30 Biopolis Street, #07-01 Matrix, Singapore 138671 and School of Biological Sciences, Nanyang Technological University, 50 Nanyang Avenue, Singapore 63979
| | - Ivana Mihalek
- Bioinformatics Institute 30 Biopolis Street, #07-01 Matrix, Singapore 138671 and School of Biological Sciences, Nanyang Technological University, 50 Nanyang Avenue, Singapore 63979
| |
Collapse
|
68
|
Zhang Y, Zagnitko O, Rodionova I, Osterman A, Godzik A. The FGGY carbohydrate kinase family: insights into the evolution of functional specificities. PLoS Comput Biol 2011; 7:e1002318. [PMID: 22215998 PMCID: PMC3245297 DOI: 10.1371/journal.pcbi.1002318] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Accepted: 11/06/2011] [Indexed: 12/29/2022] Open
Abstract
Function diversification in large protein families is a major mechanism driving expansion of cellular networks, providing organisms with new metabolic capabilities and thus adding to their evolutionary success. However, our understanding of the evolutionary mechanisms of functional diversity in such families is very limited, which, among many other reasons, is due to the lack of functionally well-characterized sets of proteins. Here, using the FGGY carbohydrate kinase family as an example, we built a confidently annotated reference set (CARS) of proteins by propagating experimentally verified functional assignments to a limited number of homologous proteins that are supported by their genomic and functional contexts. Then, we analyzed, on both the phylogenetic and the molecular levels, the evolution of different functional specificities in this family. The results show that the different functions (substrate specificities) encoded by FGGY kinases have emerged only once in the evolutionary history following an apparently simple divergent evolutionary model. At the same time, on the molecular level, one isofunctional group (L-ribulokinase, AraB) evolved at least two independent solutions that employed distinct specificity-determining residues for the recognition of a same substrate (L-ribulose). Our analysis provides a detailed model of the evolution of the FGGY kinase family. It also shows that only combined molecular and phylogenetic approaches can help reconstruct a full picture of functional diversifications in such diverse families. The protein universe is under constant expansion and is reshaping through multiple duplication, gene losses, lateral gene transfers, and speciation events. Large and functionally heterogeneous protein families that evolve through these processes contain conserved motifs and structural scaffolds, yet their individual members often perform diverse functions. For this reason, the exact functional annotation for their individual members is difficult without detailed analysis of the family. In our study, we performed such a detailed analysis of a particularly heterogeneous FGGY kinase family through the integration of several computational approaches. The combination of phylogenetic and molecular approaches allowed us to precisely assign function to hundreds of proteins, thus reconstructing carbohydrate utilization pathways in almost 200 bacterial species. This analysis also showed that different molecular mechanisms could evolve within a group of isofunctional proteins. Moreover, based on our experience with this specific protein family of FGGY kinases, we believe that our approach can be generally adapted for the analyses of other protein families and that the accumulation of evolutionary models for various families would lead to a better understanding of the protein universe.
Collapse
Affiliation(s)
- Ying Zhang
- Graduate School of Biomedical Sciences, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- Program on Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Olga Zagnitko
- Fellowship for Interpretation of Genomes, Burr Ridge, Illinois, United States of America
| | - Irina Rodionova
- Program on Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Andrei Osterman
- Program on Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- * E-mail: (AO); (AG)
| | - Adam Godzik
- Program on Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- * E-mail: (AO); (AG)
| |
Collapse
|
69
|
Barrantes-Reynolds R, Wallace SS, Bond JP. Using shifts in amino acid frequency and substitution rate to identify latent structural characters in base-excision repair enzymes. PLoS One 2011; 6:e25246. [PMID: 21998646 PMCID: PMC3188539 DOI: 10.1371/journal.pone.0025246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2010] [Accepted: 08/30/2011] [Indexed: 12/30/2022] Open
Abstract
Protein evolution includes the birth and death of structural motifs. For example, a zinc finger or a salt bridge may be present in some, but not all, members of a protein family. We propose that such transitions are manifest in sequence phylogenies as concerted shifts in substitution rates of amino acids that are neighbors in a representative structure. First, we identified rate shifts in a quartet from the Fpg/Nei family of base excision repair enzymes using a method developed by Xun Gu and coworkers. We found the shifts to be spatially correlated, more precisely, associated with a flexible loop involved in bacterial Fpg substrate specificity. Consistent with our result, sequences and structures provide convincing evidence that this loop plays a very different role in other family members. Second, then, we developed a method for identifying latent protein structural characters (LSC) given a set of homologous sequences based on Gu's method and proximity in a high-resolution structure. Third, we identified LSC and assigned states of LSC to clades within the Fpg/Nei family of base excision repair enzymes. We describe seven LSC; an accompanying Proteopedia page (http://proteopedia.org/wiki/index.php/Fpg_Nei_Protein_Family) describes these in greater detail and facilitates 3D viewing. The LSC we found provided a surprisingly complete picture of the interaction of the protein with the DNA capturing familiar examples, such as a Zn finger, as well as more subtle interactions. Their preponderance is consistent with an important role as phylogenetic characters. Phylogenetic inference based on LSC provided convincing evidence of independent losses of Zn fingers. Structural motifs may serve as important phylogenetic characters and modeling transitions involving structural motifs may provide a much deeper understanding of protein evolution.
Collapse
Affiliation(s)
- Ramiro Barrantes-Reynolds
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, United States of America
| | - Susan S. Wallace
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, United States of America
| | - Jeffrey P. Bond
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont, United States of America
- * E-mail:
| |
Collapse
|
70
|
Determinants, discriminants, conserved residues--a heuristic approach to detection of functional divergence in protein families. PLoS One 2011; 6:e24382. [PMID: 21931701 PMCID: PMC3171465 DOI: 10.1371/journal.pone.0024382] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2011] [Accepted: 08/08/2011] [Indexed: 11/19/2022] Open
Abstract
In this work, belonging to the field of comparative analysis of protein sequences, we focus on detection of functional specialization on the residue level. As the input, we take a set of sequences divided into groups of orthologues, each group known to be responsible for a different function. This provides two independent pieces of information: within group conservation and overlap in amino acid type across groups. We build our discussion around the set of scoring functions that keep the two separated and the source of the signal easy to trace back to its source.We propose a heuristic description of functional divergence that includes residue type exchangeability, both in the conservation and in the overlap measure, and does not make any assumptions on the rate of evolution in the groups other than the one under consideration. Residue types acceptable at a certain position within an orthologous group are described as a distribution which evolves in time, starting from a single ancestral type, and is subject to constraints that can be inferred only indirectly. To estimate the strength of the constraints, we compare the observed degrees of conservation and overlap with those expected in the hypothetical case of a freely evolving distribution.Our description matches the experiment well, but we also conclude that any attempt to capture the evolutionary behavior of specificity determining residues in terms of a scalar function will be tentative, because no single model can cover the variety of evolutionary behavior such residues exhibit. Especially, models expecting the same type of evolutionary behavior across functionally divergent groups tend to miss a portion of information otherwise retrievable by the conservation and overlap measures they use.
Collapse
|
71
|
Gaston D, Susko E, Roger AJ. A phylogenetic mixture model for the identification of functionally divergent protein residues. ACTA ACUST UNITED AC 2011; 27:2655-63. [PMID: 21840876 DOI: 10.1093/bioinformatics/btr470] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples. RESULTS We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions. AVAILABILITY http://rogerlab.biochem.dal.ca/Software CONTACT andrew.roger@dal.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Gaston
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Canada, B3H 1X5
| | | | | |
Collapse
|
72
|
Surveying the manifold divergence of an entire protein class for statistical clues to underlying biochemical mechanisms. Stat Appl Genet Mol Biol 2011; 10:Article 36. [PMID: 22331370 DOI: 10.2202/1544-6115.1666] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Certain residues have no known function yet are co-conserved across distantly related protein families and diverse organisms, suggesting that they perform critical roles associated with as-yet-unidentified molecular properties and mechanisms. This raises the question of how to obtain additional clues regarding these mysterious biochemical phenomena with a view to formulating experimentally testable hypotheses. One approach is to access the implicit biochemical information encoded within the vast amount of genomic sequence data now becoming available. Here, a new Gibbs sampling strategy is formulated and implemented that can partition hundreds of thousands of sequences within a major protein class into multiple, functionally-divergent categories based on those pattern residues that best discriminate between categories. The sampler precisely defines the partition and pattern for each category by explicitly modeling unrelated, non-functional and related-yet-divergent proteins that would otherwise obscure the analysis. To aid biological interpretation, auxiliary routines can characterize pattern residues within available crystal structures and identify those structures most likely to shed light on the roles of pattern residues. This approach can be used to define and annotate automatically subgroup-specific conserved domain profiles based on statistically-rigorous empirical criteria rather than on the subjective and labor-intensive process of manual curation. Incorporating such profiles into domain database search sites (such as the NCBI BLAST site) will provide biologists with previously inaccessible molecular information useful for hypothesis generation and experimental design. Analyses of P-loop GTPases and of AAA+ ATPases illustrate the sampler's ability to obtain such information.
Collapse
|
73
|
Carroll SM, Ortlund EA, Thornton JW. Mechanisms for the evolution of a derived function in the ancestral glucocorticoid receptor. PLoS Genet 2011; 7:e1002117. [PMID: 21698144 PMCID: PMC3116920 DOI: 10.1371/journal.pgen.1002117] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2011] [Accepted: 04/19/2011] [Indexed: 11/19/2022] Open
Abstract
Understanding the genetic, structural, and biophysical mechanisms that caused protein functions to evolve is a central goal of molecular evolutionary studies. Ancestral sequence reconstruction (ASR) offers an experimental approach to these questions. Here we use ASR to shed light on the earliest functions and evolution of the glucocorticoid receptor (GR), a steroid-activated transcription factor that plays a key role in the regulation of vertebrate physiology. Prior work showed that GR and its paralog, the mineralocorticoid receptor (MR), duplicated from a common ancestor roughly 450 million years ago; the ancestral functions were largely conserved in the MR lineage, but the functions of GRs-reduced sensitivity to all hormones and increased selectivity for glucocorticoids-are derived. Although the mechanisms for the evolution of glucocorticoid specificity have been identified, how reduced sensitivity evolved has not yet been studied. Here we report on the reconstruction of the deepest ancestor in the GR lineage (AncGR1) and demonstrate that GR's reduced sensitivity evolved before the acquisition of restricted hormone specificity, shortly after the GR-MR split. Using site-directed mutagenesis, X-ray crystallography, and computational analyses of protein stability to recapitulate and determine the effects of historical mutations, we show that AncGR1's reduced ligand sensitivity evolved primarily due to three key substitutions. Two large-effect mutations weakened hydrogen bonds and van der Waals interactions within the ancestral protein, reducing its stability. The degenerative effect of these two mutations is extremely strong, but a third permissive substitution, which has no apparent effect on function in the ancestral background and is likely to have occurred first, buffered the effects of the destabilizing mutations. Taken together, our results highlight the potentially creative role of substitutions that partially degrade protein structure and function and reinforce the importance of permissive mutations in protein evolution.
Collapse
Affiliation(s)
- Sean Michael Carroll
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Eric A. Ortlund
- Department of Biochemistry, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Joseph W. Thornton
- Howard Hughes Medical Institute, Center for Ecology and Evolutionary Biology, University of Oregon, Eugene, Oregon, United States of America
| |
Collapse
|
74
|
García-Torres I, Cabrera N, Torres-Larios A, Rodríguez-Bolaños M, Díaz-Mazariegos S, Gómez-Puyou A, Perez-Montfort R. Identification of amino acids that account for long-range interactions in two triosephosphate isomerases from pathogenic trypanosomes. PLoS One 2011; 6:e18791. [PMID: 21533154 PMCID: PMC3078909 DOI: 10.1371/journal.pone.0018791] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2010] [Accepted: 03/18/2011] [Indexed: 11/18/2022] Open
Abstract
For a better comprehension of the structure-function relationship in proteins it is necessary to identify the amino acids that are relevant for measurable protein functions. Because of the numerous contacts that amino acids establish within proteins and the cooperative nature of their interactions, it is difficult to achieve this goal. Thus, the study of protein-ligand interactions is usually focused on local environmental structural differences. Here, using a pair of triosephosphate isomerase enzymes with extremely high homology from two different organisms, we demonstrate that the control of a seventy-fold difference in reactivity of the interface cysteine is located in several amino acids from two structurally unrelated regions that do not contact the cysteine sensitive to the sulfhydryl reagent methylmethane sulfonate, nor the residues in its immediate vicinity. The change in reactivity is due to an increase in the apparent pKa of the interface cysteine produced by the mutated residues. Our work, which involved grafting systematically portions of one protein into the other protein, revealed unsuspected and multisite long-range interactions that modulate the properties of the interface cysteines and has general implications for future studies on protein structure-function relationships.
Collapse
Affiliation(s)
- Itzhel García-Torres
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Circuito Exterior S/N, Ciudad Universitaria, México DF, Mexico
| | - Nallely Cabrera
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Circuito Exterior S/N, Ciudad Universitaria, México DF, Mexico
| | - Alfredo Torres-Larios
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Circuito Exterior S/N, Ciudad Universitaria, México DF, Mexico
| | - Mónica Rodríguez-Bolaños
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Circuito Exterior S/N, Ciudad Universitaria, México DF, Mexico
| | - Selma Díaz-Mazariegos
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Circuito Exterior S/N, Ciudad Universitaria, México DF, Mexico
| | - Armando Gómez-Puyou
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Circuito Exterior S/N, Ciudad Universitaria, México DF, Mexico
| | - Ruy Perez-Montfort
- Departamento de Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Circuito Exterior S/N, Ciudad Universitaria, México DF, Mexico
- * E-mail:
| |
Collapse
|
75
|
González Montoro A, Chumpen Ramirez S, Quiroga R, Valdez Taubas J. Specificity of transmembrane protein palmitoylation in yeast. PLoS One 2011; 6:e16969. [PMID: 21383992 PMCID: PMC3044718 DOI: 10.1371/journal.pone.0016969] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2010] [Accepted: 01/11/2011] [Indexed: 11/29/2022] Open
Abstract
Many proteins are modified after their synthesis, by the addition of a lipid molecule to one or more cysteine residues, through a thioester bond. This modification is called S-acylation, and more commonly palmitoylation. This reaction is carried out by a family of enzymes, called palmitoyltransferases (PATs), characterized by the presence of a conserved 50- aminoacids domain called “Asp-His-His-Cys- Cysteine Rich Domain” (DHHC-CRD). There are 7 members of this family in the yeast Saccharomyces cerevisiae, and each of these proteins is thought to be responsible for the palmitoylation of a subset of substrates. Substrate specificity of PATs, however, is not yet fully understood. Several yeast PATs seem to have overlapping specificity, and it has been proposed that the machinery responsible for palmitoylating peripheral membrane proteins in mammalian cells, lacks specificity altogether. Here we investigate the specificity of transmembrane protein palmitoylation in S. cerevisiae, which is carried out predominantly by two PATs, Swf1 and Pfa4. We show that palmitoylation of transmembrane substrates requires dedicated PATs, since other yeast PATs are mostly unable to perform Swf1 or Pfa4 functions, even when overexpressed. Furthermore, we find that Swf1 is highly specific for its substrates, as it is unable to substitute for other PATs. To identify where Swf1 specificity lies, we carried out a bioinformatics survey to identify amino acids responsible for the determination of specificity or Specificity Determination Positions (SDPs) and showed experimentally, that mutation of the two best SDP candidates, A145 and K148, results in complete and partial loss of function, respectively. These residues are located within the conserved catalytic DHHC domain suggesting that it could also be involved in the determination of specificity. Finally, we show that modifying the position of the cysteines in Tlg1, a Swf1 substrate, results in lack of palmitoylation, as expected for a highly specific enzymatic reaction.
Collapse
Affiliation(s)
- Ayelén González Montoro
- Centro de Investigaciones en Química Biológica de Córdoba, CIQUIBIC (UNC-CONICET), Departamento de Química Biológica, Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Córdoba, Argentina
| | - Sabrina Chumpen Ramirez
- Centro de Investigaciones en Química Biológica de Córdoba, CIQUIBIC (UNC-CONICET), Departamento de Química Biológica, Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Córdoba, Argentina
| | - Rodrigo Quiroga
- Centro de Investigaciones en Química Biológica de Córdoba, CIQUIBIC (UNC-CONICET), Departamento de Química Biológica, Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Córdoba, Argentina
| | - Javier Valdez Taubas
- Centro de Investigaciones en Química Biológica de Córdoba, CIQUIBIC (UNC-CONICET), Departamento de Química Biológica, Facultad de Ciencias Químicas, Universidad Nacional de Córdoba, Córdoba, Argentina
- * E-mail:
| |
Collapse
|
76
|
Neuwald AF. Bayesian classification of residues associated with protein functional divergence: Arf and Arf-like GTPases. Biol Direct 2010; 5:66. [PMID: 21129209 PMCID: PMC3012027 DOI: 10.1186/1745-6150-5-66] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Accepted: 12/03/2010] [Indexed: 11/22/2022] Open
Abstract
Background Certain residues within proteins are highly conserved across very distantly related organisms, yet their (presumably critical) structural or mechanistic roles are completely unknown. To obtain clues regarding such residues within Arf and Arf-like (Arf/Arl) GTPases--which function as on/off switches regulating vesicle trafficking, phospholipid metabolism and cytoskeletal remodeling--I apply a new sampling procedure for comparative sequence analysis, termed multiple category Bayesian Partitioning with Pattern Selection (mcBPPS). Results The mcBPPS sampler classified sequences within the entire P-loop GTPase class into multiple categories by identifying those evolutionarily-divergent residues most likely to be responsible for functional specialization. Here I focus on categories of residues that most distinguish various Arf/Arl GTPases from other GTPases. This identified residues whose specific roles have been previously proposed (and in some cases corroborated experimentally and that thus serve as positive controls), as well as several categories of co-conserved residues whose possible roles are first hinted at here. For example, Arf/Arl/Sar GTPases are most distinguished from other GTPases by a conserved aspartate residue within the phosphate binding loop (P-loop) and by co-conserved residues nearby that, together, can form a network of salt-bridge and hydrogen bond interactions centered on the GTPase active site. Residues corresponding to an N-[VI] motif that is conserved within Arf/Arl GTPases may play a role in the interswitch toggle characteristic of the Arf family, whereas other, co-conserved residues may modulate the flexibility of the guanine binding loop. Arl8 GTPases conserve residues that strikingly diverge from those typically found in other Arf/Arl GTPases and that form structural interactions suggestive of a novel interswitch toggle mechanism. Conclusions This analysis suggests specific mutagenesis experiments to explore mechanisms underlying GTP hydrolysis, nucleotide exchange and interswitch toggling within Arf/Arl GTPases. More generally, it illustrates how the mcBPPS sampler can complement traditional evolutionary analyses by providing an objective, quantitative and statistically rigorous way to explore protein functional-divergence in molecular detail. Because the sampler classifies the input sequences at the same time, it can be used to generate subgroup profiles, in which functionally-divergent categories of residues are annotated automatically. Reviewers This article was reviewed by Frank Eisenhaber, L Aravind and Daniel Gaston (nominated by Eric Bapteste). For the full reviews, go to the Reviewers' comments section.
Collapse
Affiliation(s)
- Andrew F Neuwald
- Department of Biochemistry & Molecular Biology, Institute for Genome Sciences, University of Maryland School of Medicine, BioPark II, Room 617, 801 West Baltimore St, Baltimore, MD 21201, USA.
| |
Collapse
|
77
|
Dessailly BH, Redfern OC, Cuff AL, Orengo CA. Detailed analysis of function divergence in a large and diverse domain superfamily: toward a refined protocol of function classification. Structure 2010; 18:1522-35. [PMID: 21070951 PMCID: PMC3023962 DOI: 10.1016/j.str.2010.08.017] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2010] [Revised: 08/06/2010] [Accepted: 08/13/2010] [Indexed: 10/18/2022]
Abstract
Some superfamilies contain large numbers of protein domains with very different functions. The ability to refine the functional classification of domains within these superfamilies is necessary for better understanding the evolution of functions and to guide function prediction of new relatives. To achieve this, a suitable starting point is the detailed analysis of functional divisions and mechanisms of functional divergence in a single superfamily. Here, we present such a detailed analysis in the superfamily of HUP domains. A biologically meaningful functional classification of HUP domains is obtained manually. Mechanisms of function diversification are investigated in detail using this classification. We observe that structural motifs play an important role in shaping broad functional divergence, whereas residue-level changes shape diversity at a more specific level. In parallel we examine the ability of an automated protocol to capture the biologically meaningful classification, with a view to automatically extending this classification in the future.
Collapse
Affiliation(s)
- Benoit H Dessailly
- Department of Structural and Molecular Biology, University College of London, Gower Street, London WC1E6BT, UK.
| | | | | | | |
Collapse
|
78
|
Slama P, Geman D. Identification of family-determining residues in PHD fingers. Nucleic Acids Res 2010; 39:1666-79. [PMID: 21059680 PMCID: PMC3061080 DOI: 10.1093/nar/gkq947] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Histone modifications are fundamental to chromatin structure and transcriptional regulation, and are recognized by a limited number of protein folds. Among these folds are PHD fingers, which are present in most chromatin modification complexes. To date, about 15 PHD finger domains have been structurally characterized, whereas hundreds of different sequences have been identified. Consequently, an important open problem is to predict structural features of a PHD finger knowing only its sequence. Here, we classify PHD fingers into different groups based on the analysis of residue–residue co-evolution in their sequences. We measure the degree to which fixing the amino acid type at one position modifies the frequencies of amino acids at other positions. We then detect those position/amino acid combinations, or ‘conditions’, which have the strongest impact on other sequence positions. Clustering these strong conditions yields four families, providing informative labels for PHD finger sequences. Existing experimental results, as well as docking calculations performed here, reveal that these families indeed show discrepancies at the functional level. Our method should facilitate the functional characterization of new PHD fingers, as well as other protein families, solely based on sequence information.
Collapse
Affiliation(s)
- Patrick Slama
- Institute for Computational Medicine and Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA.
| | | |
Collapse
|
79
|
de Melo-Minardi RC, Bastard K, Artiguenave F. Identification of subfamily-specific sites based on active sites modeling and clustering. ACTA ACUST UNITED AC 2010; 26:3075-82. [PMID: 20980272 DOI: 10.1093/bioinformatics/btq595] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Current computational approaches to function prediction are mostly based on protein sequence classification and transfer of annotation from known proteins to their closest homologous sequences relying on the orthology concept of function conservation. This approach suffers a major weakness: annotation reliability depends on global sequence similarity to known proteins and is poorly efficient for enzyme superfamilies that catalyze different reactions. Structural biology offers a different strategy to overcome the problem of annotation by adding information about protein 3D structures. This information can be used to identify amino acids located in active sites, focusing on detection of functional polymorphisms residues in an enzyme superfamily. Structural genomics programs are providing more and more novel protein structures at a high-throughput rate. However, there is still a huge gap between the number of sequences and available structures. Computational methods, such as homology modeling provides reliable approaches to bridge this gap and could be a new precise tool to annotate protein functions. RESULTS Here, we present Active Sites Modeling and Clustering (ASMC) method, a novel unsupervised method to classify sequences using structural information of protein pockets. ASMC combines homology modeling of family members, structural alignment of modeled active sites and a subsequent hierarchical conceptual classification. Comparison of profiles obtained from computed clusters allows the identification of residues correlated to subfamily function divergence, called specificity determining positions. ASMC method has been validated on a benchmark of 42 Pfam families for which previous resolved holo-structures were available. ASMC was also applied to several families containing known protein structures and comprehensive functional annotations. We will discuss how ASMC improves annotation and understanding of protein families functions by giving some specific illustrative examples on nucleotidyl cyclases, protein kinases and serine proteases. AVAILABILITY http://www.genoscope.fr/ASMC/.
Collapse
|
80
|
Mondal S, Nagao C, Mizuguchi K. Detecting subtle functional differences in ketopantoate reductase and related enzymes using a rule-based approach with sequence-structure homology recognition scores. Protein Eng Des Sel 2010; 23:859-69. [PMID: 20876192 DOI: 10.1093/protein/gzq062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Ketopatoate reductase (KPR) is the second enzyme in the pantothenate (vitamin B(5)) biosynthesis pathway, an essential metabolic pathway identified as a potential target for new antimicrobials. The sequence similarity among putative KPRs is limited and KPR itself belongs to a large superfamily of 6-phosphogluconate dehydrogenases. Therefore, it is necessary to discriminate between true and other enzymes. In this paper, we describe a systematic analysis of putative KPRs in the context of this superfamily. Detailed structural analysis allowed us to define key residues for KPR activity and we classified eight structural genomics structures of the KPR family into four functional subclasses. We proposed a semi-automatic protocol, using sequence-structure homology recognition scores, for assigning KPR and related proteins to these subclasses and applied it to a representative set of 103 completely sequenced bacterial genomes. A similar approach can be applied to other enzyme families, which would aid the correct identification of drug targets and help design novel specific inhibitors.
Collapse
Affiliation(s)
- Sukanta Mondal
- National Institute of Biomedical Innovation, 7-6-8 Saito-Asagi, Ibaraki, Osaka, Japan
| | | | | |
Collapse
|
81
|
Fromer M, Linial M. Exposing the co-adaptive potential of protein-protein interfaces through computational sequence design. ACTA ACUST UNITED AC 2010; 26:2266-72. [PMID: 20679332 DOI: 10.1093/bioinformatics/btq412] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
MOTIVATION In nature, protein-protein interactions are constantly evolving under various selective pressures. Nonetheless, it is expected that crucial interactions are maintained through compensatory mutations between interacting proteins. Thus, many studies have used evolutionary sequence data to extract such occurrences of correlated mutation. However, this research is confounded by other evolutionary pressures that contribute to sequence covariance, such as common ancestry. RESULTS Here, we focus exclusively on the compensatory mutations deriving from physical protein interactions, by performing large-scale computational mutagenesis experiments for >260 protein-protein interfaces. We investigate the potential for co-adaptability present in protein pairs that are always found together in nature (obligate) and those that are occasionally in complex (transient). By modeling each complex both in bound and unbound forms, we find that naturally transient complexes possess greater relative capacity for correlated mutation than obligate complexes, even when differences in interface size are taken into account.
Collapse
Affiliation(s)
- Menachem Fromer
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | | |
Collapse
|
82
|
Mazin PV, Gelfand MS, Mironov AA, Rakhmaninova AB, Rubinov AR, Russell RB, Kalinina OV. An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies. Algorithms Mol Biol 2010; 5:29. [PMID: 20633297 PMCID: PMC2914642 DOI: 10.1186/1748-7188-5-29] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2009] [Accepted: 07/15/2010] [Indexed: 11/30/2022] Open
Abstract
Background Recent progress in sequencing and 3 D structure determination techniques stimulated development of approaches aimed at more precise annotation of proteins, that is, prediction of exact specificity to a ligand or, more broadly, to a binding partner of any kind. Results We present a method, SDPclust, for identification of protein functional subfamilies coupled with prediction of specificity-determining positions (SDPs). SDPclust predicts specificity in a phylogeny-independent stochastic manner, which allows for the correct identification of the specificity for proteins that are separated on a phylogenetic tree, but still bind the same ligand. SDPclust is implemented as a Web-server http://bioinf.fbb.msu.ru/SDPfoxWeb/ and a stand-alone Java application available from the website. Conclusions SDPclust performs a simultaneous identification of specificity determinants and specificity groups in a statistically robust and phylogeny-independent manner.
Collapse
|
83
|
Brandt BW, Feenstra KA, Heringa J. Multi-Harmony: detecting functional specificity from sequence alignment. Nucleic Acids Res 2010; 38:W35-40. [PMID: 20525785 PMCID: PMC2896201 DOI: 10.1093/nar/gkq415] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein–protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww.
Collapse
Affiliation(s)
- Bernd W Brandt
- Centre for Integrative Bioinformatics, VU University Amsterdam, De Boelelaan 1081A, 1081HV Amsterdam, The Netherlands
| | | | | |
Collapse
|
84
|
Harms MJ, Thornton JW. Analyzing protein structure and function using ancestral gene reconstruction. Curr Opin Struct Biol 2010; 20:360-6. [PMID: 20413295 PMCID: PMC2916957 DOI: 10.1016/j.sbi.2010.03.005] [Citation(s) in RCA: 161] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2010] [Accepted: 03/22/2010] [Indexed: 01/06/2023]
Abstract
Protein families with functionally diverse members can illuminate the structural determinants of protein function and the process by which protein structure and function evolve. To identify the key amino acid changes that differentiate one family member from another, most studies have taken a horizontal approach, swapping candidate residues between present-day family members. This approach has often been stymied, however, by the fact that shifts in function often require multiple interacting mutations; chimeric proteins are often nonfunctional, either because one lineage has amassed mutations that are incompatible with key residues that conferred a new function on other lineages, or because it lacks mutations required to support those key residues. These difficulties can be overcome by using a vertical strategy, which reconstructs ancestral genes and uses them as the appropriate background in which to study the effects of historical mutations on functional diversification. In this review, we discuss the advantages of the vertical strategy and highlight several exemplary studies that have used ancestral gene reconstruction to reveal the molecular underpinnings of protein structure, function, and evolution.
Collapse
Affiliation(s)
- Michael J. Harms
- Howard Hughes Medical Institute, Center for Ecology and Evolutionary Biology, 335 Pacific Hall, 5289 University of Oregon, Eugene, OR 97403, 541-346-0328 (ph), 541-346-2364 (fax)
| | - Joseph W. Thornton
- Howard Hughes Medical Institute, Center for Ecology and Evolutionary Biology, 335 Pacific Hall, 5289 University of Oregon, Eugene, OR 97403, 541-346-0328 (ph), 541-346-2364 (fax)
| |
Collapse
|
85
|
Wass MN, Kelley LA, Sternberg MJE. 3DLigandSite: predicting ligand-binding sites using similar structures. Nucleic Acids Res 2010; 38:W469-73. [PMID: 20513649 PMCID: PMC2896164 DOI: 10.1093/nar/gkq406] [Citation(s) in RCA: 474] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
3DLigandSite is a web server for the prediction of ligand-binding sites. It is based upon successful manual methods used in the eighth round of the Critical Assessment of techniques for protein Structure Prediction (CASP8). 3DLigandSite utilizes protein-structure prediction to provide structural models for proteins that have not been solved. Ligands bound to structures similar to the query are superimposed onto the model and used to predict the binding site. In benchmarking against the CASP8 targets 3DLigandSite obtains a Matthew’s correlation co-efficient (MCC) of 0.64, and coverage and accuracy of 71 and 60%, respectively, similar results to our manual performance in CASP8. In further benchmarking using a large set of protein structures, 3DLigandSite obtains an MCC of 0.68. The web server enables users to submit either a query sequence or structure. Predictions are visually displayed via an interactive Jmol applet. 3DLigandSite is available for use at http://www.sbg.bio.ic.ac.uk/3dligandsite.
Collapse
Affiliation(s)
- Mark N Wass
- Structural Bioinformatics Group, Centre for Bioinformatics, Imperial College London, London, SW7 2AZ, UK
| | | | | |
Collapse
|
86
|
Horan K, Shelton CR, Girke T. Predicting conserved protein motifs with Sub-HMMs. BMC Bioinformatics 2010; 11:205. [PMID: 20420695 PMCID: PMC2879284 DOI: 10.1186/1471-2105-11-205] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2009] [Accepted: 04/26/2010] [Indexed: 11/16/2022] Open
Abstract
Background Profile HMMs (hidden Markov models) provide effective methods for modeling the conserved regions of protein families. A limitation of the resulting domain models is the difficulty to pinpoint their much shorter functional sub-features, such as catalytically relevant sequence motifs in enzymes or ligand binding signatures of receptor proteins. Results To identify these conserved motifs efficiently, we propose a method for extracting the most information-rich regions in protein families from their profile HMMs. The method was used here to predict a comprehensive set of sub-HMMs from the Pfam domain database. Cross-validations with the PROSITE and CSA databases confirmed the efficiency of the method in predicting most of the known functionally relevant motifs and residues. At the same time, 46,768 novel conserved regions could be predicted. The data set also allowed us to link at least 461 Pfam domains of known and unknown function by their common sub-HMMs. Finally, the sub-HMM method showed very promising results as an alternative search method for identifying proteins that share only short sequence similarities. Conclusions Sub-HMMs extend the application spectrum of profile HMMs to motif discovery. Their most interesting utility is the identification of the functionally relevant residues in proteins of known and unknown function. Additionally, sub-HMMs can be used for highly localized sequence similarity searches that focus on shorter conserved features rather than entire domains or global similarities. The motif data generated by this study is a valuable knowledge resource for characterizing protein functions in the future.
Collapse
Affiliation(s)
- Kevin Horan
- Department of Computer Science and Engineering, University of California Riverside, Riverside, California, USA
| | | | | |
Collapse
|
87
|
Field SF, Matz MV. Retracing evolution of red fluorescence in GFP-like proteins from Faviina corals. Mol Biol Evol 2010; 27:225-33. [PMID: 19793832 PMCID: PMC2877551 DOI: 10.1093/molbev/msp230] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Proteins of the green fluorescent protein family represent a convenient experimental model to study evolution of novelty at the molecular level. Here, we focus on the origin of Kaede-like red fluorescent proteins characteristic of the corals of the Faviina suborder. We demonstrate, using an original approach involving resurrection and analysis of the library of possible evolutionary intermediates, that it takes on the order of 12 mutations, some of which strongly interact epistatically, to fully recapitulate the evolution of a red fluorescent phenotype from the ancestral green. Five of the identified mutations would not have been found without the help of ancestral reconstruction, because the corresponding site states are shared between extant red and green proteins due to their recent descent from a dual-function common ancestor. Seven of the 12 mutations affect residues that are not in close contact with the chromophore and thus must exert their effect indirectly through adjustments of the overall protein fold; the relevance of these mutations could not have been anticipated from the purely theoretical analysis of the protein's structure. Our results introduce a powerful experimental approach for comparative analysis of functional specificity in protein families even in the cases of pronounced epistasis, provide foundation for the detailed studies of evolutionary trajectories leading to novelty and complexity, and will help rational modification of existing fluorescent labels.
Collapse
Affiliation(s)
| | - Mikhail V. Matz
- Section of Integrative Biology, University of Texas at Austin
| |
Collapse
|
88
|
Röttig M, Rausch C, Kohlbacher O. Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families. PLoS Comput Biol 2010; 6:e1000636. [PMID: 20072606 PMCID: PMC2796266 DOI: 10.1371/journal.pcbi.1000636] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2009] [Accepted: 12/08/2009] [Indexed: 12/25/2022] Open
Abstract
An important aspect of the functional annotation of enzymes is not only the type of reaction catalysed by an enzyme, but also the substrate specificity, which can vary widely within the same family. In many cases, prediction of family membership and even substrate specificity is possible from enzyme sequence alone, using a nearest neighbour classification rule. However, the combination of structural information and sequence information can improve the interpretability and accuracy of predictive models. The method presented here, Active Site Classification (ASC), automatically extracts the residues lining the active site from one representative three-dimensional structure and the corresponding residues from sequences of other members of the family. From a set of representatives with known substrate specificity, a Support Vector Machine (SVM) can then learn a model of substrate specificity. Applied to a sequence of unknown specificity, the SVM can then predict the most likely substrate. The models can also be analysed to reveal the underlying structural reasons determining substrate specificities and thus yield valuable insights into mechanisms of enzyme specificity. We illustrate the high prediction accuracy achieved on two benchmark data sets and the structural insights gained from ASC by a detailed analysis of the family of decarboxylating dehydrogenases. The ASC web service is available at http://asc.informatik.uni-tuebingen.de/. Prediction of enzymatic function of experimentally uncharacterised sequences is an important task in annotation of sequence databases. While all the information on an enzyme's specificity is necessarily contained in its sequence, prediction methods using sequence alone often do not perform all that well. Obviously, structural information – if available – will yield precious hints on the function and relative importance of specific sequence positions with respect to substrate specificity. We propose a novel method (Active Site Classification, ASC) for enzyme classification bringing together structural information and sequence information. Our ASC web server allows users to build predictive models in an automated way focused on relevant enzyme residues and furthermore to interpret the models to gain insights into the mechanism of enzyme substrate specificity.
Collapse
Affiliation(s)
- Marc Röttig
- Center for Bioinformatics Tübingen, Eberhard-Karls-Universität Tübingen, Tübingen, Germany.
| | | | | |
Collapse
|
89
|
Capra JA, Laskowski RA, Thornton JM, Singh M, Funkhouser TA. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput Biol 2009; 5:e1000585. [PMID: 19997483 PMCID: PMC2777313 DOI: 10.1371/journal.pcbi.1000585] [Citation(s) in RCA: 302] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2009] [Accepted: 10/30/2009] [Indexed: 11/20/2022] Open
Abstract
Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (http://compbio.cs.princeton.edu/concavity/). Protein molecules are ubiquitous in the cell; they perform thousands of functions crucial for life. Proteins accomplish nearly all of these functions by interacting with other molecules. These interactions are mediated by specific amino acid positions in the proteins. Knowledge of these “functional sites” is crucial for understanding the molecular mechanisms by which proteins carry out their functions; however, functional sites have not been identified in the vast majority of proteins. Here, we present ConCavity, a computational method that predicts small molecule binding sites in proteins by combining analysis of evolutionary sequence conservation and protein 3D structure. ConCavity provides significant improvement over previous approaches, especially on large, multi-chain proteins. In contrast to earlier methods which only predict entire binding sites, ConCavity makes specific predictions of positions in space that are likely to overlap ligand atoms and of residues that are likely to contact bound ligands. These predictions can be used to aid computational function prediction, to guide experimental protein analysis, and to focus computationally intensive techniques used in drug discovery.
Collapse
Affiliation(s)
- John A. Capra
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Roman A. Laskowski
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Janet M. Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- * E-mail: (MS); (TAF)
| | - Thomas A. Funkhouser
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- * E-mail: (MS); (TAF)
| |
Collapse
|
90
|
Kapoor K, Rehan M, Kaushiki A, Pasrija R, Lynn AM, Prasad R. Rational mutational analysis of a multidrug MFS transporter CaMdr1p of Candida albicans by employing a membrane environment based computational approach. PLoS Comput Biol 2009; 5:e1000624. [PMID: 20041202 PMCID: PMC2789324 DOI: 10.1371/journal.pcbi.1000624] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2009] [Accepted: 11/20/2009] [Indexed: 01/31/2023] Open
Abstract
CaMdr1p is a multidrug MFS transporter of pathogenic Candida albicans. An over-expression of the gene encoding this protein is linked to clinically encountered azole resistance. In-depth knowledge of the structure and function of CaMdr1p is necessary for an effective design of modulators or inhibitors of this efflux transporter. Towards this goal, in this study, we have employed a membrane environment based computational approach to predict the functionally critical residues of CaMdr1p. For this, information theoretic scores which are variants of Relative Entropy (Modified Relative Entropy RE(M)) were calculated from Multiple Sequence Alignment (MSA) by separately considering distinct physico-chemical properties of transmembrane (TM) and inter-TM regions. The residues of CaMdr1p with high RE(M) which were predicted to be significantly important were subjected to site-directed mutational analysis. Interestingly, heterologous host Saccharomyces cerevisiae, over-expressing these mutant variants of CaMdr1p wherein these high RE(M) residues were replaced by either alanine or leucine, demonstrated increased susceptibility to tested drugs. The hypersensitivity to drugs was supported by abrogated substrate efflux mediated by mutant variant proteins and was not attributed to their poor expression or surface localization. Additionally, by employing a distance plot from a 3D deduced model of CaMdr1p, we could also predict the role of these functionally critical residues in maintaining apparent inter-helical interactions to provide the desired fold for the proper functioning of CaMdr1p. Residues predicted to be critical for function across the family were also found to be vital from other previously published studies, implying its wider application to other membrane protein families.
Collapse
Affiliation(s)
- Khyati Kapoor
- School of Life Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Mohd Rehan
- School of Information Technology, Jawaharlal Nehru University, New Delhi, India
| | - Ajeeta Kaushiki
- School of Information Technology, Jawaharlal Nehru University, New Delhi, India
| | - Ritu Pasrija
- School of Life Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Andrew M. Lynn
- School of Information Technology, Jawaharlal Nehru University, New Delhi, India
| | - Rajendra Prasad
- School of Life Sciences, Jawaharlal Nehru University, New Delhi, India
| |
Collapse
|
91
|
Dou Y, Zheng X, Wang J. Several appropriate background distributions for entropy-based protein sequence conservation measures. J Theor Biol 2009; 262:317-22. [PMID: 19808039 DOI: 10.1016/j.jtbi.2009.09.030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Revised: 09/25/2009] [Accepted: 09/25/2009] [Indexed: 11/25/2022]
Abstract
Amino acid background distribution is an important factor for entropy-based methods which extract sequence conservation information from protein multiple sequence alignments (MSAs). However, MSAs are usually not large enough to allow a reliable observed background distribution. In this paper, we propose two new estimations of background distribution. One is an integration of the observed background distribution and the position-specific residue distribution, and the other is a normalized square root of observed background frequency. To validate these new background distributions, they are applied to the relative entropy model to find catalytic sites and ligand binding sites from protein MSAs. Experimental results show that they are superior to the observed background distribution in predicting functionally important residues.
Collapse
Affiliation(s)
- Yongchao Dou
- School of Mathematical Science, Dalian University of Technology, Dalian 116024, PR China
| | | | | |
Collapse
|
92
|
Chakrabarti S, Panchenko AR. Ensemble approach to predict specificity determinants: benchmarking and validation. BMC Bioinformatics 2009; 10:207. [PMID: 19573245 PMCID: PMC2716344 DOI: 10.1186/1471-2105-10-207] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2009] [Accepted: 07/02/2009] [Indexed: 11/29/2022] Open
Abstract
Background It is extremely important and challenging to identify the sites that are responsible for functional specification or diversification in protein families. In this study, a rigorous comparative benchmarking protocol was employed to provide a reliable evaluation of methods which predict the specificity determining sites. Subsequently, three best performing methods were applied to identify new potential specificity determining sites through ensemble approach and common agreement of their prediction results. Results It was shown that the analysis of structural characteristics of predicted specificity determining sites might provide the means to validate their prediction accuracy. For example, we found that for smaller distances it holds true that the more reliable the prediction method is, the closer predicted specificity determining sites are to each other and to the ligand. Conclusion We observed certain similarities of structural features between predicted and actual subsites which might point to their functional relevance. We speculate that majority of the identified potential specificity determining sites might be indirectly involved in specific interactions and could be ideal target for mutagenesis experiments.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.
| | | |
Collapse
|
93
|
Kalinina OV, Gelfand MS, Russell RB. Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics 2009; 10:174. [PMID: 19508719 PMCID: PMC2709924 DOI: 10.1186/1471-2105-10-174] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 06/09/2009] [Indexed: 11/16/2022] Open
Abstract
Background Predicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities. Results Here we present a novel method for functional site prediction based on identification of conserved positions, as well as those responsible for determining ligand specificity. We define Specificity-Determining Positions (SDPs), as those occupied by conserved residues within sub-groups of proteins in a family having a common specificity, but differ between groups, and are thus likely to account for specific recognition events. We benchmark the approach on enzyme families of known 3D structure with bound substrates, and find that in nearly all families residues predicted by SDPsite are in contact with the bound substrate, and that the addition of SDPs significantly improves functional site prediction accuracy. We apply SDPsite to various families of proteins containing known three-dimensional structures, but lacking clear functional annotations, and discusse several illustrative examples. Conclusion The results suggest a better means to predict functional details for the thousands of protein structures determined prior to a clear understanding of molecular function.
Collapse
|
94
|
Dessailly BH, Redfern OC, Cuff A, Orengo CA. Exploiting structural classifications for function prediction: towards a domain grammar for protein function. Curr Opin Struct Biol 2009; 19:349-56. [PMID: 19398323 PMCID: PMC2920418 DOI: 10.1016/j.sbi.2009.03.009] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2009] [Revised: 02/17/2009] [Accepted: 03/16/2009] [Indexed: 12/28/2022]
Abstract
The ability to assign function to proteins has become a major bottleneck for comprehensively understanding cellular mechanisms at the molecular level. Here we discuss the extent to which structural domain classifications can help in deciphering the complex relationship between the functions of proteins and their sequences and structures. Structural classifications are particularly helpful in understanding the mosaic manner in which new proteins and functions emerge through evolution. This is partly because they provide reliable and concrete domain definitions and enable the detection of very remote structural similarities and homologies. It is also because structural data can illuminate more clearly the mechanisms by which a broader functional repertoire can emerge during evolution.
Collapse
Affiliation(s)
- Benoît H. Dessailly
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, United Kingdom
| | - Oliver C. Redfern
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, United Kingdom
| | - Alison Cuff
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, United Kingdom
| | - Christine A. Orengo
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, United Kingdom
| |
Collapse
|
95
|
Abstract
Covariation between sites can arise due to a common evolutionary history. At the same time, structure and function of proteins play significant role in evolvability of different sites that are not directly connected with the common ancestry. The nature of forces which cause residues to coevolve is still not thoroughly understood, it is especially not clear how coevolutionary processes are related to functional diversification within protein families. We analyzed both functional and structural factors that might cause covariation of specificity determinants and showed that they more often participate in coevolutionary relationships with each other and other sites compared with functional sites and those sites that are not under strong functional constraints. We also found that protein sites with higher number of coevolutionary connections with other sites have a tendency to evolve slower. Our results indicate that in some cases coevolutionary connections exist between specificity sites that are located far away in space but are under similar functional constraints. Such correlated changes and compensations can be realized through the stepwise coevolutionary processes which in turn can shed light on the mechanisms of functional diversification.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - Anna R. Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| |
Collapse
|
96
|
Sankararaman S, Sjölander K. INTREPID--INformation-theoretic TREe traversal for Protein functional site IDentification. ACTA ACUST UNITED AC 2008; 24:2445-52. [PMID: 18776193 PMCID: PMC2572704 DOI: 10.1093/bioinformatics/btn474] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Motivation: Identification of functionally important residues in proteins plays a significant role in biological discovery. Here, we present INTREPID—an information–theoretic approach for functional site identification that exploits the information in large diverse multiple sequence alignments (MSAs). INTREPID uses a traversal of the phylogeny in combination with a positional conservation score, based on Jensen–Shannon divergence, to rank positions in an MSA. While knowledge of protein 3D structure can significantly improve the accuracy of functional site identification, since structural information is not available for a majority of proteins, INTREPID relies solely on sequence information. We evaluated INTREPID on two tasks: predicting catalytic residues and predicting specificity determinants. Results: In catalytic residue prediction, INTREPID provides significant improvements over Evolutionary Trace, ConSurf as well as over a baseline global conservation method on a set of 100 manually curated enzymes from the Catalytic Site Atlas. In particular, INTREPID is able to better predict catalytic positions that are not globally conserved and hence, attains improved sensitivity at high values of specificity. We also investigated the performance of INTREPID as a function of the evolutionary divergence of the protein family. We found that INTREPID is better able to exploit the diversity in such families and that accuracy improves when homologs with very low sequence identity are included in an alignment. In specificity determinant prediction, when subtype information is known, INTREPID-SPEC, a variant of INTREPID, attains accuracies that are competitive with other approaches for this task. Availability: INTREPID is available for 16919 families in the PhyloFacts resource (http://phylogenomics.berkeley.edu/phylofacts). Contact:sriram_s@cs.berkeley.edu Supplementary information: Relevant online supplementary material is available at http://phylogenomics.berkeley.edu/INTREPID.
Collapse
Affiliation(s)
- Sriram Sankararaman
- Department of Electrical Engineering & Computer Science and Department of Bioengineering, University of California, Berkeley, USA.
| | | |
Collapse
|