1
|
Abstract
Allosteric transition, defined as conformational changes induced by ligand binding, is one of the fundamental properties of proteins. Allostery has been observed and characterized in many proteins, and has been recently utilized to control protein function via regulation of protein activity. Here, we review the physical and evolutionary origin of protein allostery, as well as its importance to protein regulation, drug discovery, and biological processes in living systems. We describe recently developed approaches to identify allosteric pathways, connected sets of pairwise interactions that are responsible for propagation of conformational change from the ligand-binding site to a distal functional site. We then present experimental and computational protein engineering approaches for control of protein function by modulation of allosteric sites. As an example of application of these approaches, we describe a synergistic computational and experimental approach to rescue the cystic-fibrosis-associated protein cystic fibrosis transmembrane conductance regulator, which upon deletion of a single residue misfolds and causes disease. This example demonstrates the power of allosteric manipulation in proteins to both elucidate mechanisms of molecular function and to develop therapeutic strategies that rescue those functions. Allosteric control of proteins provides a tool to shine a light on the complex cascades of cellular processes and facilitate unprecedented interrogation of biological systems.
Collapse
Affiliation(s)
- Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina , Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
2
|
Sandhya S, Mudgal R, Kumar G, Sowdhamini R, Srinivasan N. Protein sequence design and its applications. Curr Opin Struct Biol 2016; 37:71-80. [PMID: 26773478 DOI: 10.1016/j.sbi.2015.12.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Revised: 12/07/2015] [Accepted: 12/15/2015] [Indexed: 01/14/2023]
Abstract
Design of proteins has far-reaching potentials in diverse areas that span repurposing of the protein scaffold for reactions and substrates that they were not naturally meant for, to catching a glimpse of the ephemeral proteins that nature might have sampled during evolution. These non-natural proteins, either in synthesized or virtual form have opened the scope for the design of entities that not only rival their natural counterparts but also offer a chance to visualize the protein space continuum that might help to relate proteins and understand their associations. Here, we review the recent advances in protein engineering and design, in multiple areas, with a view to drawing attention to their future potential.
Collapse
Affiliation(s)
- Sankaran Sandhya
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | - Richa Mudgal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India; IISc Mathematics Initiative, Indian Institute of Science, Bangalore 560 012, India
| | - Gayatri Kumar
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences-TIFR, UAS-GKVK Campus, Bangalore 560065, India
| | | |
Collapse
|
3
|
Mudgal R, Sandhya S, Kumar G, Sowdhamini R, Chandra NR, Srinivasan N. NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection. Nucleic Acids Res 2014; 43:D300-5. [PMID: 25262355 PMCID: PMC4384005 DOI: 10.1093/nar/gku888] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
NrichD (http://proline.biochem.iisc.ernet.in/NRICHD/) is a database of computationally designed protein-like sequences, augmented into natural sequence databases that can perform hops in protein sequence space to assist in the detection of remote relationships. Establishing protein relationships in the absence of structural evidence or natural ‘intermediately related sequences’ is a challenging task. Recently, we have demonstrated that the computational design of artificial intermediary sequences/linkers is an effective approach to fill naturally occurring voids in protein sequence space. Through a large-scale assessment we have demonstrated that such sequences can be plugged into commonly employed search databases to improve the performance of routinely used sequence search methods in detecting remote relationships. Since it is anticipated that such data sets will be employed to establish protein relationships, two databases that have already captured these relationships at the structural and functional domain level, namely, the SCOP database and the Pfam database, have been ‘enriched’ with these artificial intermediary sequences. NrichD database currently contains 3 611 010 artificial sequences that have been generated between 27 882 pairs of families from 374 SCOP folds. The data sets are freely available for download. Additional features include the design of artificial sequences between any two protein families of interest to the user.
Collapse
Affiliation(s)
- Richa Mudgal
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore 560 012, Karnataka, India
| | - Sankaran Sandhya
- Department of Biochemistry, Indian Institute of Science, Bangalore 560 012, Karnataka, India
| | - Gayatri Kumar
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, Karnataka, India
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Gandhi Krishi Vignan Kendra Campus, Bellary road, Bangalore 560 065, Karnataka, India
| | - Nagasuma R Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore 560 012, Karnataka, India
| | | |
Collapse
|
4
|
Mudgal R, Sowdhamini R, Chandra N, Srinivasan N, Sandhya S. Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability. J Mol Biol 2013; 426:962-79. [PMID: 24316367 DOI: 10.1016/j.jmb.2013.11.026] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 11/23/2013] [Accepted: 11/26/2013] [Indexed: 12/11/2022]
Abstract
Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like "linker" sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be "plugged-into" routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold.
Collapse
Affiliation(s)
- Richa Mudgal
- IISc Mathematics Initiative, Indian Institute of Science, Bangalore 560 012, India
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, University of Agricultural Sciences Gandhi Krishi Vignan Kendra Campus, Bangalore 560 065, India
| | - Nagasuma Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore 560 012, India
| | | | - Sankaran Sandhya
- Department of Biochemistry, Indian Institute of Science, Bangalore 560 012, India
| |
Collapse
|
5
|
Sandhya S, Mudgal R, Jayadev C, Abhinandan KR, Sowdhamini R, Srinivasan N. Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins. MOLECULAR BIOSYSTEMS 2012; 8:2076-84. [PMID: 22692068 DOI: 10.1039/c2mb25113b] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Over the past two decades, many ingenious efforts have been made in protein remote homology detection. Because homologous proteins often diversify extensively in sequence, it is challenging to demonstrate such relatedness through entirely sequence-driven searches. Here, we describe a computational method for the generation of 'protein-like' sequences that serves to bridge gaps in protein sequence space. Sequence profile information, as embodied in a position-specific scoring matrix of multiply aligned sequences of bona fide family members, serves as the starting point in this algorithm. The observed amino acid propensity and the selection of a random number dictate the selection of a residue for each position in the sequence. In a systematic manner, and by applying a 'roulette-wheel' selection approach at each position, we generate parent family-like sequences and thus facilitate an enlargement of sequence space around the family. When generated for a large number of families, we demonstrate that they expand the utility of natural intermediately related sequences in linking distant proteins. In 91% of the assessed examples, inclusion of designed sequences improved fold coverage by 5-10% over searches made in their absence. Furthermore, with several examples from proteins adopting folds such as TIM, globin, lipocalin and others, we demonstrate that the success of including designed sequences in a database positively sensitized methods such as PSI-BLAST and Cascade PSI-BLAST and is a promising opportunity for enormously improved remote homology recognition using sequence information alone.
Collapse
Affiliation(s)
- S Sandhya
- National Centre for Biological Sciences, UAS-GKVK Campus, Bangalore 560065, India
| | | | | | | | | | | |
Collapse
|
6
|
Bondugula R, Wallqvist A, Lee MS. Can computationally designed protein sequences improve secondary structure prediction? Protein Eng Des Sel 2011; 24:455-61. [DOI: 10.1093/protein/gzr003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
7
|
Chubb D, Jefferys BR, Sternberg MJE, Kelley LA. Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe. Bioinformatics 2010; 26:2664-71. [PMID: 20843957 DOI: 10.1093/bioinformatics/btq527] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Databases of sequenced genomes are widely used to characterize the structure, function and evolutionary relationships of proteins. The ability to discern such relationships is widely expected to grow as sequencing projects provide novel information, bridging gaps in our map of the protein universe. RESULTS We have plotted our progress in protein sequencing over the last two decades and found that the rate of novel sequence discovery is in a sustained period of decline. Consequently, PSI-BLAST, the most widely used method to detect remote evolutionary relationships, which relies upon the accumulation of novel sequence data, is now showing a plateau in performance. We interpret this trend as signalling our approach to a representative map of the protein universe and discuss its implications.
Collapse
Affiliation(s)
- Daniel Chubb
- Department of Life Science, Imperial College London, London, UK.
| | | | | | | |
Collapse
|
8
|
Martínez-Castilla LP, Rodríguez-Sotres R. A score of the ability of a three-dimensional protein model to retrieve its own sequence as a quantitative measure of its quality and appropriateness. PLoS One 2010; 5:e12483. [PMID: 20830209 PMCID: PMC2935356 DOI: 10.1371/journal.pone.0012483] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2010] [Accepted: 08/03/2010] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Despite the remarkable progress of bioinformatics, how the primary structure of a protein leads to a three-dimensional fold, and in turn determines its function remains an elusive question. Alignments of sequences with known function can be used to identify proteins with the same or similar function with high success. However, identification of function-related and structure-related amino acid positions is only possible after a detailed study of every protein. Folding pattern diversity seems to be much narrower than sequence diversity, and the amino acid sequences of natural proteins have evolved under a selective pressure comprising structural and functional requirements acting in parallel. PRINCIPAL FINDINGS The approach described in this work begins by generating a large number of amino acid sequences using ROSETTA [Dantas G et al. (2003) J Mol Biol 332:449-460], a program with notable robustness in the assignment of amino acids to a known three-dimensional structure. The resulting sequence-sets showed no conservation of amino acids at active sites, or protein-protein interfaces. Hidden Markov models built from the resulting sequence sets were used to search sequence databases. Surprisingly, the models retrieved from the database sequences belonged to proteins with the same or a very similar function. Given an appropriate cutoff, the rate of false positives was zero. According to our results, this protocol, here referred to as Rd.HMM, detects fine structural details on the folding patterns, that seem to be tightly linked to the fitness of a structural framework for a specific biological function. CONCLUSION Because the sequence of the native protein used to create the Rd.HMM model was always amongst the top hits, the procedure is a reliable tool to score, very accurately, the quality and appropriateness of computer-modeled 3D-structures, without the need for spectroscopy data. However, Rd.HMM is very sensitive to the conformational features of the models' backbone.
Collapse
Affiliation(s)
- León P. Martínez-Castilla
- Departamento de Bioquímica–Facultad de Química, Universidad Nacional Autónoma de México, Ciudad de México, Distrito Federal, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Ciudad de México, Distrito Federal, Mexico
| | - Rogelio Rodríguez-Sotres
- Departamento de Bioquímica–Facultad de Química, Universidad Nacional Autónoma de México, Ciudad de México, Distrito Federal, Mexico
- * E-mail:
| |
Collapse
|
9
|
Dai L, Yang Y, Kim HR, Zhou Y. Improving computational protein design by using structure-derived sequence profile. Proteins 2010; 78:2338-48. [PMID: 20544969 PMCID: PMC3058783 DOI: 10.1002/prot.22746] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Designing a protein sequence that will fold into a predefined structure is of both practical and fundamental interest. Many successful, computational designs in the last decade resulted from improved understanding of hydrophobic and polar interactions between side chains of amino acid residues in stabilizing protein tertiary structures. However, the coupling between main-chain backbone structure and local sequence has yet to be fully addressed. Here, we attempt to account for such coupling by using a sequence profile derived from the sequences of five residue fragments in a fragment library that are structurally matched to the five-residue segments contained in a target structure. We further introduced a term to reduce low complexity regions of designed sequences. These two terms together with optimized reference states for amino-acid residues were implemented in the RosettaDesign program. The new method, called RosettaDesign-SR, makes a 12% increase (from 34 to 46%) in fraction of proteins whose designed sequences are more than 35% identical to wild-type sequences. Meanwhile, it reduces 8% (from 22% to 14%) to the number of designed sequences that are not homologous to any known protein sequences according to psi-blast. More importantly, the sequences designed by RosettaDesign-SR have 2-3% more polar residues at the surface and core regions of proteins and these surface and core polar residues have about 4% higher sequence identity to wild-type sequences than by RosettaDesign. Thus, the proteins designed by RosettaDesign-SR should be less likely to aggregate and more likely to have unique structures due to more specific polar interactions.
Collapse
Affiliation(s)
- Liang Dai
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202
| | - Yuedong Yang
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202
| | - Hyung Rae Kim
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202
| | - Yaoqi Zhou
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202
| |
Collapse
|
10
|
Fromer M, Yanover C, Linial M. Design of multispecific protein sequences using probabilistic graphical modeling. Proteins 2010; 78:530-47. [PMID: 19842166 DOI: 10.1002/prot.22575] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In nature, proteins partake in numerous protein- protein interactions that mediate their functions. Moreover, proteins have been shown to be physically stable in multiple structures, induced by cellular conditions, small ligands, or covalent modifications. Understanding how protein sequences achieve this structural promiscuity at the atomic level is a fundamental step in the drug design pipeline and a critical question in protein physics. One way to investigate this subject is to computationally predict protein sequences that are compatible with multiple states, i.e., multiple target structures or binding to distinct partners. The goal of engineering such proteins has been termed multispecific protein design. We develop a novel computational framework to efficiently and accurately perform multispecific protein design. This framework utilizes recent advances in probabilistic graphical modeling to predict sequences with low energies in multiple target states. Furthermore, it is also geared to specifically yield positional amino acid probability profiles compatible with these target states. Such profiles can be used as input to randomly bias high-throughput experimental sequence screening techniques, such as phage display, thus providing an alternative avenue for elucidating the multispecificity of natural proteins and the synthesis of novel proteins with specific functionalities. We prove the utility of such multispecific design techniques in better recovering amino acid sequence diversities similar to those resulting from millions of years of evolution. We then compare the approaches of prediction of low energy ensembles and of amino acid profiles and demonstrate their complementarity in providing more robust predictions for protein design.
Collapse
Affiliation(s)
- Menachem Fromer
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel.
| | | | | |
Collapse
|
11
|
Backbone flexibility in computational protein design. Curr Opin Biotechnol 2009; 20:420-8. [DOI: 10.1016/j.copbio.2009.07.006] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Revised: 07/17/2009] [Accepted: 07/25/2009] [Indexed: 11/22/2022]
|
12
|
Wang K, Horst JA, Cheng G, Nickle DC, Samudrala R. Protein meta-functional signatures from combining sequence, structure, evolution, and amino acid property information. PLoS Comput Biol 2008; 4:e1000181. [PMID: 18818722 PMCID: PMC2526173 DOI: 10.1371/journal.pcbi.1000181] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2008] [Accepted: 08/07/2008] [Indexed: 11/19/2022] Open
Abstract
Protein function is mediated by different amino acid residues, both their positions and types, in a protein sequence. Some amino acids are responsible for the stability or overall shape of the protein, playing an indirect role in protein function. Others play a functionally important role as part of active or binding sites of the protein. For a given protein sequence, the residues and their degree of functional importance can be thought of as a signature representing the function of the protein. We have developed a combination of knowledge- and biophysics-based function prediction approaches to elucidate the relationships between the structural and the functional roles of individual residues and positions. Such a meta-functional signature (MFS), which is a collection of continuous values representing the functional significance of each residue in a protein, may be used to study proteins of known function in greater detail and to aid in experimental characterization of proteins of unknown function. We demonstrate the superior performance of MFS in predicting protein functional sites and also present four real-world examples to apply MFS in a wide range of settings to elucidate protein sequence-structure-function relationships. Our results indicate that the MFS approach, which can combine multiple sources of information and also give biological interpretation to each component, greatly facilitates the understanding and characterization of protein function.
Collapse
MESH Headings
- Amino Acid Sequence
- Amino Acids/chemistry
- Bacterial Proteins/chemistry
- Bacterial Proteins/genetics
- Bacterial Proteins/physiology
- Binding Sites
- Cellulose 1,4-beta-Cellobiosidase/chemistry
- Cellulose 1,4-beta-Cellobiosidase/genetics
- Cellulose 1,4-beta-Cellobiosidase/physiology
- Computational Biology/methods
- Computer Simulation
- Conserved Sequence
- Databases, Protein/statistics & numerical data
- Evolution, Molecular
- Internet
- Models, Chemical
- Models, Genetic
- Models, Molecular
- Molecular Structure
- Mutagenesis, Site-Directed
- Ornithine Decarboxylase/chemistry
- Ornithine Decarboxylase/genetics
- Ornithine Decarboxylase/physiology
- Protein Interaction Domains and Motifs
- Protein Structure, Tertiary
- Proteins/chemistry
- Proteins/genetics
- Proteins/physiology
- Regression Analysis
- Sequence Alignment/statistics & numerical data
- Thermodynamics
Collapse
Affiliation(s)
- Kai Wang
- Computational Genomics Group, Department of Microbiology, University of Washington, Seattle, Washington, United States of America
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Jeremy A. Horst
- Computational Genomics Group, Department of Microbiology, University of Washington, Seattle, Washington, United States of America
- Department of Oral Biology, University of Washington, Seattle, Washington, United States of America
| | - Gong Cheng
- Computational Genomics Group, Department of Microbiology, University of Washington, Seattle, Washington, United States of America
- Department of Biochemistry, University of Washington, Seattle, Washington, United States of America
| | - David C. Nickle
- Computational Genomics Group, Department of Microbiology, University of Washington, Seattle, Washington, United States of America
| | - Ram Samudrala
- Computational Genomics Group, Department of Microbiology, University of Washington, Seattle, Washington, United States of America
- Department of Oral Biology, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
13
|
Larrea AA, Pedroso IM, Malhotra A, Myers RS. Identification of two conserved aspartic acid residues required for DNA digestion by a novel thermophilic Exonuclease VII in Thermotoga maritima. Nucleic Acids Res 2008; 36:5992-6003. [PMID: 18812402 PMCID: PMC2566859 DOI: 10.1093/nar/gkn588] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Exonuclease VII was first identified in 1974 as a DNA exonuclease that did not require any divalent cations for activity. Indeed, Escherichia coli ExoVII was identified in partially purified extracts in the presence of EDTA. ExoVII is comprised of two subunits (XseA and XseB) that are highly conserved and present in most sequenced prokaryotic genomes, but are not seen in eukaryotes. To better understand this exonuclease family, we have characterized an ExoVII homolog from Thermotoga maritima. Thermotoga maritima XseA/B homologs TM1768 and TM1769 were co-expressed and purified, and show robust nuclease activity at 80°C. This activity is magnesium dependent and is inhibited by phosphate ions, which distinguish it from E. coli ExoVII. Nevertheless, both E. coli and T. maritima ExoVII share a similar putative active site motif with two conserved aspartate residues in the large (XseA/TM1768) subunit. We show that these residues, Asp235 and Asp240, are essential for the nuclease activity of T. maritima ExoVII. We hypothesize that the ExoVII family of nucleases can be sub-divided into two sub-families based on EDTA resistance and that T. maritima ExoVII is the first member of the branch that is characterized by EDTA sensitivity and inhibition by phosphate.
Collapse
Affiliation(s)
- Andres A Larrea
- Department of Biochemistry and Molecular Biology, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | | | | | | |
Collapse
|
14
|
Dukka BKC, Livesay DR. Improving position-specific predictions of protein functional sites using phylogenetic motifs. ACTA ACUST UNITED AC 2008; 24:2308-16. [PMID: 18723520 DOI: 10.1093/bioinformatics/btn454] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Accurate computational prediction of protein functional sites is critical to maximizing the utility of recent high-throughput sequencing efforts. Among the available approaches, position-specific conservation scores remain among the most popular due to their accuracy and ease of computation. Unfortunately, high false positive rates remain a limiting factor. Using phylogenetic motifs (PMs), we have developed two combined (conservation + PMs) prediction schemes that significantly improve prediction accuracy. RESULTS Our first approach, called position-specific MINER (psMINER), rank orders alignment columns by conservation. Subsequently, positions that are also not identified as PMs are excluded from the prediction set. This approach improves prediction accuracy, in a statistically significant way, compared to the underlying conservation scores. Increased accuracy is a general result, meaning improvement is observed over several different conservation scores that span a continuum of complexity. In addition, a hybrid MINER (hMINER) that quantitatively considers both scoring regimes provides further improvement. More importantly, it provides critical insight into the relative importance of phylogeny versus alignment conservation. Both methods outperform other common prediction algorithms that also utilize phylogenetic concepts. Finally, we demonstrate that the presented results are critically sensitive to functional site definition, thus highlighting the need for more complete benchmarks within the prediction community.
Collapse
Affiliation(s)
- Bahadur K C Dukka
- Department of Computer Science and Bioinformatics Research Center, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | | |
Collapse
|
15
|
Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics 2006. [PMID: 16916457 DOI: 10.1186/1471‐2105‐7‐385] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Several entropy-based methods have been developed for scoring sequence conservation in protein multiple sequence alignments. High scoring amino acid positions may correlate with structurally or functionally important residues. However, amino acid background frequencies are usually not taken into account in these entropy-based scoring schemes. RESULTS We demonstrate that using a relative entropy measure that incorporates amino acid background frequency results in improved performance in identifying functional sites from protein multiple sequence alignments. CONCLUSION Our results suggest that the application of appropriate background frequency information may lead to more biologically relevant results in many areas of bioinformatics.
Collapse
|
16
|
Wang K, Samudrala R. Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics 2006; 7:385. [PMID: 16916457 PMCID: PMC1562451 DOI: 10.1186/1471-2105-7-385] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2006] [Accepted: 08/17/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Several entropy-based methods have been developed for scoring sequence conservation in protein multiple sequence alignments. High scoring amino acid positions may correlate with structurally or functionally important residues. However, amino acid background frequencies are usually not taken into account in these entropy-based scoring schemes. RESULTS We demonstrate that using a relative entropy measure that incorporates amino acid background frequency results in improved performance in identifying functional sites from protein multiple sequence alignments. CONCLUSION Our results suggest that the application of appropriate background frequency information may lead to more biologically relevant results in many areas of bioinformatics.
Collapse
Affiliation(s)
- Kai Wang
- Computational Genomics Group, Department of Microbiology, University of Washington, USA
| | - Ram Samudrala
- Computational Genomics Group, Department of Microbiology, University of Washington, USA
| |
Collapse
|
17
|
Li J, Wang W. Detailed assessment of homology detection using different substitution matrices. CHINESE SCIENCE BULLETIN-CHINESE 2006. [DOI: 10.1007/s11434-006-1538-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
18
|
Liu Y, Kuhlman B. RosettaDesign server for protein design. Nucleic Acids Res 2006; 34:W235-8. [PMID: 16845000 PMCID: PMC1538902 DOI: 10.1093/nar/gkl163] [Citation(s) in RCA: 166] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2006] [Revised: 02/22/2006] [Accepted: 03/20/2006] [Indexed: 11/14/2022] Open
Abstract
The RosettaDesign server identifies low energy amino acid sequences for target protein structures (http://rosettadesign.med.unc.edu). The client provides the backbone coordinates of the target structure and specifies which residues to design. The server returns to the client the sequences, coordinates and energies of the designed proteins. The simulations are performed using the design module of the Rosetta program (RosettaDesign). RosettaDesign uses Monte Carlo optimization with simulated annealing to search for amino acids that pack well on the target structure and satisfy hydrogen bonding potential. RosettaDesign has been experimentally validated and has been used previously to stabilize naturally occurring proteins and design a novel protein structure.
Collapse
Affiliation(s)
- Yi Liu
- Department of Biochemistry and Biophysics, University of North CarolinaChapel Hill, NC 27599, USA
| | - Brian Kuhlman
- Department of Biochemistry and Biophysics, University of North CarolinaChapel Hill, NC 27599, USA
| |
Collapse
|
19
|
Sandhya S, Chakrabarti S, Abhinandan KR, Sowdhamini R, Srinivasan N. Assessment of a rigorous transitive profile based search method to detect remotely similar proteins. J Biomol Struct Dyn 2005; 23:283-98. [PMID: 16218755 DOI: 10.1080/07391102.2005.10507066] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Profile-based sequence search procedures are commonly employed to detect remote relationships between proteins. We provide an assessment of a Cascade PSI-BLAST protocol that rigorously employs intermediate sequences in detecting remote relationships between proteins. In this approach we detect using PSI-BLAST, which involves multiple rounds of iteration, an initial set of homologues for a protein in a 'first generation' search by querying a database. We propagate a 'second generation' search in the database, involving multiple runs of PSI-BLAST using each of the homologues identified in the previous generation as queries to recognize homologues not detected earlier. This non-directed search process can be viewed as an iteration of iterations that is continued to detect further homologues until no new hits are detectable. We present an assessment of the coverage of this 'cascaded' intermediate sequence search on diverse folds and find that searches for up to three generations detect most known homologues of a query. Our assessments show that this approach appears to perform better than the traditional use of PSI-BLAST by detecting 15% more relationships within a family and 35% more relationships within a superfamily. We show that such searches can be performed on generalized sequence databases and non-trivial relationships between proteins can be detected effectively. Such a propagation of searches maximizes the chances of detecting distant homologies by effectively scanning protein "fold space".
Collapse
Affiliation(s)
- S Sandhya
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | | | | | | | | |
Collapse
|
20
|
Cheng G, Qian B, Samudrala R, Baker D. Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Res 2005; 33:5861-7. [PMID: 16224101 PMCID: PMC1258172 DOI: 10.1093/nar/gki894] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
The prediction of functional sites in newly solved protein structures is a challenge for computational structural biology. Most methods for approaching this problem use evolutionary conservation as the primary indicator of the location of functional sites. However, sequence conservation reflects not only evolutionary selection at functional sites to maintain protein function, but also selection throughout the protein to maintain the stability of the folded state. To disentangle sequence conservation due to protein functional constraints from sequence conservation due to protein structural constraints, we use all atom computational protein design methodology to predict sequence profiles expected under solely structural constraints, and to compute the free energy difference between the naturally occurring amino acid and the lowest free energy amino acid at each position. We show that functional sites are more likely than non-functional sites to have computed sequence profiles which differ significantly from the naturally occurring sequence profiles and to have residues with sub-optimal free energies, and that incorporation of these two measures improves sequence based prediction of protein functional sites. The combined sequence and structure based functional site prediction method has been implemented in a publicly available web server.
Collapse
Affiliation(s)
- Gong Cheng
- Department of Biochemistry, University of WashingtonSeattle, Washington, USA
- Biomolecular Structure and Design Program, University of WashingtonSeattle, Washington, USA
| | - Bin Qian
- Department of Biochemistry, University of WashingtonSeattle, Washington, USA
- Howard Hughes Medical Institute, University of WashingtonSeattle, Washington, USA
| | - Ram Samudrala
- Department of Microbiology, University of WashingtonSeattle, Washington, USA
| | - David Baker
- Department of Biochemistry, University of WashingtonSeattle, Washington, USA
- Howard Hughes Medical Institute, University of WashingtonSeattle, Washington, USA
- To whom correspondence should be addressed. Tel: +1 206 543 1295; Fax: +1 206 685 1792;
| |
Collapse
|
21
|
Greaves R, Warwicker J. Active site identification through geometry-based and sequence profile-based calculations: burial of catalytic clefts. J Mol Biol 2005; 349:547-57. [PMID: 15882869 DOI: 10.1016/j.jmb.2005.04.018] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2005] [Revised: 03/30/2005] [Accepted: 04/08/2005] [Indexed: 12/30/2022]
Abstract
Electrostatics calculations with proteins that are uniformly charged over volume can aid enzyme/non-enzyme discrimination. For known enzymes, such methods locate active sites to within 5% on the enzyme surface, in 77% of a test set. We now report that removing the dielectric boundary improves active site location to 80%, with optimal discrimination between enzymes and non-enzymes of around 80% specificity and 80% sensitivity. This calculation quantifies burial of solvent-accessible regions. Many of the true enzymes incorrectly assigned as non-enzymes have active sites at subunit boundaries. These are missed in monomer-based calculations. Catalytic and non-catalytic antibodies are studied in this context of active/binding site burial. Whilst catalytic antibodies, on average, have marginally higher active site burial than non-catalytic antibodies, these values are generally smaller than for non-antibody enzymes, possibly contributing to their relatively low turnover. Prediction of active site location improves further when sequence profile-based weights replace the uniform charge distribution, so that a combination of burial and amino acid conservation is assessed. Accuracy rises to 93% of active sites to within 5%, in the test set, for the optimal profile weights scheme. The equivalent value in a separate validation set is 89% to within 5%. Enzyme/non-enzyme and enzyme functional site predictions are made for structural genomics proteins, suggesting that a substantial majority of these are non-enzymes.
Collapse
Affiliation(s)
- Richard Greaves
- Faculty of Life Sciences, Jackson's Mill, University of Manchester, P.O. Box 88, Sackville Street, Manchester M60 1QD, UK
| | | |
Collapse
|
22
|
Saunders CT, Baker D. Recapitulation of protein family divergence using flexible backbone protein design. J Mol Biol 2005; 346:631-44. [PMID: 15670610 DOI: 10.1016/j.jmb.2004.11.062] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2004] [Revised: 11/18/2004] [Accepted: 11/22/2004] [Indexed: 11/30/2022]
Abstract
We use flexible backbone protein design to explore the sequence and structure neighborhoods of naturally occurring proteins. The method samples sequence and structure space in the vicinity of a known sequence and structure by alternately optimizing the sequence for a fixed protein backbone using rotamer based sequence search, and optimizing the backbone for a fixed amino acid sequence using atomic-resolution structure prediction. We find that such a flexible backbone design method better recapitulates protein family sequence variation than sequence optimization on fixed backbones or randomly perturbed backbone ensembles for ten diverse protein structures. For the SH3 domain, the backbone structure variation in the family is also better recapitulated than in randomly perturbed backbones. The potential application of this method as a model of protein family evolution is highlighted by a concerted transition to the amino acid sequence in the structural core of one SH3 domain starting from the backbone coordinates of an homologous structure.
Collapse
Affiliation(s)
- Christopher T Saunders
- Department of Genome Sciences, University of Washington, Box 357730, Seattle, WA 98195, USA
| | | |
Collapse
|
23
|
Parisi G, Echave J. The structurally constrained protein evolution model accounts for sequence patterns of the LbetaH superfamily. BMC Evol Biol 2004; 4:41. [PMID: 15500694 PMCID: PMC538250 DOI: 10.1186/1471-2148-4-41] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2003] [Accepted: 10/22/2004] [Indexed: 11/24/2022] Open
Abstract
Background Structure conservation constrains evolutionary sequence divergence, resulting in observable sequence patterns. Most current models of protein evolution do not take structure into account explicitly, being unsuitable for investigating the effects of structure conservation on sequence divergence. To this end, we recently developed the Structurally Constrained Protein Evolution (SCPE) model. The model starts with the coding sequence of a protein with known three-dimensional structure. At each evolutionary time-step of an SCPE simulation, a trial sequence is generated by introducing a random point mutation in the current coding DNA sequence. Then, a "score" for the trial sequence is calculated and the mutation is accepted only if its score is under a given cutoff, λ. The SCPE score measures the distance between the trial sequence and a given reference sequence, given the structure. In our first brief report we used a "global score", in which the same reference sequence, the ancestral one, was used at each evolutionary step. Here, we introduce a new scoring function, the "local score", in which the sequence accepted at the previous evolutionary time-step is used as the reference. We assess the model on the UDP-N-acetylglucosamine acyltransferase (LPXA) family, as in our previous report, and we extend this study to all other members of the left-handed parallel beta helix fold (LβH) superfamily whose structure has been determined. Results We studied site-dependent entropies, amino acid probability distributions, and substitution matrices predicted by SCPE and compared with experimental data for several members of the LβH superfamily. We also evaluated structure conservation during simulations. Overall, SCPE outperforms JTT in the description of sequence patterns observed in structurally constrained sites. Maximum Likelihood calculations show that the local-score and global-score SCPE substitution matrices obtained for LPXA outperform the JTT model for the LPXA family and for the structurally constrained sites of class i of other members within the LβH superfamily. Conclusion We extended the SCPE model by introducing a new scoring function, the local score. We performed a thorough assessment of the SCPE model on the LPXA family and extended it to all other members of known structure of the LβH superfamily.
Collapse
Affiliation(s)
- Gustavo Parisi
- Centro de Estudios e Investigaciones, Universidad Nacional de Quilmes, Roque Saenz Peña 180, B1876BXD Bernal, Argentina
| | - Julián Echave
- Centro de Estudios e Investigaciones, Universidad Nacional de Quilmes, Roque Saenz Peña 180, B1876BXD Bernal, Argentina
| |
Collapse
|
24
|
Cai W, Pei J, Grishin NV. Reconstruction of ancestral protein sequences and its applications. BMC Evol Biol 2004; 4:33. [PMID: 15377393 PMCID: PMC522809 DOI: 10.1186/1471-2148-4-33] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2004] [Accepted: 09/17/2004] [Indexed: 11/16/2022] Open
Abstract
Background Modern-day proteins were selected during long evolutionary history as descendants of ancient life forms. In silico reconstruction of such ancestral protein sequences facilitates our understanding of evolutionary processes, protein classification and biological function. Additionally, reconstructed ancestral protein sequences could serve to fill in sequence space thus aiding remote homology inference. Results We developed ANCESCON, a package for distance-based phylogenetic inference and reconstruction of ancestral protein sequences that takes into account the observed variation of evolutionary rates between positions that more precisely describes the evolution of protein families. To improve the accuracy of evolutionary distance estimation and ancestral sequence reconstruction, two approaches are proposed to estimate position-specific evolutionary rates. Comparisons show that at large evolutionary distances our method gives more accurate ancestral sequence reconstruction than PAML, PHYLIP and PAUP*. We apply the reconstructed ancestral sequences to homology inference and functional site prediction. We show that the usage of hypothetical ancestors together with the present day sequences improves profile-based sequence similarity searches; and that ancestral sequence reconstruction methods can be used to predict positions with functional specificity. Conclusions As a computational tool to reconstruct ancestral protein sequences from a given multiple sequence alignment, ANCESCON shows high accuracy in tests and helps detection of remote homologs and prediction of functional sites. ANCESCON is freely available for non-commercial use. Pre-compiled versions for several platforms can be downloaded from .
Collapse
Affiliation(s)
- Wei Cai
- Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, 5323 Harry Hines Blvd., Dallas, TX. 75390-9050, USA
| | - Jimin Pei
- Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, 5323 Harry Hines Blvd., Dallas, TX. 75390-9050, USA
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, 5323 Harry Hines Blvd., Dallas, TX. 75390-9050, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, 5323 Harry Hines Blvd., Dallas, TX. 75390-9050, USA
| |
Collapse
|
25
|
Bate P, Warwicker J. Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods. J Mol Biol 2004; 340:263-76. [PMID: 15201051 DOI: 10.1016/j.jmb.2004.04.070] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2004] [Revised: 04/29/2004] [Accepted: 04/29/2004] [Indexed: 11/27/2022]
Abstract
Calculations of charge interactions complement analysis of a characterised active site, rationalising pH-dependence of activity and transition state stabilisation. Prediction of active site location through large DeltapK(a)s or electrostatic strain is relevant for structural genomics. We report a study of ionisable groups in a set of 20 enzymes, finding that false positives obscure predictive potential. In a larger set of 156 enzymes, peaks in solvent-space electrostatic properties are calculated. Both electric field and potential match well to active site location. The best correlation is found with electrostatic potential calculated from uniform charge density over enzyme volume, rather than from assignment of a standard atom-specific charge set. Studying a shell around each molecule, for 77% of enzymes the potential peak is within that 5% of the shell closest to the active site centre, and 86% within 10%. Active site identification by largest cleft, also with projection onto a shell, gives 58% of enzymes for which the centre of the largest cleft lies within 5% of the active site, and 70% within 10%. Dielectric boundary conditions emphasise clefts in the uniform charge density method, which is suited to recognition of binding pockets embedded within larger clefts. The variation of peak potential with distance from active site, and comparison between enzyme and non-enzyme sets, gives an optimal threshold distinguishing enzyme from non-enzyme. We find that 87% of the enzyme set exceeds the threshold as compared to 29% of the non-enzyme set. Enzyme/non-enzyme homologues, "structural genomics" annotated proteins and catalytic/non-catalytic RNAs are studied in this context.
Collapse
Affiliation(s)
- Paul Bate
- Biomolecular Sciences Department, University of Manchester Institute of Science and Technology, Sackville Street, Manchester M60 1QD, UK
| | | |
Collapse
|