Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Pei J, Dokholyan NV, Shakhnovich EI, Grishin NV. Using protein design for homology detection and active site searches. Proc Natl Acad Sci U S A 2003;100:11361-6. [PMID: 12975528 PMCID: PMC208762 DOI: 10.1073/pnas.2034878100] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2003] [Indexed: 11/18/2022] Open

For:	Pei J, Dokholyan NV, Shakhnovich EI, Grishin NV. Using protein design for homology detection and active site searches. Proc Natl Acad Sci U S A 2003;100:11361-6. [PMID: 12975528 PMCID: PMC208762 DOI: 10.1073/pnas.2034878100] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2003] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Dokholyan NV. Controlling Allosteric Networks in Proteins. Chem Rev 2016;116:6463-87. [PMID: 26894745 DOI: 10.1021/acs.chemrev.5b00544] [Citation(s) in RCA: 189] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Sandhya S, Mudgal R, Kumar G, Sowdhamini R, Srinivasan N. Protein sequence design and its applications. Curr Opin Struct Biol 2016;37:71-80. [PMID: 26773478 DOI: 10.1016/j.sbi.2015.12.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Revised: 12/07/2015] [Accepted: 12/15/2015] [Indexed: 01/14/2023]

Mudgal R, Sandhya S, Kumar G, Sowdhamini R, Chandra NR, Srinivasan N. NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection. Nucleic Acids Res 2014;43:D300-5. [PMID: 25262355 PMCID: PMC4384005 DOI: 10.1093/nar/gku888] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Mudgal R, Sowdhamini R, Chandra N, Srinivasan N, Sandhya S. Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability. J Mol Biol 2013;426:962-79. [PMID: 24316367 DOI: 10.1016/j.jmb.2013.11.026] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 11/23/2013] [Accepted: 11/26/2013] [Indexed: 12/11/2022]

Abstract

Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like "linker" sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be "plugged-into" routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold.

Collapse

Sandhya S, Mudgal R, Jayadev C, Abhinandan KR, Sowdhamini R, Srinivasan N. Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins. MOLECULAR BIOSYSTEMS 2012;8:2076-84. [PMID: 22692068 DOI: 10.1039/c2mb25113b] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Bondugula R, Wallqvist A, Lee MS. Can computationally designed protein sequences improve secondary structure prediction? Protein Eng Des Sel 2011;24:455-61. [DOI: 10.1093/protein/gzr003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Chubb D, Jefferys BR, Sternberg MJE, Kelley LA. Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe. Bioinformatics 2010;26:2664-71. [PMID: 20843957 DOI: 10.1093/bioinformatics/btq527] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Martínez-Castilla LP, Rodríguez-Sotres R. A score of the ability of a three-dimensional protein model to retrieve its own sequence as a quantitative measure of its quality and appropriateness. PLoS One 2010;5:e12483. [PMID: 20830209 PMCID: PMC2935356 DOI: 10.1371/journal.pone.0012483] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2010] [Accepted: 08/03/2010] [Indexed: 11/18/2022] Open

Abstract

BACKGROUND

Despite the remarkable progress of bioinformatics, how the primary structure of a protein leads to a three-dimensional fold, and in turn determines its function remains an elusive question. Alignments of sequences with known function can be used to identify proteins with the same or similar function with high success. However, identification of function-related and structure-related amino acid positions is only possible after a detailed study of every protein. Folding pattern diversity seems to be much narrower than sequence diversity, and the amino acid sequences of natural proteins have evolved under a selective pressure comprising structural and functional requirements acting in parallel.

PRINCIPAL FINDINGS

The approach described in this work begins by generating a large number of amino acid sequences using ROSETTA [Dantas G et al. (2003) J Mol Biol 332:449-460], a program with notable robustness in the assignment of amino acids to a known three-dimensional structure. The resulting sequence-sets showed no conservation of amino acids at active sites, or protein-protein interfaces. Hidden Markov models built from the resulting sequence sets were used to search sequence databases. Surprisingly, the models retrieved from the database sequences belonged to proteins with the same or a very similar function. Given an appropriate cutoff, the rate of false positives was zero. According to our results, this protocol, here referred to as Rd.HMM, detects fine structural details on the folding patterns, that seem to be tightly linked to the fitness of a structural framework for a specific biological function.

CONCLUSION

Because the sequence of the native protein used to create the Rd.HMM model was always amongst the top hits, the procedure is a reliable tool to score, very accurately, the quality and appropriateness of computer-modeled 3D-structures, without the need for spectroscopy data. However, Rd.HMM is very sensitive to the conformational features of the models' backbone.

Collapse

Dai L, Yang Y, Kim HR, Zhou Y. Improving computational protein design by using structure-derived sequence profile. Proteins 2010;78:2338-48. [PMID: 20544969 PMCID: PMC3058783 DOI: 10.1002/prot.22746] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Fromer M, Yanover C, Linial M. Design of multispecific protein sequences using probabilistic graphical modeling. Proteins 2010;78:530-47. [PMID: 19842166 DOI: 10.1002/prot.22575] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Backbone flexibility in computational protein design. Curr Opin Biotechnol 2009;20:420-8. [DOI: 10.1016/j.copbio.2009.07.006] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Revised: 07/17/2009] [Accepted: 07/25/2009] [Indexed: 11/22/2022]

Wang K, Horst JA, Cheng G, Nickle DC, Samudrala R. Protein meta-functional signatures from combining sequence, structure, evolution, and amino acid property information. PLoS Comput Biol 2008;4:e1000181. [PMID: 18818722 PMCID: PMC2526173 DOI: 10.1371/journal.pcbi.1000181] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2008] [Accepted: 08/07/2008] [Indexed: 11/19/2022] Open

Larrea AA, Pedroso IM, Malhotra A, Myers RS. Identification of two conserved aspartic acid residues required for DNA digestion by a novel thermophilic Exonuclease VII in Thermotoga maritima. Nucleic Acids Res 2008;36:5992-6003. [PMID: 18812402 PMCID: PMC2566859 DOI: 10.1093/nar/gkn588] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Dukka BKC, Livesay DR. Improving position-specific predictions of protein functional sites using phylogenetic motifs. ACTA ACUST UNITED AC 2008;24:2308-16. [PMID: 18723520 DOI: 10.1093/bioinformatics/btn454] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics 2006. [PMID: 16916457 DOI: 10.1186/1471‐2105‐7‐385] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Wang K, Samudrala R. Incorporating background frequency improves entropy-based residue conservation measures. BMC Bioinformatics 2006;7:385. [PMID: 16916457 PMCID: PMC1562451 DOI: 10.1186/1471-2105-7-385] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2006] [Accepted: 08/17/2006] [Indexed: 11/10/2022] Open

Li J, Wang W. Detailed assessment of homology detection using different substitution matrices. CHINESE SCIENCE BULLETIN-CHINESE 2006. [DOI: 10.1007/s11434-006-1538-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Liu Y, Kuhlman B. RosettaDesign server for protein design. Nucleic Acids Res 2006;34:W235-8. [PMID: 16845000 PMCID: PMC1538902 DOI: 10.1093/nar/gkl163] [Citation(s) in RCA: 166] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2006] [Revised: 02/22/2006] [Accepted: 03/20/2006] [Indexed: 11/14/2022] Open

Sandhya S, Chakrabarti S, Abhinandan KR, Sowdhamini R, Srinivasan N. Assessment of a rigorous transitive profile based search method to detect remotely similar proteins. J Biomol Struct Dyn 2005;23:283-98. [PMID: 16218755 DOI: 10.1080/07391102.2005.10507066] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Cheng G, Qian B, Samudrala R, Baker D. Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Res 2005;33:5861-7. [PMID: 16224101 PMCID: PMC1258172 DOI: 10.1093/nar/gki894] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open

Greaves R, Warwicker J. Active site identification through geometry-based and sequence profile-based calculations: burial of catalytic clefts. J Mol Biol 2005;349:547-57. [PMID: 15882869 DOI: 10.1016/j.jmb.2005.04.018] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2005] [Revised: 03/30/2005] [Accepted: 04/08/2005] [Indexed: 12/30/2022]

Saunders CT, Baker D. Recapitulation of protein family divergence using flexible backbone protein design. J Mol Biol 2005;346:631-44. [PMID: 15670610 DOI: 10.1016/j.jmb.2004.11.062] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2004] [Revised: 11/18/2004] [Accepted: 11/22/2004] [Indexed: 11/30/2022]

Parisi G, Echave J. The structurally constrained protein evolution model accounts for sequence patterns of the LbetaH superfamily. BMC Evol Biol 2004;4:41. [PMID: 15500694 PMCID: PMC538250 DOI: 10.1186/1471-2148-4-41] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2003] [Accepted: 10/22/2004] [Indexed: 11/24/2022] Open

Abstract

Background

Structure conservation constrains evolutionary sequence divergence, resulting in observable sequence patterns. Most current models of protein evolution do not take structure into account explicitly, being unsuitable for investigating the effects of structure conservation on sequence divergence. To this end, we recently developed the Structurally Constrained Protein Evolution (SCPE) model. The model starts with the coding sequence of a protein with known three-dimensional structure. At each evolutionary time-step of an SCPE simulation, a trial sequence is generated by introducing a random point mutation in the current coding DNA sequence. Then, a "score" for the trial sequence is calculated and the mutation is accepted only if its score is under a given cutoff, λ. The SCPE score measures the distance between the trial sequence and a given reference sequence, given the structure. In our first brief report we used a "global score", in which the same reference sequence, the ancestral one, was used at each evolutionary step. Here, we introduce a new scoring function, the "local score", in which the sequence accepted at the previous evolutionary time-step is used as the reference. We assess the model on the UDP-N-acetylglucosamine acyltransferase (LPXA) family, as in our previous report, and we extend this study to all other members of the left-handed parallel beta helix fold (LβH) superfamily whose structure has been determined.

Results

We studied site-dependent entropies, amino acid probability distributions, and substitution matrices predicted by SCPE and compared with experimental data for several members of the LβH superfamily. We also evaluated structure conservation during simulations. Overall, SCPE outperforms JTT in the description of sequence patterns observed in structurally constrained sites. Maximum Likelihood calculations show that the local-score and global-score SCPE substitution matrices obtained for LPXA outperform the JTT model for the LPXA family and for the structurally constrained sites of class i of other members within the LβH superfamily.

Conclusion

We extended the SCPE model by introducing a new scoring function, the local score. We performed a thorough assessment of the SCPE model on the LPXA family and extended it to all other members of known structure of the LβH superfamily.

Collapse

Cai W, Pei J, Grishin NV. Reconstruction of ancestral protein sequences and its applications. BMC Evol Biol 2004;4:33. [PMID: 15377393 PMCID: PMC522809 DOI: 10.1186/1471-2148-4-33] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2004] [Accepted: 09/17/2004] [Indexed: 11/16/2022] Open

Bate P, Warwicker J. Enzyme/non-enzyme discrimination and prediction of enzyme active site location using charge-based methods. J Mol Biol 2004;340:263-76. [PMID: 15201051 DOI: 10.1016/j.jmb.2004.04.070] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2004] [Revised: 04/29/2004] [Accepted: 04/29/2004] [Indexed: 11/27/2022]