Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ofran Y, Mysore V, Rost B. Prediction of DNA-binding residues from sequence. ACTA ACUST UNITED AC 2007;23:i347-53. [PMID: 17646316 DOI: 10.1093/bioinformatics/btm174] [Citation(s) in RCA: 117] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

For:	Ofran Y, Mysore V, Rost B. Prediction of DNA-binding residues from sequence. ACTA ACUST UNITED AC 2007;23:i347-53. [PMID: 17646316 DOI: 10.1093/bioinformatics/btm174] [Citation(s) in RCA: 117] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Number

Cited by Other Article(s)

101

Identification and Analysis of Binding Site Residues in Protein Complexes: Energy Based Approach. ACTA ACUST UNITED AC 2010. [DOI: 10.1007/978-3-642-14922-1_78] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

102

Rangwala H, Kauffman C, Karypis G. svmPRAT: SVM-based protein residue annotation toolkit. BMC Bioinformatics 2009;10:439. [PMID: 20028521 PMCID: PMC2805646 DOI: 10.1186/1471-2105-10-439] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2009] [Accepted: 12/22/2009] [Indexed: 11/10/2022] Open

103

Huang YF, Huang CC, Liu YC, Oyang YJ, Huang CK. DNA-binding residues and binding mode prediction with binding-mechanism concerned models. BMC Genomics 2009;10 Suppl 3:S23. [PMID: 19958487 PMCID: PMC2788376 DOI: 10.1186/1471-2164-10-s3-s23] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Abstract

Background

Protein-DNA interactions are essential for fundamental biological activities including DNA transcription, replication, packaging, repair and rearrangement. Proteins interacting with DNA can be classified into two categories of binding mechanisms - sequence-specific and non-specific binding. Protein-DNA specific binding provides a mechanism to recognize correct nucleotide base pairs for sequence-specific identification. Protein-DNA non-specific binding shows sequence independent interaction for accelerated targeting by interacting with DNA backbone. Both sequence-specific and non-specific binding residues contribute to their roles for interaction.

Results

The proposed framework has two stage predictors: DNA-binding residues prediction and binding mode prediction. In the first stage - DNA-binding residues prediction, the predictor for DNA specific binding residues achieves 96.45% accuracy with 50.14% sensitivity, 99.31% specificity, 81.70% precision, and 62.15% F-measure. The predictor for DNA non-specific binding residues achieves 89.14% accuracy with 53.06% sensitivity, 95.25% specificity, 65.47% precision, and 58.62% F-measure. While combining prediction results of sequence-specific and non-specific binding residues with OR operation, the predictor achieves 89.26% accuracy with 56.86% sensitivity, 95.63% specificity, 71.92% precision, and 63.51% F-measure. In the second stage, protein-DNA binding mode prediction achieves 75.83% accuracy while using support vector machine with multi-class prediction.

Conclusion

This article presents the design of a sequence based predictor aiming to identify sequence-specific and non-specific binding residues in a transcription factor with DNA binding-mechanism concerned. The protein-DNA binding mode prediction was introduced to help improve DNA-binding residues prediction. In addition, the results of this study will help with the design of binding-mechanism concerned predictors for other families of proteins interacting with DNA.

Collapse

104

Gao M, Skolnick J. A threading-based method for the prediction of DNA-binding proteins with application to the human genome. PLoS Comput Biol 2009;5:e1000567. [PMID: 19911048 PMCID: PMC2770119 DOI: 10.1371/journal.pcbi.1000567] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2009] [Accepted: 10/16/2009] [Indexed: 11/18/2022] Open

Abstract

Diverse mechanisms for DNA-protein recognition have been elucidated in numerous atomic complex structures from various protein families. These structural data provide an invaluable knowledge base not only for understanding DNA-protein interactions, but also for developing specialized methods that predict the DNA-binding function from protein structure. While such methods are useful, a major limitation is that they require an experimental structure of the target as input. To overcome this obstacle, we develop a threading-based method, DNA-Binding-Domain-Threader (DBD-Threader), for the prediction of DNA-binding domains and associated DNA-binding protein residues. Our method, which uses a template library composed of DNA-protein complex structures, requires only the target protein's sequence. In our approach, fold similarity and DNA-binding propensity are employed as two functional discriminating properties. In benchmark tests on 179 DNA-binding and 3,797 non-DNA-binding proteins, using templates whose sequence identity is less than 30% to the target, DBD-Threader achieves a sensitivity/precision of 56%/86%. This performance is considerably better than the standard sequence comparison method PSI-BLAST and is comparable to DBD-Hunter, which requires an experimental structure as input. Moreover, for over 70% of predicted DNA-binding domains, the backbone Root Mean Square Deviations (RMSDs) of the top-ranked structural models are within 6.5 A of their experimental structures, with their associated DNA-binding sites identified at satisfactory accuracy. Additionally, DBD-Threader correctly assigned the SCOP superfamily for most predicted domains. To demonstrate that DBD-Threader is useful for automatic function annotation on a large-scale, DBD-Threader was applied to 18,631 protein sequences from the human genome; 1,654 proteins are predicted to have DNA-binding function. Comparison with existing Gene Ontology (GO) annotations suggests that approximately 30% of our predictions are new. Finally, we present some interesting predictions in detail. In particular, it is estimated that approximately 20% of classic zinc finger domains play a functional role not related to direct DNA-binding.

Collapse

105

Song J, Tan H, Mahmood K, Law RHP, Buckle AM, Webb GI, Akutsu T, Whisstock JC. Prodepth: predict residue depth by support vector regression approach from protein sequences only. PLoS One 2009;4:e7072. [PMID: 19759917 PMCID: PMC2742725 DOI: 10.1371/journal.pone.0007072] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2009] [Accepted: 08/20/2009] [Indexed: 11/24/2022] Open

Abstract

Residue depth (RD) is a solvent exposure measure that complements the information provided by conventional accessible surface area (ASA) and describes to what extent a residue is buried in the protein structure space. Previous studies have established that RD is correlated with several protein properties, such as protein stability, residue conservation and amino acid types. Accurate prediction of RD has many potentially important applications in the field of structural bioinformatics, for example, facilitating the identification of functionally important residues, or residues in the folding nucleus, or enzyme active sites from sequence information. In this work, we introduce an efficient approach that uses support vector regression to quantify the relationship between RD and protein sequence. We systematically investigated eight different sequence encoding schemes including both local and global sequence characteristics and examined their respective prediction performances. For the objective evaluation of our approach, we used 5-fold cross-validation to assess the prediction accuracies and showed that the overall best performance could be achieved with a correlation coefficient (CC) of 0.71 between the observed and predicted RD values and a root mean square error (RMSE) of 1.74, after incorporating the relevant multiple sequence features. The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally. We highlight two examples as a comparison in order to illustrate the applicability of this approach. We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling. This method might prove to be a powerful tool for sequence analysis.

Collapse

106

Kim R, Guo JT. PDA: an automatic and comprehensive analysis program for protein-DNA complex structures. BMC Genomics 2009;10 Suppl 1:S13. [PMID: 19594872 PMCID: PMC2709256 DOI: 10.1186/1471-2164-10-s1-s13] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Knowledge of protein-DNA interactions at the structural-level can provide insights into the mechanisms of protein-DNA recognition and gene regulation. Although over 1400 protein-DNA complex structures have been deposited into Protein Data Bank (PDB), the structural details of protein-DNA interactions are generally not available. In addition, current approaches to comparison of protein-DNA complexes are mainly based on protein sequence similarity while the DNA sequences are not taken into account. With the number of experimentally-determined protein-DNA complex structures increasing, there is a need for an automatic program to analyze the protein-DNA complex structures and to provide comprehensive structural information for the benefit of the whole research community.

RESULTS

We developed an automatic and comprehensive protein-DNA complex structure analysis program, PDA (for protein-DNA complex structure analyzer). PDA takes PDB files as inputs and performs structural analysis that includes 1) whole protein-DNA complex structure restoration, especially the reconstruction of double-stranded DNA structures; 2) an efficient new approach for DNA base-pair detection; 3) systematic annotation of protein-DNA interactions; and 4) extraction of DNA subsequences involved in protein-DNA interactions and identification of protein-DNA binding units. Protein-DNA complex structures in current PDB were processed and analyzed with our PDA program and the analysis results were stored in a database. A dataset useful for studying protein-DNA interactions involved in gene regulation was generated using both protein and DNA sequences as well as the contact information of the complexes. WebPDA was developed to provide a web interface for using PDA and for data retrieval.

CONCLUSION

PDA is a computational tool for structural annotations of protein-DNA complexes. It provides a useful resource for investigating protein-DNA interactions. Data from the PDA analysis can also facilitate the classification of protein-DNA complexes and provide insights into rational design of benchmarks. The PDA program is freely available at http://bioinfozen.uncc.edu/webpda.

Collapse

107

Chu WY, Huang YF, Huang CC, Cheng YS, Huang CK, Oyang YJ. ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors. Nucleic Acids Res 2009;37:W396-401. [PMID: 19483101 PMCID: PMC2703882 DOI: 10.1093/nar/gkp449] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

108

Andrabi M, Mizuguchi K, Sarai A, Ahmad S. Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks. BMC STRUCTURAL BIOLOGY 2009;9:30. [PMID: 19439068 PMCID: PMC2693520 DOI: 10.1186/1472-6807-9-30] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/12/2008] [Accepted: 05/13/2009] [Indexed: 11/18/2022]

Abstract

Background

DNA recognition by proteins is one of the most important processes in living systems. Therefore, understanding the recognition process in general, and identifying mutual recognition sites in proteins and DNA in particular, carries great significance. The sequence and structural dependence of DNA-binding sites in proteins has led to the development of successful machine learning methods for their prediction. However, all existing machine learning methods predict DNA-binding sites, irrespective of their target sequence and hence, none of them is helpful in identifying specific protein-DNA contacts. In this work, we formulate the problem of predicting specific DNA-binding sites in terms of contacts between the residue environments of proteins and the identity of a mononucleotide or a dinucleotide step in DNA. The aim of this work is to take a protein sequence or structural features as inputs and predict for each amino acid residue if it binds to DNA at locations identified by one of the four possible mononucleotides or one of the 10 unique dinucleotide steps. Contact predictions are made at various levels of resolution viz. in terms of side chain, backbone and major or minor groove atoms of DNA.

Results

Significant differences in residue preferences for specific contacts are observed, which combined with other features, lead to promising levels of prediction. In general, PSSM-based predictions, supported by secondary structure and solvent accessibility, achieve a good predictability of ~70–80%, measured by the area under the curve (AUC) of ROC graphs. The major and minor groove contact predictions stood out in terms of their poor predictability from sequences or PSSM, which was very strongly (>20 percentage points) compensated by the addition of secondary structure and solvent accessibility information, revealing a predominant role of local protein structure in the major/minor groove DNA-recognition. Following a detailed analysis of results, a web server to predict mononucleotide and dinucleotide-step contacts using PSSM was developed and made available at or .

Conclusion

Most residue-nucleotide contacts can be predicted with high accuracy using only sequence and evolutionary information. Major and minor groove contacts, however, depend profoundly on the local structure. Overall, this study takes us a step closer to the ultimate goal of predicting mutual recognition sites in protein and DNA sequences.

Collapse

109

Nimrod G, Szilágyi A, Leslie C, Ben-Tal N. Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. J Mol Biol 2009;387:1040-53. [PMID: 19233205 PMCID: PMC2726711 DOI: 10.1016/j.jmb.2009.02.023] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2008] [Revised: 02/12/2009] [Accepted: 02/12/2009] [Indexed: 11/22/2022]

110

Wrzeszczynski KO, Rost B. Cell cycle kinases predicted from conserved biophysical properties. Proteins 2009;74:655-68. [PMID: 18704950 PMCID: PMC2629806 DOI: 10.1002/prot.22181] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

111

A Kernel Framework for Protein Residue Annotation. ACTA ACUST UNITED AC 2009. [DOI: 10.1007/978-3-642-01307-2_40] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

112

Slama P, Filippis I, Lappe M. Detection of protein catalytic residues at high precision using local network properties. BMC Bioinformatics 2008;9:517. [PMID: 19055796 PMCID: PMC2632678 DOI: 10.1186/1471-2105-9-517] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2008] [Accepted: 12/04/2008] [Indexed: 12/02/2022] Open

Abstract

Background

Identifying the active site of an enzyme is a crucial step in functional studies. While protein sequences and structures can be experimentally characterized, determining which residues build up an active site is not a straightforward process. In the present study a new method for the detection of protein active sites is introduced. This method uses local network descriptors derived from protein three-dimensional structures to determine whether a residue is part of an active site. It thus does not involve any sequence alignment or structure similarity to other proteins. A scoring function is elaborated over a set of more than 220 proteins having different structures and functions, in order to detect protein catalytic sites with a high precision, i.e. with a minimal rate of false positives.

Results

The scoring function was based on the counts of first-neighbours on side-chain contacts, third-neighbours and residue type. Precision of the detection using this function was 28.1%, which represents a more than three-fold increase compared to combining closeness centrality with residue surface accessibility, a function which was proposed in recent years. The performance of the scoring function was also analysed into detail over a smaller set of eight proteins. For the detection of 'functional' residues, which were involved either directly in catalytic activity or in the binding of substrates, precision reached a value of 72.7% on this second set. These results suggested that our scoring function was effective at detecting not only catalytic residues, but also any residue that is part of the functional site of a protein.

Conclusion

As having been validated on the majority of known structural families, this method should prove useful for the detection of active sites in any protein with unknown function, and for direct application to the design of site-directed mutagenesis experiments.

Collapse

113

Wu J, Liu H, Duan X, Ding Y, Wu H, Bai Y, Sun X. Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature. ACTA ACUST UNITED AC 2008;25:30-5. [PMID: 19008251 PMCID: PMC2638931 DOI: 10.1093/bioinformatics/btn583] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]

114

Punta M, Ofran Y. The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS Comput Biol 2008;4:e1000160. [PMID: 18974821 PMCID: PMC2518264 DOI: 10.1371/journal.pcbi.1000160] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

115

Ahmad S. Sequence-dependence and prediction of nucleotide solvent accessibility in double stranded DNA. Gene 2008;428:25-30. [PMID: 18955120 DOI: 10.1016/j.gene.2008.09.031] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2008] [Revised: 09/06/2008] [Accepted: 09/30/2008] [Indexed: 10/21/2022]

116

Ahmad S, Keskin O, Sarai A, Nussinov R. Protein-DNA interactions: structural, thermodynamic and clustering patterns of conserved residues in DNA-binding proteins. Nucleic Acids Res 2008;36:5922-32. [PMID: 18801847 PMCID: PMC2566867 DOI: 10.1093/nar/gkn573] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

117

Song J, Tan H, Takemoto K, Akutsu T. HSEpred: predict half-sphere exposure from protein sequences. Bioinformatics 2008;24:1489-97. [DOI: 10.1093/bioinformatics/btn222] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

118

An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins. Amino Acids 2008;36:167-75. [DOI: 10.1007/s00726-008-0044-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2007] [Accepted: 02/07/2008] [Indexed: 10/22/2022]