1
|
Pal A, Chakrabarti P, Dey S. ProDFace: A web-tool for the dissection of protein-DNA interfaces. Front Mol Biosci 2022; 9:978310. [PMID: 36148013 PMCID: PMC9486321 DOI: 10.3389/fmolb.2022.978310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Accepted: 08/09/2022] [Indexed: 11/30/2022] Open
Abstract
Protein-DNA interactions play a crucial role in gene expression and regulation. Identifying the DNA binding surface of proteins has long been a challenge–in comparison to protein-protein interactions, limited progress has been made in the development of efficient DNA binding site prediction and protein-DNA docking methods. Here we present ProDFace, a web tool that characterizes the binding region of a protein-DNA complex based on amino acid propensity, hydrogen bond (HB) donor capacity (number of solvent accessible HB donor groups), sequence conservation at the interface core and rim region, and geometry. The program takes as input the structure of a protein-DNA complex in PDB (Protein Data Bank) format, and outputs various physicochemical and geometric parameters of the interface, as well as conservation of the interface residues in the protein component. Values are provided for the whole interface, and after dissecting it into core and rim regions. Details of water mediated HBs between protein and DNA, potential HB donor groups present at the binding surface of protein, and conserved interface residues are also provided as downloadable text files. These parameters can be useful in evaluating and validating protein-DNA docking solutions, structures derived from simulation as well as solutions from the available prediction tools, and facilitate the development of more efficient prediction methods. The web-tool is freely available at structbioinfo.iitj.ac.in/resources/bioinfo/pd_interface.
Collapse
Affiliation(s)
- Arumay Pal
- School of Bioengineering, Vellore Institute of Technology, Bhopal, India
| | | | - Sucharita Dey
- Department of Bioscience and Bioengineering, Indian Institute of Technology Jodhpur, Karwar, India
- *Correspondence: Sucharita Dey,
| |
Collapse
|
2
|
杨 爽. Analysis of Residue Interface Preference in Protein-DNA Complexes and Its Application in Recognition of Binding Interface. Biophysics (Nagoya-shi) 2022. [DOI: 10.12677/biphy.2022.104006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
3
|
Satheesh D, Rajendran A, Chithra K. Protein-ligand binding interactions of imidazolium salts with SARS CoV-2. Heliyon 2020; 6:e05544. [PMID: 33230487 PMCID: PMC7674018 DOI: 10.1016/j.heliyon.2020.e05544] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 09/06/2020] [Accepted: 11/16/2020] [Indexed: 12/17/2022] Open
Abstract
The disease called severe acute respiratory syndrome (SARS) is a lifestyle intimidating viral contamination affected by a positive, single stranded novel RNA virus (COVID-2019) from the enveloped coronaviruse family. The COVID-2019 virus has affected many people, scattering promptly, and researchers are attempting to find out medicines for its effectual cure in all over the globe. Chloroquine (ChQ) and its derivatives, an older drug used for the cure of malaria, is exposed to encompass a perceptible feasibility and commendable well-being in opposition to SARS CoV-2 associated pneumonia clinical trials conducted in China. Later on, a few investigations have been directed to find and present SARS CoV-2 antiviral medications. The aim of this present work deals with the potential binding interactions of some imidazolium salts with Nsp9 (Nonstructural protein 9) RNA binding protein of SARS CoV-2.
Collapse
Affiliation(s)
- Dhurairaj Satheesh
- Research and Development Centre, Bharathiar University, Coimbatore 641 046, Tamilnadu, India
- PG and Research Department of Chemistry, Loganatha Narayanaswamy Government College (Autonomous), Ponneri 601 204, Tamilnadu, India
| | - Annamalai Rajendran
- Department of Chemistry, Sir Theagaraya College, Chennai 600 021, Tamilnadu, India
| | - Kasi Chithra
- PG and Research Department of Chemistry, Loganatha Narayanaswamy Government College (Autonomous), Ponneri 601 204, Tamilnadu, India
| |
Collapse
|
4
|
Qiu L, Zou X. Scoring Functions for Protein-RNA Complex Structure Prediction: Advances, Applications, and Future Directions. COMMUNICATIONS IN INFORMATION AND SYSTEMS 2020; 20:1-22. [PMID: 33867869 DOI: 10.4310/cis.2020.v20.n1.a1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Protein-RNA interaction is among the most essential of biological events in living cells, being involved in protein synthesizing, RNA processing and transport, DNA transcription, and regulation of gene expression, and many other critical bio-molecular activities. A thorough understanding of this interaction is of paramount importance in fundamental study of a variety of vital cellular processes and therapeutic application for remedy of a broad range of diseases. Experimental high-resolution 3D structure determination is the primary source of knowledge for protein-RNA complexes. However, due to technical limitations, the existing techniques for experimental structure determination couldn't match the demand from fast growing interest in academia and industry. This problem necessitates the alternative high-throughput computational method for protein-RNA complex structure prediction. Similar to the in silico methods used for protein-protein and protein-DNA interactions, a reliable prediction of protein-RNA complex structure requires a scoring function with commensurate discriminatory power. Derived from determined structures and purposed to predict the to-be-determined structures, the scoring function is not only a predictive tool but also a gauge of our knowledge of protein-RNA interaction. In this review, we present an overview of the status of existing scoring functions and the scientific principle behind their constructions as well as their strengths and limitations. Finally, we will discuss about future directions of the scoring function development for protein-RNA structure prediction.
Collapse
Affiliation(s)
- Liming Qiu
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri 65211
| | - Xiaoqin Zou
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri 65211.,Department of Physics & Astronomy, University of Missouri, Columbia, Missouri 65211.,Department of Biochemistry, University of Missouri, Columbia, Missouri 65211.,Informatics Institute, University of Missouri, Columbia, Missouri 65211
| |
Collapse
|
5
|
Lapillo M, Tuccinardi T, Martinelli A, Macchia M, Giordano A, Poli G. Extensive Reliability Evaluation of Docking-Based Target-Fishing Strategies. Int J Mol Sci 2019; 20:ijms20051023. [PMID: 30818741 PMCID: PMC6429110 DOI: 10.3390/ijms20051023] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 02/21/2019] [Accepted: 02/22/2019] [Indexed: 01/03/2023] Open
Abstract
The development of target-fishing approaches, aimed at identifying the possible protein targets of a small molecule, represents a hot topic in medicinal chemistry. A successful target-fishing approach would allow for the elucidation of the mechanism of action of all therapeutically interesting compounds for which the actual target is still unknown. Moreover, target-fishing would be essential for preventing adverse effects of drug candidates, by predicting their potential off-targets, and it would speed up drug repurposing campaigns. However, due to the huge number of possible protein targets that a small-molecule might interact with, experimental target-fishing approaches are out of reach. In silico target-fishing represents a valuable alternative, and examples of receptor-based approaches, exploiting the large number of crystallographic protein structures determined to date, have been reported in the literature. To the best of our knowledge, no proper evaluation of such approaches is, however, reported yet. In the present work, we extensively assessed the reliability of docking-based target-fishing strategies. For this purpose, a set of X-ray structures belonging to different targets was selected, and a dataset of compounds, including 10 experimentally active ligands for each target, was created. A target-fishing benchmark database was then obtained, and used to assess the performance of 13 different docking procedures, in identifying the correct target of the dataset ligands. Moreover, a consensus docking-based target-fishing strategy was developed and evaluated. The analysis highlighted that specific features of the target proteins could affect the reliability of the protocol, which however, proved to represent a valuable tool in the proper applicability domain. Our study represents the first extensive performance assessment of docking-based target-fishing approaches, paving the way for the development of novel efficient receptor-based target fishing strategies.
Collapse
Affiliation(s)
| | | | | | - Marco Macchia
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy.
| | - Antonio Giordano
- Sbarro Institute for Cancer Research and Molecular Medicine, Center for Biotechnology, College of Science and Technology, Temple University, Philadelphia, PA 19122, USA.
- Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy.
| | - Giulio Poli
- Department of Pharmacy, University of Pisa, 56126 Pisa, Italy.
| |
Collapse
|
6
|
Smolinska K, Pacholczyk M. EMQIT: a machine learning approach for energy based PWM matrix quality improvement. Biol Direct 2017; 12:17. [PMID: 28764727 PMCID: PMC5539975 DOI: 10.1186/s13062-017-0189-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 07/17/2017] [Indexed: 11/10/2022] Open
Abstract
Background Transcription factor binding affinities to DNA play a key role for the gene regulation. Learning the specificity of the mechanisms of binding TFs to DNA is important both to experimentalists and theoreticians. With the development of high-throughput methods such as, e.g., ChiP-seq the need to provide unbiased models of binding events has been made apparent. We present EMQIT a modification to the approach introduced by Alamanova et al. and later implemented as 3DTF server. We observed that tuning of Boltzmann factor weights, used for conversion of calculated energies to nucleotide probabilities, has a significant impact on the quality of the associated PWM matrix. Results Consequently, we proposed to use receiver operator characteristics curves and the 10-fold cross-validation to learn best weights using experimentally verified data from TRANSFAC database. We applied our method to data available for various TFs. We verified the efficiency of detecting TF binding sites by the 3DTF matrices improved with our technique using experimental data from the TRANSFAC database. The comparison showed a significant similarity and comparable performance between the improved and the experimental matrices (TRANSFAC). Improved 3DTF matrices achieved significantly higher AUC values than the original 3DTF matrices (at least by 0.1) and, at the same time, detected notably more experimentally verified TFBSs. Conclusions The resulting new improved PWM matrices for analyzed factors show similarity to TRANSFAC matrices. Matrices had comparable predictive capabilities. Moreover, improved PWMs achieve better results than matrices downloaded from 3DTF server. Presented approach is general and applicable to any energy-based matrices. EMQIT is available online at http://biosolvers.polsl.pl:3838/emqit. Reviewers This article was reviewed by Oliviero Carugo, Marek Kimmel and István Simon. Electronic supplementary material The online version of this article (doi:10.1186/s13062-017-0189-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Karolina Smolinska
- Institute of Automatic Control, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
| | - Marcin Pacholczyk
- Institute of Automatic Control, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland.
| |
Collapse
|
7
|
Farrel A, Murphy J, Guo JT. Structure-based prediction of transcription factor binding specificity using an integrative energy function. Bioinformatics 2017; 32:i306-i313. [PMID: 27307632 PMCID: PMC4908348 DOI: 10.1093/bioinformatics/btw264] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
UNLABELLED Transcription factors (TFs) regulate gene expression through binding to specific target DNA sites. Accurate annotation of transcription factor binding sites (TFBSs) at genome scale represents an essential step toward our understanding of gene regulation networks. In this article, we present a structure-based method for computational prediction of TFBSs using a novel, integrative energy (IE) function. The new energy function combines a multibody (MB) knowledge-based potential and two atomic energy terms (hydrogen bond and π interaction) that might not be accurately captured by the knowledge-based potential owing to the mean force nature and low count problem. We applied the new energy function to the TFBS prediction using a non-redundant dataset that consists of TFs from 12 different families. Our results show that the new IE function improves the prediction accuracy over the knowledge-based, statistical potentials, especially for homeodomain TFs, the second largest TF family in mammals. CONTACT jguo4@uncc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alvin Farrel
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Jonathan Murphy
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
8
|
Farrel A, Guo JT. An efficient algorithm for improving structure-based prediction of transcription factor binding sites. BMC Bioinformatics 2017; 18:342. [PMID: 28715997 PMCID: PMC5514533 DOI: 10.1186/s12859-017-1755-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2017] [Accepted: 07/12/2017] [Indexed: 01/07/2023] Open
Abstract
Background Gene expression is regulated by transcription factors binding to specific target DNA sites. Understanding how and where transcription factors bind at genome scale represents an essential step toward our understanding of gene regulation networks. Previously we developed a structure-based method for prediction of transcription factor binding sites using an integrative energy function that combines a knowledge-based multibody potential and two atomic energy terms. While the method performs well, it is not computationally efficient due to the exponential increase in the number of binding sequences to be evaluated for longer binding sites. In this paper, we present an efficient pentamer algorithm by splitting DNA binding sequences into overlapping fragments along with a simplified integrative energy function for transcription factor binding site prediction. Results A DNA binding sequence is split into overlapping pentamers (5 base pairs) for calculating transcription factor-pentamer interaction energy. To combine the results from overlapping pentamer scores, we developed two methods, Kmer-Sum and PWM (Position Weight Matrix) stacking, for full-length binding motif prediction. Our results show that both Kmer-Sum and PWM stacking in the new pentamer approach along with a simplified integrative energy function improved transcription factor binding site prediction accuracy and dramatically reduced computation time, especially for longer binding sites. Conclusion Our new fragment-based pentamer algorithm and simplified energy function improve both efficiency and accuracy. To our knowledge, this is the first fragment-based method for structure-based transcription factor binding sites prediction. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1755-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alvin Farrel
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC, 28223, USA.
| |
Collapse
|
9
|
Qin W, Zhao G, Carson M, Jia C, Lu H. Knowledge-based three-body potential for transcription factor binding site prediction. IET Syst Biol 2016; 10:23-9. [PMID: 26816396 DOI: 10.1049/iet-syb.2014.0066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
A structure-based statistical potential is developed for transcription factor binding site (TFBS) prediction. Besides the direct contact between amino acids from TFs and DNA bases, the authors also considered the influence of the neighbouring base. This three-body potential showed better discriminate powers than the two-body potential. They validate the performance of the potential in TFBS identification, binding energy prediction and binding mutation prediction.
Collapse
Affiliation(s)
- Wenyi Qin
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, USA
| | - Guijun Zhao
- Key Laboratory of Molecular Embryology, Ministry of Health & Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai 200040, People's Republic of China
| | - Matthew Carson
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, USA
| | - Caiyan Jia
- School of Computer and Information Technology & Beijing Key Lab of Traffic Data Analysis, Beijing Jiaotong University, Beijing, People's Republic of China
| | - Hui Lu
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, USA.
| |
Collapse
|
10
|
Jou JD, Jain S, Georgiev IS, Donald BR. BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design. J Comput Biol 2016; 23:413-24. [PMID: 26744898 DOI: 10.1089/cmb.2015.0194] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.
Collapse
Affiliation(s)
- Jonathan D Jou
- 1 Department of Computer Science, Duke University , Durham, North Carolina
| | - Swati Jain
- 1 Department of Computer Science, Duke University , Durham, North Carolina.,2 Department of Biochemistry, Duke University Medical Center , Durham, North Carolina.,3 Department of Computational Biology and Bioinformatics Program, Duke University , Durham, North Carolina
| | - Ivelin S Georgiev
- 1 Department of Computer Science, Duke University , Durham, North Carolina
| | - Bruce R Donald
- 1 Department of Computer Science, Duke University , Durham, North Carolina.,2 Department of Biochemistry, Duke University Medical Center , Durham, North Carolina.,4 Department of Chemistry, Duke University , Durham, North Carolina
| |
Collapse
|
11
|
Tuszynska I, Magnus M, Jonak K, Dawson W, Bujnicki JM. NPDock: a web server for protein-nucleic acid docking. Nucleic Acids Res 2015; 43:W425-30. [PMID: 25977296 PMCID: PMC4489298 DOI: 10.1093/nar/gkv493] [Citation(s) in RCA: 173] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 05/02/2015] [Indexed: 01/03/2023] Open
Abstract
Protein–RNA and protein–DNA interactions play fundamental roles in many biological processes. A detailed understanding of these interactions requires knowledge about protein–nucleic acid complex structures. Because the experimental determination of these complexes is time-consuming and perhaps futile in some instances, we have focused on computational docking methods starting from the separate structures. Docking methods are widely employed to study protein–protein interactions; however, only a few methods have been made available to model protein–nucleic acid complexes. Here, we describe NPDock (Nucleic acid–Protein Docking); a novel web server for predicting complexes of protein–nucleic acid structures which implements a computational workflow that includes docking, scoring of poses, clustering of the best-scored models and refinement of the most promising solutions. The NPDock server provides a user-friendly interface and 3D visualization of the results. The smallest set of input data consists of a protein structure and a DNA or RNA structure in PDB format. Advanced options are available to control specific details of the docking process and obtain intermediate results. The web server is available at http://genesilico.pl/NPDock.
Collapse
Affiliation(s)
- Irina Tuszynska
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland Institute of Informatics, University of Warsaw, Banacha 2, PL-02-097 Warsaw, Poland
| | - Marcin Magnus
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | - Katarzyna Jonak
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | - Wayne Dawson
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, ul. Ks. Trojdena 4, PL-02-109 Warsaw, Poland Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, ul. Umultowska 89, PL-61-614 Poznan, Poland
| |
Collapse
|
12
|
Jou JD, Jain S, Georgiev I, Donald BR. BWM*: A Novel, Provable, Ensemble-Based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design. LECTURE NOTES IN COMPUTER SCIENCE 2015. [DOI: 10.1007/978-3-319-16706-0_16] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
|
13
|
Pujato M, Kieken F, Skiles AA, Tapinos N, Fiser A. Prediction of DNA binding motifs from 3D models of transcription factors; identifying TLX3 regulated genes. Nucleic Acids Res 2014; 42:13500-12. [PMID: 25428367 PMCID: PMC4267649 DOI: 10.1093/nar/gku1228] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Proper cell functioning depends on the precise spatio-temporal expression of its genetic material. Gene expression is controlled to a great extent by sequence-specific transcription factors (TFs). Our current knowledge on where and how TFs bind and associate to regulate gene expression is incomplete. A structure-based computational algorithm (TF2DNA) is developed to identify binding specificities of TFs. The method constructs homology models of TFs bound to DNA and assesses the relative binding affinity for all possible DNA sequences using a knowledge-based potential, after optimization in a molecular mechanics force field. TF2DNA predictions were benchmarked against experimentally determined binding motifs. Success rates range from 45% to 81% and primarily depend on the sequence identity of aligned target sequences and template structures, TF2DNA was used to predict 1321 motifs for 1825 putative human TF proteins, facilitating the reconstruction of most of the human gene regulatory network. As an illustration, the predicted DNA binding site for the poorly characterized T-cell leukemia homeobox 3 (TLX3) TF was confirmed with gel shift assay experiments. TLX3 motif searches in human promoter regions identified a group of genes enriched in functions relating to hematopoiesis, tissue morphology, endocrine system and connective tissue development and function.
Collapse
Affiliation(s)
- Mario Pujato
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA
| | - Fabien Kieken
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA Macromolecular Therapeutics Development, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA
| | - Amanda A Skiles
- Molecular Neuroscience Laboratory, Geisinger Clinic, 100 North Academy Avenue, Danville, PA 17822, USA
| | - Nikos Tapinos
- Molecular Neuroscience Laboratory, Geisinger Clinic, 100 North Academy Avenue, Danville, PA 17822, USA
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA
| |
Collapse
|
14
|
Ashworth J, Plaisier CL, Lo FY, Reiss DJ, Baliga NS. Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction. PLoS One 2014; 9:e107863. [PMID: 25255272 PMCID: PMC4177876 DOI: 10.1371/journal.pone.0107863] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 08/16/2014] [Indexed: 11/18/2022] Open
Abstract
Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.
Collapse
Affiliation(s)
- Justin Ashworth
- Institute for Systems Biology, Seattle, Washington, United States of America
- * E-mail: (JA); (NB)
| | | | - Fang Yin Lo
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - David J. Reiss
- Institute for Systems Biology, Seattle, Washington, United States of America
| | - Nitin S. Baliga
- Institute for Systems Biology, Seattle, Washington, United States of America
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
- * E-mail: (JA); (NB)
| |
Collapse
|
15
|
On the use of knowledge-based potentials for the evaluation of models of protein-protein, protein-DNA, and protein-RNA interactions. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:77-120. [PMID: 24629186 DOI: 10.1016/b978-0-12-800168-4.00004-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Proteins are the bricks and mortar of cells, playing structural and functional roles. In order to perform their function, they interact with each other as well as with other biomolecules such as DNA or RNA. Therefore, to fathom the function of a protein, we require knowing its partners and the atomic details of its interactions (i.e., the structure of the complex). However, the amount of protein interactions with an experimentally determined three-dimensional structure is scarce. Therefore, computational techniques such as homology modeling are foremost to fill this gap. Protein interactions can be modeled using as templates the interactions of homologous proteins, if the structure of the complex is known, or using docking methods. In both approaches, the estimation of the quality of models is essential. There are several ways to address this problem. In this review, we focus on the use of knowledge-based potentials for the analysis of protein interactions. We describe the procedure to derive statistical potentials and split them into different energetic terms that can be used for different purposes. We extensively discuss the fields where knowledge-based potentials have been successfully applied to (1) model protein-protein, protein-DNA, and protein-RNA interactions and (2) predict binding sites (in the protein and in the DNA). Moreover, we provide ready-to-use resources for docking and benchmarking protein interactions.
Collapse
|
16
|
Computational structure analysis of biomacromolecule complexes by interface geometry. Comput Biol Chem 2013; 47:16-23. [DOI: 10.1016/j.compbiolchem.2013.06.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2012] [Revised: 06/11/2013] [Accepted: 06/12/2013] [Indexed: 11/18/2022]
|
17
|
Zhu Y, Zhou W, Dai DQ, Yan H. Identification of DNA-binding and protein-binding proteins using enhanced graph wavelet features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1017-1031. [PMID: 24334394 DOI: 10.1109/tcbb.2013.117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Interactions between biomolecules play an essential role in various biological processes. For predicting DNA-binding or protein-binding proteins, many machine-learning-based techniques have used various types of features to represent the interface of the complexes, but they only deal with the properties of a single atom in the interface and do not take into account the information of neighborhood atoms directly. This paper proposes a new feature representation method for biomolecular interfaces based on the theory of graph wavelet. The enhanced graph wavelet features (EGWF) provides an effective way to characterize interface feature through adding physicochemical features and exploiting a graph wavelet formulation. Particularly, graph wavelet condenses the information around the center atom, and thus enhances the discrimination of features of biomolecule binding proteins in the feature space. Experiment results show that EGWF performs effectively for predicting DNA-binding and protein-binding proteins in terms of Matthew's correlation coefficient (MCC) score and the area value under the receiver operating characteristic curve (AUC).
Collapse
Affiliation(s)
- Yuan Zhu
- Guangdong University of Finance and Economics, Guangzhou and Sun Yat-Sen University, Guangzhou
| | | | | | - Hong Yan
- City University of Hong Kong, Hong Kong and University of Sydney, Sydney
| |
Collapse
|
18
|
Abstract
Predicting binding sites of a transcription factor in the genome is an important, but challenging, issue in studying gene regulation. In the past decade, a large number of protein–DNA co-crystallized structures available in the Protein Data Bank have facilitated the understanding of interacting mechanisms between transcription factors and their binding sites. Recent studies have shown that both physics-based and knowledge-based potential functions can be applied to protein–DNA complex structures to deliver position weight matrices (PWMs) that are consistent with the experimental data. To further use the available structural models, the proposed Web server, PiDNA, aims at first constructing reliable PWMs by applying an atomic-level knowledge-based scoring function on numerous in silico mutated complex structures, and then using the PWM constructed by the structure models with small energy changes to predict the interaction between proteins and DNA sequences. With PiDNA, the users can easily predict the relative preference of all the DNA sequences with limited mutations from the native sequence co-crystallized in the model in a single run. More predictions on sequences with unlimited mutations can be realized by additional requests or file uploading. Three types of information can be downloaded after prediction: (i) the ranked list of mutated sequences, (ii) the PWM constructed by the favourable mutated structures, and (iii) any mutated protein–DNA complex structure models specified by the user. This study first shows that the constructed PWMs are similar to the annotated PWMs collected from databases or literature. Second, the prediction accuracy of PiDNA in detecting relatively high-specificity sites is evaluated by comparing the ranked lists against in vitro experiments from protein-binding microarrays. Finally, PiDNA is shown to be able to select the experimentally validated binding sites from 10 000 random sites with high accuracy. With PiDNA, the users can design biological experiments based on the predicted sequence specificity and/or request mutated structure models for further protein design. As well, it is expected that PiDNA can be incorporated with chromatin immunoprecipitation data to refine large-scale inference of in vivo protein–DNA interactions. PiDNA is available at: http://dna.bime.ntu.edu.tw/pidna.
Collapse
Affiliation(s)
- Chih-Kang Lin
- Center for Systems Biology, National Taiwan University, Taipei 106, Taiwan
| | | |
Collapse
|
19
|
Xu B, Schones DE, Wang Y, Liang H, Li G. A structural-based strategy for recognition of transcription factor binding sites. PLoS One 2013; 8:e52460. [PMID: 23320072 PMCID: PMC3540023 DOI: 10.1371/journal.pone.0052460] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2012] [Accepted: 11/19/2012] [Indexed: 12/30/2022] Open
Abstract
Scanning through genomes for potential transcription factor binding sites (TFBSs) is becoming increasingly important in this post-genomic era. The position weight matrix (PWM) is the standard representation of TFBSs utilized when scanning through sequences for potential binding sites. However, many transcription factor (TF) motifs are short and highly degenerate, and methods utilizing PWMs to scan for sites are plagued by false positives. Furthermore, many important TFs do not have well-characterized PWMs, making identification of potential binding sites even more difficult. One approach to the identification of sites for these TFs has been to use the 3D structure of the TF to predict the DNA structure around the TF and then to generate a PWM from the predicted 3D complex structure. However, this approach is dependent on the similarity of the predicted structure to the native structure. We introduce here a novel approach to identify TFBSs utilizing structure information that can be applied to TFs without characterized PWMs, as long as a 3D complex structure (TF/DNA) exists. This approach utilizes an energy function that is uniquely trained on each structure. Our approach leads to increased prediction accuracy and robustness compared with those using a more general energy function. The software is freely available upon request.
Collapse
Affiliation(s)
- Beisi Xu
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, The Chinese Academy of Sciences, Dalian, Liaoning, China
- Department of Microbiology, Immunology and Biochemistry, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
- Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Dustin E. Schones
- Department of Cancer Biology, Beckman Research Institute, City of Hope, Duarte, California, United States of America
| | - Yongmei Wang
- Department of Chemistry, University of Memphis, Memphis, Tennessee, United States of America
| | - Haojun Liang
- Department of Polymer Science and Engineering, University of Science and Technology of China, Hefei, Anhui, China
| | - Guohui Li
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, The Chinese Academy of Sciences, Dalian, Liaoning, China
- * E-mail:
| |
Collapse
|
20
|
Pujato M, MacCarthy T, Fiser A, Bergman A. The underlying molecular and network level mechanisms in the evolution of robustness in gene regulatory networks. PLoS Comput Biol 2013; 9:e1002865. [PMID: 23300434 PMCID: PMC3536627 DOI: 10.1371/journal.pcbi.1002865] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2012] [Accepted: 11/13/2012] [Indexed: 11/18/2022] Open
Abstract
Gene regulatory networks show robustness to perturbations. Previous works identified robustness as an emergent property of gene network evolution but the underlying molecular mechanisms are poorly understood. We used a multi-tier modeling approach that integrates molecular sequence and structure information with network architecture and population dynamics. Structural models of transcription factor-DNA complexes are used to estimate relative binding specificities. In this model, mutations in the DNA cause changes on two levels: (a) at the sequence level in individual binding sites (modulating binding specificity), and (b) at the network level (creating and destroying binding sites). We used this model to dissect the underlying mechanisms responsible for the evolution of robustness in gene regulatory networks. Results suggest that in sparse architectures (represented by short promoters), a mixture of local-sequence and network-architecture level changes are exploited. At the local-sequence level, robustness evolves by decreasing the probabilities of both the destruction of existent and generation of new binding sites. Meanwhile, in highly interconnected architectures (represented by long promoters), robustness evolves almost entirely via network level changes, deleting and creating binding sites that modify the network architecture.
Collapse
Affiliation(s)
- Mario Pujato
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, United States of America
- Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Thomas MacCarthy
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, United States of America
- Department of Applied Mathematics and Statistics, SUNY, Stony Brook, New York, United States of America
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, United States of America
- Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Aviv Bergman
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, United States of America
| |
Collapse
|
21
|
Takeda T, Corona RI, Guo JT. A knowledge-based orientation potential for transcription factor-DNA docking. Bioinformatics 2012; 29:322-30. [DOI: 10.1093/bioinformatics/bts699] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
|
22
|
Zhou W, Yan H. Alpha shape and Delaunay triangulation in studies of protein-related interactions. Brief Bioinform 2012. [PMID: 23193202 DOI: 10.1093/bib/bbs077] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
In recent years, more 3D protein structures have become available, which has made the analysis of large molecular structures much easier. There is a strong demand for geometric models for the study of protein-related interactions. Alpha shape and Delaunay triangulation are powerful tools to represent protein structures and have advantages in characterizing the surface curvature and atom contacts. This review presents state-of-the-art applications of alpha shape and Delaunay triangulation in the studies on protein-DNA, protein-protein, protein-ligand interactions and protein structure analysis.
Collapse
Affiliation(s)
- Weiqiang Zhou
- Department of Electronic Engineering, City University of Hong Kong, Tat Chee Avenue 83, Hong Kong.
| | | |
Collapse
|
23
|
Turner D, Kim R, Guo JT. TFinDit: transcription factor-DNA interaction data depository. BMC Bioinformatics 2012; 13:220. [PMID: 22943312 PMCID: PMC3483241 DOI: 10.1186/1471-2105-13-220] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 08/23/2012] [Indexed: 11/28/2022] Open
Abstract
Background One of the crucial steps in regulation of gene expression is the binding of transcription factor(s) to specific DNA sequences. Knowledge of the binding affinity and specificity at a structural level between transcription factors and their target sites has important implications in our understanding of the mechanism of gene regulation. Due to their unique functions and binding specificity, there is a need for a transcription factor-specific, structure-based database and corresponding web service to facilitate structural bioinformatics studies of transcription factor-DNA interactions, such as development of knowledge-based interaction potential, transcription factor-DNA docking, binding induced conformational changes, and the thermodynamics of protein-DNA interactions. Description TFinDit is a relational database and a web search tool for studying transcription factor-DNA interactions. The database contains annotated transcription factor-DNA complex structures and related data, such as unbound protein structures, thermodynamic data, and binding sequences for the corresponding transcription factors in the complex structures. TFinDit also provides a user-friendly interface and allows users to either query individual entries or generate datasets through culling the database based on one or more search criteria. Conclusions TFinDit is a specialized structural database with annotated transcription factor-DNA complex structures and other preprocessed data. We believe that this database/web service can facilitate the development and testing of TF-DNA interaction potentials and TF-DNA docking algorithms, and the study of protein-DNA recognition mechanisms.
Collapse
Affiliation(s)
- Daniel Turner
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | | | | |
Collapse
|
24
|
Wu J, Hong B, Takeda T, Guo JT. High performance transcription factor-DNA docking with GPU computing. Proteome Sci 2012; 10 Suppl 1:S17. [PMID: 22759575 PMCID: PMC3380734 DOI: 10.1186/1477-5956-10-s1-s17] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem.
Collapse
Affiliation(s)
- Jiadong Wu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia, 30332, USA.
| | | | | | | |
Collapse
|
25
|
Gabdoulline R, Eckweiler D, Kel A, Stegmaier P. 3DTF: a web server for predicting transcription factor PWMs using 3D structure-based energy calculations. Nucleic Acids Res 2012; 40:W180-5. [PMID: 22693215 PMCID: PMC3394331 DOI: 10.1093/nar/gks551] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
We present the webserver 3D transcription factor (3DTF) to compute position-specific weight matrices (PWMs) of transcription factors using a knowledge-based statistical potential derived from crystallographic data on protein–DNA complexes. Analysis of available structures that can be used to construct PWMs shows that there are hundreds of 3D structures from which PWMs could be derived, as well as thousands of proteins homologous to these. Therefore, we created 3DTF, which delivers binding matrices given the experimental or modeled protein–DNA complex. The webserver can be used by biologists to derive novel PWMs for transcription factors lacking known binding sites and is freely accessible at http://www.gene-regulation.com/pub/programs/3dtf/.
Collapse
Affiliation(s)
- R Gabdoulline
- Heinrich-Heine University of Duesseldorf, Universitaetstr. 1, 40225 Duesseldorf, Germany
| | | | | | | |
Collapse
|
26
|
Seddon G, Lounnas V, McGuire R, van den Bergh T, Bywater RP, Oliveira L, Vriend G. Drug design for ever, from hype to hope. J Comput Aided Mol Des 2012; 26:137-50. [PMID: 22252446 PMCID: PMC3268973 DOI: 10.1007/s10822-011-9519-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Accepted: 12/05/2011] [Indexed: 01/28/2023]
Abstract
In its first 25 years JCAMD has been disseminating a large number of techniques aimed at finding better medicines faster. These include genetic algorithms, COMFA, QSAR, structure based techniques, homology modelling, high throughput screening, combichem, and dozens more that were a hype in their time and that now are just a useful addition to the drug-designers toolbox. Despite massive efforts throughout academic and industrial drug design research departments, the number of FDA-approved new molecular entities per year stagnates, and the pharmaceutical industry is reorganising accordingly. The recent spate of industrial consolidations and the concomitant move towards outsourcing of research activities requires better integration of all activities along the chain from bench to bedside. The next 25 years will undoubtedly show a series of translational science activities that are aimed at a better communication between all parties involved, from quantum chemistry to bedside and from academia to industry. This will above all include understanding the underlying biological problem and optimal use of all available data.
Collapse
Affiliation(s)
| | - V. Lounnas
- CMBI, Radboud University Nijmegen Medical Centre, Geert Grooteplein 26–28, 6525 GA Nijmegen, The Netherlands
| | - R. McGuire
- BioAxis Research, Bergse Heihoek 56, Berghem, 5351 SL The Netherlands
| | - T. van den Bergh
- Bio-Prodict, Dreijenplein 10, 6703 HB Wageningen, The Netherlands
| | | | - L. Oliveira
- Sao Paulo Federal University (UNIFESP), Sao Paulo, Brazil
| | - G. Vriend
- CMBI, Radboud University Nijmegen Medical Centre, Geert Grooteplein 26–28, 6525 GA Nijmegen, The Netherlands
| |
Collapse
|
27
|
Zhou W, Yan H. Prediction of DNA-binding protein based on statistical and geometric features and support vector machines. Proteome Sci 2011; 9 Suppl 1:S1. [PMID: 22166014 PMCID: PMC3289070 DOI: 10.1186/1477-5956-9-s1-s1] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background Previous studies on protein-DNA interaction mostly focused on the bound structure of DNA-binding proteins but few paid enough attention to the unbound structures. As more new proteins are discovered, it is useful and imperative to develop algorithms for the functional prediction of unbound proteins. In our work, we apply an alpha shape model to represent the surface structure of the protein-DNA complex and extract useful statistical and geometric features, and use structural alignment and support vector machines for the prediction of unbound DNA-binding proteins. Results The performance of our method is evaluated by discriminating a set of 104 DNA-binding proteins from 401 non-DNA-binding proteins. In the same test, the proposed method outperforms the other method using conditional probability. The results achieved by our proposed method for; precision, 83.33%; accuracy, 86.53%; and MCC, 0.5368 demonstrate its good performance. Conclusions In this study we develop an effective method for the prediction of protein-DNA interactions based on statistical and geometric features and support vector machines. Our results show that interface surface features play an important role in protein-DNA interaction. Our technique is able to predict unbound DNA-binding protein and discriminatory DNA-binding proteins from proteins that bind with other molecules.
Collapse
Affiliation(s)
- Weiqiang Zhou
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong.
| | | |
Collapse
|
28
|
Yang X, Yan Y. Statistical investigation of position-specific deformation pattern of nucleosome DNA based on multiple conformational properties. Bioinformation 2011; 7:120-4. [PMID: 22125381 PMCID: PMC3218313 DOI: 10.6026/97320630007120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Accepted: 09/11/2011] [Indexed: 11/23/2022] Open
Abstract
The histone octamer induced bending of DNA into the super-helix structure in nucleosome core particle, is very unique and vital for DNA packing into chromatin. We collected 48 nucleosome crystal structures from PDB and applied a multivariate analysis on the nucleosome structural data. Based on the anisotropic nature of DNA structure, a principal conformational subspace (PCS) is derived from multiple properties to represent the most significant variances of nucleosome DNA structures. The coupling of base pair-oriented parameters with sugar phosphate backbone parameters presented in principal dimensionalities reveals two main deformation modes that have supplemented the existing physical model. By using sequence alignment-based statistics, a positiondependent conformational map for the super-helical DNA path is established. The result shows that the crystal structures of nucleosome DNA have much consistency in position-specific structural variations and certain periodicity is found to exist in these variations. Thus, the positions with obvious deformation patterns along the DNA path in nucleosome core particle are relatively conservative from the perspective of statistics.
Collapse
Affiliation(s)
- Xi Yang
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
| | - Yan Yan
- Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
- School of Electrical and Information Engineering, University of Sydney, NSW 2006, Australia
| |
Collapse
|
29
|
Hall BM, Vaughn EE, Begaye AR, Cordes MHJ. Reengineering Cro protein functional specificity with an evolutionary code. J Mol Biol 2011; 413:914-28. [PMID: 21945527 DOI: 10.1016/j.jmb.2011.08.056] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Revised: 08/13/2011] [Accepted: 08/29/2011] [Indexed: 11/17/2022]
Abstract
Cro proteins from different lambdoid bacteriophages are extremely variable in their target consensus DNA sequences and constitute an excellent model for evolution of transcription factor specificity. We experimentally tested a bioinformatically derived evolutionary code relating switches between pairs of amino acids at three recognition helix sites in Cro proteins to switches between pairs of nucleotide bases in the cognate consensus DNA half-sites. We generated all eight possible code variants of bacteriophage λ Cro and used electrophoretic mobility shift assays to compare binding of each variant to its own putative cognate site and to the wild-type cognate site; we also tested the wild-type protein against all eight DNA sites. Each code variant showed stronger binding to its putative cognate site than to the wild-type site, except some variants containing proline at position 27; each also bound its cognate site better than wild-type Cro bound the same site. Most code variants, however, displayed poorer affinity and specificity than wild-type λ Cro. Fluorescence anisotropy assays on λ Cro and the triple code variant (PSQ) against the two cognate sites confirmed the switch in specificity and showed larger apparent effects on binding affinity and specificity. Bacterial one-hybrid assays of λ Cro and PSQ against libraries of sequences with a single randomized half-site showed the expected switches in specificity at two of three coded positions and no clear switches in specificity at noncoded positions. With a few caveats, these results confirm that the proposed Cro evolutionary code can be used to reengineer Cro specificity.
Collapse
Affiliation(s)
- Branwen M Hall
- Department of Chemistry and Biochemistry, University of Arizona, Tucson, AZ 85721, USA
| | | | | | | |
Collapse
|
30
|
Shazman S, Elber G, Mandel-Gutfreund Y. From face to interface recognition: a differential geometric approach to distinguish DNA from RNA binding surfaces. Nucleic Acids Res 2011; 39:7390-9. [PMID: 21693557 PMCID: PMC3177183 DOI: 10.1093/nar/gkr395] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Protein nucleic acid interactions play a critical role in all steps of the gene expression pathway. Nucleic acid (NA) binding proteins interact with their partners, DNA or RNA, via distinct regions on their surface that are characterized by an ensemble of chemical, physical and geometrical properties. In this study, we introduce a novel methodology based on differential geometry, commonly used in face recognition, to characterize and predict NA binding surfaces on proteins. Applying the method on experimentally solved three-dimensional structures of proteins we successfully classify double-stranded DNA (dsDNA) from single-stranded RNA (ssRNA) binding proteins, with 83% accuracy. We show that the method is insensitive to conformational changes that occur upon binding and can be applicable for de novo protein-function prediction. Remarkably, when concentrating on the zinc finger motif, we distinguish successfully between RNA and DNA binding interfaces possessing the same binding motif even within the same protein, as demonstrated for the RNA polymerase transcription-factor, TFIIIA. In conclusion, we present a novel methodology to characterize protein surfaces, which can accurately tell apart dsDNA from an ssRNA binding interfaces. The strength of our method in recognizing fine-tuned differences on NA binding interfaces make it applicable for many other molecular recognition problems, with potential implications for drug design.
Collapse
Affiliation(s)
- Shula Shazman
- Department of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel
| | | | | |
Collapse
|
31
|
Zhou W, Yan H. A discriminatory function for prediction of protein-DNA interactions based on alpha shape modeling. Bioinformatics 2010; 26:2541-8. [DOI: 10.1093/bioinformatics/btq478] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
32
|
Alamanova D, Stegmaier P, Kel A. Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies. BMC Bioinformatics 2010; 11:225. [PMID: 20438625 PMCID: PMC2879287 DOI: 10.1186/1471-2105-11-225] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2009] [Accepted: 05/03/2010] [Indexed: 12/03/2022] Open
Abstract
Background Knowledge of transcription factor-DNA binding patterns is crucial for understanding gene transcription. Numerous DNA-binding proteins are annotated as transcription factors in the literature, however, for many of them the corresponding DNA-binding motifs remain uncharacterized. Results The position weight matrices (PWMs) of transcription factors from different structural classes have been determined using a knowledge-based statistical potential. The scoring function calibrated against crystallographic data on protein-DNA contacts recovered PWMs of various members of widely studied transcription factor families such as p53 and NF-κB. Where it was possible, extensive comparison to experimental binding affinity data and other physical models was made. Although the p50p50, p50RelB, and p50p65 dimers belong to the same family, particular differences in their PWMs were detected, thereby suggesting possibly different in vivo binding modes. The PWMs of p63 and p73 were computed on the basis of homology modeling and their performance was studied using upstream sequences of 85 p53/p73-regulated human genes. Interestingly, about half of the p63 and p73 hits reported by the Match algorithm in the altogether 126 promoters lay more than 2 kb upstream of the corresponding transcription start sites, which deviates from the common assumption that most regulatory sites are located more proximal to the TSS. The fact that in most of the cases the binding sites of p63 and p73 did not overlap with the p53 sites suggests that p63 and p73 could influence the p53 transcriptional activity cooperatively. The newly computed p50p50 PWM recovered 5 more experimental binding sites than the corresponding TRANSFAC matrix, while both PWMs showed comparable receiver operator characteristics. Conclusions A novel algorithm was developed to calculate position weight matrices from protein-DNA complex structures. The proposed algorithm was extensively validated against experimental data. The method was further combined with Homology Modeling to obtain PWMs of factors for which crystallographic complexes with DNA are not yet available. The performance of PWMs obtained in this work in comparison to traditionally constructed matrices demonstrates that the structure-based approach presents a promising alternative to experimental determination of transcription factor binding properties.
Collapse
Affiliation(s)
- Denitsa Alamanova
- BIOBASE GmbH, Halchtersche Strasse 33, D-38304 Wolfenbuettel, Germany.
| | | | | |
Collapse
|
33
|
Sherer EC. Antibiotics Targeting the Ribosome: Structure-Based Design and the Nobel Prize. ACTA ACUST UNITED AC 2010. [DOI: 10.1016/s1574-1400(10)06009-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
34
|
Xu B, Yang Y, Liang H, Zhou Y. An all-atom knowledge-based energy function for protein-DNA threading, docking decoy discrimination, and prediction of transcription-factor binding profiles. Proteins 2009; 76:718-30. [PMID: 19274740 PMCID: PMC2743280 DOI: 10.1002/prot.22384] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
How to make an accurate representation of protein-DNA interaction by an energy function is a long-standing unsolved problem in structural biology. Here, we modified a statistical potential based on the distance-scaled, finite ideal-gas reference state so that it is optimized for protein-DNA interactions. The changes include a volume-fraction correction to account for unmixable atom types in proteins and DNA in addition to the usage of a low-count correction, residue/base-specific atom types, and a shorter cutoff distance for protein-DNA interactions. The new statistical energy functions are tested in threading and docking decoy discriminations and prediction of protein-DNA binding affinities and transcription-factor binding profiles. The results indicate that new proposed energy functions are among the best in existing energy functions for protein-DNA interactions. The new energy functions are available as a web-server called DDNA 2.0 at http://sparks.informatics.iupui.edu. The server version was trained by the entire 212 protein-DNA complexes.
Collapse
Affiliation(s)
- Beisi Xu
- Department of Polymer Science and Engineering, University of Science and Technology of China, Hefei, Anhui, China
| | | | | | | |
Collapse
|
35
|
Cohen M, Potapov V, Schreiber G. Four distances between pairs of amino acids provide a precise description of their interaction. PLoS Comput Biol 2009; 5:e1000470. [PMID: 19680437 PMCID: PMC2715887 DOI: 10.1371/journal.pcbi.1000470] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Accepted: 07/15/2009] [Indexed: 11/18/2022] Open
Abstract
The three-dimensional structures of proteins are stabilized by the interactions between amino acid residues. Here we report a method where four distances are calculated between any two side chains to provide an exact spatial definition of their bonds. The data were binned into a four-dimensional grid and compared to a random model, from which the preference for specific four-distances was calculated. A clear relation between the quality of the experimental data and the tightness of the distance distribution was observed, with crystal structure data providing far tighter distance distributions than NMR data. Since the four-distance data have higher information content than classical bond descriptions, we were able to identify many unique inter-residue features not found previously in proteins. For example, we found that the side chains of Arg, Glu, Val and Leu are not symmetrical in respect to the interactions of their head groups. The described method may be developed into a function, which computationally models accurately protein structures.
Collapse
Affiliation(s)
- Mati Cohen
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Vladimir Potapov
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Gideon Schreiber
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
36
|
Kim R, Guo JT. PDA: an automatic and comprehensive analysis program for protein-DNA complex structures. BMC Genomics 2009; 10 Suppl 1:S13. [PMID: 19594872 PMCID: PMC2709256 DOI: 10.1186/1471-2164-10-s1-s13] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Knowledge of protein-DNA interactions at the structural-level can provide insights into the mechanisms of protein-DNA recognition and gene regulation. Although over 1400 protein-DNA complex structures have been deposited into Protein Data Bank (PDB), the structural details of protein-DNA interactions are generally not available. In addition, current approaches to comparison of protein-DNA complexes are mainly based on protein sequence similarity while the DNA sequences are not taken into account. With the number of experimentally-determined protein-DNA complex structures increasing, there is a need for an automatic program to analyze the protein-DNA complex structures and to provide comprehensive structural information for the benefit of the whole research community. RESULTS We developed an automatic and comprehensive protein-DNA complex structure analysis program, PDA (for protein-DNA complex structure analyzer). PDA takes PDB files as inputs and performs structural analysis that includes 1) whole protein-DNA complex structure restoration, especially the reconstruction of double-stranded DNA structures; 2) an efficient new approach for DNA base-pair detection; 3) systematic annotation of protein-DNA interactions; and 4) extraction of DNA subsequences involved in protein-DNA interactions and identification of protein-DNA binding units. Protein-DNA complex structures in current PDB were processed and analyzed with our PDA program and the analysis results were stored in a database. A dataset useful for studying protein-DNA interactions involved in gene regulation was generated using both protein and DNA sequences as well as the contact information of the complexes. WebPDA was developed to provide a web interface for using PDA and for data retrieval. CONCLUSION PDA is a computational tool for structural annotations of protein-DNA complexes. It provides a useful resource for investigating protein-DNA interactions. Data from the PDA analysis can also facilitate the classification of protein-DNA complexes and provide insights into rational design of benchmarks. The PDA program is freely available at http://bioinfozen.uncc.edu/webpda.
Collapse
Affiliation(s)
- RyangGuk Kim
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC 28223 USA.
| | | |
Collapse
|
37
|
Abstract
Several novel and established knowledge-based discriminatory function formulations and reference state derivations have been evaluated to identify parameter sets capable of distinguishing native and near-native biomolecular interactions from incorrect ones. We developed the r.m.r function, a novel atomic level radial distribution function with mean reference state that averages over all pairwise atom types from a reduced atom type composition, using experimentally determined intermolecular complexes in the Cambridge Structural Database (CSD) and the Protein Data Bank (PDB) as the information sources. We demonstrate that r.m.r had the best discriminatory accuracy and power for protein-small molecule and protein-DNA interactions, regardless of whether the native complex was included or excluded, from the test set. The superior performance of the r.m.r discriminatory function compared with seventeen alternative functions evaluated on publicly available test sets for protein-small molecule and protein-DNA interactions indicated that the function was not over optimized through back testing on a single class of biomolecular interactions. The initial success of the reduced composition and superior performance with the CSD as the distribution set over the PDB implies that further improvements and generality of the function are possible by deriving probabilities from subsets of the CSD, using structures that consist of only the atom types to be considered for given biomolecular interactions. The method is available as a web server module at http://protinfo.compbio.washington.edu.
Collapse
Affiliation(s)
- Brady Bernard
- Department of Bioengineering, University of Washington, Seattle, WA 98195, USA
| | - Ram Samudrala
- Department of Microbiology, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
38
|
Zhang S, Xu M, Li S, Su Z. Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes. Nucleic Acids Res 2009; 37:e72. [PMID: 19383880 PMCID: PMC2691844 DOI: 10.1093/nar/gkp248] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Although cis-regulatory binding sites (CRBSs) are at least as important as the coding sequences in a genome, our general understanding of them in most sequenced genomes is very limited due to the lack of efficient and accurate experimental and computational methods for their characterization, which has largely hindered our understanding of many important biological processes. In this article, we describe a novel algorithm for genome-wide de novo prediction of CRBSs with high accuracy. We designed our algorithm to circumvent three identified difficulties for CRBS prediction using comparative genomics principles based on a new method for the selection of reference genomes, a new metric for measuring the similarity of CRBSs, and a new graph clustering procedure. When operon structures are correctly predicted, our algorithm can predict 81% of known individual binding sites belonging to 94% of known cis-regulatory motifs in the Escherichia coli K12 genome, while achieving high prediction specificity. Our algorithm has also achieved similar prediction accuracy in the Bacillus subtilis genome, suggesting that it is very robust, and thus can be applied to any other sequenced prokaryotic genome. When compared with the prior state-of-the-art algorithms, our algorithm outperforms them in both prediction sensitivity and specificity.
Collapse
Affiliation(s)
- Shaoqiang Zhang
- Department of Bioinformatics and Genomics, Bioinformatics Research Center, the University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | | | | | | |
Collapse
|
39
|
Gao M, Skolnick J. From nonspecific DNA-protein encounter complexes to the prediction of DNA-protein interactions. PLoS Comput Biol 2009; 5:e1000341. [PMID: 19343221 PMCID: PMC2659451 DOI: 10.1371/journal.pcbi.1000341] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2008] [Accepted: 02/26/2009] [Indexed: 11/19/2022] Open
Abstract
DNA–protein interactions are involved in many essential biological
activities. Because there is no simple mapping code between DNA base pairs and
protein amino acids, the prediction of DNA–protein interactions is a
challenging problem. Here, we present a novel computational approach for
predicting DNA-binding protein residues and DNA–protein interaction
modes without knowing its specific DNA target sequence. Given the structure of a
DNA-binding protein, the method first generates an ensemble of complex
structures obtained by rigid-body docking with a nonspecific canonical B-DNA.
Representative models are subsequently selected through clustering and ranking
by their DNA–protein interfacial energy. Analysis of these encounter
complex models suggests that the recognition sites for specific DNA binding are
usually favorable interaction sites for the nonspecific DNA probe and that
nonspecific DNA–protein interaction modes exhibit some similarity to
specific DNA–protein binding modes. Although the method requires as
input the knowledge that the protein binds DNA, in benchmark tests, it achieves
better performance in identifying DNA-binding sites than three previously
established methods, which are based on sophisticated machine-learning
techniques. We further apply our method to protein structures predicted through
modeling and demonstrate that our method performs satisfactorily on protein
models whose root-mean-square Cα deviation from native is up to 5
Å from their native structures. This study provides valuable
structural insights into how a specific DNA-binding protein interacts with a
nonspecific DNA sequence. The similarity between the specific
DNA–protein interaction mode and nonspecific interaction modes may
reflect an important sampling step in search of its specific DNA targets by a
DNA-binding protein. Many essential biological activities require interactions between DNA and
proteins. These proteins usually use certain amino acids, called DNA-binding
sites, to recognize their specific DNA targets. To facilitate the search of its
specific DNA targets, a DNA-binding protein often associates with nonspecific
DNA and then diffuses along the DNA. Due to the weak interactions between
nonspecific DNA and the protein, structural characterization of nonspecific
DNA–protein complexes is experimentally challenging. This paper
describes a computational modeling study on nonspecific DNA–protein
complexes and comparative analysis with respect to specific
DNA–protein complexes. The study found that the specific DNA-binding
sites on a protein are typically favorable for nonspecific DNA and that
nonspecific and specific DNA–protein interaction modes are quite
similar. This similarity may reflect an important sampling step in the search
for the specific DNA target sequence by a DNA-binding protein. On the basis of
these observations, a novel method was proposed for predicting DNA-binding sites
and binding modes of a DNA-binding protein without knowing its specific DNA
target sequence. Ultimately, the combination of this method and protein
structure prediction may lead the way to high throughput modeling of
DNA–protein interactions.
Collapse
Affiliation(s)
- Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia
Institute of Technology, Atlanta, Georgia, United States of America
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia
Institute of Technology, Atlanta, Georgia, United States of America
- * E-mail:
| |
Collapse
|
40
|
Liu Z, Guo JT, Li T, Xu Y. Structure-based prediction of transcription factor binding sites using a protein-DNA docking approach. Proteins 2009; 72:1114-24. [PMID: 18320590 DOI: 10.1002/prot.22002] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Accurate identification of transcription factor binding sites is critical to our understanding of transcriptional regulatory networks. To overcome the issue of high false-positive predictions that trouble the sequence-based prediction techniques, we have developed a structure-based prediction method that takes into consideration of interactions between the amino acids of a transcription factor and the nucleotides of its DNA binding sequence at structural level, along with an efficient protein-DNA docking algorithm. The docked structures between a protein and a DNA are evaluated using a knowledge-based energy function, in conjunction with van der Waals energy. Our docking algorithm supports quasi-flexible docking, overcoming a number of limiting issues faced by similar docking algorithms. Our rigid-body docking algorithm is tested on a dataset of 141 nonredundant transcription factor-DNA complex structures. The test results show that 63.1% of the 141 complex structures are reconstructed with accuracies better than 1.0 A RMSDs (root mean square deviation) and 79.4% of the complexes are predicted with accuracies better than 3.0 A RMSDs when using the native DNA structures. Our quasi-flexible docking algorithm, assuming that the DNA structures are not known, is tested on a separate set of 45 transcription factor-DNA complexes, of which 57.8% of the docked complex conformations achieve better than 1.0 A RMSDs while 71.1% of the complexes have RMSDs less than 3.0 A. We have also applied our method to predict the binding motifs of the ferric uptake regulator in E. coli and showed that most of the experimentally identified sites can be predicted accurately.
Collapse
Affiliation(s)
- Zhijie Liu
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia 30602, USA
| | | | | | | |
Collapse
|
41
|
Angarica VE, Pérez AG, Vasconcelos AT, Collado-Vides J, Contreras-Moreira B. Prediction of TF target sites based on atomistic models of protein-DNA complexes. BMC Bioinformatics 2008; 9:436. [PMID: 18922190 PMCID: PMC2585596 DOI: 10.1186/1471-2105-9-436] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2008] [Accepted: 10/16/2008] [Indexed: 11/10/2022] Open
Abstract
Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition.
Collapse
Affiliation(s)
- Vladimir Espinosa Angarica
- Departamento de Bioquímica y Biología Molecular y Celular, Facultad de Ciencias, Universidad de Zaragoza, Pedro Cerbuna 12, 50009 Zaragoza, España.
| | | | | | | | | |
Collapse
|
42
|
Gao M, Skolnick J. DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions. Nucleic Acids Res 2008; 36:3978-92. [PMID: 18515839 PMCID: PMC2475642 DOI: 10.1093/nar/gkn332] [Citation(s) in RCA: 121] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The structures of DNA–protein complexes have illuminated the diversity of DNA–protein binding mechanisms shown by different protein families. This lack of generality could pose a great challenge for predicting DNA–protein interactions. To address this issue, we have developed a knowledge-based method, DNA-binding Domain Hunter (DBD-Hunter), for identifying DNA-binding proteins and associated binding sites. The method combines structural comparison and the evaluation of a statistical potential, which we derive to describe interactions between DNA base pairs and protein residues. We demonstrate that DBD-Hunter is an accurate method for predicting DNA-binding function of proteins, and that DNA-binding protein residues can be reliably inferred from the corresponding templates if identified. In benchmark tests on ∼4000 proteins, our method achieved an accuracy of 98% and a precision of 84%, which significantly outperforms three previous methods. We further validate the method on DNA-binding protein structures determined in DNA-free (apo) state. We show that the accuracy of our method is only slightly affected on apo-structures compared to the performance on holo-structures cocrystallized with DNA. Finally, we apply the method to ∼1700 structural genomics targets and predict that 37 targets with previously unknown function are likely to be DNA-binding proteins. DBD-Hunter is freely available at http://cssb.biology.gatech.edu/skolnick/webservice/DBD-Hunter/.
Collapse
Affiliation(s)
- Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA
| | | |
Collapse
|
43
|
Zheng S, Robertson TA, Varani G. A knowledge-based potential function predicts the specificity and relative binding energy of RNA-binding proteins. FEBS J 2007; 274:6378-91. [PMID: 18005254 DOI: 10.1111/j.1742-4658.2007.06155.x] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
RNA-protein interactions are fundamental to gene expression. Thus, the molecular basis for the sequence dependence of protein-RNA recognition has been extensively studied experimentally. However, there have been very few computational studies of this problem, and no sustained attempt has been made towards using computational methods to predict or alter the sequence-specificity of these proteins. In the present study, we provide a distance-dependent statistical potential function derived from our previous work on protein-DNA interactions. This potential function discriminates native structures from decoys, successfully predicts the native sequences recognized by sequence-specific RNA-binding proteins, and recapitulates experimentally determined relative changes in binding energy due to mutations of individual amino acids at protein-RNA interfaces. Thus, this work demonstrates that statistical models allow the quantitative analysis of protein-RNA recognition based on their structure and can be applied to modeling protein-RNA interfaces for prediction and design purposes.
Collapse
Affiliation(s)
- Suxin Zheng
- Department of Chemistry, University of Washington, Seattle, WA 98195, USA
| | | | | |
Collapse
|