1
|
Wacquiez A, Coste F, Kut E, Gaudon V, Trapp S, Castaing B, Marc D. Structure and Sequence Determinants Governing the Interactions of RNAs with Influenza A Virus Non-Structural Protein NS1. Viruses 2020; 12:E947. [PMID: 32867106 PMCID: PMC7552008 DOI: 10.3390/v12090947] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 08/17/2020] [Accepted: 08/25/2020] [Indexed: 11/16/2022] Open
Abstract
The non-structural protein NS1 of influenza A viruses is an RNA-binding protein of which its activities in the infected cell contribute to the success of the viral cycle, notably through interferon antagonism. We have previously shown that NS1 strongly binds RNA aptamers harbouring virus-specific sequence motifs (Marc et al., Nucleic Acids Res. 41, 434-449). Here, we started out investigating the putative role of one particular virus-specific motif through the phenotypic characterization of mutant viruses that were genetically engineered from the parental strain WSN. Unexpectedly, our data did not evidence biological importance of the putative binding of NS1 to this specific motif (UGAUUGAAG) in the 3'-untranslated region of its own mRNA. Next, we sought to identify specificity determinants in the NS1-RNA interaction through interaction assays in vitro with several RNA ligands and through solving by X-ray diffraction the 3D structure of several complexes associating NS1's RBD with RNAs of various affinities. Our data show that the RBD binds the GUAAC motif within double-stranded RNA helices with an apparent specificity that may rely on the sequence-encoded ability of the RNA to bend its axis. On the other hand, we showed that the RBD binds to the virus-specific AGCAAAAG motif when it is exposed in the apical loop of a high-affinity RNA aptamer, probably through a distinct mode of interaction that still requires structural characterization. Our data are consistent with more than one mode of interaction of NS1's RBD with RNAs, recognizing both structure and sequence determinants.
Collapse
MESH Headings
- 3' Untranslated Regions
- Animals
- Aptamers, Nucleotide/chemistry
- Aptamers, Nucleotide/metabolism
- Base Sequence
- Cell Line
- Humans
- Influenza A Virus, H1N1 Subtype/chemistry
- Influenza A Virus, H7N1 Subtype/chemistry
- Models, Molecular
- Nucleic Acid Conformation
- Protein Binding
- Protein Domains
- RNA/chemistry
- RNA/metabolism
- RNA, Double-Stranded/chemistry
- RNA, Double-Stranded/metabolism
- RNA, Messenger/chemistry
- RNA, Messenger/metabolism
- RNA, Viral/chemistry
- RNA, Viral/metabolism
- RNA-Binding Proteins/chemistry
- RNA-Binding Proteins/metabolism
- SELEX Aptamer Technique
- Viral Nonstructural Proteins/chemistry
- Viral Nonstructural Proteins/metabolism
Collapse
Affiliation(s)
- Alan Wacquiez
- Equipe 3IMo, UMR1282 Infectiologie et Santé Publique, INRAE, F-37380 Nouzilly, France; (A.W.); (E.K.); (S.T.)
- UMR1282 Infectiologie et Santé Publique, Université de Tours, F-37000 Tours, France
- Centre de Biophysique Moléculaire, UPR4301 CNRS, rue Charles Sadron, CEDEX 02, 45071 Orléans, France; (F.C.); (V.G.)
| | - Franck Coste
- Centre de Biophysique Moléculaire, UPR4301 CNRS, rue Charles Sadron, CEDEX 02, 45071 Orléans, France; (F.C.); (V.G.)
| | - Emmanuel Kut
- Equipe 3IMo, UMR1282 Infectiologie et Santé Publique, INRAE, F-37380 Nouzilly, France; (A.W.); (E.K.); (S.T.)
- UMR1282 Infectiologie et Santé Publique, Université de Tours, F-37000 Tours, France
| | - Virginie Gaudon
- Centre de Biophysique Moléculaire, UPR4301 CNRS, rue Charles Sadron, CEDEX 02, 45071 Orléans, France; (F.C.); (V.G.)
| | - Sascha Trapp
- Equipe 3IMo, UMR1282 Infectiologie et Santé Publique, INRAE, F-37380 Nouzilly, France; (A.W.); (E.K.); (S.T.)
- UMR1282 Infectiologie et Santé Publique, Université de Tours, F-37000 Tours, France
| | - Bertrand Castaing
- Centre de Biophysique Moléculaire, UPR4301 CNRS, rue Charles Sadron, CEDEX 02, 45071 Orléans, France; (F.C.); (V.G.)
| | - Daniel Marc
- Equipe 3IMo, UMR1282 Infectiologie et Santé Publique, INRAE, F-37380 Nouzilly, France; (A.W.); (E.K.); (S.T.)
- UMR1282 Infectiologie et Santé Publique, Université de Tours, F-37000 Tours, France
| |
Collapse
|
2
|
Torres Montaguth OE, Bervoets I, Peeters E, Charlier D. Competitive Repression of the artPIQM Operon for Arginine and Ornithine Transport by Arginine Repressor and Leucine-Responsive Regulatory Protein in Escherichia coli. Front Microbiol 2019; 10:1563. [PMID: 31354664 PMCID: PMC6640053 DOI: 10.3389/fmicb.2019.01563] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Accepted: 06/21/2019] [Indexed: 11/20/2022] Open
Abstract
Two out of the three major uptake systems for arginine in Escherichia coli are encoded by the artJ-artPIQM gene cluster. ArtJ is the high-affinity periplasmic arginine-specific binding protein (ArgBP-I), whereas artI encodes the arginine and ornithine periplasmic binding protein (AO). Both ArtJ and ArtI are supposed to combine with the inner membrane-associated ArtQMP2 transport complex of the ATP-binding cassette-type (ABC). Transcription of artJ is repressed by arginine repressor (ArgR) and the artPIQM operon is regulated by the transcriptional regulators ArgR and Leucine-responsive regulatory protein (Lrp). Whereas repression by ArgR requires arginine as corepressor, repression of PartP by Lrp is partially counteracted by leucine, its major effector molecule. We demonstrate that binding of dimeric Lrp to the artP control region generates four complexes with a distinct migration velocity, and that leucine has an effect on both global binding affinity and cooperativity in the binding. We identify the binding sites for Lrp in the artP control region, reveal interferences in the binding of ArgR and Lrp in vitro and demonstrate that the two transcription factors act as competitive repressors in vivo, each one being a more potent regulator in the absence of the other. This competitive behavior may be explained by the partial steric overlap of their respective binding sites. Furthermore, we demonstrate ArgR binding to an unusual position in the control region of the lrp gene, downstream of the transcription initiation site. From this unusual position for an ArgR-specific operator, ArgR has little direct effect on lrp expression, but interferes with the negative leucine-sensitive autoregulation exerted by Lrp. Direct arginine and ArgR-dependent repression of lrp could be observed with a 25-bp deletion mutant, in which the ArgR binding site was artificially moved to a position immediately downstream of the lrp transcription initiation site. This finding is reminiscent of a previous observation made for the carAB operon encoding carbamoylphosphate synthase, where ArgR bound in overlap with the downstream promoter P2 does not block transcription initiated 67 bp upstream at the P1 promoter, and further supports the hypothesis that ArgR does not act as an efficient roadblock.
Collapse
Affiliation(s)
- Oscar E Torres Montaguth
- Research Group of Microbiology, Department of Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium
| | - Indra Bervoets
- Research Group of Microbiology, Department of Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium
| | - Eveline Peeters
- Research Group of Microbiology, Department of Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium
| | - Daniel Charlier
- Research Group of Microbiology, Department of Bioengineering Sciences, Vrije Universiteit Brussel, Brussels, Belgium
| |
Collapse
|
3
|
O'Neill PK, Erill I. Parametric bootstrapping for biological sequence motifs. BMC Bioinformatics 2016; 17:406. [PMID: 27716039 PMCID: PMC5052923 DOI: 10.1186/s12859-016-1246-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2016] [Accepted: 09/08/2016] [Indexed: 11/10/2022] Open
Abstract
Background Biological sequence motifs drive the specific interactions of proteins and nucleic acids. Accordingly, the effective computational discovery and analysis of such motifs is a central theme in bioinformatics. Many practical questions about the properties of motifs can be recast as random sampling problems. In this light, the task is to determine for a given motif whether a certain feature of interest is statistically unusual among relevantly similar alternatives. Despite the generality of this framework, its use has been frustrated by the difficulties of defining an appropriate reference class of motifs for comparison and of sampling from it effectively. Results We define two distributions over the space of all motifs of given dimension. The first is the maximum entropy distribution subject to mean information content, and the second is the truncated uniform distribution over all motifs having information content within a given interval. We derive exact sampling algorithms for each. As a proof of concept, we employ these sampling methods to analyze a broad collection of prokaryotic and eukaryotic transcription factor binding site motifs. In addition to positional information content, we consider the informational Gini coefficient of the motif, a measure of the degree to which information is evenly distributed throughout a motif’s positions. We find that both prokaryotic and eukaryotic motifs tend to exhibit higher informational Gini coefficients (IGC) than would be expected by chance under either reference distribution. As a second application, we apply maximum entropy sampling to the motif p-value problem and use it to give elementary derivations of two new estimators. Conclusions Despite the historical centrality of biological sequence motif analysis, this study constitutes to our knowledge the first use of principled null hypotheses for sequence motifs given information content. Through their use, we are able to characterize for the first time differerences in global motif statistics between biological motifs and their null distributions. In particular, we observe that biological sequence motifs show an unusual distribution of IGC, presumably due to biochemical constraints on the mechanisms of direct read-out. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1246-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Patrick K O'Neill
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, 21250, US
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, 21250, US.
| |
Collapse
|
4
|
Shirley BC, Mucaki EJ, Whitehead T, Costea PI, Akan P, Rogan PK. Interpretation, stratification and evidence for sequence variants affecting mRNA splicing in complete human genome sequences. GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:77-85. [PMID: 23499923 PMCID: PMC4357664 DOI: 10.1016/j.gpb.2013.01.008] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Revised: 01/16/2013] [Accepted: 01/21/2013] [Indexed: 11/29/2022]
Abstract
Information theory-based methods have been shown to be sensitive and specific for predicting and quantifying the effects of non-coding mutations in Mendelian diseases. We present the Shannon pipeline software for genome-scale mutation analysis and provide evidence that the software predicts variants affecting mRNA splicing. Individual information contents (in bits) of reference and variant splice sites are compared and significant differences are annotated and prioritized. The software has been implemented for CLC-Bio Genomics platform. Annotation indicates the context of novel mutations as well as common and rare SNPs with splicing effects. Potential natural and cryptic mRNA splicing variants are identified, and null mutations are distinguished from leaky mutations. Mutations and rare SNPs were predicted in genomes of three cancer cell lines (U2OS, U251 and A431), which were supported by expression analyses. After filtering, tractable numbers of potentially deleterious variants are predicted by the software, suitable for further laboratory investigation. In these cell lines, novel functional variants comprised 6–17 inactivating mutations, 1–5 leaky mutations and 6–13 cryptic splicing mutations. Predicted effects were validated by RNA-seq analysis of the three aforementioned cancer cell lines, and expression microarray analysis of SNPs in HapMap cell lines.
Collapse
Affiliation(s)
- Ben C Shirley
- Department of Computer Science, Middlesex College, The University of Western Ontario, London, ON N6A 5B7, Canada
| | | | | | | | | | | |
Collapse
|
5
|
Aittokallio T, Kurki M, Nevalainen O, Nikula T, West A, Lahesmaa R. Computational Strategies for Analyzing Data in Gene Expression Microarray Experiments. J Bioinform Comput Biol 2012; 1:541-86. [PMID: 15290769 DOI: 10.1142/s0219720003000319] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2003] [Revised: 07/02/2003] [Indexed: 11/18/2022]
Abstract
Microarray analysis has become a widely used method for generating gene expression data on a genomic scale. Microarrays have been enthusiastically applied in many fields of biological research, even though several open questions remain about the analysis of such data. A wide range of approaches are available for computational analysis, but no general consensus exists as to standard for microarray data analysis protocol. Consequently, the choice of data analysis technique is a crucial element depending both on the data and on the goals of the experiment. Therefore, basic understanding of bioinformatics is required for optimal experimental design and meaningful interpretation of the results. This review summarizes some of the common themes in DNA microarray data analysis, including data normalization and detection of differential expression. Algorithms are demonstrated by analyzing cDNA microarray data from an experiment monitoring gene expression in T helper cells. Several computational biology strategies, along with their relative merits, are overviewed and potential areas for additional research discussed. The goal of the review is to provide a computational framework for applying and evaluating such bioinformatics strategies. Solid knowledge of microarray informatics contributes to the implementation of more efficient computational protocols for the given data obtained through microarray experiments.
Collapse
Affiliation(s)
- Tero Aittokallio
- Department of Computational Biology, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-Shi, Chiba 277-8562, Japan.
| | | | | | | | | | | |
Collapse
|
6
|
Oshchepkov DY, Levitsky VG. In silico prediction of transcriptional factor-binding sites. Methods Mol Biol 2011; 760:251-67. [PMID: 21780002 DOI: 10.1007/978-1-61779-176-5_16] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
The recognition of transcription factor binding sites (TFBSs) is the first step on the way to deciphering the DNA regulatory code. A large variety of computational approaches and corresponding in silico tools for TFBS recognition are available, each having their own advantages and shortcomings. This chapter provides a brief tutorial to assist end users in the application of these tools for functional characterization of genes.
Collapse
Affiliation(s)
- Dmitry Y Oshchepkov
- Laboratory of Theoretical Genetics, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.
| | | |
Collapse
|
7
|
Abstract
The idea that we could build molecular communications systems can be advanced by investigating how actual molecules from living organisms function. Information theory provides tools for such an investigation. This review describes how we can compute the average information in the DNA binding sites of any genetic control protein and how this can be extended to analyze its individual sites. A formula equivalent to Claude Shannon's channel capacity can be applied to molecular systems and used to compute the efficiency of protein binding. This efficiency is often 70% and a brief explanation for that is given. The results imply that biological systems have evolved to function at channel capacity, which means that we should be able to build molecular communications that are just as robust as our macroscopic ones.
Collapse
Affiliation(s)
- Thomas D. Schneider
- National Institutes of Health, National Cancer Institute at Frederick, P.O. Box B, Frederick, MD 21702-1201, United States
| |
Collapse
|
8
|
Shultzaberger RK, Malashock DS, Kirsch JF, Eisen MB. The fitness landscapes of cis-acting binding sites in different promoter and environmental contexts. PLoS Genet 2010; 6:e1001042. [PMID: 20686658 PMCID: PMC2912393 DOI: 10.1371/journal.pgen.1001042] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2010] [Accepted: 06/29/2010] [Indexed: 11/18/2022] Open
Abstract
The biophysical nature of the interaction between a transcription factor and its target sequences in vitro is sufficiently well understood to allow for the effects of DNA sequence alterations on affinity to be predicted. But even in relatively simple in vivo systems, the complexities of promoter organization and activity have made it difficult to predict how altering specific interactions between a transcription factor and DNA will affect promoter output. To better understand this, we measured the relative fitness of nearly all Escherichia coli sigma(70) -35 binding sites in different promoter and environmental contexts by competing four randomized -35 promoter libraries controlling the expression of the tetracycline resistance gene (tet)against each other in increasing concentrations of drug. We sequenced populations after competition to determine the relative enrichment of each -35 sequence. We observed a consistent relationship between the frequency of recovery of each -35 binding site and its predicted affinity for sigma(70) that varied depending on the sequence context of the promoter and drug concentration. Overall the relative fitness of each promoter could be predicted by a simple thermodynamic model of transcriptional regulation, in which the rate of transcriptional initiation (and hence fitness) is dependent upon the overall stability of the initiation complex, which in turn is dependent upon the energetic contributions of all sites within the complex. As implied by this model, a decrease in the free energy of association at one site could be compensated for by an increase in the binding energy at another to produce a similar output. Furthermore, these data show that a large and continuous range of transcriptional outputs can be accessed by merely changing the -35, suggesting that evolved or engineered mutations at this site could allow for subtle and precise control over gene expression.
Collapse
Affiliation(s)
- Ryan K. Shultzaberger
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
| | - Daniel S. Malashock
- Graduate Group in Comparative Biochemistry, University of California Berkeley, Berkeley, California, United States of America
| | - Jack F. Kirsch
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
- Department of Chemistry, University of California Berkeley, Berkeley, California, United States of America
| | - Michael B. Eisen
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, California, United States of America
- California Institute of Quantitative Biosciences, University of California Berkeley, Berkeley, California, United States of America
- Genomics Division, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| |
Collapse
|
9
|
Ponomarenko PM, Suslov VV, Savinkova LK, Ponomarenko MP, Kolchanov NA. A precise equation of equilibrium of four steps of TBP binding with the TATA box for prognosis of phenotypic manifestation of mutations. Biophysics (Nagoya-shi) 2010. [DOI: 10.1134/s0006350910030036] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
10
|
Data Compression Concepts and Algorithms and their Applications to Bioinformatics. ENTROPY 2009; 12:34. [PMID: 20157640 DOI: 10.3390/e12010034] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences.
Collapse
|
11
|
Peeters E, Nguyen Le Minh P, Foulquié-Moreno M, Charlier D. Competitive activation of the Escherichia coli argO gene coding for an arginine exporter by the transcriptional regulators Lrp and ArgP. Mol Microbiol 2009; 74:1513-26. [DOI: 10.1111/j.1365-2958.2009.06950.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
Zhang J, Li E, Olsen GJ. Protein-coding gene promoters in Methanocaldococcus (Methanococcus) jannaschii. Nucleic Acids Res 2009; 37:3588-601. [PMID: 19359364 PMCID: PMC2699501 DOI: 10.1093/nar/gkp213] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Although Methanocaldococcus (Methanococcus) jannaschii was the first archaeon to have its genome sequenced, little is known about the promoters of its protein-coding genes. To expand our knowledge, we have experimentally identified 131 promoters for 107 protein-coding genes in this genome by mapping their transcription start sites. Compared to previously identified promoters, more than half of which are from genes for stable RNAs, the protein-coding gene promoters are qualitatively similar in overall sequence pattern, but statistically different at several positions due to greater variation among their sequences. Relative binding affinity for general transcription factors was measured for 12 of these promoters by competition electrophoretic mobility shift assays. These promoters bind the factors less tightly than do most tRNA gene promoters. When a position weight matrix (PWM) was constructed from the protein gene promoters, factor binding affinities correlated with corresponding promoter PWM scores. We show that the PWM based on our data more accurately predicts promoters in the genome and transcription start sites than could be done with the previously available data. We also introduce a PWM logo, which visually displays the implications of observing a given base at a position in a sequence.
Collapse
Affiliation(s)
- Jian Zhang
- Department of Microbiology, University of Illinois at Urbana-Champaign, 601 South Goodwin Avenue, Urbana, IL 61801, USA
| | | | | |
Collapse
|
13
|
Lyakhov IG, Krishnamachari A, Schneider TD. Discovery of novel tumor suppressor p53 response elements using information theory. Nucleic Acids Res 2008; 36:3828-33. [PMID: 18495754 PMCID: PMC2441790 DOI: 10.1093/nar/gkn189] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
An accurate method for locating genes under tumor suppressor p53 control that is based on a well-established mathematical theory and built using naturally occurring, experimentally proven p53 sites is essential in understanding the complete p53 network. We used a molecular information theory approach to create a flexible model for p53 binding. By searching around transcription start sites in human chromosomes 1 and 2, we predicted 16 novel p53 binding sites and experimentally demonstrated that 15 of the 16 (94%) sites were bound by p53. Some were also bound by the related proteins p63 and p73. Thirteen of the adjacent genes were controlled by at least one of the proteins. Eleven of the 16 sites (69%) had not been identified previously. This molecular information theory approach can be extended to any genetic system to predict new sites for DNA-binding proteins.
Collapse
Affiliation(s)
- Ilya G Lyakhov
- Basic Research Program, SAIC-Frederick, Inc., NCI at Frederick, Frederick, MD, USA
| | | | | |
Collapse
|
14
|
Abstract
Systematic evolution of ligand by exponential enrichment (SELEX) is a new combinational chemical methodology for in vitro selection of specific aptamers. Aptamers are artificial oligonucleotide ligands with high affinity binding to target molecules. They are isolated from combinational libraries of synthetic oligonucleotide by an iterative process of affinity selection, recovery and amplification. Several properties of aptamers such as convenient affinity selection and high affinity and specificify make them widely used. Their affinity and specificity for a given protein are superior to antibodies and make it possible to isolate a matching ligand and adjust its bioactivity. This article reviews the development and potentially clinical application of aptamers targeting at hepatitis C virus.
Collapse
|
15
|
Quatrini R, Lefimil C, Veloso FA, Pedroso I, Holmes DS, Jedlicki E. Bioinformatic prediction and experimental verification of Fur-regulated genes in the extreme acidophile Acidithiobacillus ferrooxidans. Nucleic Acids Res 2007; 35:2153-66. [PMID: 17355989 PMCID: PMC1874648 DOI: 10.1093/nar/gkm068] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2006] [Revised: 01/16/2007] [Accepted: 01/22/2007] [Indexed: 01/12/2023] Open
Abstract
The gamma-proteobacterium Acidithiobacillus ferrooxidans lives in extremely acidic conditions (pH 2) and, unlike most organisms, is confronted with an abundant supply of soluble iron. It is also unusual in that it oxidizes iron as an energy source. Consequently, it faces the challenging dual problems of (i) maintaining intracellular iron homeostasis when confronted with extremely high environmental loads of iron and (ii) of regulating the use of iron both as an energy source and as a metabolic micronutrient. A combined bioinformatic and experimental approach was undertaken to identify Fur regulatory sites in the genome of A. ferrooxidans and to gain insight into the constitution of its Fur regulon. Fur regulatory targets associated with a variety of cellular functions including metal trafficking (e.g. feoPABC, tdr, tonBexbBD, copB, cdf), utilization (e.g. fdx, nif), transcriptional regulation (e.g. phoB, irr, iscR) and redox balance (grx, trx, gst) were identified. Selected predicted Fur regulatory sites were confirmed by FURTA, EMSA and in vitro transcription analyses. This study provides the first model for a Fur-binding site consensus sequence in an acidophilic iron-oxidizing microorganism and lays the foundation for future studies aimed at deepening our understanding of the regulatory networks that control iron uptake, homeostasis and oxidation in extreme acidophiles.
Collapse
Affiliation(s)
- Raquel Quatrini
- Center for Bioinformatics and Genome Biology, MIFAB, Life Science Foundation and Andrés Bello University, Santiago, Chile.
| | | | | | | | | | | |
Collapse
|
16
|
Abstract
Information theory was used to build a promoter model that accounts for the -10, the -35 and the uncertainty of the gap between them on a common scale. Helical face assignment indicated that base -7, rather than -11, of the -10 may be flipping to initiate transcription. We found that the sequence conservation of sigma70 binding sites is 6.5 +/- 0.1 bits. Some promoters lack a -35 region, but have a 6.7 +/- 0.2 bit extended -10, almost the same information as the bipartite promoter. These results and similarities between the contacts in the extended -10 binding and the -35 suggest that the flexible bipartite sigma factor evolved from a simpler polymerase. Binding predicted by the bipartite model is enriched around 35 bases upstream of the translational start. This distance is the smallest 5' mRNA leader necessary for ribosome binding, suggesting that selective pressure minimizes transcript length. The promoter model was combined with models of the transcription factors Fur and Lrp to locate new promoters, to quantify promoter strengths, and to predict activation and repression. Finally, the DNA-bending proteins Fis, H-NS and IHF frequently have sites within one DNA persistence length from the -35, so bending allows distal activators to reach the polymerase.
Collapse
Affiliation(s)
| | | | | | - Thomas D. Schneider
- To whom correspondence should be addressed. Tel: +1 301 846 5581; Fax: +1 301 846 5598;
| |
Collapse
|
17
|
Hasan S, Schreiber M. Recovering motifs from biased genomes: application of signal correction. Nucleic Acids Res 2006; 34:5124-32. [PMID: 16990246 PMCID: PMC1636444 DOI: 10.1093/nar/gkl676] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A significant problem in biological motif analysis arises when the background symbol distribution is biased (e.g. high/low GC content in the case of DNA sequences). This can lead to overestimation of the amount of information encoded in a motif. A motif can be depicted as a signal using information theory (IT). We apply two concepts from IT, distortion and patterned interference (a type of noise), to model genomic and codon bias respectively. This modeling approach allows us to correct a raw signal to recover signals that are weakened by compositional bias. The corrected signal is more likely to be discriminated from a biased background by a macromolecule. We apply this correction technique to recover ribosome-binding site (RBS) signals from available sequenced and annotated prokaryotic genomes having diverse compositional biases. We observed that linear correction was sufficient for recovering signals even at the extremes of these biases. Further comparative genomics studies were made possible upon correction of these signals. We find that the average Euclidian distance between RBS signal frequency matrices of different genomes can be significantly reduced by using the correction technique. Within this reduced average distance, we can find examples of class-specific RBS signals. Our results have implications for motif-based prediction, particularly with regards to the estimation of reliable inter-genomic model parameters.
Collapse
Affiliation(s)
| | - Mark Schreiber
- To whom correspondence should be addressed. Tel: +65 6722 2900; Fax: +65 6722 2910;
| |
Collapse
|
18
|
Mercey R, Lantier I, Maurel MC, Grosclaude J, Lantier F, Marc D. Fast, reversible interaction of prion protein with RNA aptamers containing specific sequence patterns. Arch Virol 2006; 151:2197-214. [PMID: 16799875 DOI: 10.1007/s00705-006-0790-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2006] [Accepted: 04/20/2006] [Indexed: 02/06/2023]
Abstract
One of the unsolved problems in prion diseases relates to the physiological function of cellular prion protein (PrP), of which a misfolded isoform is the major component of the transmissible spongiform encephalopathies agent. Knowledge of the PrP-binding molecules may help in elucidating its role and understanding the pathological events underlying prion diseases. Because nucleic acids are known to bind PrP, we attempted to identify the preferred RNA sequences that bind to the ovine recombinant PrP. An in vitro selection approach (SELEX) was applied to a pool of 80-nucleotide(nt)-long RNAs containing a randomised 40-nt central region. The most frequently isolated aptamer, RM312, was also the best ligand (20 nM KD value), according to both surface plasmon resonance and filter binding assays. The fast rates of association and dissociation of RM312 with immobilized PrP, which are reminiscent of biologically relevant interactions, could point to a physiological function of PrP towards cellular nucleic acids. The minimal sequence that we found necessary for binding of RM312 to PrP presents a striking similarity with one previously described PrP aptamer of comparable affinity. In addition, we here identify the two lysine clusters contained in the N-terminal part of PrP as its main nucleic-acid binding sites.
Collapse
Affiliation(s)
- R Mercey
- Infectiologie Animale et Santé Publique, Institut National de la Recherche Agronomique, Centre de Tours, Nouzilly, France
| | | | | | | | | | | |
Collapse
|
19
|
Schneider TD. Claude Shannon: biologist. The founder of information theory used biology to formulate the channel capacity. ACTA ACUST UNITED AC 2006; 25:30-3. [PMID: 16485389 PMCID: PMC1538977 DOI: 10.1109/memb.2006.1578661] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Thomas D Schneider
- National Cancer Institute, Center for Cancer Research Nanobiology Program, Molecular Information Theory Group, Frederick, Maryland 21702-1201, USA.
| |
Collapse
|
20
|
Kaplan T, Friedman N, Margalit H. Ab initio prediction of transcription factor targets using structural knowledge. PLoS Comput Biol 2005; 1:e1. [PMID: 16103898 PMCID: PMC1183507 DOI: 10.1371/journal.pcbi.0010001] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2005] [Accepted: 02/11/2005] [Indexed: 12/02/2022] Open
Abstract
Current approaches for identification and detection of transcription factor binding sites rely on an extensive set of known target genes. Here we describe a novel structure-based approach applicable to transcription factors with no prior binding data. Our approach combines sequence data and structural information to infer context-specific amino acid–nucleotide recognition preferences. These are used to predict binding sites for novel transcription factors from the same structural family. We demonstrate our approach on the Cys2His2 Zinc Finger protein family, and show that the learned DNA-recognition preferences are compatible with experimental results. We use these preferences to perform a genome-wide scan for direct targets of Drosophila melanogaster Cys2His2 transcription factors. By analyzing the predicted targets along with gene annotation and expression data we infer the function and activity of these proteins. Cells respond to dynamic changes in their environment by invoking various cellular processes, coordinated by a complex regulatory program. A main component of this program is the regulation of transcription, which is mainly accomplished by transcription factors that bind the DNA in the vicinity of genes. To better understand transcriptional regulation, advanced computational approaches are needed for linking between transcription factors and their targets. The authors describe a novel approach by which the binding site of a given transcription factor can be characterized without previous experimental binding data. This approach involves learning a set of context-specific amino acid–nucleotide recognition preferences that, when combined with the sequence and structure of the protein, can predict its specific binding preferences. Applying this approach to the Cys2His2 Zinc Finger protein family demonstrated its genome-wide potential by automatically predicting the direct targets of 29 regulators in the genome of the fruit fly Drosophila melanogaster. At present, with the availability of many genome sequences, there are numerous proteins annotated as transcription factors based on their sequence alone. This approach offers a promising direction for revealing the targets of these factors and for understanding their roles in the cellular network.
Collapse
Affiliation(s)
- Tommy Kaplan
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
- Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University, Jerusalem, Israel
| | - Nir Friedman
- School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel
- *To whom correspondence should be addressed. E-mail: (NF), (HM)
| | - Hanah Margalit
- Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University, Jerusalem, Israel
- *To whom correspondence should be addressed. E-mail: (NF), (HM)
| |
Collapse
|
21
|
Vyhlidal CA, Rogan PK, Leeder JS. Development and refinement of pregnane X receptor (PXR) DNA binding site model using information theory: insights into PXR-mediated gene regulation. J Biol Chem 2004; 279:46779-86. [PMID: 15316010 DOI: 10.1074/jbc.m408395200] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The pregnane X receptor (PXR) acts as a receptor to induce gene expression in response to structurally diverse xenobiotics through binding as a heterodimer with the 9-cis retinoic acid receptor (RXR) to enhancers in target gene promoters. We identified and estimated the affinities of novel PXR/RXR binding sites in regulated genes and additional genomic targets of PXR with an information theory-based model of the PXR/RXR binding site. Our initial PXR/RXR model, the result of the alignment of 15 previously characterized binding sites, was used to scan the promoters of known PXR target genes. Sites from these genes, with information contents of >8 bits bound by PXR/RXR in vitro, were used to revise the information weight matrix; this procedure was repeated by screening for progressively weaker binding sites. After three iterations of refinement, the model was based on 48 validated PXR/RXR binding sites and has an average information content (Rsequence) of 14.43 +/- 3.21 bits. A scan of the human genome predicted novel PXR/RXR binding sites in the promoters of UGT1A3 (19.78 bits at -8040 and 16.37 bits at -6930) and UGT1A6 (12.74 bits at -9216), both of which were identified previously as targets for PXR. These sites were subsequently demonstrated to specifically bind PXR/RXR in competition electrophoretic mobility shift assays. A strong PXR site was also predicted upstream of the CASP10 gene (18.69 bits at -7872) and was validated by binding studies and reporter assays as a PXR responsive element. This suggests that the PXR-mediated response extends beyond genes involved in drug biotransformation and transport.
Collapse
Affiliation(s)
- Carrie A Vyhlidal
- Section of Developmental Pharmacology and Experimental Therapeutics, Division of Pediatric Clinical Pharmacology and Medical Toxicology and Laboratory of Human Molecular Genetics, Children's Mercy Hospital and Clinics, Kansas City, Missouri 64108, USA
| | | | | |
Collapse
|
22
|
Wasserman WW, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 2004; 5:276-87. [PMID: 15131651 DOI: 10.1038/nrg1315] [Citation(s) in RCA: 803] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics and British Columbia Women's and Children's Hospitals, and Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia V5Z 4H4, Canada
| | | |
Collapse
|
23
|
Gadiraju S, Vyhlidal CA, Leeder JS, Rogan PK. Genome-wide prediction, display and refinement of binding sites with information theory-based models. BMC Bioinformatics 2003; 4:38. [PMID: 12962546 PMCID: PMC200970 DOI: 10.1186/1471-2105-4-38] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2003] [Accepted: 09/08/2003] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We present Delila-genome, a software system for identification, visualization and analysis of protein binding sites in complete genome sequences. Binding sites are predicted by scanning genomic sequences with information theory-based (or user-defined) weight matrices. Matrices are refined by adding experimentally-defined binding sites to published binding sites. Delila-Genome was used to examine the accuracy of individual information contents of binding sites detected with refined matrices as a measure of the strengths of the corresponding protein-nucleic acid interactions. The software can then be used to predict novel sites by rescanning the genome with the refined matrices. RESULTS Parameters for genome scans are entered using a Java-based GUI interface and backend scripts in Perl. Multi-processor CPU load-sharing minimized the average response time for scans of different chromosomes. Scans of human genome assemblies required 4-6 hours for transcription factor binding sites and 10-19 hours for splice sites, respectively, on 24- and 3-node Mosix and Beowulf clusters. Individual binding sites are displayed either as high-resolution sequence walkers or in low-resolution custom tracks in the UCSC genome browser. For large datasets, we applied a data reduction strategy that limited displays of binding sites exceeding a threshold information content to specific chromosomal regions within or adjacent to genes. An HTML document is produced listing binding sites ranked by binding site strength or chromosomal location hyperlinked to the UCSC custom track, other annotation databases and binding site sequences. Post-genome scan tools parse binding site annotations of selected chromosome intervals and compare the results of genome scans using different weight matrices. Comparisons of multiple genome scans can display binding sites that are unique to each scan and identify sites with significantly altered binding strengths. CONCLUSIONS Delila-Genome was used to scan the human genome sequence with information weight matrices of transcription factor binding sites, including PXR/RXRalpha, AHR and NF-kappaB p50/p65, and matrices for RNA binding sites including splice donor, acceptor, and SC35 recognition sites. Comparisons of genome scans with the original and refined PXR/RXRalpha information weight matrices indicate that the refined model more accurately predicts the strengths of known binding sites and is more sensitive for detection of novel binding sites.
Collapse
Affiliation(s)
- Sashidhar Gadiraju
- Laboratory of Human Molecular Genetics, Children's Mercy Hospital and Clinics, School of Medicine
- School of Interdisciplinary Computer Science and Engineering, University of Missouri-Kansas City, Kansas City MO 64108 USA
| | - Carrie A Vyhlidal
- Section of Developmental and Experimental Pharmacology and Therapeutics, Children's Mercy Hospital and Clinics. School of Medicine
| | - J Steven Leeder
- Section of Developmental and Experimental Pharmacology and Therapeutics, Children's Mercy Hospital and Clinics. School of Medicine
| | - Peter K Rogan
- Laboratory of Human Molecular Genetics, Children's Mercy Hospital and Clinics, School of Medicine
- School of Interdisciplinary Computer Science and Engineering, University of Missouri-Kansas City, Kansas City MO 64108 USA
| |
Collapse
|
24
|
Mirny LA, Gelfand MS. Structural analysis of conserved base pairs in protein-DNA complexes. Nucleic Acids Res 2002; 30:1704-11. [PMID: 11917033 PMCID: PMC101836 DOI: 10.1093/nar/30.7.1704] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Understanding of protein-DNA interactions is crucial for prediction of DNA-binding specificity of transcription factors and design of novel DNA-binding proteins. In this paper we develop a novel approach to analysis of protein-DNA interactions. We bring together two sources of information: (i) structures of protein-DNA complexes (PDB/NDB database) and (ii) experimentally obtained sites recognized by DNA-binding proteins. Sites are used to compute conservation (information content) of each base pair, which indicates relative importance of the base pair in specific recognition. The main result of this study is that conservation of base pairs in a site exhibits significant correlation with the number of contacts the base pairs have with the protein. In particular, base pairs that have more contacts with the protein are more conserved in evolution. As natural as it is, this result has never been reported before. We also observe that for most of the studied proteins, hydrogen bonds and hydrophobic interactions alone cannot explain the pattern of evolutionary conservation in the binding site suggesting cumulative contribution of different types of interactions to specific recognition. Implications for prediction of the DNA-binding specificity are discussed.
Collapse
Affiliation(s)
- Leonid A Mirny
- Harvard-MIT Division of Health Sciences and Technology, Room 16-343D, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.
| | | |
Collapse
|
25
|
Ponomarenko JV, Orlova GV, Frolov AS, Gelfand MS, Ponomarenko MP. SELEX_DB: a database on in vitro selected oligomers adapted for recognizing natural sites and for analyzing both SNPs and site-directed mutagenesis data. Nucleic Acids Res 2002; 30:195-9. [PMID: 11752291 PMCID: PMC99084 DOI: 10.1093/nar/30.1.195] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SELEX_DB is an online resource containing both the experimental data on in vitro selected DNA/RNA oligomers (aptamers) and the applets for recognition of these oligomers. Since in vitro experimental data are evidently system-dependent, the new release of the SELEX_DB has been supplemented by the database SYSTEM storing the experimental design. In addition, the recognition applet package, SELEX_TOOLS, applying in vitro selected data to annotation of the genome DNA, is accompanied by the cross-validation test database CROSS_TEST discriminating the sites (natural or other) related to in vitro selected sites out of random DNA. By cross-validation testing, we have unexpectedly observed that the recognition accuracy increases with the growth of homology between the training and test sets of protein binding sequences. For natural sites, the recognition accuracy was lower than that for the nearest protein homologs and higher than that for distant homologs and non-homologous proteins binding the common site. The current SELEX_DB release is available at http://wwwmgs.bionet.nsc.ru/mgs/systems/selex/.
Collapse
Affiliation(s)
- Julia V Ponomarenko
- Institute of Cytology and Genetics, 10 Lavrentyev Avenue, Novosibirsk 630090, Russia and Integrated Genomics, Moscow Branch, PO Box 348, Moscow 117333, Russia.
| | | | | | | | | |
Collapse
|
26
|
Schneider TD. Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation. Nucleic Acids Res 2001; 29:4881-91. [PMID: 11726698 PMCID: PMC96701 DOI: 10.1093/nar/29.23.4881] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The sequence logo for DNA binding sites of the bacteriophage P1 replication protein RepA shows unusually high sequence conservation ( approximately 2 bits) at a minor groove that faces RepA. However, B-form DNA can support only 1 bit of sequence conservation via contacts into the minor groove. The high conservation in RepA sites therefore implies a distorted DNA helix with direct or indirect contacts to the protein. Here I show that a high minor groove conservation signature also appears in sequence logos of sites for other replication origin binding proteins (Rts1, DnaA, P4 alpha, EBNA1, ORC) and promoter binding proteins (sigma(70), sigma(D) factors). This finding implies that DNA binding proteins generally use non-B-form DNA distortion such as base flipping to initiate replication and transcription.
Collapse
Affiliation(s)
- T D Schneider
- National Cancer Institute at Frederick, Laboratory of Experimental and Computational Biology, Building 469, PO Box B, Frederick, MD 21702-1201, USA.
| |
Collapse
|
27
|
Shultzaberger RK, Bucheimer RE, Rudd KE, Schneider TD. Anatomy of Escherichia coli ribosome binding sites. J Mol Biol 2001; 313:215-28. [PMID: 11601857 DOI: 10.1006/jmbi.2001.5040] [Citation(s) in RCA: 104] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
During translational initiation in prokaryotes, the 3' end of the 16S rRNA binds to a region just upstream of the initiation codon. The relationship between this Shine-Dalgarno (SD) region and the binding of ribosomes to translation start-points has been well studied, but a unified mathematical connection between the SD, the initiation codon and the spacing between them has been lacking. Using information theory, we constructed a model that treats these three components uniformly by assigning to the SD and the initiation region (IR) conservations in bits of information, and by assigning to the spacing an uncertainty, also in bits. To build the model, we first aligned the SD region by maximizing the information content there. The ease of this process confirmed the existence of the SD pattern within a set of 4122 reviewed and revised Escherichia coli gene starts. This large data set allowed us to show graphically, by sequence logos, that the spacing between the SD and the initiation region affects both the SD site conservation and its pattern. We used the aligned SD, the spacing, and the initiation region to model ribosome binding and to identify gene starts that do not conform to the ribosome binding site model. A total of 569 experimentally proven starts are more conserved (have higher information content) than the full set of revised starts, which probably reflects an experimental bias against the detection of gene products that have inefficient ribosome binding sites. Models were refined cyclically by removing non-conforming weak sites. After this procedure, models derived from either the original or the revised gene start annotation were similar. Therefore, this information theory-based technique provides a method for easily constructing biologically sensible ribosome binding site models. Such models should be useful for refining gene-start predictions of any sequenced bacterial genome.
Collapse
MESH Headings
- Base Sequence
- Binding Sites
- Codon, Initiator/genetics
- Databases as Topic
- Escherichia coli/genetics
- Escherichia coli Proteins/chemistry
- Escherichia coli Proteins/genetics
- Escherichia coli Proteins/metabolism
- Genes, Bacterial/genetics
- Information Theory
- Models, Biological
- Nucleic Acid Conformation
- Peptide Chain Initiation, Translational/genetics
- Pliability
- Protein Binding
- RNA Stability
- RNA, Bacterial/chemistry
- RNA, Bacterial/genetics
- RNA, Bacterial/metabolism
- RNA, Messenger/chemistry
- RNA, Messenger/genetics
- RNA, Messenger/metabolism
- RNA-Binding Proteins/chemistry
- RNA-Binding Proteins/genetics
- RNA-Binding Proteins/metabolism
- Regulatory Sequences, Nucleic Acid/genetics
- Ribosomes/chemistry
- Ribosomes/genetics
- Ribosomes/metabolism
Collapse
|
28
|
Ouhammouch M, Geiduschek EP. A thermostable platform for transcriptional regulation: the DNA-binding properties of two Lrp homologs from the hyperthermophilic archaeon Methanococcus jannaschii. EMBO J 2001; 20:146-56. [PMID: 11226165 PMCID: PMC140199 DOI: 10.1093/emboj/20.1.146] [Citation(s) in RCA: 45] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The hyperthermophilic archaeon Methanococcus jannaschii encodes two putative transcription regulators, Ptr1 and Ptr2, related to the bacterial Lrp/AsnC family of transcriptional regulators. We show that these two small helix-turn-helix proteins are specific DNA-binding proteins recognizing sites in their respective promoter regions. In vitro selection at high temperature has been used to isolate sets of high- affinity DNA sites that define a palindromic consensus binding sequence for each protein. Ptr1 and Ptr2 bind these cognate sites from one side of the DNA helix, as dimers, with each protein monomer making base- specific contacts in the major groove. As the first archaeal DNA-binding proteins with clearly defined specificities, Ptr1 and Ptr2 provide a thermostable DNA-binding platform for analysis of effector interactions with the core archaeal transcription apparatus; a platform allowing manipulation of promoter structure and examination of mechanisms of action at heterologous promoters.
Collapse
Affiliation(s)
- M Ouhammouch
- Division of Biology and Center for Molecular Genetics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0634, USA.
| | | |
Collapse
|
29
|
Abstract
How do genetic systems gain information by evolutionary processes? Answering this question precisely requires a robust, quantitative measure of information. Fortunately, 50 years ago Claude Shannon defined information as a decrease in the uncertainty of a receiver. For molecular systems, uncertainty is closely related to entropy and hence has clear connections to the Second Law of Thermodynamics. These aspects of information theory have allowed the development of a straightforward and practical method of measuring information in genetic control systems. Here this method is used to observe information gain in the binding sites for an artificial 'protein' in a computer simulation of evolution. The simulation begins with zero information and, as in naturally occurring genetic systems, the information measured in the fully evolved binding sites is close to that needed to locate the sites in the genome. The transition is rapid, demonstrating that information gain can occur by punctuated equilibrium.
Collapse
Affiliation(s)
- T D Schneider
- National Cancer Institute, Frederick Cancer Research and Development Center, Laboratory of Experimental and Computational Biology, PO Box B, Frederick, MD 21702-1201, USA.
| |
Collapse
|
30
|
Roulet E, Bucher P, Schneider R, Wingender E, Dusserre Y, Werner T, Mermod N. Experimental analysis and computer prediction of CTF/NFI transcription factor DNA binding sites. J Mol Biol 2000; 297:833-48. [PMID: 10736221 DOI: 10.1006/jmbi.2000.3614] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Accurate prediction of transcription factor binding sites is needed to unravel the function and regulation of genes discovered in genome sequencing projects. To evaluate current computer prediction tools, we have begun a systematic study of the sequence-specific DNA-binding of a transcription factor belonging to the CTF/NFI family. Using a systematic collection of rationally designed oligonucleotides combined with an in vitro DNA binding assay, we found that the sequence specificity of this protein cannot be represented by a simple consensus sequence or weight matrix. For instance, CTF/NFI uses a flexible DNA binding mode that allows for variations of the binding site length. From the experimental data, we derived a novel prediction method using a generalised profile as a binding site predictor. Experimental evaluation of the generalised profile indicated that it accurately predicts the binding affinity of the transcription factor to natural or synthetic DNA sequences. Furthermore, the in vitro measured binding affinities of a subset of oligonucleotides were found to correlate with their transcriptional activities in transfected cells. The combined computational-experimental approach exemplified in this work thus resulted in an accurate prediction method for CTF/NFI binding sites potentially functioning as regulatory regions in vivo.
Collapse
Affiliation(s)
- E Roulet
- Laboratory of Molecular Biotechnology, Centre for Biotechnology UNIL-EPFL and Institute of Animal Biology University of Lausanne, Lausanne, CH-1015, Switzerland
| | | | | | | | | | | | | |
Collapse
|
31
|
Vijesurier RM, Carlock L, Blumenthal RM, Dunbar JC. Role and mechanism of action of C. PvuII, a regulatory protein conserved among restriction-modification systems. J Bacteriol 2000; 182:477-87. [PMID: 10629196 PMCID: PMC94299 DOI: 10.1128/jb.182.2.477-487.2000] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/1999] [Accepted: 10/27/1999] [Indexed: 11/20/2022] Open
Abstract
The PvuII restriction-modification system is a type II system, which means that its restriction endonuclease and modification methyltransferase are independently active proteins. The PvuII system is carried on a plasmid, and its movement into a new host cell is expected to be followed initially by expression of the methyltransferase gene alone so that the new host's DNA is protected before endonuclease activity appears. Previous studies have identified a regulatory gene (pvuIIC) between the divergently oriented genes for the restriction endonuclease (pvuIIR) and modification methyltransferase (pvuIIM), with pvuIIC in the same orientation as and partially overlapping pvuIIR. The product of pvuIIC, C. PvuII, was found to act in trans and to be required for expression of pvuIIR. In this study we demonstrate that premature expression of pvuIIC prevents establishment of the PvuII genes, consistent with the model that requiring C. PvuII for pvuIIR expression provides a timing delay essential for protection of the new host's DNA. We find that the opposing pvuIIC and pvuIIM transcripts overlap by over 60 nucleotides at their 5' ends, raising the possibility that their hybridization might play a regulatory role. We furthermore characterize the action of C. PvuII, demonstrating that it is a sequence-specific DNA-binding protein that binds to the pvuIIC promoter and stimulates transcription of both pvuIIC and pvuIIR into a polycistronic mRNA. The apparent location of C. PvuII binding, overlapping the -10 promoter hexamer and the pvuIICR transcriptional starting points, is highly unusual for transcriptional activators.
Collapse
Affiliation(s)
- R M Vijesurier
- Center for Molecular Medicine, Wayne State University School of Medicine, Detroit, Michigan 48201, USA
| | | | | | | |
Collapse
|
32
|
|
33
|
Abstract
Availability of complete bacterial genomes opens the way to the comparative approach to the recognition of transcription regulatory sites. Assumption of regulon conservation in conjunction with profile analysis provides two lines of independent evidence making it possible to make highly specific predictions. Recently this approach was used to analyze several regulons in eubacteria and archaebacteria. The present review covers recent advances in the comparative analysis of transcriptional regulation in prokaryotes and phylogenetic fingerprinting techniques in eukaryotes, and describes the emerging patterns of the evolution of regulatory systems.
Collapse
Affiliation(s)
- M S Gelfand
- State Scientific Center for Biotechnology 'NIIGenetika', Moscow, Russia.
| |
Collapse
|
34
|
Abstract
Microbial genome sequencing is driven by the need to understand and control pathogens and to exploit extremophiles and their enzymes in bioremediation and industry. It is hard for the traditional bacteriologist to grasp the scale and pace of the venture. Around two dozen microbial genomes have now been completed and, within a decade, genomes from every significant species of bacterial pathogen of humans, animals and plants will have been sequenced. Indeed, we will often have more than one sequence from a species or genus--for example, we already have sequences from two strains of Helicobacter pylori, from two strains of Mycobacterium tuberculosis and from three species of Pyrococcus. However, genome sequencing risks becoming expensive molecular stamp-collecting without the tools to mine the data and fuel hypothesis-driven laboratory-based research. Bioinformatics, twinned with the new experimental approaches forming functional genomics', provides some of the needed tools. Nonetheless, there will be an increasing need for us to explore the detailed implications of genomic findings. Microbial genome sequencing thus represents not a threat, but an exciting opportunity for molecular microbiologists.
Collapse
Affiliation(s)
- M J Pallen
- Department of Medical Microbiology, St Bartholomew's and the Royal London School of Medicine and Dentistry, West Smithfield, London, UK.
| |
Collapse
|