51
|
Sugihara F, Kasahara K, Kokubo T. Highly redundant function of multiple AT-rich sequences as core promoter elements in the TATA-less RPS5 promoter of Saccharomyces cerevisiae. Nucleic Acids Res 2010; 39:59-75. [PMID: 20805245 PMCID: PMC3017598 DOI: 10.1093/nar/gkq741] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
In eukaryotes, protein-coding genes are transcribed by RNA polymerase II (pol II) together with general transcription factors (GTFs). TFIID, the largest GTF composed of TATA element-binding protein (TBP) and 14 TBP-associated factors (TAFs), plays a critical role in transcription from TATA-less promoters. In metazoans, several core promoter elements other than the TATA element are thought to be recognition sites for TFIID. However, it is unclear whether functionally homologous elements also exist in TATA-less promoters in Saccharomyces cerevisiae. Here, we identify the cis-elements required to support normal levels of transcription and accurate initiation from sites within the TATA-less and TFIID-dependent RPS5 core promoter. Systematic mutational analyses show that multiple AT-rich sequences are required for these activities and appear to function as recognition sites for TFIID. A single copy of these sequences can support accurate initiation from the endogenous promoter, indicating that they carry highly redundant functions. These results show a novel architecture of yeast TATA-less promoters and support a model in which pol II scans DNA downstream from a recruited site, while searching for appropriate initiation site(s).
Collapse
Affiliation(s)
- Fuminori Sugihara
- Division of Molecular and Cellular Biology, Graduate School of Nanobioscience, Yokohama City University, Yokohama, Kanagawa, Japan
| | | | | |
Collapse
|
52
|
Warnatz HJ, Querfurth R, Guerasimova A, Cheng X, Haas SA, Hufton AL, Manke T, Vanhecke D, Nietfeld W, Vingron M, Janitz M, Lehrach H, Yaspo ML. Functional analysis and identification of cis-regulatory elements of human chromosome 21 gene promoters. Nucleic Acids Res 2010; 38:6112-23. [PMID: 20494980 PMCID: PMC2952857 DOI: 10.1093/nar/gkq402] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Given the inherent limitations of in silico studies relying solely on DNA sequence analysis, the functional characterization of mammalian promoters and associated cis-regulatory elements requires experimental support, which demands cloning and analysis of putative promoter regions. Focusing on human chromosome 21, we cloned 182 gene promoters of 2500 bp in length and conducted reporter gene assays on transfected-cell arrays. We found 56 promoters that were active in HEK293 cells, while another 49 promoters could be activated by treatment of cells with Trichostatin A or depletion of serum. We observed high correlations between promoter activities and endogenous transcript levels, RNA polymerase II occupancy, CpG islands and core promoter elements. Truncation of a subset of 62 promoters to ∼500 bp revealed that truncation rarely resulted in loss of activity, but rather in loss of responses to external stimuli, suggesting the presence of cis-regulatory response elements within distal promoter regions. In these regions, we found a strong enrichment of transcription factor binding sites that could potentially activate gene expression in the presence of stimuli. This study illustrates the modular functional architecture of chromosome 21 promoters and helps to reveal the complex mechanisms governing transcriptional regulation.
Collapse
Affiliation(s)
- Hans-Jörg Warnatz
- Department for Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
53
|
Walker DJ, Suetterlin P, Reisenberg M, Williams G, Doherty P. Down-regulation of diacylglycerol lipase-alpha during neural stem cell differentiation: identification of elements that regulate transcription. J Neurosci Res 2010; 88:735-45. [PMID: 19798744 DOI: 10.1002/jnr.22251] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The diacylglycerol lipases (DAGLalpha and DAGLbeta) synthesize 2-arachidonoylglycerol (2-AG), a full agonist at cannabinoid receptors. Dynamic regulation of DAGL expression underpins its role in axonal growth and guidance during development, retrograde synaptic signalling at mature synapses, and maintenance of adult neurogenesis. We show here that DAGLalpha expression is dramatically down-regulated when neural stem (NS) cells are differentiated toward a gamma-aminobutyric acidergic neuronal phenotype. To understand how DAGLalpha expression might be controlled, we sought to identify the core promoter region and regulatory elements within it. The core promoter was identified and shown to contain both an enhancer and a suppressor region. Deletion analysis identified two elements, including a GC-box, that specifically promote expression in NS cells. Bioinformatic analysis identified three candidate transcription factors that might regulate DAGLalpha expression in NS cells by binding to the GC box; these were specificity protein 1 (Sp1), early growth response element 1 (EGR1), and zinc finger DNA-binding protein 89 (ZBP-89). However, Sp1 was the only factor that could bind to the GC-box. A specific mutation within the GC-box that inhibited Sp1 binding reduced DAGLalpha promoter activity in NS cells. Likewise, a dominant negative Sp1 was shown to bind to the GC-box and to suppress DAGLalpha promoter activity specifically in NS cells. Finally, like DAGLalpha, Sp1 was down-regulated during neuronal differentiation. A full characterization of the DAGLalpha promoter will help to elucidate the upstream pathways that regulate DAGLalpha expression in NS cells and their progeny.
Collapse
Affiliation(s)
- Deborah J Walker
- Wolfson Centre for Age-Related Diseases, King's College London, London, United Kingdom
| | | | | | | | | |
Collapse
|
54
|
Albert TK, Grote K, Boeing S, Meisterernst M. Basal core promoters control the equilibrium between negative cofactor 2 and preinitiation complexes in human cells. Genome Biol 2010; 11:R33. [PMID: 20230619 PMCID: PMC2864573 DOI: 10.1186/gb-2010-11-3-r33] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2009] [Revised: 02/22/2010] [Accepted: 03/15/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The general transcription factor TFIIB and its antagonist negative cofactor 2 (NC2) are hallmarks of RNA polymerase II (RNAPII) transcription. Both factors bind TATA box-binding protein (TBP) at promoters in a mutually exclusive manner. Dissociation of NC2 is thought to be followed by TFIIB association and subsequent preinitiation complex formation. TFIIB dissociates upon RNAPII promoter clearance, thereby providing a specific measure for steady-state preinitiation complex levels. As yet, genome-scale promoter mapping of human TFIIB has not been reported. It thus remains elusive how human core promoters contribute to preinitiation complex formation in vivo. RESULTS We compare target genes of TFIIB and NC2 in human B cells and analyze associated core promoter architectures. TFIIB occupancy is positively correlated with gene expression, with the vast majority of promoters being GC-rich and lacking defined core promoter elements. TATA elements, but not the previously in vitro defined TFIIB recognition elements, are enriched in some 4 to 5% of the genes. NC2 binds to a highly related target gene set. Nonetheless, subpopulations show strong variations in factor ratios: whereas high TFIIB/NC2 ratios select for promoters with focused start sites and conserved core elements, high NC2/TFIIB ratios correlate to multiple start-site promoters lacking defined core elements. CONCLUSIONS TFIIB and NC2 are global players that occupy active genes. Preinitiation complex formation is independent of core elements at the majority of genes. TATA and TATA-like elements dictate TFIIB occupancy at a subset of genes. Biochemical data support a model in which preinitiation complex but not TBP-NC2 complex formation is regulated.
Collapse
Affiliation(s)
- Thomas K Albert
- Institute of Molecular Tumor Biology (IMTB), University of Muenster, Robert-Koch-Str. 43, 48149 Muenster, Germany
| | - Korbinian Grote
- Genomatix Software GmbH, Bayerstr. 85a, 80335 Munich, Germany
| | - Stefan Boeing
- Institute of Molecular Tumor Biology (IMTB), University of Muenster, Robert-Koch-Str. 43, 48149 Muenster, Germany
| | - Michael Meisterernst
- Institute of Molecular Tumor Biology (IMTB), University of Muenster, Robert-Koch-Str. 43, 48149 Muenster, Germany
| |
Collapse
|
55
|
Riethoven JJM. Regulatory regions in DNA: promoters, enhancers, silencers, and insulators. Methods Mol Biol 2010; 674:33-42. [PMID: 20827584 DOI: 10.1007/978-1-60761-854-6_3] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
One of the mechanisms through which protein levels in the cell are controlled is through transcriptional regulation. Certain regions, called cis-regulatory elements, on the DNA are footprints for the trans-acting proteins involved in transcription, either for the positioning of the basic transcriptional machinery or for the regulation - in simple terms turn on or turn off - thereof. The basic transcriptional machinery is DNA-dependent RNA polymerase (RNAP) which synthesizes various types of RNA and core promoters on the DNA are used to position the RNAP. Other nearby regions will regulate the transcription: in prokaryotic organisms operators are involved; in eukaryotic organisms, proximal promoter regions, enhancers, silencers, and insulators are present. This chapter will describe the various DNA regions involved in transcription and transcriptional regulation.
Collapse
Affiliation(s)
- Jean-Jack M Riethoven
- Bioinformatics Core Research Facility, Center for Biotechnology and School for Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE, USA.
| |
Collapse
|
56
|
Characterization of The Promoter Region and Upstream Regulation Region of Human and Mouse SCN3A Gene*. PROG BIOCHEM BIOPHYS 2009. [DOI: 10.3724/sp.j.1206.2008.00450] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
57
|
Gazit K, Moshonov S, Elfakess R, Sharon M, Mengus G, Davidson I, Dikstein R. TAF4/4b x TAF12 displays a unique mode of DNA binding and is required for core promoter function of a subset of genes. J Biol Chem 2009; 284:26286-96. [PMID: 19635797 DOI: 10.1074/jbc.m109.011486] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The major core promoter-binding factor in polymerase II transcription machinery is TFIID, a complex consisting of TBP, the TATA box-binding protein, and 13 to 14 TBP-associated factors (TAFs). Previously we found that the histone H2A-like TAF paralogs TAF4 and TAF4b possess DNA-binding activity. Whether TAF4/TAF4b DNA binding directs TFIID to a specific core promoter element or facilitates TFIID binding to established core promoter elements is not known. Here we analyzed the mode of TAF4b.TAF12 DNA binding and show that this complex binds DNA with high affinity. The DNA length required for optimal binding is approximately 70 bp. Although the complex displays a weak sequence preference, the nucleotide composition is less important than the length of the DNA for high affinity binding. Comparative expression profiling of wild-type and a DNA-binding mutant of TAF4 revealed common core promoter features in the down-regulated genes that include a TATA-box and an Initiator. Further examination of the PEL98 gene from this group showed diminished Initiator activity and TFIID occupancy in TAF4 DNA-binding mutant cells. These findings suggest that DNA binding by TAF4/4b-TAF12 facilitates the association of TFIID with the core promoter of a subset of genes.
Collapse
Affiliation(s)
- Kfir Gazit
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | | | | | | | |
Collapse
|
58
|
Zeng J, Zhu S, Yan H. Towards accurate human promoter recognition: a review of currently used sequence features and classification methods. Brief Bioinform 2009; 10:498-508. [PMID: 19531545 DOI: 10.1093/bib/bbp027] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
This review describes important advances that have been made during the past decade for genome-wide human promoter recognition. Interest in promoter recognition algorithms on a genome-wide scale is worldwide and touches on a number of practical systems that are important in analysis of gene regulation and in genome annotation without experimental support of ESTs, cDNAs or mRNAs. The main focus of this review is on feature extraction and model selection for accurate human promoter recognition, with descriptions of what they are, what has been accomplished, and what remains to be done.
Collapse
Affiliation(s)
- Jia Zeng
- Department of Computer Science, Hong Kong Baptist University, Kowloon, Hong Kong.
| | | | | |
Collapse
|
59
|
Pitulescu ME, Teichmann M, Luo L, Kessel M. TIPT2 and geminin interact with basal transcription factors to synergize in transcriptional regulation. BMC BIOCHEMISTRY 2009; 10:16. [PMID: 19515240 PMCID: PMC2702275 DOI: 10.1186/1471-2091-10-16] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2009] [Accepted: 06/10/2009] [Indexed: 12/20/2022]
Abstract
BACKGROUND The re-replication inhibitor Geminin binds to several transcription factors including homeodomain proteins, and to members of the polycomb and the SWI/SNF complexes. RESULTS Here we describe the TATA-binding protein-like factor-interacting protein (TIPT) isoform 2, as a strong binding partner of Geminin. TIPT2 is widely expressed in mouse embryonic and adult tissues, residing both in cyto- and nucleoplasma, and enriched in the nucleolus. Like Geminin, also TIPT2 interacts with several polycomb factors, with the general transcription factor TBP (TATA box binding protein), and with the related protein TBPL1 (TRF2). TIPT2 synergizes with geminin and TBP in the activation of TATA box-containing promoters, and with TBPL1 and geminin in the activation of the TATA-less NF1 promoter. Geminin and TIPT2 were detected in the chromatin near TBP/TBPL1 binding sites. CONCLUSION Together, our study introduces a novel transcriptional regulator and its function in cooperation with chromatin associated factors and the basal transcription machinery.
Collapse
Affiliation(s)
- Mara E Pitulescu
- Department of Molecular Cell Biology, Max Planck Institute for Biophysical Chemistry, 37077 Göttingen, Germany.
| | | | | | | |
Collapse
|
60
|
Savinkova LK, Ponomarenko MP, Ponomarenko PM, Drachkova IA, Lysova MV, Arshinova TV, Kolchanov NA. TATA box polymorphisms in human gene promoters and associated hereditary pathologies. BIOCHEMISTRY (MOSCOW) 2009; 74:117-29. [PMID: 19267666 DOI: 10.1134/s0006297909020011] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
TATA-binding protein (TBP) is the first basal factor that recognizes and binds a TATA box on TATA-containing gene promoters transcribed by RNA polymerase II. Data available in the literature are indicative of admissible variability of the TATA box. The TATA box flanking sequences can influence TBP affinity as well as the level of basal and activated transcription. The possibility of mediated involvement in in vivo gene expression regulation of the TBP interactions with variant TATA boxes is supported by data on TATA box polymorphisms and associated human hereditary pathologies. A table containing data on TATA element polymorphisms in human gene promoters (about 40 mutations have been described), associated with particular pathologies, their short functional characteristics, and manifestation mechanisms of TATA-box SNPs is presented. Four classes of polymorphisms are considered: TATA box polymorphisms that weaken and enhance promoter, polymorphisms causing TATA box emergence and disappearance, and human virus TATA box polymorphisms. The described examples are indicative of the polymorphism-associated severe pathologies like thalassemia, the increased risk of hepatocellular carcinoma, sensitivity to H. pylori infection, oral cavity and lung cancers, arterial hypertension, etc.
Collapse
Affiliation(s)
- L K Savinkova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.
| | | | | | | | | | | | | |
Collapse
|
61
|
Yarden G, Elfakess R, Gazit K, Dikstein R. Characterization of sINR, a strict version of the Initiator core promoter element. Nucleic Acids Res 2009; 37:4234-46. [PMID: 19443449 PMCID: PMC2715227 DOI: 10.1093/nar/gkp315] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The proximal promoter consists of binding sites for transcription regulators and a core promoter. We identified an overrepresented motif in the proximal promoter of human genes with an Initiator (INR) positional bias. The core of the motif fits the INR consensus but its sequence is more strict and flanked by additional conserved sequences. This strict INR (sINR) is enriched in TATA-less genes that belong to specific functional categories. Analysis of the sINR-containing DHX9 and ATP5F1 genes showed that the entire sINR sequence, including the strict core and the conserved flanking sequences, is important for transcription. A conventional INR sequence could not substitute for DHX9 sINR whereas, sINR could replace a conventional INR. The minimal region required to create the major TSS of the DHX9 promoter includes the sINR and an upstream Sp1 site. In a heterologous context, sINR substituted for the TATA box when positioned downstream to several Sp1 sites. Consistent with that the majority of sINR promoters contain at least one Sp1 site. Thus, sINR is a TATA-less-specific INR that functions in cooperation with Sp1. These findings support the idea that the INR is a family of related core promoter motifs.
Collapse
Affiliation(s)
- Ganit Yarden
- Department of Biological Chemistry, The Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | |
Collapse
|
62
|
Ozsolak F, Poling LL, Wang Z, Liu H, Liu XS, Roeder RG, Zhang X, Song JS, Fisher DE. Chromatin structure analyses identify miRNA promoters. Genes Dev 2009; 22:3172-83. [PMID: 19056895 DOI: 10.1101/gad.1706508] [Citation(s) in RCA: 488] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Although microRNAs (miRNAs) are key regulators of gene expression in normal human physiology and disease, transcriptional regulation of miRNAs is poorly understood, because most miRNA promoters have not yet been characterized. We identified the proximal promoters of 175 human miRNAs by combining nucleosome mapping with chromatin signatures for promoters. We observe that one-third of intronic miRNAs have transcription initiation regions independent from their host promoters and present a list of RNA polymerase II- and III-occupied miRNAs. Nucleosome mapping and linker sequence analyses in miRNA promoters permitted accurate prediction of transcription factors regulating miRNA expression, thus identifying nine miRNAs regulated by the MITF transcription factor/oncoprotein in melanoma cells. Furthermore, DNA sequences encoding mature miRNAs were found to be preferentially occupied by positioned-nucleosomes, and the 3' end sites of known genes exhibited nucleosome depletion. The high-throughput identification of miRNA promoter and enhancer regulatory elements sheds light on evolution of miRNA transcription and permits rapid identification of transcriptional networks of miRNAs.
Collapse
Affiliation(s)
- Fatih Ozsolak
- Department of Dermatology and Cutaneous Biology Research Center, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
63
|
Distinct modes of gene regulation by a cell-specific transcriptional activator. Proc Natl Acad Sci U S A 2009; 106:4213-8. [PMID: 19251649 DOI: 10.1073/pnas.0808347106] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The architectural layout of a eukaryotic RNA polymerase II core promoter plays a role in general transcriptional activation. However, its role in tissue-specific expression is not known. For example, differing modes of its recognition by general transcription machinery can provide an additional layer of control within which a single tissue-restricted transcription factor may operate. Erythroid Kruppel-like factor (EKLF) is a hematopoietic-specific transcription factor that is critical for the activation of subset of erythroid genes. We find that EKLF interacts with TATA binding protein-associated factor 9 (TAF9), which leads to important consequences for expression of adult beta-globin. First, TAF9 functionally supports EKLF activity by enhancing its ability to activate the beta-globin gene. Second, TAF9 interacts with a conserved beta-globin downstream promoter element, and ablation of this interaction by beta-thalassemia-causing mutations decreases its promoter activity and disables superactivation. Third, depletion of EKLF prevents recruitment of TAF9 to the beta-globin promoter, whereas depletion of TAF9 drastically impairs beta-promoter activity. However, a TAF9-independent mode of EKLF transcriptional activation is exhibited by the alpha-hemoglobin-stabilizing protein (AHSP) gene, which does not contain a discernable downstream promoter element. In this case, TAF9 does not enhance EKLF activity and depletion of TAF9 has no effect on AHSP promoter activation. These studies demonstrate that EKLF directs different modes of tissue-specific transcriptional activation depending on the architecture of its target core promoter.
Collapse
|
64
|
TFIIB recognition elements control the TFIIA-NC2 axis in transcriptional regulation. Mol Cell Biol 2008; 29:1389-400. [PMID: 19114554 DOI: 10.1128/mcb.01346-08] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
TFIIB recognizes DNA sequence-specific motifs that can flank the TATA elements of the promoters of protein-encoding genes. The TFIIB recognition elements (BRE(u) and BRE(d)) can have positive or negative effects on transcription in a promoter context-dependent manner. Here we show that the BREs direct the selective recruitment of TFIIA and NC2 to the promoter. We find that TFIIA preferentially associates with BRE-containing promoters while NC2 is recruited to promoters that lack consensus BREs. The functional relevance of the BRE-dependent recruitment of TFIIA and NC2 was determined by small interfering RNA-mediated knockdown of TFIIA and NC2, both of which elicited BRE-dependent effects on transcription. Our results confirm the established functional reciprocity of TFIIA and NC2. However, our findings show that TFIIA assembly at BRE-containing promoters results in reduced transcriptional activity, while NC2 acts as a positive factor at promoters that lack functional BREs. Taken together, our results provide a basis for the selective recruitment of TFIIA and NC2 to the promoter and give new insights into the functional relationship between core promoter elements and general transcription factor activity.
Collapse
|
65
|
Brick K, Watanabe J, Pizzi E. Core promoters are predicted by their distinct physicochemical properties in the genome of Plasmodium falciparum. Genome Biol 2008; 9:R178. [PMID: 19094208 PMCID: PMC2646282 DOI: 10.1186/gb-2008-9-12-r178] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2008] [Revised: 11/03/2008] [Accepted: 12/18/2008] [Indexed: 11/23/2022] Open
Abstract
A method is presented to computationally identify core promoters in the Plasmodium falciparum genome using only DNA physicochemical properties. Little is known about the structure and distinguishing features of core promoters in Plasmodium falciparum. In this work, we describe the first method to computationally identify core promoters in this AT-rich genome. This prediction algorithm uses solely DNA physicochemical properties as descriptors. Our results add to a growing body of evidence that a physicochemical code for eukaryotic genomes plays a crucial role in core promoter recognition.
Collapse
Affiliation(s)
- Kevin Brick
- Dipartimento di Malattie Infettive, Parassitarie ed Immunomediate - Istituto Superiore di Sanità, Viale Regina Elena, 299, 00161 Rome, Italy.
| | | | | |
Collapse
|
66
|
Anwar F, Baker SM, Jabid T, Mehedi Hasan M, Shoyaib M, Khan H, Walshe R. Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach. BMC Bioinformatics 2008; 9:414. [PMID: 18834544 PMCID: PMC2575220 DOI: 10.1186/1471-2105-9-414] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2008] [Accepted: 10/04/2008] [Indexed: 01/03/2023] Open
Abstract
Background Eukaryotic promoter prediction using computational analysis techniques is one of the most difficult jobs in computational genomics that is essential for constructing and understanding genetic regulatory networks. The increased availability of sequence data for various eukaryotic organisms in recent years has necessitated for better tools and techniques for the prediction and analysis of promoters in eukaryotic sequences. Many promoter prediction methods and tools have been developed to date but they have yet to provide acceptable predictive performance. One obvious criteria to improve on current methods is to devise a better system for selecting appropriate features of promoters that distinguish them from non-promoters. Secondly improved performance can be achieved by enhancing the predictive ability of the machine learning algorithms used. Results In this paper, a novel approach is presented in which 128 4-mer motifs in conjunction with a non-linear machine-learning algorithm utilising a Support Vector Machine (SVM) are used to distinguish between promoter and non-promoter DNA sequences. By applying this approach to plant, Drosophila, human, mouse and rat sequences, the classification model has showed 7-fold cross-validation percentage accuracies of 83.81%, 94.82%, 91.25%, 90.77% and 82.35% respectively. The high sensitivity and specificity value of 0.86 and 0.90 for plant; 0.96 and 0.92 for Drosophila; 0.88 and 0.92 for human; 0.78 and 0.84 for mouse and 0.82 and 0.80 for rat demonstrate that this technique is less prone to false positive results and exhibits better performance than many other tools. Moreover, this model successfully identifies location of promoter using TATA weight matrix. Conclusion The high sensitivity and specificity indicate that 4-mer frequencies in conjunction with supervised machine-learning methods can be beneficial in the identification of RNA pol II promoters comparative to other methods. This approach can be extended to identify promoters in sequences for other eukaryotic genomes.
Collapse
Affiliation(s)
- Firoz Anwar
- Department of Computer Science and Engineering, East West University, Bangladesh.
| | | | | | | | | | | | | |
Collapse
|
67
|
Juven-Gershon T, Hsu JY, Theisen JW, Kadonaga JT. The RNA polymerase II core promoter - the gateway to transcription. Curr Opin Cell Biol 2008; 20:253-9. [PMID: 18436437 DOI: 10.1016/j.ceb.2008.03.003] [Citation(s) in RCA: 269] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2008] [Accepted: 03/11/2008] [Indexed: 10/22/2022]
Abstract
The RNA polymerase II core promoter is generally defined to be the sequence that directs the initiation of transcription. This simple definition belies a diverse and complex transcriptional module. There are two major types of core promoters - focused and dispersed. Focused promoters contain either a single transcription start site or a distinct cluster of start sites over several nucleotides, whereas dispersed promoters contain several start sites over 50-100 nucleotides and are typically found in CpG islands in vertebrates. Focused promoters are more ancient and widespread throughout nature than dispersed promoters; however, in vertebrates, dispersed promoters are more common than focused promoters. In addition, core promoters may contain many different sequence motifs, such as the TATA box, BRE, Inr, MTE, DPE, DCE, and XCPE1, that specify different mechanisms of transcription and responses to enhancers. Thus, the core promoter is a sophisticated gateway to transcription that determines which signals will lead to transcription initiation.
Collapse
Affiliation(s)
- Tamar Juven-Gershon
- Section of Molecular Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0347, USA
| | | | | | | |
Collapse
|
68
|
Maston GA, Evans SK, Green MR. Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 2008; 7:29-59. [PMID: 16719718 DOI: 10.1146/annurev.genom.7.080505.115623] [Citation(s) in RCA: 567] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The faithful execution of biological processes requires a precise and carefully orchestrated set of steps that depend on the proper spatial and temporal expression of genes. Here we review the various classes of transcriptional regulatory elements (core promoters, proximal promoters, distal enhancers, silencers, insulators/boundary elements, and locus control regions) and the molecular machinery (general transcription factors, activators, and coactivators) that interacts with the regulatory elements to mediate precisely controlled patterns of gene expression. The biological importance of transcriptional regulation is highlighted by examples of how alterations in these transcriptional components can lead to disease. Finally, we discuss the methods currently used to identify transcriptional regulatory elements, and the ability of these methods to be scaled up for the purpose of annotating the entire human genome.
Collapse
Affiliation(s)
- Glenn A Maston
- Howard Hughes Medical Institute, Programs in Gene Function and Expression and Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA.
| | | | | |
Collapse
|
69
|
Abstract
We showed previously that anharmonic DNA dynamical features correlate with transcriptional activity in selected viral promoters, and hypothesized that areas of DNA softness may represent loci of functional significance. The nine known promoters from human adenovirus type 5 were analyzed for inherent DNA softness using the Peyrard-Bishop-Dauxois model and a statistical mechanics approach, using a transfer integral operator. We found a loosely defined pattern of softness peaks distributed both upstream and downstream of the transcriptional start sites, and that early transcriptional regions tended to be softer than late promoter regions. When reported transcription factor binding sites were superimposed on our calculated softness profiles, we observed a close correspondence in many cases, which suggests that DNA duplex breathing dynamics may play a role in protein recognition of specific nucleotide sequences and protein-DNA binding. These results suggest that genetic information is stored not only in explicit codon sequences, but also may be encoded into local dynamic and structural features, and that it may be possible to access this obscured information using DNA dynamics calculations.
Collapse
|
70
|
Moshonov S, Elfakess R, Golan-Mashiach M, Sinvani H, Dikstein R. Links between core promoter and basic gene features influence gene expression. BMC Genomics 2008; 9:92. [PMID: 18298820 PMCID: PMC2279122 DOI: 10.1186/1471-2164-9-92] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2007] [Accepted: 02/25/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Diversity in rates of gene expression is essential for basic cell functions and is controlled by a variety of intricate mechanisms. Revealing general mechanisms that control gene expression is important for understanding normal and pathological cell functions and for improving the design of expression systems. Here we analyzed the relationship between general features of genes and their contribution to expression levels. RESULTS Genes were divided into four groups according to their core promoter type and their characteristics analyzed statistically. Surprisingly we found that small variations in the TATA box are linked to large differences in gene length. Genes containing canonical TATA are generally short whereas long genes are associated with either non-canonical TATA or TATA-less promoters. These differences in gene length are primarily determined by the size and number of introns. Generally, gene expression was found to be tightly correlated with the strength of the TATA-box. However significant reduction in gene expression levels were linked with long TATA-containing genes (canonical and non-canonical) whereas intron length hardly affected the expression of TATA-less genes. Interestingly, features associated with high translation are prevalent in TATA-containing genes suggesting that their protein production is also more efficient. CONCLUSION Our results suggest that interplay between core promoter type and gene size can generate significant diversity in gene expression.
Collapse
Affiliation(s)
- Sandra Moshonov
- Department of Biological Chemistry, The Weizmann Institute of Science, Rehovot 76100, Israel.
| | | | | | | | | |
Collapse
|
71
|
Frith MC, Valen E, Krogh A, Hayashizaki Y, Carninci P, Sandelin A. A code for transcription initiation in mammalian genomes. Genes Dev 2008; 18:1-12. [PMID: 18032727 PMCID: PMC2134772 DOI: 10.1101/gr.6831208] [Citation(s) in RCA: 189] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2007] [Accepted: 10/14/2007] [Indexed: 11/24/2022]
Abstract
Genome-wide detection of transcription start sites (TSSs) has revealed that RNA Polymerase II transcription initiates at millions of positions in mammalian genomes. Most core promoters do not have a single TSS, but an array of closely located TSSs with different rates of initiation. As a rule, genes have more than one such core promoter; however, defining the boundaries between core promoters is not trivial. These discoveries prompt a re-evaluation of our models for transcription initiation. We describe a new framework for understanding the organization of transcription initiation. We show that initiation events are clustered on the chromosomes at multiple scales-clusters within clusters-indicating multiple regulatory processes. Within the smallest of such clusters, which can be interpreted as core promoters, the local DNA sequence predicts the relative transcription start usage of each nucleotide with a remarkable 91% accuracy, implying the existence of a DNA code that determines TSS selection. Conversely, the total expression strength of such clusters is only partially determined by the local DNA sequence. Thus, the overall control of transcription can be understood as a combination of large- and small-scale effects; the selection of transcription start sites is largely governed by the local DNA sequence, whereas the transcriptional activity of a locus is regulated at a different level; it is affected by distal features or events such as enhancers and chromatin remodeling.
Collapse
Affiliation(s)
- Martin C. Frith
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- ARC Centre in Bioinformatics, Institute for Molecular Bioscience, University of Queensland, Brisbane, Qld 4072, Australia
| | - Eivind Valen
- The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 København N, Denmark
| | - Anders Krogh
- The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 København N, Denmark
| | - Yoshihide Hayashizaki
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Piero Carninci
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Albin Sandelin
- The Bioinformatics Centre, Department of Molecular Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 København N, Denmark
| |
Collapse
|
72
|
DNA sequence and structural properties as predictors of human and mouse promoters. Gene 2007; 410:165-76. [PMID: 18234453 PMCID: PMC2672154 DOI: 10.1016/j.gene.2007.12.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2007] [Revised: 11/30/2007] [Accepted: 12/05/2007] [Indexed: 11/21/2022]
Abstract
Promoters play a central role in gene regulation, yet our power to discriminate them from non-promoter sequences in higher eukaryotes is mainly restricted to those associated with CpG islands. Here, we examined in silico the promoters of 30,954 human and 18,083 mouse transcripts in the DBTSS database, to assess the impact of particular sequence and structural features (propeller twist, bendability and nucleosome positioning preference) on promoter classification and prediction. Our analysis showed that a stricter-than-traditional definition of CpG islands captures low and high CpG count promoter classes more accurately than the traditional one. We observed that both human and mouse promoter sequences are flexible with the exception of the TATA box and TSS, which are rigid regions irrespective of association with a CpG island. Therefore varying levels of structural flexibility in promoters may affect their accessibility to proteins, and hence their specificity. For all features investigated, averaged values across core promoters discriminated CpG island associated promoters from background, whereas the same did not hold for promoters without a CpG island. However, local changes around - 34 to - 23 (expected position of TATA box) and the TSS were informative in discriminating promoters (both classes) from non-promoter sequences. Additionally, we investigated ATG deserts and observed that they occur in all promoter sets except those with a TATA-box and without a CpG island in human. Interestingly, all mouse promoter sets showed ATG codon depletion irrespective of the presence of a TATA-box, possibly reflecting a weaker contribution to TSS specificity in mouse.
Collapse
|
73
|
Diss JKJ, Calissano M, Gascoyne D, Djamgoz MBA, Latchman DS. Identification and characterization of the promoter region of the Nav1.7 voltage-gated sodium channel gene (SCN9A). Mol Cell Neurosci 2007; 37:537-47. [PMID: 18249135 DOI: 10.1016/j.mcn.2007.12.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2007] [Revised: 11/15/2007] [Accepted: 12/06/2007] [Indexed: 12/19/2022] Open
Abstract
The Nav1.7 sodium channel plays an important role in pain and is also upregulated in prostate cancer. To investigate the mechanisms regulating physiological and pathophysiological Nav1.7 expression we identified the core promoter of this gene (SCN9A) in the human genome. In silico genomic analysis revealed a putative SCN9A 5' non-coding exon approximately 64,000 nucleotides from the translation start site, expression of which commenced at three very closely-positioned transcription initiation sites (TISs), as determined by 5' RACE experiments. The genomic region around these TISs possesses numerous core elements of a TATA-less promoter within a well-defined CpG island. Importantly, it acted as a promoter when inserted upstream of luciferase in a fusion construct. Moreover, the activity of the promoter-luciferase construct ostensibly paralleled endogenous Nav1.7 mRNA levels in vitro, with both increased in a quantitatively and qualitatively similar manner by numerous factors (including NGF, phorbol esters, retinoic acid, and Brn-3a transcription factor over-expression).
Collapse
Affiliation(s)
- James K J Diss
- Medical Molecular Biology Unit, Institute of Child Health, University College London, Guilford Street, London WC1N 1EH, UK.
| | | | | | | | | |
Collapse
|
74
|
Isogai Y, Keles S, Prestel M, Hochheimer A, Tjian R. Transcription of histone gene cluster by differential core-promoter factors. Genes Dev 2007; 21:2936-49. [PMID: 17978101 PMCID: PMC2049195 DOI: 10.1101/gad.1608807] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2007] [Accepted: 09/21/2007] [Indexed: 12/16/2022]
Abstract
The 100 copies of tandemly arrayed Drosophila linker (H1) and core (H2A/B and H3/H4) histone gene cluster are coordinately regulated during the cell cycle. However, the molecular mechanisms that must allow differential transcription of linker versus core histones prevalent during development remain elusive. Here, we used fluorescence imaging, biochemistry, and genetics to show that TBP (TATA-box-binding protein)-related factor 2 (TRF2) selectively regulates the TATA-less Histone H1 gene promoter, while TBP/TFIID targets core histone transcription. Importantly, TRF2-depleted polytene chromosomes display severe chromosomal structural defects. This selective usage of TRF2 and TBP provides a novel mechanism to differentially direct transcription within the histone cluster. Moreover, genome-wide chromatin immunoprecipitation (ChIP)-on-chip analyses coupled with RNA interference (RNAi)-mediated functional studies revealed that TRF2 targets several classes of TATA-less promoters of >1000 genes including those driving transcription of essential chromatin organization and protein synthesis genes. Our studies establish that TRF2 promoter recognition complexes play a significantly more central role in governing metazoan transcription than previously appreciated.
Collapse
Affiliation(s)
- Yoh Isogai
- Department of Molecular and Cell Biology, University of California at Berkeley, Berkeley, California 94720, USA
| | - Sündüz Keles
- Department of Statistics, Department of Biostatistics, and Department of Medical Informatics, University of Wisconsin at Madison, Madison, Wisconsin 53706, USA
| | - Matthias Prestel
- Adolf-Butenandt-Institut, Molekularbiologie, 80336 Munich, Germany
| | | | - Robert Tjian
- Department of Molecular and Cell Biology, University of California at Berkeley, Berkeley, California 94720, USA
- Howard Hughes Medical Institute, University of California at Berkeley, Berkeley, California 94720, USA
- Li Ka-Shing Center for Biomedical and Health Sciences, University of California at Berkeley, Berkeley, California 94720, USA
| |
Collapse
|
75
|
Zhao X, Xuan Z, Zhang MQ. Boosting with stumps for predicting transcription start sites. Genome Biol 2007; 8:R17. [PMID: 17274821 PMCID: PMC1852414 DOI: 10.1186/gb-2007-8-2-r17] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2006] [Revised: 12/01/2006] [Accepted: 02/02/2007] [Indexed: 12/05/2022] Open
Abstract
CoreBoost applies a boosting technique to select important features for predicting core promoters with diverse patterns. Promoter prediction is a difficult but important problem in gene finding, and it is critical for elucidating the regulation of gene expression. We introduce a new promoter prediction program, CoreBoost, which applies a boosting technique with stumps to select important small-scale as well as large-scale features. CoreBoost improves greatly on locating transcription start sites. We also demonstrate that by further utilizing some tissue-specific information, better accuracy can be achieved.
Collapse
Affiliation(s)
- Xiaoyue Zhao
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA
| | - Zhenyu Xuan
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA
| | - Michael Q Zhang
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA
| |
Collapse
|
76
|
Deng W, Roberts SGE. TFIIB and the regulation of transcription by RNA polymerase II. Chromosoma 2007; 116:417-29. [PMID: 17593382 DOI: 10.1007/s00412-007-0113-9] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2007] [Revised: 05/21/2007] [Accepted: 05/21/2007] [Indexed: 02/01/2023]
Abstract
Accurate transcription of a gene by RNA polymerase II requires the assembly of a group of general transcription factors at the promoter. The general transcription factor TFIIB plays a central role in preinitiation complex assembly, providing a bridge between promoter-bound TFIID and RNA polymerase II. TFIIB makes extensive contact with the core promoter via two independent DNA-recognition modules. In addition to interacting with other general transcription factors, TFIIB directly modulates the catalytic center of RNA polymerase II in the transcription complex. Moreover, TFIIB has been proposed as a target of transcriptional activator proteins that act to stimulate preinitiation complex assembly. In this review, we will discuss our current understanding of these activities of TFIIB.
Collapse
Affiliation(s)
- Wensheng Deng
- Faculty of Life Sciences, University of Manchester, The Michael Smith Building, Oxford Road, Manchester, M13 9PT, UK
| | | |
Collapse
|
77
|
Malecová B, Gross P, Boyer-Guittaut M, Yavuz S, Oelgeschläger T. The initiator core promoter element antagonizes repression of TATA-directed transcription by negative cofactor NC2. J Biol Chem 2007; 282:24767-76. [PMID: 17584739 DOI: 10.1074/jbc.m702776200] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Core promoter regions of protein-coding genes in metazoan genomes are structurally highly diverse and can contain several distinct core promoter elements, which direct accurate transcription initiation and determine basal promoter strength. Diversity in core promoter structure is an important aspect of transcription regulation in metazoans as it provides a basis for gene-selective function of activators and repressors. The basal activity of TATA box-containing promoters is dramatically enhanced by the initiator element (INR), which can function in concert with the TATA box in a synergistic manner. Here we report that a functional INR provides resistance to NC2 (Dr1/DRAP1), a general repressor of TATA promoters. INR-mediated resistance to NC2 is established during transcription initiation complex assembly and requires TBP-associated factors (TAFs) and TAF- and INR-dependent cofactor activity. Remarkably, the INR appears to stimulate TATA-dependent transcription similar to activators by strongly enhancing recruitment of TFIIA and TFIIB and, at the same time, by compromising NC2 binding.
Collapse
Affiliation(s)
- Barbora Malecová
- Transcription Laboratory, Marie Curie Research Institute, The Chart, Oxted, Surrey RH8 0TL, United Kingdom
| | | | | | | | | |
Collapse
|
78
|
Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y, Hume DA. Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet 2007; 8:424-36. [PMID: 17486122 DOI: 10.1038/nrg2026] [Citation(s) in RCA: 367] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The identification and characterization of mammalian core promoters and transcription start sites is a prerequisite to understanding how RNA polymerase II transcription is controlled. New experimental technologies have enabled genome-wide discovery and characterization of core promoters, revealing that most mammalian genes do not conform to the simple model in which a TATA box directs transcription from a single defined nucleotide position. In fact, most genes have multiple promoters, within which there are multiple start sites, and alternative promoter usage generates diversity and complexity in the mammalian transcriptome and proteome. Promoters can be described by their start site usage distribution, which is coupled to the occurrence of cis-regulatory elements, gene function and evolutionary constraints. A comprehensive survey of mammalian promoters is a major step towards describing and understanding transcriptional control networks.
Collapse
Affiliation(s)
- Albin Sandelin
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | | | | | | | | | | |
Collapse
|
79
|
Vardhanabhuti S, Wang J, Hannenhalli S. Position and distance specificity are important determinants of cis-regulatory motifs in addition to evolutionary conservation. Nucleic Acids Res 2007; 35:3203-13. [PMID: 17452354 PMCID: PMC1904283 DOI: 10.1093/nar/gkm201] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Computational discovery of cis-regulatory elements remains challenging. To cope with the high false positives, evolutionary conservation is routinely used. However, conservation is only one of the attributes of cis-regulatory elements and is neither necessary nor sufficient. Here, we assess two additional attributes—positional and inter-motif distance specificity—that are critical for interactions between transcription factors. We first show that for a greater than expected fraction of known motifs, the genes that contain the motifs in their promoters in a position-specific or distance-specific manner are related, both in function and/or in expression pattern. We then use the position and distance specificity to discover novel motifs. Our work highlights the importance of distance and position specificity, in addition to the evolutionary conservation, in discovering cis-regulatory motifs.
Collapse
Affiliation(s)
| | | | - Sridhar Hannenhalli
- *To whom correspondence should be addressed. Tel: +215 746 8683; Fax: +215 573 3111;
| |
Collapse
|
80
|
Juven-Gershon T, Hsu JY, Kadonaga JT. Perspectives on the RNA polymerase II core promoter. Biochem Soc Trans 2007; 34:1047-50. [PMID: 17073747 DOI: 10.1042/bst0341047] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The RNA polymerase II core promoter is a critical yet often overlooked component in the transcription process. The core promoter is defined as the stretch of DNA, which encompasses the RNA start site and is typically approx. 40-50 nt in length, that directs the initiation of gene transcription. In the past, it has been generally presumed that core promoters are general in function and that transcription initiation occurs via a common shared mechanism. Recent studies have revealed, however, that there is considerable diversity in core promoter structure and function. There are a number of DNA elements that contribute to core promoter activity, and the specific properties of a given core promoter are dictated by the presence or absence of these core promoter motifs. The known core promoter elements include the TATA box, Inr (initiator), BRE(u) {BRE [TFIIB (transcription factor for RNA polymerase IIB) recognition element] upstream of the TATA box} and BRE(d) (BRE downstream of the TATA box), MTE (motif ten element), DCE (downstream core element) and DPE (downstream core promoter element). In this paper, we will provide some perspectives on current and future issues that pertain to the RNA polymerase II core promoter.
Collapse
Affiliation(s)
- T Juven-Gershon
- Section of Molecular Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0347, USA
| | | | | |
Collapse
|
81
|
Abstract
The general transcription factor TFIIB (transcription factor IIB) plays a critical role in the assembly of the RNA polymerase II pre-initiation complex. TFIIB can make sequence-specific DNA contacts both upstream and downstream of the TATA box. This has led to the definition of two core promoter BREs (TFIIB-recognition elements), one upstream [BRE(u) (upstream BRE)] and one downstream of TATA box [BRE(d) (downstream BRE)]. TFIIB-BRE(u) and TFIIB-BRE(d) contacts are mediated by two independent DNA-recognition motifs within the core domain of TFIIB. Both the BRE(u) and the BRE(d) modulate the transcriptional potency of a promoter. However, the net effect of the BREs on promoter activity is dependent on the specific blend of elements present within a core promoter.
Collapse
Affiliation(s)
- W Deng
- Faculty of Life Sciences, The Michael Smith Building, The University of Manchester, Oxford Road, Manchester M13 9PT, UK
| | | |
Collapse
|
82
|
Juven-Gershon T, Cheng S, Kadonaga JT. Rational design of a super core promoter that enhances gene expression. Nat Methods 2007; 3:917-22. [PMID: 17124735 DOI: 10.1038/nmeth937] [Citation(s) in RCA: 157] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Transcription is a critical component in the expression of genes. Here we describe the design and analysis of a potent core promoter, termed super core promoter 1 (SCP1), which directs high amounts of transcription by RNA polymerase II in metazoans. SCP1 contains four core promoter motifs-the TATA box, initiator (Inr), motif ten element (MTE) and downstream promoter element (DPE)-in a single promoter, and is distinctly stronger than the cytomegalovirus (CMV) IE1 and adenovirus major late (AdML) core promoters both in vitro and in vivo. Each of the four core promoter motifs is needed for full SCP1 activity. SCP1 is bound efficiently by TFIID and exhibits a high propensity to form productive transcription complexes. SCP1 and related super core promoters (SCPs) with multiple core promoter motifs will be useful for the biophysical analysis of TFIID binding to DNA, the biochemical investigation of the transcription process and the enhancement of gene expression in cells.
Collapse
Affiliation(s)
- Tamar Juven-Gershon
- Section of Molecular Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | | | | |
Collapse
|
83
|
Cis-motifs upstream of the transcription and translation initiation sites are effectively revealed by their positional disequilibrium in eukaryote genomes using frequency distribution curves. BMC Bioinformatics 2006; 7:522. [PMID: 17137509 PMCID: PMC1698937 DOI: 10.1186/1471-2105-7-522] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2006] [Accepted: 11/30/2006] [Indexed: 11/24/2022] Open
Abstract
Background The discovery of cis-regulatory motifs still remains a challenging task even though the number of sequenced genomes is constantly growing. Computational analyses using pattern search algorithms have been valuable in phylogenetic footprinting approaches as have expression profile experiments to predict co-occurring motifs. Surprisingly little is known about the nature of cis-regulatory element (CRE) distribution in promoters. Results In this paper we used the Motif Mapper open-source collection of visual basic scripts for the analysis of motifs in any aligned set of DNA sequences. We focused on promoter motif distribution curves to identify positional over-representation of DNA motifs. Using differentially aligned datasets from the model species Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster and Saccharomyces cerevisiae, we convincingly demonstrated the importance of the position and orientation for motif discovery. Analysis with known CREs and all possible hexanucleotides showed that some functional elements gather close to the transcription and translation initiation sites and that elements other than the TATA-box motif are conserved between eukaryote promoters. While a high background frequency usually decreases the effectiveness of such an enumerative investigation, we improved our analysis by conducting motif distribution maps using large datasets. Conclusion This is the first study to reveal positional over-representation of CREs and promoter motifs in a cross-species approach. CREs and motifs shared between eukaryotic promoters support the observation that an eukaryotic promoter structure has been conserved throughout evolutionary time. Furthermore, with the information on positional enrichment of a motif or a known functional CRE, it is possible to get a more detailed insight into where an element appears to function. This in turn might accelerate the in depth examination of known and yet unknown cis-regulatory sequences in the laboratory.
Collapse
|
84
|
Yang C, Bolotin E, Jiang T, Sladek FM, Martinez E. Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters. Gene 2006; 389:52-65. [PMID: 17123746 PMCID: PMC1955227 DOI: 10.1016/j.gene.2006.09.029] [Citation(s) in RCA: 256] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2006] [Revised: 09/12/2006] [Accepted: 09/22/2006] [Indexed: 10/24/2022]
Abstract
The core promoter of eukaryotic genes is the minimal DNA region that recruits the basal transcription machinery to direct efficient and accurate transcription initiation. The fraction of human and yeast genes that contain specific core promoter elements such as the TATA box and the initiator (INR) remains unclear and core promoter motifs specific for TATA-less genes remain to be identified. Here, we present genome-scale computational analyses indicating that approximately 76% of human core promoters lack TATA-like elements, have a high GC content, and are enriched in Sp1-binding sites. We further identify two motifs - M3 (SCGGAAGY) and M22 (TGCGCANK) - that occur preferentially in human TATA-less core promoters. About 24% of human genes have a TATA-like element and their promoters are generally AT-rich; however, only approximately 10% of these TATA-containing promoters have the canonical TATA box (TATAWAWR). In contrast, approximately 46% of human core promoters contain the consensus INR (YYANWYY) and approximately 30% are INR-containing TATA-less genes. Significantly, approximately 46% of human promoters lack both TATA-like and consensus INR elements. Surprisingly, mammalian-type INR sequences are present - and tend to cluster - in the transcription start site (TSS) region of approximately 40% of yeast core promoters and the frequency of specific core promoter types appears to be conserved in yeast and human genomes. Gene Ontology analyses reveal that TATA-less genes in humans, as in yeast, are frequently involved in basic "housekeeping" processes, while TATA-containing genes are more often highly regulated, such as by biotic or stress stimuli. These results reveal unexpected similarities in the occurrence of specific core promoter types and in their associated biological processes in yeast and humans and point to novel vertebrate-specific DNA motifs that might play a selective role in TATA-independent transcription.
Collapse
Affiliation(s)
- Chuhu Yang
- Genetics Genomics and Bioinformatics Graduate Program, University of California, Riverside, CA 92521, USA
| | | | | | | | | |
Collapse
|
85
|
Abstract
In eukaryotes, the core promoter serves as a platform for the assembly of transcription preinitiation complex (PIC) that includes TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH, and RNA polymerase II (pol II), which function collectively to specify the transcription start site. PIC formation usually begins with TFIID binding to the TATA box, initiator, and/or downstream promoter element (DPE) found in most core promoters, followed by the entry of other general transcription factors (GTFs) and pol II through either a sequential assembly or a preassembled pol II holoenzyme pathway. Formation of this promoter-bound complex is sufficient for a basal level of transcription. However, for activator-dependent (or regulated) transcription, general cofactors are often required to transmit regulatory signals between gene-specific activators and the general transcription machinery. Three classes of general cofactors, including TBP-associated factors (TAFs), Mediator, and upstream stimulatory activity (USA)-derived positive cofactors (PC1/PARP-1, PC2, PC3/DNA topoisomerase I, and PC4) and negative cofactor 1 (NC1/HMGB1), normally function independently or in combination to fine-tune the promoter activity in a gene-specific or cell-type-specific manner. In addition, other cofactors, such as TAF1, BTAF1, and negative cofactor 2 (NC2), can also modulate TBP or TFIID binding to the core promoter. In general, these cofactors are capable of repressing basal transcription when activators are absent and stimulating transcription in the presence of activators. Here we review the roles of these cofactors and GTFs, as well as TBP-related factors (TRFs), TAF-containing complexes (TFTC, SAGA, SLIK/SALSA, STAGA, and PRC1) and TAF variants, in pol II-mediated transcription, with emphasis on the events occurring after the chromatin has been remodeled but prior to the formation of the first phosphodiester bond.
Collapse
Affiliation(s)
- Mary C Thomas
- Department of Biochemistry, Case Western Reserve University School of Medicine, Cleveland, OH 44106-4935, USA
| | | |
Collapse
|
86
|
Stewart JJ, Fischbeck JA, Chen X, Stargell LA. Non-optimal TATA Elements Exhibit Diverse Mechanistic Consequences. J Biol Chem 2006; 281:22665-73. [PMID: 16772290 DOI: 10.1074/jbc.m603237200] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
To reveal mechanistic differences in transcription initiation between variant TATA elements, in vivo and in vitro assays of the functional activity of 14 different sequences were compared. Variant elements exhibited particular degrees of activation in vivo but universally were unable to support the -fold activation observed for an element consisting of TATAAA. Each element was classified by its functional activity for in vitro interaction with TATA-binding protein (TBP), TFIIA, and TFIIB. Certain off-consensus TATA elements form poor binding sites for TBP and this compromised interaction interferes with higher order complex formation with TFIIA and/or TFIIB. Other elements are only modestly decreased for TBP binding but dramatically affected for higher order complex formation. Another distinct category is comprised of two elements (CATAAA and TATAAG), which are not affected in the initial formation of the TBP, TFIIA-TBP, or TFIIB-TBP complexes. However, CATAAA and TATAAG are unable to form a stable TFIIA-TBP-DNA complex in vitro. Moreover, fusion of TFIIA to TBP specifically restores activity from these two elements in vivo. Taken together, these results indicate that the interplay between the sequence of the TATA element and the components of the general transcription machinery can lead to variations in the formation of functional complexes and/or the stability of these complexes. These differences offer distinct opportunities for an organism to exploit diverse steps in the regulation of gene expression depending on the precise TATA element sequence at a given gene.
Collapse
Affiliation(s)
- Jennifer J Stewart
- Department of Biochemistry and Molecular Biology, Colorado State University, Fort Collins, Colorado 80523-1870, USA
| | | | | | | |
Collapse
|
87
|
Luo Q, Yang W, Liu P. Promoter recognition based on the Interpolated Markov Chains optimized via simulated annealing and genetic algorithm. Pattern Recognit Lett 2006. [DOI: 10.1016/j.patrec.2005.11.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
88
|
Gershenzon NI, Trifonov EN, Ioshikhes IP. The features of Drosophila core promoters revealed by statistical analysis. BMC Genomics 2006; 7:161. [PMID: 16790048 PMCID: PMC1538597 DOI: 10.1186/1471-2164-7-161] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2006] [Accepted: 06/21/2006] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Experimental investigation of transcription is still a very labor- and time-consuming process. Only a few transcription initiation scenarios have been studied in detail. The mechanism of interaction between basal machinery and promoter, in particular core promoter elements, is not known for the majority of identified promoters. In this study, we reveal various transcription initiation mechanisms by statistical analysis of 3393 nonredundant Drosophila promoters. RESULTS Using Drosophila-specific position-weight matrices, we identified promoters containing TATA box, Initiator, Downstream Promoter Element (DPE), and Motif Ten Element (MTE), as well as core elements discovered in Human (TFIIB Recognition Element (BRE) and Downstream Core Element (DCE)). Promoters utilizing known synergetic combinations of two core elements (TATA_Inr, Inr_MTE, Inr_DPE, and DPE_MTE) were identified. We also establish the existence of promoters with potentially novel synergetic combinations: TATA_DPE and TATA_MTE. Our analysis revealed several motifs with the features of promoter elements, including possible novel core promoter element(s). Comparison of Human and Drosophila showed consistent percentages of promoters with TATA, Inr, DPE, and synergetic combinations thereof, as well as most of the same functional and mutual positions of the core elements. No statistical evidence of MTE utilization in Human was found. Distinct nucleosome positioning in particular promoter classes was revealed. CONCLUSION We present lists of promoters that potentially utilize the aforementioned elements/combinations. The number of these promoters is two orders of magnitude larger than the number of promoters in which transcription initiation was experimentally studied. The sequences are ready to be experimentally tested or used for further statistical analysis. The developed approach may be utilized for other species.
Collapse
Affiliation(s)
- Naum I Gershenzon
- Department of Biomedical Informatics, The Ohio State University, 333 West 10Avenue, Columbus OH 43210, USA
- Department of Physics, Wright State University, Dayton OH 45435, USA
| | - Edward N Trifonov
- Genome Diversity Center, Institute of Evolution, University of Haifa, Haifa 31905, Israel
| | - Ilya P Ioshikhes
- Department of Biomedical Informatics, The Ohio State University, 333 West 10Avenue, Columbus OH 43210, USA
| |
Collapse
|
89
|
Jin VX, Singer GAC, Agosto-Pérez FJ, Liyanarachchi S, Davuluri RV. Genome-wide analysis of core promoter elements from conserved human and mouse orthologous pairs. BMC Bioinformatics 2006; 7:114. [PMID: 16522199 PMCID: PMC1475891 DOI: 10.1186/1471-2105-7-114] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2005] [Accepted: 03/07/2006] [Indexed: 01/20/2023] Open
Abstract
Background The canonical core promoter elements consist of the TATA box, initiator (Inr), downstream core promoter element (DPE), TFIIB recognition element (BRE) and the newly-discovered motif 10 element (MTE). The motifs for these core promoter elements are highly degenerate, which tends to lead to a high false discovery rate when attempting to detect them in promoter sequences. Results In this study, we have performed the first analysis of these core promoter elements in orthologous mouse and human promoters with experimentally-supported transcription start sites. We have identified these various elements using a combination of positional weight matrices (PWMs) and the degree of conservation of orthologous mouse and human sequences – a procedure that significantly reduces the false positive rate of motif discovery. Our analysis of 9,010 orthologous mouse-human promoter pairs revealed two combinations of three-way synergistic effects, TATA-Inr-MTE and BRE-Inr-MTE. The former has previously been putatively identified in human, but the latter represents a novel synergistic relationship. Conclusion Our results demonstrate that DNA sequence conservation can greatly improve the identification of functional core promoter elements in the human genome. The data also underscores the importance of synergistic occurrence of two or more core promoter elements. Furthermore, the sequence data and results presented here can help build better computational models for predicting the transcription start sites in the promoter regions, which remains one of the most challenging problems.
Collapse
Affiliation(s)
- Victor X Jin
- Human Cancer Genetics Program, Comprehensive Cancer Center, Department of Molecular Virology, Immunology, and Medical Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Gregory AC Singer
- Human Cancer Genetics Program, Comprehensive Cancer Center, Department of Molecular Virology, Immunology, and Medical Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Francisco J Agosto-Pérez
- Human Cancer Genetics Program, Comprehensive Cancer Center, Department of Molecular Virology, Immunology, and Medical Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Sandya Liyanarachchi
- Human Cancer Genetics Program, Comprehensive Cancer Center, Department of Molecular Virology, Immunology, and Medical Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Ramana V Davuluri
- Human Cancer Genetics Program, Comprehensive Cancer Center, Department of Molecular Virology, Immunology, and Medical Genetics, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
90
|
Lee DH, Gershenzon N, Gupta M, Ioshikhes IP, Reinberg D, Lewis BA. Functional characterization of core promoter elements: the downstream core element is recognized by TAF1. Mol Cell Biol 2005; 25:9674-86. [PMID: 16227614 PMCID: PMC1265815 DOI: 10.1128/mcb.25.21.9674-9686.2005] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Downstream elements are a newly appreciated class of core promoter elements of RNA polymerase II-transcribed genes. The downstream core element (DCE) was discovered in the human beta-globin promoter, and its sequence composition is distinct from that of the downstream promoter element (DPE). We show here that the DCE is a bona fide core promoter element present in a large number of promoters and with high incidence in promoters containing a TATA motif. Database analysis indicates that the DCE is found in diverse promoters, supporting its functional relevance in a variety of promoter contexts. The DCE consists of three subelements, and DCE function is recapitulated in a TFIID-dependent manner. Subelement 3 can function independently of the other two and shows a TFIID requirement as well. UV photo-cross-linking results demonstrate that TAF1/TAF(II)250 interacts with the DCE subelement DNA in a sequence-dependent manner. These data show that downstream elements consist of at least two types, those of the DPE class and those of the DCE class; they function via different DNA sequences and interact with different transcription activation factors. Finally, these data argue that TFIID is, in fact, a core promoter recognition complex.
Collapse
Affiliation(s)
- Dong-Hoon Lee
- Department of Biochemistry, Robert Woods Johnson Medical School, 683 Hoes Lane, Piscataway, NJ 08854, USA
| | | | | | | | | | | |
Collapse
|
91
|
Buckland PR. The importance and identification of regulatory polymorphisms and their mechanisms of action. Biochim Biophys Acta Mol Basis Dis 2005; 1762:17-28. [PMID: 16297602 DOI: 10.1016/j.bbadis.2005.10.004] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2005] [Revised: 10/11/2005] [Accepted: 10/11/2005] [Indexed: 01/16/2023]
Abstract
The search for the genetic variations underlying all human phenotypes is in its infancy but must be one of the long term goals of the scientific community. There is evidence that most, if not all human phenotypes, including illnesses are influenced by the genetic makeup of the individual. There are an estimated 11 million human genetic polymorphisms with a minor allele frequency >1% and possibly many times that number of rare sequence variants. The proportion of these sequence variants which have any functional effect is unknown but it is likely that the majority of those which influence illness lie outside of the amino acid coding regions of genes, and affect the regulation of gene expression--these are called rSNPs. Recent research suggests that about 50% of genes have one or more common rSNPs associated with them and probably most if not all genes have an rSNP within the human population. In the long term, determining which polymorphisms are potentially functional must be done bio-informatically using algorithms based upon experimental data. However, at the current time, the limited data that has been obtained does not allow the creation of such an algorithm. In vitro studies suggest that a large proportion of rSNPs lie within the core and proximal promoter regions of genes but it is not clear how the majority of these influence transcription, as they do not appear to be within any known transcription factor binding sites. However, promoter regions possess a number of sequence-dependent characteristics which make them distinct from the rest of the genome, namely stability, curvature and flexibility. Subtle changes to these features may underlie the mechanisms by which many polymorphisms exert their function.
Collapse
Affiliation(s)
- Paul R Buckland
- Department of Psychological Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK.
| |
Collapse
|
92
|
Gershenzon NI, Stormo GD, Ioshikhes IP. Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites. Nucleic Acids Res 2005; 33:2290-301. [PMID: 15849315 PMCID: PMC1084321 DOI: 10.1093/nar/gki519] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Position-weight matrices (PWMs) are broadly used to locate transcription factor binding sites in DNA sequences. The majority of existing PWMs provide a low level of both sensitivity and specificity. We present a new computational algorithm, a modification of the Staden–Bucher approach, that improves the PWM. We applied the proposed technique on the PWM of the GC-box, binding site for Sp1. The comparison of old and new PWMs shows that the latter increase both sensitivity and specificity. The statistical parameters of GC-box distribution in promoter regions and in the human genome, as well as in each chromosome, are presented. The majority of commonly used PWMs are the 4-row mononucleotide matrices, although 16-row dinucleotide matrices are known to be more informative. The algorithm efficiently determines the 16-row matrices and preliminary results show that such matrices provide better results than 4-row matrices.
Collapse
Affiliation(s)
- Naum I Gershenzon
- Department of Biomedical Informatics, The Ohio State University 3184 Graves Hall, 333 W. 10th Avenue, Columbus, OH 43210, USA.
| | | | | |
Collapse
|