1
|
Kapun M, Mitchell ED, Kawecki TJ, Schmidt P, Flatt T. An Ancestral Balanced Inversion Polymorphism Confers Global Adaptation. Mol Biol Evol 2023; 40:msad118. [PMID: 37220650 PMCID: PMC10234209 DOI: 10.1093/molbev/msad118] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 04/17/2023] [Accepted: 05/19/2023] [Indexed: 05/25/2023] Open
Abstract
Since the pioneering work of Dobzhansky in the 1930s and 1940s, many chromosomal inversions have been identified, but how they contribute to adaptation remains poorly understood. In Drosophila melanogaster, the widespread inversion polymorphism In(3R)Payne underpins latitudinal clines in fitness traits on multiple continents. Here, we use single-individual whole-genome sequencing, transcriptomics, and published sequencing data to study the population genomics of this inversion on four continents: in its ancestral African range and in derived populations in Europe, North America, and Australia. Our results confirm that this inversion originated in sub-Saharan Africa and subsequently became cosmopolitan; we observe marked monophyletic divergence of inverted and noninverted karyotypes, with some substructure among inverted chromosomes between continents. Despite divergent evolution of this inversion since its out-of-Africa migration, derived non-African populations exhibit similar patterns of long-range linkage disequilibrium between the inversion breakpoints and major peaks of divergence in its center, consistent with balancing selection and suggesting that the inversion harbors alleles that are maintained by selection on several continents. Using RNA-sequencing, we identify overlap between inversion-linked single-nucleotide polymorphisms and loci that are differentially expressed between inverted and noninverted chromosomes. Expression levels are higher for inverted chromosomes at low temperature, suggesting loss of buffering or compensatory plasticity and consistent with higher inversion frequency in warm climates. Our results suggest that this ancestrally tropical balanced polymorphism spread around the world and became latitudinally assorted along similar but independent climatic gradients, always being frequent in subtropical/tropical areas but rare or absent in temperate climates.
Collapse
Affiliation(s)
- Martin Kapun
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Division of Cell and Developmental Biology, Medical University of Vienna, Vienna, Austria
- Natural History Museum Vienna, Zentrale Forschungslaboratorien, Vienna, Austria
| | - Esra Durmaz Mitchell
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Tadeusz J Kawecki
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Paul Schmidt
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Thomas Flatt
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Department of Biology, University of Fribourg, Fribourg, Switzerland
| |
Collapse
|
2
|
Bu L, Cripps RM. Promoter architecture of Drosophila genes regulated by Myocyte enhancer factor-2. PLoS One 2022; 17:e0271554. [PMID: 35862472 PMCID: PMC9302807 DOI: 10.1371/journal.pone.0271554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 07/01/2022] [Indexed: 11/18/2022] Open
Abstract
To gain understanding into the mechanisms of transcriptional activation of muscle genes, we sought to determine if genes targeted by the myogenic transcription factor Myocyte enhancer factor-2 (MEF2) were enriched for specific core promoter elements. We identified 330 known MEF2 target promoters in Drosophila, and analyzed them for for the presence and location of 17 known consensus promoter sequences. As a control, we also searched all Drosophila RNA polymerase II-dependent promoters for the same sequences. We found that promoter motifs were readily detected in the MEF2 target dataset, and that many of them were slightly enriched in frequency compared to the control dataset. A prominent sequence over-represented in the MEF2 target genes was NDM2, that appeared in over 50% of MEF2 target genes and was 2.5-fold over-represented in MEF2 targets compared to background. To test the functional significance of NDM2, we identified two promoters containing a single copy of NDM2 plus an upstream MEF2 site, and tested the activity of these promoters in vivo. Both the sticks and stones and Kahuli fragments showed strong skeletal myoblast-specific expression of a lacZ reporter in embryos. However, the timing and level of reporter expression was unaffected when the NDM2 site in either element was mutated. These studies identify variations in promoter architecture for a set of regulated genes compared to all RNA polymerase II-dependent genes, and underline the potential redundancy in the activities of some core promoter elements.
Collapse
Affiliation(s)
- Lijing Bu
- Department of Biology and Center for Evolutionary and Theoretical Immunology, University of New Mexico, Albuquerque, NM, United States of America
| | - Richard M. Cripps
- Department of Biology, San Diego State University, San Diego, CA, United States of America
| |
Collapse
|
3
|
Ray M, Larschan E. Getting started: altering promoter choice as a mechanism for cell type differentiation. Genes Dev 2020; 34:619-620. [PMID: 32358039 PMCID: PMC7197355 DOI: 10.1101/gad.338723.120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
In this issue of Genes & Development, Lu and colleagues (pp. 663-677) have discovered a key new mechanism of alternative promoter choice that is involved in differentiation of spermatocytes. Promoter choice has strong potential as mechanism for differentiation of many different cell types.
Collapse
Affiliation(s)
- Mukulika Ray
- Department of Molecular Biology, Cellular Biology, and Biochemistry, Brown University, Providence, Rhode Island 02912, USA
| | - Erica Larschan
- Department of Molecular Biology, Cellular Biology, and Biochemistry, Brown University, Providence, Rhode Island 02912, USA
| |
Collapse
|
4
|
Castro-Mondragon JA, Jaeger S, Thieffry D, Thomas-Chollier M, van Helden J. RSAT matrix-clustering: dynamic exploration and redundancy reduction of transcription factor binding motif collections. Nucleic Acids Res 2017; 45:e119. [PMID: 28591841 PMCID: PMC5737723 DOI: 10.1093/nar/gkx314] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 06/04/2017] [Indexed: 01/08/2023] Open
Abstract
Transcription factor (TF) databases contain multitudes of binding motifs (TFBMs) from various sources, from which non-redundant collections are derived by manual curation. The advent of high-throughput methods stimulated the production of novel collections with increasing numbers of motifs. Meta-databases, built by merging these collections, contain redundant versions, because available tools are not suited to automatically identify and explore biologically relevant clusters among thousands of motifs. Motif discovery from genome-scale data sets (e.g. ChIP-seq) also produces redundant motifs, hampering the interpretation of results. We present matrix-clustering, a versatile tool that clusters similar TFBMs into multiple trees, and automatically creates non-redundant TFBM collections. A feature unique to matrix-clustering is its dynamic visualisation of aligned TFBMs, and its capability to simultaneously treat multiple collections from various sources. We demonstrate that matrix-clustering considerably simplifies the interpretation of combined results from multiple motif discovery tools, and highlights biologically relevant variations of similar motifs. We also ran a large-scale application to cluster ∼11 000 motifs from 24 entire databases, showing that matrix-clustering correctly groups motifs belonging to the same TF families, and drastically reduced motif redundancy. matrix-clustering is integrated within the RSAT suite (http://rsat.eu/), accessible through a user-friendly web interface or command-line for its integration in pipelines.
Collapse
Affiliation(s)
| | | | - Denis Thieffry
- IBENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France
| | - Morgane Thomas-Chollier
- IBENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research University, F-75005 Paris, France
| | - Jacques van Helden
- Aix Marseille Univ, INSERM, TAGC, Theory and Approaches of Genomic Complexity, UMR_S 1090, Marseille, France
| |
Collapse
|
5
|
Spanier KI, Jansen M, Decaestecker E, Hulselmans G, Becker D, Colbourne JK, Orsini L, De Meester L, Aerts S. Conserved Transcription Factors Steer Growth-Related Genomic Programs in Daphnia. Genome Biol Evol 2017; 9:1821-1842. [PMID: 28854641 PMCID: PMC5569996 DOI: 10.1093/gbe/evx127] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2017] [Indexed: 02/06/2023] Open
Abstract
Ecological genomics aims to understand the functional association between environmental gradients and the genes underlying adaptive traits. Many genes that are identified by genome-wide screening in ecologically relevant species lack functional annotations. Although gene functions can be inferred from sequence homology, such approaches have limited power. Here, we introduce ecological regulatory genomics by presenting an ontology-free gene prioritization method. Specifically, our method combines transcriptome profiling with high-throughput cis-regulatory sequence analysis in the water fleas Daphnia pulex and Daphnia magna. It screens coexpressed genes for overrepresented DNA motifs that serve as transcription factor binding sites, thereby providing insight into conserved transcription factors and gene regulatory networks shaping the expression profile. We first validated our method, called Daphnia-cisTarget, on a D. pulex heat shock data set, which revealed a network driven by the heat shock factor. Next, we performed RNA-Seq in D. magna exposed to the cyanobacterium Microcystis aeruginosa. Daphnia-cisTarget identified coregulated gene networks that associate with the moulting cycle and potentially regulate life history changes in growth rate and age at maturity. These networks are predicted to be regulated by evolutionary conserved transcription factors such as the homologues of Drosophila Shavenbaby and Grainyhead, nuclear receptors, and a GATA family member. In conclusion, our approach allows prioritising candidate genes in Daphnia without bias towards prior knowledge about functional gene annotation and represents an important step towards exploring the molecular mechanisms of ecological responses in organisms with poorly annotated genomes.
Collapse
Affiliation(s)
- Katina I. Spanier
- Department of Biology, Laboratory of Aquatic Ecology, Evolution and Conservation, KU Leuven, Belgium
- Department of Human Genetics, Laboratory of Computational Biology, KU Leuven, Belgium
- VIB Center for Brain and Disease Research, KU Leuven, Belgium
| | - Mieke Jansen
- Department of Biology, Laboratory of Aquatic Ecology, Evolution and Conservation, KU Leuven, Belgium
| | - Ellen Decaestecker
- Department of Biology, Laboratory of Aquatic Biology, Science and Technology, KU Leuven Campus Kulak, Kortrjik, Belgium
| | - Gert Hulselmans
- Department of Human Genetics, Laboratory of Computational Biology, KU Leuven, Belgium
- VIB Center for Brain and Disease Research, KU Leuven, Belgium
| | - Dörthe Becker
- Environmental Genomics Group, School of Biosciences, College of Life and Environmental Sciences, University of Birmingham, United Kingdom
- Department of Animal and Plant Sciences, University of Sheffield, Western Bank, United Kingdom
| | - John K. Colbourne
- Environmental Genomics Group, School of Biosciences, College of Life and Environmental Sciences, University of Birmingham, United Kingdom
| | - Luisa Orsini
- Environmental Genomics Group, School of Biosciences, College of Life and Environmental Sciences, University of Birmingham, United Kingdom
| | - Luc De Meester
- Department of Biology, Laboratory of Aquatic Ecology, Evolution and Conservation, KU Leuven, Belgium
| | - Stein Aerts
- Department of Human Genetics, Laboratory of Computational Biology, KU Leuven, Belgium
- VIB Center for Brain and Disease Research, KU Leuven, Belgium
| |
Collapse
|
6
|
Pascual-Garcia P, Debo B, Aleman JR, Talamas JA, Lan Y, Nguyen NH, Won KJ, Capelson M. Metazoan Nuclear Pores Provide a Scaffold for Poised Genes and Mediate Induced Enhancer-Promoter Contacts. Mol Cell 2017; 66:63-76.e6. [PMID: 28366641 DOI: 10.1016/j.molcel.2017.02.020] [Citation(s) in RCA: 99] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Revised: 01/19/2017] [Accepted: 02/17/2017] [Indexed: 01/09/2023]
Abstract
Nuclear pore complex components (Nups) have been implicated in transcriptional regulation, yet what regulatory steps are controlled by metazoan Nups remains unclear. We identified the presence of multiple Nups at promoters, enhancers, and insulators in the Drosophila genome. In line with this binding, we uncovered a functional role for Nup98 in mediating enhancer-promoter looping at ecdysone-inducible genes. These genes were found to be stably associated with nuclear pores before and after activation. Although changing levels of Nup98 disrupted enhancer-promoter contacts, it did not affect ongoing transcription but instead compromised subsequent transcriptional activation or transcriptional memory. In support of the enhancer-looping role, we found Nup98 to gain and retain physical interactions with architectural proteins upon stimulation with ecdysone. Together, our data identify Nups as a class of architectural proteins for enhancers and supports a model in which animal genomes use the nuclear pore as an organizing scaffold for inducible poised genes.
Collapse
Affiliation(s)
- Pau Pascual-Garcia
- Department of Cell and Developmental Biology, Epigenetics Program, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Brian Debo
- Department of Cell and Developmental Biology, Epigenetics Program, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jennifer R Aleman
- Department of Cell and Developmental Biology, Epigenetics Program, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jessica A Talamas
- Department of Cell and Developmental Biology, Epigenetics Program, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yemin Lan
- Department of Cell and Developmental Biology, Epigenetics Program, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Nha H Nguyen
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Kyoung J Won
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Maya Capelson
- Department of Cell and Developmental Biology, Epigenetics Program, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
7
|
Schor IE, Degner JF, Harnett D, Cannavò E, Casale FP, Shim H, Garfield DA, Birney E, Stephens M, Stegle O, Furlong EEM. Promoter shape varies across populations and affects promoter evolution and expression noise. Nat Genet 2017; 49:550-558. [PMID: 28191888 DOI: 10.1038/ng.3791] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 01/20/2017] [Indexed: 12/29/2022]
Abstract
Animal promoters initiate transcription either at precise positions (narrow promoters) or dispersed regions (broad promoters), a distinction referred to as promoter shape. Although highly conserved, the functional properties of promoters with different shapes and the genetic basis of their evolution remain unclear. Here we used natural genetic variation across a panel of 81 Drosophila lines to measure changes in transcriptional start site (TSS) usage, identifying thousands of genetic variants affecting transcript levels (strength) or the distribution of TSSs within a promoter (shape). Our results identify promoter shape as a molecular trait that can evolve independently of promoter strength. Broad promoters typically harbor shape-associated variants, with signatures of adaptive selection. Single-cell measurements demonstrate that variants modulating promoter shape often increase expression noise, whereas heteroallelic interactions with other promoter variants alleviate these effects. These results uncover new functional properties of natural promoters and suggest the minimization of expression noise as an important factor in promoter evolution.
Collapse
Affiliation(s)
- Ignacio E Schor
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Jacob F Degner
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Dermot Harnett
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Enrico Cannavò
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Francesco P Casale
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Heejung Shim
- Department of Statistics, Purdue University, West Lafayette, Indiana, USA
| | - David A Garfield
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL) Genome Biology Unit, Heidelberg, Germany
| |
Collapse
|
8
|
Raborn RT, Spitze K, Brendel VP, Lynch M. Promoter Architecture and Sex-Specific Gene Expression in Daphnia pulex. Genetics 2016; 204:593-612. [PMID: 27585846 PMCID: PMC5068849 DOI: 10.1534/genetics.116.193334] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2016] [Accepted: 07/29/2016] [Indexed: 11/18/2022] Open
Abstract
Large-scale transcription start site (TSS) profiling produces a high-resolution, quantitative picture of transcription initiation and core promoter locations within a genome. However, application of TSS profiling to date has largely been restricted to a small set of prominent model systems. We sought to characterize the cis-regulatory landscape of the water flea Daphnia pulex, an emerging model arthropod that reproduces both asexually (via parthenogenesis) and sexually (via meiosis). We performed Cap Analysis of Gene Expression (CAGE) with RNA isolated from D. pulex within three developmental states: sexual females, asexual females, and males. Identified TSSs were utilized to generate a "Daphnia Promoter Atlas," i.e., a catalog of active promoters across the surveyed states. Analysis of the distribution of promoters revealed evidence for widespread alternative promoter usage in D. pulex, in addition to a prominent fraction of compactly-arranged promoters in divergent orientations. We carried out de novo motif discovery using CAGE-defined TSSs and identified eight candidate core promoter motifs; this collection includes canonical promoter elements (e.g., TATA and Initiator) in addition to others lacking obvious orthologs. A comparison of promoter activities found evidence for considerable state-specific differential gene expression between states. Our work represents the first global definition of transcription initiation and promoter architecture in crustaceans. The Daphnia Promoter Atlas presented here provides a valuable resource for comparative study of cis-regulatory regions in metazoans, as well as for investigations into the circuitries that underpin meiosis and parthenogenesis.
Collapse
Affiliation(s)
- R Taylor Raborn
- Department of Biology, Indiana University, Bloomington, Indiana 47405 School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405
| | - Ken Spitze
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| | - Volker P Brendel
- Department of Biology, Indiana University, Bloomington, Indiana 47405 School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405
| | - Michael Lynch
- Department of Biology, Indiana University, Bloomington, Indiana 47405
| |
Collapse
|
9
|
Shlyueva D, Meireles-Filho ACA, Pagani M, Stark A. Genome-Wide Ultrabithorax Binding Analysis Reveals Highly Targeted Genomic Loci at Developmental Regulators and a Potential Connection to Polycomb-Mediated Regulation. PLoS One 2016; 11:e0161997. [PMID: 27575958 PMCID: PMC5004984 DOI: 10.1371/journal.pone.0161997] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2016] [Accepted: 08/16/2016] [Indexed: 12/22/2022] Open
Abstract
Hox homeodomain transcription factors are key regulators of animal development. They specify the identity of segments along the anterior-posterior body axis in metazoans by controlling the expression of diverse downstream targets, including transcription factors and signaling pathway components. The Drosophila melanogaster Hox factor Ultrabithorax (Ubx) directs the development of thoracic and abdominal segments and appendages, and loss of Ubx function can lead for example to the transformation of third thoracic segment appendages (e.g. halters) into second thoracic segment appendages (e.g. wings), resulting in a characteristic four-wing phenotype. Here we present a Drosophila melanogaster strain with a V5-epitope tagged Ubx allele, which we employed to obtain a high quality genome-wide map of Ubx binding sites using ChIP-seq. We confirm the sensitivity of the V5 ChIP-seq by recovering 7/8 of well-studied Ubx-dependent cis-regulatory regions. Moreover, we show that Ubx binding is predictive of enhancer activity as suggested by comparison with a genome-scale resource of in vivo tested enhancer candidates. We observed densely clustered Ubx binding sites at 12 extended genomic loci that included ANTP-C, BX-C, Polycomb complex genes, and other regulators and the clustered binding sites were frequently active enhancers. Furthermore, Ubx binding was detected at known Polycomb response elements (PREs) and was associated with significant enrichments of Pc and Pho ChIP signals in contrast to binding sites of other developmental TFs. Together, our results show that Ubx targets developmental regulators via strongly clustered binding sites and allow us to hypothesize that regulation by Ubx might involve Polycomb group proteins to maintain specific regulatory states in cooperative or mutually exclusive fashion, an attractive model that combines two groups of proteins with prominent gene regulatory roles during animal development.
Collapse
Affiliation(s)
- Daria Shlyueva
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria
| | | | - Michaela Pagani
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria
- * E-mail:
| |
Collapse
|
10
|
Cis regulatory motifs and antisense transcriptional control in the apicomplexan Theileria parva. BMC Genomics 2016; 17:128. [PMID: 26896950 PMCID: PMC4761415 DOI: 10.1186/s12864-016-2444-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 02/08/2016] [Indexed: 11/23/2022] Open
Abstract
Background Theileria parva is an intracellular parasite that causes a lymphoproliferative disease in cattle. It does so by inducing cancer-like phenotypes in the host cells it infects, although the molecular and regulatory mechanisms involved remain poorly understood. RNAseq data, and the resulting updated genome annotation now available for this parasite, offer an unprecedented opportunity to characterize the genomic features associated with gene regulation in this species. Our previous analyses revealed a T. parva genome even more gene-dense than previously thought, with many adjacent loci overlapping each other, not only at the level of untranslated sequences (UTRs) but even in coding sequences. Results Despite this compactness, Theileria intergenic regions show a pattern of size distribution indicative of monocistronic gene transcription. Three previously described motifs are conserved among Theileria species and highly prevalent in promoter regions near or at the transcription start sites. We found novel motifs at many transcription termination sites, as well as upstream of parasite genes thought to be critical for host transformation. Adjacent genes that could be regulated by antisense transcription from an overlapping transcriptional unit are syntenic between T. parva and P. falciparum at a frequency higher than expected by chance, suggesting the presence of common, and evolutionary old, regulatory mechanisms in the phylum Apicomplexa. Conclusions We propose a model of transcription with conserved sense and antisense transcription from a few taxonomically ubiquitous and several species-specific promoter motifs. Interestingly, the gene networks regulated by conserved promoters are themselves, in most cases, not conserved between species or genera. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2444-5) contains supplementary material, which is available to authorized users.
Collapse
|
11
|
FootprintDB: Analysis of Plant Cis-Regulatory Elements, Transcription Factors, and Binding Interfaces. Methods Mol Biol 2016; 1482:259-77. [PMID: 27557773 DOI: 10.1007/978-1-4939-6396-6_17] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
FootprintDB is a database and search engine that compiles regulatory sequences from open access libraries of curated DNA cis-elements and motifs, and their associated transcription factors (TFs). It systematically annotates the binding interfaces of the TFs by exploiting protein-DNA complexes deposited in the Protein Data Bank. Each entry in footprintDB is thus a DNA motif linked to the protein sequence of the TF(s) known to recognize it, and in most cases, the set of predicted interface residues involved in specific recognition. This chapter explains step-by-step how to search for DNA motifs and protein sequences in footprintDB and how to focus the search to a particular organism. Two real-world examples are shown where this software was used to analyze transcriptional regulation in plants. Results are described with the aim of guiding users on their interpretation, and special attention is given to the choices users might face when performing similar analyses.
Collapse
|
12
|
AlQuraishi M, Tang S, Xia X. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system. BMC Bioinformatics 2015; 16:390. [PMID: 26586237 PMCID: PMC4653904 DOI: 10.1186/s12859-015-0819-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 11/11/2015] [Indexed: 11/28/2022] Open
Abstract
Background Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. Description We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Conclusions This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA. .,HMS Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA, 02115, USA.
| | - Shengdong Tang
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA.,HMS Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA, 02115, USA
| | - Xide Xia
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA.,HMS Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Avenue, Boston, MA, 02115, USA
| |
Collapse
|
13
|
iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 2014; 10:e1003731. [PMID: 25058159 PMCID: PMC4109854 DOI: 10.1371/journal.pcbi.1003731] [Citation(s) in RCA: 648] [Impact Index Per Article: 58.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 05/27/2014] [Indexed: 01/17/2023] Open
Abstract
Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org. Gene regulatory networks control developmental, homeostatic, and disease processes by governing precise levels and spatio-temporal patterns of gene expression. Determining their topology can provide mechanistic insight into these processes. Gene regulatory networks consist of interactions between transcription factors and their direct target genes. Each regulatory interaction represents the binding of the transcription factor to a specific DNA binding site near its target gene. Here we present a computational method, called iRegulon, to identify master regulators and direct target genes in a human gene signature, i.e. a set of co-expressed genes. iRegulon relies on the analysis of the regulatory sequences around each gene in the gene set to detect enriched TF motifs or ChIP-seq peaks, using databases of nearly 10.000 TF motifs and 1000 ChIP-seq data sets or “tracks”. Next, it associates enriched motifs and tracks with candidate transcription factors and determines the optimal subset of direct target genes. We validate iRegulon on ENCODE data, and use it in combination with RNA-seq and ChIP-seq data to map a p53 downstream network with new predicted co-factors and targets. iRegulon is available as a Cytoscape plugin, supporting human, mouse, and Drosophila genes, and provides access to hundreds of cancer-related TF-target subnetworks or “regulons”.
Collapse
|
14
|
Arnold CD, Gerlach D, Spies D, Matts JA, Sytnikova YA, Pagani M, Lau NC, Stark A. Quantitative genome-wide enhancer activity maps for five Drosophila species show functional enhancer conservation and turnover during cis-regulatory evolution. Nat Genet 2014; 46:685-92. [PMID: 24908250 PMCID: PMC4250274 DOI: 10.1038/ng.3009] [Citation(s) in RCA: 132] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 05/15/2014] [Indexed: 12/14/2022]
Abstract
Phenotypic differences between closely related species are thought to arise primarily from changes in gene expression due to mutations in cis-regulatory sequences (enhancers). However, it has remained unclear how frequently mutations alter enhancer activity or create functional enhancers de novo. Here we use STARR-seq, a recently developed quantitative enhancer assay, to determine genome-wide enhancer activity profiles for five Drosophila species in the constant trans-regulatory environment of Drosophila melanogaster S2 cells. We find that the functions of a large fraction of D. melanogaster enhancers are conserved for their orthologous sequences owing to selection and stabilizing turnover of transcription factor motifs. Moreover, hundreds of enhancers have been gained since the D. melanogaster-Drosophila yakuba split about 11 million years ago without apparent adaptive selection and can contribute to changes in gene expression in vivo. Our finding that enhancer activity is often deeply conserved and frequently gained provides functional insights into regulatory evolution.
Collapse
Affiliation(s)
- Cosmas D Arnold
- 1] Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria. [2]
| | - Daniel Gerlach
- 1] Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria. [2] [3]
| | - Daniel Spies
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria
| | - Jessica A Matts
- 1] Department of Biology, Brandeis University, Waltham, Massachusetts, USA. [2] Rosenstiel Basic Medical Science Research Center at Brandeis University, Waltham, Massachusetts, USA. [3]
| | - Yuliya A Sytnikova
- 1] Department of Biology, Brandeis University, Waltham, Massachusetts, USA. [2] Rosenstiel Basic Medical Science Research Center at Brandeis University, Waltham, Massachusetts, USA
| | - Michaela Pagani
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria
| | - Nelson C Lau
- 1] Department of Biology, Brandeis University, Waltham, Massachusetts, USA. [2] Rosenstiel Basic Medical Science Research Center at Brandeis University, Waltham, Massachusetts, USA
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria
| |
Collapse
|
15
|
Dubos C, Kelemen Z, Sebastian A, Bülow L, Huep G, Xu W, Grain D, Salsac F, Brousse C, Lepiniec L, Weisshaar B, Contreras-Moreira B, Hehl R. Integrating bioinformatic resources to predict transcription factors interacting with cis-sequences conserved in co-regulated genes. BMC Genomics 2014; 15:317. [PMID: 24773781 PMCID: PMC4234446 DOI: 10.1186/1471-2164-15-317] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2013] [Accepted: 04/16/2014] [Indexed: 11/22/2022] Open
Abstract
Background Using motif detection programs it is fairly straightforward to identify conserved cis-sequences in promoters of co-regulated genes. In contrast, the identification of the transcription factors (TFs) interacting with these cis-sequences is much more elaborate. To facilitate this, we explore the possibility of using several bioinformatic and experimental approaches for TF identification. This starts with the selection of co-regulated gene sets and leads first to the prediction and then to the experimental validation of TFs interacting with cis-sequences conserved in the promoters of these co-regulated genes. Results Using the PathoPlant database, 32 up-regulated gene groups were identified with microarray data for drought-responsive gene expression from Arabidopsis thaliana. Application of the binding site estimation suite of tools (BEST) discovered 179 conserved sequence motifs within the corresponding promoters. Using the STAMP web-server, 49 sequence motifs were classified into 7 motif families for which similarities with known cis-regulatory sequences were identified. All motifs were subjected to a footprintDB analysis to predict interacting DNA binding domains from plant TF families. Predictions were confirmed by using a yeast-one-hybrid approach to select interacting TFs belonging to the predicted TF families. TF-DNA interactions were further experimentally validated in yeast and with a Physcomitrella patens transient expression system, leading to the discovery of several novel TF-DNA interactions. Conclusions The present work demonstrates the successful integration of several bioinformatic resources with experimental approaches to predict and validate TFs interacting with conserved sequence motifs in co-regulated genes.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Reinhard Hehl
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr, 7, 38106 Braunschweig, Germany.
| |
Collapse
|
16
|
Müller F, Tora L. Chromatin and DNA sequences in defining promoters for transcription initiation. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2013; 1839:118-28. [PMID: 24275614 DOI: 10.1016/j.bbagrm.2013.11.003] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2013] [Revised: 11/11/2013] [Accepted: 11/11/2013] [Indexed: 01/29/2023]
Abstract
One of the key events in eukaryotic gene regulation and consequent transcription is the assembly of general transcription factors and RNA polymerase II into a functional pre-initiation complex at core promoters. An emerging view of complexity arising from a variety of promoter associated DNA motifs, their binding factors and recent discoveries in characterising promoter associated chromatin properties brings an old question back into the limelight: how is a promoter defined? In addition to position-dependent DNA sequence motifs, accumulating evidence suggests that several parallel acting mechanisms are involved in orchestrating a pattern marked by the state of chromatin and general transcription factor binding in preparation for defining transcription start sites. In this review we attempt to summarise these promoter features and discuss the available evidence pointing at their interactions in defining transcription initiation in developmental contexts. This article is part of a Special Issue entitled: Chromatin and epigenetic regulation of animal development.
Collapse
Affiliation(s)
- Ferenc Müller
- School of Clinical and Experimental Medicine, College of Medical and Dental Sciences, University of Birmingham, B15 2TT Edgbaston, Birmingham, UK.
| | - Làszlò Tora
- Cellular Signaling and Nuclear Dynamics Program, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), UMR 7104 CNRS, UdS, INSERM U964, BP 10142, F-67404 Illkirch Cedex, CU de Strasbourg, France; School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, Singapore.
| |
Collapse
|
17
|
Sebastian A, Contreras-Moreira B. footprintDB: a database of transcription factors with annotated cis elements and binding interfaces. ACTA ACUST UNITED AC 2013; 30:258-65. [PMID: 24234003 DOI: 10.1093/bioinformatics/btt663] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
MOTIVATION Traditional and high-throughput techniques for determining transcription factor (TF) binding specificities are generating large volumes of data of uneven quality, which are scattered across individual databases. RESULTS FootprintDB integrates some of the most comprehensive freely available libraries of curated DNA binding sites and systematically annotates the binding interfaces of the corresponding TFs. The first release contains 2422 unique TF sequences, 10 112 DNA binding sites and 3662 DNA motifs. A survey of the included data sources, organisms and TF families was performed together with proprietary database TRANSFAC, finding that footprintDB has a similar coverage of multicellular organisms, while also containing bacterial regulatory data. A search engine has been designed that drives the prediction of DNA motifs for input TFs, or conversely of TF sequences that might recognize input regulatory sequences, by comparison with database entries. Such predictions can also be extended to a single proteome chosen by the user, and results are ranked in terms of interface similarity. Benchmark experiments with bacterial, plant and human data were performed to measure the predictive power of footprintDB searches, which were able to correctly recover 10, 55 and 90% of the tested sequences, respectively. Correctly predicted TFs had a higher interface similarity than the average, confirming its diagnostic value. AVAILABILITY AND IMPLEMENTATION Web site implemented in PHP,Perl, MySQL and Apache. Freely available from http://floresta.eead.csic.es/footprintdb.
Collapse
Affiliation(s)
- Alvaro Sebastian
- Laboratory of Computational Biology, Department of Genetics and Plant Production, Estación Experimental de Aula Dei/CSIC, Av. Montañana 1005, Zaragoza (http://www.eead.csic.es/compbio) and Fundación ARAID, Paseo María Agustín 36, Zaragoza, Spain
| | | |
Collapse
|
18
|
Kumari S, Ware D. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots. PLoS One 2013; 8:e79011. [PMID: 24205361 PMCID: PMC3812177 DOI: 10.1371/journal.pone.0079011] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Accepted: 09/18/2013] [Indexed: 01/22/2023] Open
Abstract
Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the computational prediction of CPEs across eight plant genomes to help better understand the transcription initiation complex assembly. The distribution of thirteen known CPEs across four monocots (Brachypodium distachyon, Oryza sativa ssp. japonica, Sorghum bicolor, Zea mays) and four dicots (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Glycine max) reveals the structural organization of the core promoter in relation to the TATA-box as well as with respect to other CPEs. The distribution of known CPE motifs with respect to transcription start site (TSS) exhibited positional conservation within monocots and dicots with slight differences across all eight genomes. Further, a more refined subset of annotated genes based on orthologs of the model monocot (O. sativa ssp. japonica) and dicot (A. thaliana) genomes supported the positional distribution of these thirteen known CPEs. DNA free energy profiles provided evidence that the structural properties of promoter regions are distinctly different from that of the non-regulatory genome sequence. It also showed that monocot core promoters have lower DNA free energy than dicot core promoters. The comparison of monocot and dicot promoter sequences highlights both the similarities and differences in the core promoter architecture irrespective of the species-specific nucleotide bias. This study will be useful for future work related to genome annotation projects and can inspire research efforts aimed to better understand regulatory mechanisms of transcription.
Collapse
Affiliation(s)
- Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America,
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America,
- United States Department of Agriculture-Agriculture Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, New York, United States of America
| |
Collapse
|
19
|
Aitken S, Akman OE. Nested sampling for parameter inference in systems biology: application to an exemplar circadian model. BMC SYSTEMS BIOLOGY 2013; 7:72. [PMID: 23899119 PMCID: PMC3735395 DOI: 10.1186/1752-0509-7-72] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 07/29/2013] [Indexed: 01/04/2023]
Abstract
Background Model selection and parameter inference are complex problems that have yet to be fully addressed in systems biology. In contrast with parameter optimisation, parameter inference computes both the parameter means and their standard deviations (or full posterior distributions), thus yielding important information on the extent to which the data and the model topology constrain the inferred parameter values. Results We report on the application of nested sampling, a statistical approach to computing the Bayesian evidence Z, to the inference of parameters, and the estimation of log Z in an established model of circadian rhythms. A ten-fold difference in the coefficient of variation between degradation and transcription parameters is demonstrated. We further show that the uncertainty remaining in the parameter values is reduced by the analysis of increasing numbers of circadian cycles of data, up to 4 cycles, but is unaffected by sampling the data more frequently. Novel algorithms for calculating the likelihood of a model, and a characterisation of the performance of the nested sampling algorithm are also reported. The methods we develop considerably improve the computational efficiency of the likelihood calculation, and of the exploratory step within nested sampling. Conclusions We have demonstrated in an exemplar circadian model that the estimates of posterior parameter densities (as summarised by parameter means and standard deviations) are influenced predominately by the length of the time series, becoming more narrowly constrained as the number of circadian cycles considered increases. We have also shown the utility of the coefficient of variation for discriminating between highly-constrained and less-well constrained parameters.
Collapse
Affiliation(s)
- Stuart Aitken
- MRC Human Genetics Unit, IGMM, University of Edinburgh, Edinburgh EH4 2XU, UK.
| | | |
Collapse
|
20
|
Juvenile hormone and its receptor, methoprene-tolerant, control the dynamics of mosquito gene expression. Proc Natl Acad Sci U S A 2013; 110:E2173-81. [PMID: 23633570 DOI: 10.1073/pnas.1305293110] [Citation(s) in RCA: 103] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Juvenile hormone III (JH) plays a key role in regulating the reproduction of female mosquitoes. Microarray time-course analysis revealed dynamic changes in gene expression during posteclosion (PE) development in the fat body of female Aedes aegypti. Hierarchical clustering identified three major gene clusters: 1,843 early-PE (EPE) genes maximally expressed at 6 h PE, 457 mid-PE (MPE) genes at 24 h PE, and 1,815 late-PE (LPE) genes at 66 h PE. The RNAi microarray screen for the JH receptor Methoprene-tolerant (Met) showed that 27% of EPE and 40% of MPE genes were up-regulated whereas 36% of LPE genes were down-regulated in the absence of this receptor. Met repression of EPE and MPE and activation of LPE genes were validated by an in vitro fat-body culture experiment using Met RNAi. Sequence motif analysis revealed the consensus for a 9-mer Met-binding motif, CACG(C)/TG(A)/G(T)/AG. Met-binding motif variants were overrepresented within the first 300 bases of the promoters of Met RNAi-down-regulated (LPE) genes but not in Met RNAi-up-regulated (EPE) genes. EMSAs using a combination of mutational and anti-Met antibody supershift analyses confirmed the binding properties of the Met consensus motif variants. There was a striking temporal separation of expression profiles among major functional gene groups, with carbohydrate, lipid, and xenobiotics metabolism belonging to the EPE and MPE clusters and transcription and translation to the LPE cluster. This study represents a significant advancement in the understanding of the regulation of gene expression by JH and its receptor Met during female mosquito reproduction.
Collapse
|
21
|
Jankovic BR, Archer JAC, Chowdhary R, Schaefer U, Bajic VB. Promoter Structures Conserved between Homo Sapiens, Mus Musculus and Drosophila Melanogaster. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Some of the key processes in living organisms remain essentially unchanged even in evolutionarily very distant species. Transcriptional regulation is one such fundamental process that is essential for cell survival. Transcriptional control exerts great part of its effects at the level of transcription initiation mediated through protein-DNA interactions mainly at promoters but also at other control regions. In this chapter, the authors identify conserved families of motifs of promoter regulatory structures between Homo sapiens, Mus musculus and Drosophila melanogaster. By a promoter regulatory structure they consider here a combination of motifs from identified motif families. Conservation of promoter structure across these vertebrate and invertebrate genomes suggests the presence of a fundamental promoter architecture and provides the basis for deeper understanding of the necessary components of the transcription regulation machinery. The authors reveal the existence of families of DNA sequence motifs that are shared across all three species in upstream promoter regions. They further analyze the relevance of our findings for better understanding of preserved regulatory mechanisms and associated biology insights.
Collapse
Affiliation(s)
- Boris R. Jankovic
- King Abdullah University of Science and Technology, Kingdom of Saudi Arabia
| | - John A. C. Archer
- King Abdullah University of Science and Technology, Kingdom of Saudi Arabia
| | | | - Ulf Schaefer
- King Abdullah University of Science and Technology, Kingdom of Saudi Arabia
| | - Vladimir B. Bajic
- King Abdullah University of Science and Technology, Kingdom of Saudi Arabia
| |
Collapse
|
22
|
Genome-wide prediction and functional validation of promoter motifs regulating gene expression in spore and infection stages of Phytophthora infestans. PLoS Pathog 2013; 9:e1003182. [PMID: 23516354 PMCID: PMC3597505 DOI: 10.1371/journal.ppat.1003182] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2012] [Accepted: 12/20/2012] [Indexed: 01/18/2023] Open
Abstract
Most eukaryotic pathogens have complex life cycles in which gene expression networks orchestrate the formation of cells specialized for dissemination or host colonization. In the oomycete Phytophthora infestans, the potato late blight pathogen, major shifts in mRNA profiles during developmental transitions were identified using microarrays. We used those data with search algorithms to discover about 100 motifs that are over-represented in promoters of genes up-regulated in hyphae, sporangia, sporangia undergoing zoosporogenesis, swimming zoospores, or germinated cysts forming appressoria (infection structures). Most of the putative stage-specific transcription factor binding sites (TFBSs) thus identified had features typical of TFBSs such as position or orientation bias, palindromy, and conservation in related species. Each of six motifs tested in P. infestans transformants using the GUS reporter gene conferred the expected stage-specific expression pattern, and several were shown to bind nuclear proteins in gel-shift assays. Motifs linked to the appressoria-forming stage, including a functionally validated TFBS, were over-represented in promoters of genes encoding effectors and other pathogenesis-related proteins. To understand how promoter and genome architecture influence expression, we also mapped transcription patterns to the P. infestans genome assembly. Adjacent genes were not typically induced in the same stage, including genes transcribed in opposite directions from small intergenic regions, but co-regulated gene pairs occurred more than expected by random chance. These data help illuminate the processes regulating development and pathogenesis, and will enable future attempts to purify the cognate transcription factors. The genus Phytophthora includes over one hundred species of plant pathogens that have devastating effects worldwide in agriculture and natural environments. Its most notorious member is P. infestans, which causes the late blight diseases of potato and tomato. Their success as pathogens is dependent on the formation of specialized cells for plant-to-plant transmission and host infection, but little is known about how this is regulated. Recognizing that changes in gene expression drive the formation of these cell types, we used a computational approach to predict the sequences of about one hundred transcription factor binding sites associated with expression in either of five life stages, including several types of spores and infection structures. We then used a functional testing strategy to prove their biological activity by showing that the DNA motifs enabled the stage-specific expression of a transgene. Our work lays the groundwork for dissecting the molecular mechanisms that regulate life-stage transitions and pathogenesis in Phytophthora. A similar approach should be useful for other plant and animal pathogens.
Collapse
|
23
|
Wenger AM, Clarke SL, Guturu H, Chen J, Schaar BT, McLean CY, Bejerano G. PRISM offers a comprehensive genomic approach to transcription factor function prediction. Genome Res 2013; 23:889-904. [PMID: 23382538 PMCID: PMC3638144 DOI: 10.1101/gr.139071.112] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The human genome encodes 1500–2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.
Collapse
Affiliation(s)
- Aaron M Wenger
- Department of Computer Science, Stanford University, Stanford, California 94305, USA
| | | | | | | | | | | | | |
Collapse
|
24
|
Ha N, Polychronidou M, Lohmann I. COPS: detecting co-occurrence and spatial arrangement of transcription factor binding motifs in genome-wide datasets. PLoS One 2012; 7:e52055. [PMID: 23272209 PMCID: PMC3525548 DOI: 10.1371/journal.pone.0052055] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2012] [Accepted: 11/12/2012] [Indexed: 11/18/2022] Open
Abstract
In multi-cellular organisms, spatiotemporal activity of cis-regulatory DNA elements depends on their occupancy by different transcription factors (TFs). In recent years, genome-wide ChIP-on-Chip, ChIP-Seq and DamID assays have been extensively used to unravel the combinatorial interaction of TFs with cis-regulatory modules (CRMs) in the genome. Even though genome-wide binding profiles are increasingly becoming available for different TFs, single TF binding profiles are in most cases not sufficient for dissecting complex regulatory networks. Thus, potent computational tools detecting statistically significant and biologically relevant TF-motif co-occurrences in genome-wide datasets are essential for analyzing context-dependent transcriptional regulation. We have developed COPS (Co-Occurrence Pattern Search), a new bioinformatics tool based on a combination of association rules and Markov chain models, which detects co-occurring TF binding sites (BSs) on genomic regions of interest. COPS scans DNA sequences for frequent motif patterns using a Frequent-Pattern tree based data mining approach, which allows efficient performance of the software with respect to both data structure and implementation speed, in particular when mining large datasets. Since transcriptional gene regulation very often relies on the formation of regulatory protein complexes mediated by closely adjoining TF binding sites on CRMs, COPS additionally detects preferred short distance between co-occurring TF motifs. The performance of our software with respect to biological significance was evaluated using three published datasets containing genomic regions that are independently bound by several TFs involved in a defined biological process. In sum, COPS is a fast, efficient and user-friendly tool mining statistically and biologically significant TFBS co-occurrences and therefore allows the identification of TFs that combinatorially regulate gene expression.
Collapse
Affiliation(s)
- Nati Ha
- Centre for Organismal Studies (COS) Heidelberg, University of Heidelberg, Heidelberg and CellNetworks – Cluster of Excellence Germany, Heidelberg, Germany
| | - Maria Polychronidou
- Centre for Organismal Studies (COS) Heidelberg, University of Heidelberg, Heidelberg and CellNetworks – Cluster of Excellence Germany, Heidelberg, Germany
| | - Ingrid Lohmann
- Centre for Organismal Studies (COS) Heidelberg, University of Heidelberg, Heidelberg and CellNetworks – Cluster of Excellence Germany, Heidelberg, Germany
- * E-mail:
| |
Collapse
|
25
|
Blanco E, Corominas M. CBS: an open platform that integrates predictive methods and epigenetics information to characterize conserved regulatory features in multiple Drosophila genomes. BMC Genomics 2012; 13:688. [PMID: 23228284 PMCID: PMC3564944 DOI: 10.1186/1471-2164-13-688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Accepted: 11/28/2012] [Indexed: 12/11/2022] Open
Abstract
Background Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS) is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila.
Collapse
Affiliation(s)
- Enrique Blanco
- Departament de Genètica and Institut de Biomedicina (IBUB), Universitat de Barcelona, Av, Diagonal 643, 08028, Barcelona, Spain.
| | | |
Collapse
|
26
|
AlQuraishi M, McAdams HH. Three enhancements to the inference of statistical protein-DNA potentials. Proteins 2012; 81:426-42. [PMID: 23042633 DOI: 10.1002/prot.24201] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Revised: 09/17/2012] [Accepted: 10/02/2012] [Indexed: 12/28/2022]
Abstract
The energetics of protein-DNA interactions are often modeled using so-called statistical potentials, that is, energy models derived from the atomic structures of protein-DNA complexes. Many statistical protein-DNA potentials based on differing theoretical assumptions have been investigated, but little attention has been paid to the types of data and the parameter estimation process used in deriving the statistical potentials. We describe three enhancements to statistical potential inference that significantly improve the accuracy of predicted protein-DNA interactions: (i) incorporation of binding energy data of protein-DNA complexes, in conjunction with their X-ray crystal structures, (ii) use of spatially-aware parameter fitting, and (iii) use of ensemble-based parameter fitting. We apply these enhancements to three widely-used statistical potentials and use the resulting enhanced potentials in a structure-based prediction of the DNA binding sites of proteins. These enhancements are directly applicable to all statistical potentials used in protein-DNA modeling, and we show that they can improve the accuracy of predicted DNA binding sites by up to 21%.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, California 94305, USA
| | | |
Collapse
|
27
|
Todt TJ, Wels M, Bongers RS, Siezen RS, van Hijum SAFT, Kleerebezem M. Genome-wide prediction and validation of sigma70 promoters in Lactobacillus plantarum WCFS1. PLoS One 2012; 7:e45097. [PMID: 23028780 PMCID: PMC3447810 DOI: 10.1371/journal.pone.0045097] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 08/14/2012] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND In prokaryotes, sigma factors are essential for directing the transcription machinery towards promoters. Various sigma factors have been described that recognize, and bind to specific DNA sequence motifs in promoter sequences. The canonical sigma factor σ(70) is commonly involved in transcription of the cell's housekeeping genes, which is mediated by the conserved σ(70) promoter sequence motifs. In this study the σ(70)-promoter sequences in Lactobacillus plantarum WCFS1 were predicted using a genome-wide analysis. The accuracy of the transcriptionally-active part of this promoter prediction was subsequently evaluated by correlating locations of predicted promoters with transcription start sites inferred from the 5'-ends of transcripts detected by high-resolution tiling array transcriptome datasets. RESULTS To identify σ(70)-related promoter sequences, we performed a genome-wide sequence motif scan of the L. plantarum WCFS1 genome focussing on the regions upstream of protein-encoding genes. We obtained several highly conserved motifs including those resembling the conserved σ(70)-promoter consensus. Position weight matrices-based models of the recovered σ(70)-promoter sequence motif were employed to identify 3874 motifs with significant similarity (p-value<10(-4)) to the model-motif in the L. plantarum genome. Genome-wide transcript information deduced from whole genome tiling-array transcriptome datasets, was used to infer transcription start sites (TSSs) from the 5'-end of transcripts. By this procedure, 1167 putative TSSs were identified that were used to corroborate the transcriptionally active fraction of these predicted promoters. In total, 568 predicted promoters were found in proximity (≤ 40 nucleotides) of the putative TSSs, showing a highly significant co-occurrence of predicted promoter and TSS (p-value<10(-263)). CONCLUSIONS High-resolution tiling arrays provide a suitable source to infer TSSs at a genome-wide level, and allow experimental verification of in silico predicted promoter sequence motifs.
Collapse
Affiliation(s)
- Tilman J. Todt
- Center for Molecular and Biomolecular Informatics, Nijmegen Center for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
- HAN University of Applied Sciences, Institute of Applied Sciences, Nijmegen, The Netherlands
| | - Michiel Wels
- Center for Molecular and Biomolecular Informatics, Nijmegen Center for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
- NIZO food research, Ede, The Netherlands
- TI Food and Nutrition, Wageningen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, Delft, The Netherlands
| | - Roger S. Bongers
- NIZO food research, Ede, The Netherlands
- TI Food and Nutrition, Wageningen, The Netherlands
| | - Roland S. Siezen
- Center for Molecular and Biomolecular Informatics, Nijmegen Center for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
- HAN University of Applied Sciences, Institute of Applied Sciences, Nijmegen, The Netherlands
- NIZO food research, Ede, The Netherlands
- TI Food and Nutrition, Wageningen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, Delft, The Netherlands
- Netherlands Bioinformatics Centre, Nijmegen, The Netherlands
| | - Sacha A. F. T. van Hijum
- Center for Molecular and Biomolecular Informatics, Nijmegen Center for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
- NIZO food research, Ede, The Netherlands
- TI Food and Nutrition, Wageningen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, Delft, The Netherlands
- Netherlands Bioinformatics Centre, Nijmegen, The Netherlands
- * E-mail:
| | - Michiel Kleerebezem
- NIZO food research, Ede, The Netherlands
- TI Food and Nutrition, Wageningen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, Delft, The Netherlands
- Wageningen University, Host Microbe Interactomics Group, Wageningen, The Netherlands
| |
Collapse
|
28
|
Katzenberger RJ, Rach EA, Anderson AK, Ohler U, Wassarman DA. The Drosophila Translational Control Element (TCE) is required for high-level transcription of many genes that are specifically expressed in testes. PLoS One 2012; 7:e45009. [PMID: 22984601 PMCID: PMC3439415 DOI: 10.1371/journal.pone.0045009] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/11/2012] [Indexed: 11/19/2022] Open
Abstract
To investigate the importance of core promoter elements for tissue-specific transcription of RNA polymerase II genes, we examined testis-specific transcription in Drosophila melanogaster. Bioinformatic analyses of core promoter sequences from 190 genes that are specifically expressed in testes identified a 10 bp A/T-rich motif that is identical to the translational control element (TCE). The TCE functions in the 5′ untranslated region of Mst(3)CGP mRNAs to repress translation, and it also functions in a heterologous gene to regulate transcription. We found that among genes with focused initiation patterns, the TCE is significantly enriched in core promoters of genes that are specifically expressed in testes but not in core promoters of genes that are specifically expressed in other tissues. The TCE is variably located in core promoters and is conserved in melanogaster subgroup species, but conservation dramatically drops in more distant species. In transgenic flies, short (300–400 bp) genomic regions containing a TCE directed testis-specific transcription of a reporter gene. Mutation of the TCE significantly reduced but did not abolish reporter gene transcription indicating that the TCE is important but not essential for transcription activation. Finally, mutation of testis-specific TFIID (tTFIID) subunits significantly reduced the transcription of a subset of endogenous TCE-containing but not TCE-lacking genes, suggesting that tTFIID activity is limited to TCE-containing genes but that tTFIID is not an obligatory regulator of TCE-containing genes. Thus, the TCE is a core promoter element in a subset of genes that are specifically expressed in testes. Furthermore, the TCE regulates transcription in the context of short genomic regions, from variable locations in the core promoter, and both dependently and independently of tTFIID. These findings set the stage for determining the mechanism by which the TCE regulates testis-specific transcription and understanding the dual role of the TCE in translational and transcriptional regulation.
Collapse
Affiliation(s)
- Rebeccah J. Katzenberger
- University of Wisconsin School of Medicine and Public Health, Department of Cell and Regenerative Biology, Madison, Wisconsin, United States of America
| | - Elizabeth A. Rach
- Institute for Genome Sciences and Policy, Departments of Biostatistics and Bioinformatics and Computer Science, Duke University, Durham, North Carolina, United States of America
| | - Ashley K. Anderson
- University of Wisconsin School of Medicine and Public Health, Department of Cell and Regenerative Biology, Madison, Wisconsin, United States of America
| | - Uwe Ohler
- Institute for Genome Sciences and Policy, Departments of Biostatistics and Bioinformatics and Computer Science, Duke University, Durham, North Carolina, United States of America
- * E-mail: (DAW); (UO)
| | - David A. Wassarman
- University of Wisconsin School of Medicine and Public Health, Department of Cell and Regenerative Biology, Madison, Wisconsin, United States of America
- * E-mail: (DAW); (UO)
| |
Collapse
|
29
|
Wilding CS, Smith I, Lynd A, Yawson AE, Weetman D, Paine MJI, Donnelly MJ. A cis-regulatory sequence driving metabolic insecticide resistance in mosquitoes: functional characterisation and signatures of selection. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2012; 42:699-707. [PMID: 22732326 DOI: 10.1016/j.ibmb.2012.06.003] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Revised: 06/13/2012] [Accepted: 06/13/2012] [Indexed: 06/01/2023]
Abstract
Although cytochrome P450 (CYP450) enzymes are frequently up-regulated in mosquitoes resistant to insecticides, no regulatory motifs driving these expression differences with relevance to wild populations have been identified. Transposable elements (TEs) are often enriched upstream of those CYP450s involved in insecticide resistance, leading to the assumption that they contribute regulatory motifs that directly underlie the resistance phenotype. A partial CuRE1 (Culex Repetitive Element 1) transposable element is found directly upstream of CYP9M10, a cytochrome P450 implicated previously in larval resistance to permethrin in the ISOP450 strain of Culex quinquefasciatus, but is absent from the equivalent genomic region of a susceptible strain. Via expression of CYP9M10 in Escherichia coli we have now demonstrated time- and NADPH-dependant permethrin metabolism, prerequisites for confirmation of a role in metabolic resistance, and through qPCR shown that CYP9M10 is >20-fold over-expressed in ISOP450 compared to a susceptible strain. In a fluorescent reporter assay the region upstream of CYP9M10 from ISOP450 drove 10× expression compared to the equivalent region (lacking CuRE1) from the susceptible strain. Close correspondence with the gene expression fold-change implicates the upstream region including CuRE1 as a cis-regulatory element involved in resistance. Only a single CuRE1 bearing allele, identical to the CuRE1 bearing allele in the resistant strain, is found throughout Sub-Saharan Africa, in contrast to the diversity encountered in non-CuRE1 alleles. This suggests a single origin and subsequent spread due to selective advantage. CuRE1 is detectable using a simple diagnostic. When applied to C. quinquefasciatus larvae from Ghana we have demonstrated a significant association with permethrin resistance in multiple field sites (mean Odds Ratio = 3.86) suggesting this marker has relevance to natural populations of vector mosquitoes. However, when CuRE1 was excised from the allele used in the reporter assay through fusion PCR, expression was unaffected, indicating that the TE has no direct role in resistance and hence that CuRE1 is acting only as a marker of an as yet unidentified regulatory motif in the association analysis. This suggests that a re-evaluation of the assumption that TEs contribute regulatory motifs involved in gene expression may be necessary.
Collapse
Affiliation(s)
- Craig S Wilding
- Vector Group, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool L3 5QA, UK.
| | | | | | | | | | | | | |
Collapse
|
30
|
Herrmann C, Van de Sande B, Potier D, Aerts S. i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules. Nucleic Acids Res 2012; 40:e114. [PMID: 22718975 PMCID: PMC3424583 DOI: 10.1093/nar/gks543] [Citation(s) in RCA: 137] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The field of regulatory genomics today is characterized by the generation of high-throughput data sets that capture genome-wide transcription factor (TF) binding, histone modifications, or DNAseI hypersensitive regions across many cell types and conditions. In this context, a critical question is how to make optimal use of these publicly available datasets when studying transcriptional regulation. Here, we address this question in Drosophila melanogaster for which a large number of high-throughput regulatory datasets are available. We developed i-cisTarget (where the 'i' stands for integrative), for the first time enabling the discovery of different types of enriched 'regulatory features' in a set of co-regulated sequences in one analysis, being either TF motifs or 'in vivo' chromatin features, or combinations thereof. We have validated our approach on 15 co-expressed gene sets, 21 ChIP data sets, 628 curated gene sets and multiple individual case studies, and show that meaningful regulatory features can be confidently discovered; that bona fide enhancers can be identified, both by in vivo events and by TF motifs; and that combinations of in vivo events and TF motifs further increase the performance of enhancer prediction.
Collapse
Affiliation(s)
- Carl Herrmann
- TAGC - Inserm U1090 and Aix-Marseille Université, Campus de Luminy, 13288 Marseille, France.
| | | | | | | |
Collapse
|
31
|
Haslam NJ, Shields DC. Profile-based short linear protein motif discovery. BMC Bioinformatics 2012; 13:104. [PMID: 22607209 PMCID: PMC3534220 DOI: 10.1186/1471-2105-13-104] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2011] [Accepted: 04/04/2012] [Indexed: 01/24/2023] Open
Abstract
Background Short linear protein motifs are attracting increasing attention as functionally independent sites, typically 3–10 amino acids in length that are enriched in disordered regions of proteins. Multiple methods have recently been proposed to discover over-represented motifs within a set of proteins based on simple regular expressions. Here, we extend these approaches to profile-based methods, which provide a richer motif representation. Results The profile motif discovery method MEME performed relatively poorly for motifs in disordered regions of proteins. However, when we applied evolutionary weighting to account for redundancy amongst homologous proteins, and masked out poorly conserved regions of disordered proteins, the performance of MEME is equivalent to that of regular expression methods. However, the two approaches returned different subsets within both a benchmark dataset, and a more realistic discovery dataset. Conclusions Profile-based motif discovery methods complement regular expression based methods. Whilst profile-based methods are computationally more intensive, they are likely to discover motifs currently overlooked by regular expression methods.
Collapse
Affiliation(s)
- Niall J Haslam
- Complex and Adaptive Systems Laboratory, University College Dublin, Ireland
| | | |
Collapse
|
32
|
Potier D, Atak ZK, Sanchez MN, Herrmann C, Aerts S. Using cisTargetX to predict transcriptional targets and networks in Drosophila. Methods Mol Biol 2012; 786:291-314. [PMID: 21938634 DOI: 10.1007/978-1-61779-292-2_18] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Gene expression regulation is a fundamental biological process leading to complete organism development by controlling processes like cell type specification and differentiation. The accuracy of this process is -governed by transcription factors (TFs) acting within a complex gene regulatory network. CisTargetX has been developed to enable a user to predict TFs, enhancers, and target genes involved in the regulation of co-expressed genes. It uses a strategy that incorporates the genome-wide prediction of clusters of transcription factor binding sites (TFBSs), starting from a large, unbiased collection of position weight matrices (PWMs) and uses comparative genomics criteria to filter potential TFBS. We describe in this chapter, step-by-step, how to use cisTargetX starting from a set of genes or TF(s) to predict transcriptional targets with their putative binding sites and networks in Drosophila. Next, we illustrate this approach on a particular developmental system, namely, sensory organ development, and identify relevant TFs, DNA regions regulating gene expression, and TF/target gene interactions. CisTargetX is available at http://med.kuleuven.be/lcb/cisTargetX .
Collapse
Affiliation(s)
- Delphine Potier
- TAGC Inserm U928 and Université de la Mediterranée, Marseille, France
| | | | | | | | | |
Collapse
|
33
|
Gruel J, LeBorgne M, LeMeur N, Théret N. Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns. BMC Bioinformatics 2011; 12:365. [PMID: 21910886 PMCID: PMC3215511 DOI: 10.1186/1471-2105-12-365] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2010] [Accepted: 09/12/2011] [Indexed: 01/07/2023] Open
Abstract
Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.
Collapse
Affiliation(s)
- Jérémy Gruel
- EA 4427 SeRAIC IFR140, Université de Rennes 1, 2 avenue du Pr, Léon Bernard, Rennes 35043, France.
| | | | | | | |
Collapse
|
34
|
Balakirev ES, Anisimova M, Ayala FJ. Complex interplay of evolutionary forces in the ladybird homeobox genes of Drosophila melanogaster. PLoS One 2011; 6:e22613. [PMID: 21799919 PMCID: PMC3142176 DOI: 10.1371/journal.pone.0022613] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2011] [Accepted: 06/29/2011] [Indexed: 11/19/2022] Open
Abstract
Tandemly arranged paralogous genes lbe and lbl are members of the Drosophila NK homeobox family. We analyzed population samples of Drosophila melanogaster from Africa, Europe, North and South America, and single strains of D. sechellia, D. simulans, and D. yakuba within two linked regions encompassing partial sequences of lbe and lbl. The evolution of lbe and lbl is highly constrained due to their important regulatory functions. Despite this, a variety of forces have shaped the patterns of variation in lb genes: recombination, intragenic gene conversion and natural selection strongly influence background variation created by linkage disequilibrium and dimorphic haplotype structure. The two genes exhibited similar levels of nucleotide diversity and positive selection was detected in the noncoding regions of both genes. However, synonymous variability was significantly higher for lbe: no nonsynonymous changes were observed in this gene. We argue that balancing selection impacts some synonymous sites of the lbe gene. Stability of mRNA secondary structure was significantly different between the lbe (but not lbl) haplotype groups and may represent a driving force of balancing selection in epistatically interacting synonymous sites. Balancing selection on synonymous sites may be the first, or one of a few such observations, in Drosophila. In contrast, recurrent positive selection on lbl at the protein level influenced evolution at three codon sites. Transcription factor binding-site profiles were different for lbe and lbl, suggesting that their developmental functions are not redundant. Combined with our previous results on nucleotide variation in esterase and other homeobox genes, these results suggest that interplay of balancing and directional selection may be a general feature of molecular evolution in Drosophila and other eukaryote genomes.
Collapse
Affiliation(s)
- Evgeniy S Balakirev
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America.
| | | | | |
Collapse
|
35
|
He BZ, Holloway AK, Maerkl SJ, Kreitman M. Does positive selection drive transcription factor binding site turnover? A test with Drosophila cis-regulatory modules. PLoS Genet 2011; 7:e1002053. [PMID: 21572512 PMCID: PMC3084208 DOI: 10.1371/journal.pgen.1002053] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2010] [Accepted: 03/02/2011] [Indexed: 12/04/2022] Open
Abstract
Transcription factor binding site(s) (TFBS) gain and loss (i.e., turnover) is a well-documented feature of cis-regulatory module (CRM) evolution, yet little attention has been paid to the evolutionary force(s) driving this turnover process. The predominant view, motivated by its widespread occurrence, emphasizes the importance of compensatory mutation and genetic drift. Positive selection, in contrast, although it has been invoked in specific instances of adaptive gene expression evolution, has not been considered as a general alternative to neutral compensatory evolution. In this study we evaluate the two hypotheses by analyzing patterns of single nucleotide polymorphism in the TFBS of well-characterized CRM in two closely related Drosophila species, Drosophila melanogaster and Drosophila simulans. An important feature of the analysis is classification of TFBS mutations according to the direction of their predicted effect on binding affinity, which allows gains and losses to be evaluated independently along the two phylogenetic lineages. The observed patterns of polymorphism and divergence are not compatible with neutral evolution for either class of mutations. Instead, multiple lines of evidence are consistent with contributions of positive selection to TFBS gain and loss as well as purifying selection in its maintenance. In discussion, we propose a model to reconcile the finding of selection driving TFBS turnover with constrained CRM function over long evolutionary time.
Collapse
Affiliation(s)
- Bin Z He
- Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois, USA.
| | | | | | | |
Collapse
|
36
|
Murray MJ, Saini HK, van Dongen S, Palmer RD, Muralidhar B, Pett MR, Piipari M, Thornton CM, Nicholson JC, Enright AJ, Coleman N. The two most common histological subtypes of malignant germ cell tumour are distinguished by global microRNA profiles, associated with differential transcription factor expression. Mol Cancer 2010; 9:290. [PMID: 21059207 PMCID: PMC2993676 DOI: 10.1186/1476-4598-9-290] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Accepted: 11/08/2010] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND We hypothesised that differences in microRNA expression profiles contribute to the contrasting natural history and clinical outcome of the two most common types of malignant germ cell tumour (GCT), yolk sac tumours (YSTs) and germinomas. RESULTS By direct comparison, using microarray data for paediatric GCT samples and published qRT-PCR data for adult samples, we identified microRNAs significantly up-regulated in YSTs (n = 29 paediatric, 26 adult, 11 overlapping) or germinomas (n = 37 paediatric). By Taqman qRT-PCR we confirmed differential expression of 15 of 16 selected microRNAs and further validated six of these (miR-302b, miR-375, miR-200b, miR-200c, miR-122, miR-205) in an independent sample set. Interestingly, the miR-302 cluster, which is over-expressed in all malignant GCTs, showed further over-expression in YSTs versus germinomas, representing six of the top eight microRNAs over-expressed in paediatric YSTs and seven of the top 11 in adult YSTs. To explain this observation, we used mRNA expression profiles of paediatric and adult malignant GCTs to identify 10 transcription factors (TFs) consistently over-expressed in YSTs versus germinomas, followed by linear regression to confirm associations between TF and miR-302 cluster expression levels. Using the sequence motif analysis environment iMotifs, we identified predicted binding sites for four of the 10 TFs (GATA6, GATA3, TCF7L2 and MAF) in the miR-302 cluster promoter region. Finally, we showed that miR-302 family over-expression in YST is likely to be functionally significant, as mRNAs down-regulated in YSTs were enriched for 3' untranslated region sequences complementary to the common seed of miR-302a~miR-302d. Such mRNAs included mediators of key cancer-associated processes, including tumour suppressor genes, apoptosis regulators and TFs. CONCLUSIONS Differential microRNA expression is likely to contribute to the relatively aggressive behaviour of YSTs and may enable future improvements in clinical diagnosis and/or treatment.
Collapse
Affiliation(s)
- Matthew J Murray
- Medical Research Council Cancer Cell Unit, Cambridge, CB2 0XZ, UK
| | - Harpreet K Saini
- EMBL-European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Stijn van Dongen
- EMBL-European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Roger D Palmer
- Medical Research Council Cancer Cell Unit, Cambridge, CB2 0XZ, UK
| | | | - Mark R Pett
- Medical Research Council Cancer Cell Unit, Cambridge, CB2 0XZ, UK
| | - Matias Piipari
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK
| | - Claire M Thornton
- Department of Pathology, Royal Group of Hospitals Trust, Belfast, UK
| | - James C Nicholson
- Department of Paediatric Haematology and Oncology, Addenbrooke's Hospital, Cambridge, CB2 0QQ, UK
| | - Anton J Enright
- EMBL-European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Nicholas Coleman
- Medical Research Council Cancer Cell Unit, Cambridge, CB2 0XZ, UK
- Department of Pathology, University of Cambridge, CB2 1QP, UK
| |
Collapse
|
37
|
Wilczyński B, Furlong EEM. Dynamic CRM occupancy reflects a temporal map of developmental progression. Mol Syst Biol 2010; 6:383. [PMID: 20571532 PMCID: PMC2913398 DOI: 10.1038/msb.2010.35] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 04/30/2010] [Indexed: 02/07/2023] Open
Abstract
Development is driven by tightly coordinated spatio-temporal patterns of gene expression, which are initiated through the action of transcription factors (TFs) binding to cis-regulatory modules (CRMs). Although many studies have investigated how spatial patterns arise, precise temporal control of gene expression is less well understood. Here, we show that dynamic changes in the timing of CRM occupancy is a prevalent feature common to all TFs examined in a developmental ChIP time course to date. CRMs exhibit complex binding patterns that cannot be explained by the sequence motifs or expression of the TFs themselves. The temporal changes in TF binding are highly correlated with dynamic patterns of target gene expression, which in turn reflect transitions in cellular function during different stages of development. Thus, it is not only the timing of a TF's expression, but also its temporal occupancy in refined time windows, which determines temporal gene expression. Systematic measurement of dynamic CRM occupancy may therefore serve as a powerful method to decode dynamic changes in gene expression driving developmental progression.
Collapse
Affiliation(s)
- Bartek Wilczyński
- Department of Genome Biology, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | | |
Collapse
|
38
|
Bernard F, Krejci A, Housden B, Adryan B, Bray SJ. Specificity of Notch pathway activation: twist controls the transcriptional output in adult muscle progenitors. Development 2010; 137:2633-42. [PMID: 20610485 PMCID: PMC2910383 DOI: 10.1242/dev.053181] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/27/2010] [Indexed: 11/20/2022]
Abstract
Cell-cell signalling mediated by Notch regulates many different developmental and physiological processes and is involved in a variety of human diseases. Activation of Notch impinges directly on gene expression through the Suppressor of Hairless [Su(H)] DNA-binding protein. A major question that remains to be elucidated is how the same Notch signalling pathway can result in different transcriptional responses depending on the cellular context and environment. Here, we have investigated the factors required to confer this specific response in Drosophila adult myogenic progenitor-related cells. Our analysis identifies Twist (Twi) as a crucial co-operating factor. Enhancers from several direct Notch targets require a combination of Twi and Notch activities for expression in vivo; neither alone is sufficient. Twi is bound at target enhancers prior to Notch activation and enhances Su(H) binding to these regulatory regions. To determine the breadth of the combinatorial regulation we mapped Twi occupancy genome-wide in DmD8 myogenic progenitor-related cells by chromatin immunoprecipitation. Comparing the sites bound by Su(H) and by Twi in these cells revealed a strong association, identifying a large spectrum of co-regulated genes. We conclude that Twi is an essential Notch co-regulator in myogenic progenitor cells and has the potential to confer specificity on Notch signalling at over 170 genes, showing that a single factor can have a profound effect on the output of the pathway.
Collapse
Affiliation(s)
- Fred Bernard
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Alena Krejci
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Ben Housden
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Boris Adryan
- Cambridge Systems Biology Centre and Department of Genetics, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK
| | - Sarah J. Bray
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| |
Collapse
|
39
|
Piipari M, Down TA, Hubbard TJ. Metamotifs--a generative model for building families of nucleotide position weight matrices. BMC Bioinformatics 2010; 11:348. [PMID: 20579334 PMCID: PMC2906491 DOI: 10.1186/1471-2105-11-348] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2010] [Accepted: 06/25/2010] [Indexed: 11/25/2022] Open
Abstract
Background Development of high-throughput methods for measuring DNA interactions of transcription factors together with computational advances in short motif inference algorithms is expanding our understanding of transcription factor binding site motifs. The consequential growth of sequence motif data sets makes it important to systematically group and categorise regulatory motifs. It has been shown that there are familial tendencies in DNA sequence motifs that are predictive of the family of factors that binds them. Further development of methods that detect and describe familial motif trends has the potential to help in measuring the similarity of novel computational motif predictions to previously known data and sensitively detecting regulatory motifs similar to previously known ones from novel sequence. Results We propose a probabilistic model for position weight matrix (PWM) sequence motif families. The model, which we call the 'metamotif' describes recurring familial patterns in a set of motifs. The metamotif framework models variation within a family of sequence motifs. It allows for simultaneous estimation of a series of independent metamotifs from input position weight matrix (PWM) motif data and does not assume that all input motif columns contribute to a familial pattern. We describe an algorithm for inferring metamotifs from weight matrix data. We then demonstrate the use of the model in two practical tasks: in the Bayesian NestedMICA model inference algorithm as a PWM prior to enhance motif inference sensitivity, and in a motif classification task where motifs are labelled according to their interacting DNA binding domain. Conclusions We show that metamotifs can be used as PWM priors in the NestedMICA motif inference algorithm to dramatically increase the sensitivity to infer motifs. Metamotifs were also successfully applied to a motif classification problem where sequence motif features were used to predict the family of protein DNA binding domains that would interact with it. The metamotif based classifier is shown to compare favourably to previous related methods. The metamotif has great potential for further use in machine learning tasks related to especially de novo computational sequence motif inference. The metamotif methods presented have been incorporated into the NestedMICA suite.
Collapse
Affiliation(s)
- Matias Piipari
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.
| | | | | |
Collapse
|
40
|
Kulakovskiy IV, Makeev VJ. Discovery of DNA motifs recognized by transcription factors through integration of different experimental sources. Biophysics (Nagoya-shi) 2010. [DOI: 10.1134/s0006350909060013] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
|
41
|
Abstract
Animal growth and development depend on the precise control of gene expression at the level of transcription. A central role in the regulation of developmental transcription is attributed to transcription factors that bind DNA enhancer elements, which are often located far from gene transcription start sites. Here, we review recent studies that have uncovered significant regulatory functions in developmental transcription for the TFIID basal transcription factors and for the DNA core promoter elements that are located close to transcription start sites.
Collapse
Affiliation(s)
- Uwe Ohler
- Institute for Genome Sciences & Policy, Departments of Biostatistics & Bioinformatics and Computer Science, Duke University, Durham, NC 27708, USA
| | | |
Collapse
|
42
|
Piipari M, Down TA, Saini H, Enright A, Hubbard TJP. iMotifs: an integrated sequence motif visualization and analysis environment. Bioinformatics 2010; 26:843-4. [PMID: 20106815 PMCID: PMC2832821 DOI: 10.1093/bioinformatics/btq026] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important. iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces. The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided. AVAILABILITY iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at http://wiki.github.com/mz2/imotifs and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files. CONTACT matias.piipari@gmail.com; imotifs@googlegroups.com.
Collapse
Affiliation(s)
- Matias Piipari
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK.
| | | | | | | | | |
Collapse
|
43
|
Lusk RW, Eisen MB. Evolutionary mirages: selection on binding site composition creates the illusion of conserved grammars in Drosophila enhancers. PLoS Genet 2010; 6:e1000829. [PMID: 20107516 PMCID: PMC2809757 DOI: 10.1371/journal.pgen.1000829] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2009] [Accepted: 12/22/2009] [Indexed: 01/05/2023] Open
Abstract
The clustering of transcription factor binding sites in developmental enhancers and the apparent preferential conservation of clustered sites have been widely interpreted as proof that spatially constrained physical interactions between transcription factors are required for regulatory function. However, we show here that selection on the composition of enhancers alone, and not their internal structure, leads to the accumulation of clustered sites with evolutionary dynamics that suggest they are preferentially conserved. We simulated the evolution of idealized enhancers from Drosophila melanogaster constrained to contain only a minimum number of binding sites for one or more factors. Under this constraint, mutations that destroy an existing binding site are tolerated only if a compensating site has emerged elsewhere in the enhancer. Overlapping sites, such as those frequently observed for the activator Bicoid and repressor Krüppel, had significantly longer evolutionary half-lives than isolated sites for the same factors. This leads to a substantially higher density of overlapping sites than expected by chance and the appearance that such sites are preferentially conserved. Because D. melanogaster (like many other species) has a bias for deletions over insertions, sites tended to become closer together over time, leading to an overall clustering of sites in the absence of any selection for clustered sites. Since this effect is strongest for the oldest sites, clustered sites also incorrectly appear to be preferentially conserved. Following speciation, sites tend to be closer together in all descendent species than in their common ancestors, violating the common assumption that shared features of species' genomes reflect their ancestral state. Finally, we show that selection on binding site composition alone recapitulates the observed number of overlapping and closely neighboring sites in real D. melanogaster enhancers. Thus, this study calls into question the common practice of inferring "cis-regulatory grammars" from the organization and evolutionary dynamics of developmental enhancers.
Collapse
Affiliation(s)
- Richard W. Lusk
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
| | - Michael B. Eisen
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
- Genomics Division, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- California Institute of Quantitative Biosciences, University of California Berkeley, Berkeley, California, United States of America
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, California, United States of America
- * E-mail:
| |
Collapse
|
44
|
Sorourian M, Betrán E. Turnover and lineage-specific broadening of the transcription start site in a testis-specific retrogene. Fly (Austin) 2010; 4:3-11. [PMID: 20160503 PMCID: PMC2855778 DOI: 10.4161/fly.4.1.11136] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Proteasomes are large multisubunit complexes responsible for regulated protein degradation. Made of a core particle (20S) and regulatory caps (19S), proteasomal proteins are encoded by at least 33 genes, of which 12 have been shown to have testis-specific isoforms in Drosophila melanogaster. Pros28.1A (also known as Prosalpha4T1), a young retroduplicate copy of Pros28.1 (also known as Prosalpha4), is one of these isoforms. It is present in the D. melanogaster subgroup and was previously shown to be testis-specific in D. melanogaster. Here, we show its testis-specific transcription in all D. melanogaster subgroup species. Due to this conserved pattern of expression in the species harboring this insertion, we initially expected that a regulatory region common to these species evolved prior to the speciation event. We determined that the region driving testis expression in D. melanogaster is not far from the coding region (within 272 bp upstream of the ATG). However, different Transcription Start Sites (TSSs) are used in D. melanogaster and D. simulans, and a "broad" transcription start site is used in D. yakuba. These results suggest one of the following scenarios: (1) there is a conserved motif in the 5' region of the gene that can be used as an upstream or downstream element or at different distance depending on the species; (2) different species evolved diverse regulatory sequences for the same pattern of expression (i.e., "TSS turnover"); or (3) the transcription start site can be broad or narrow depending on the species. This work reveals the difficulties of studying gene regulation in one species and extrapolating those findings to close relatives.
Collapse
Affiliation(s)
- Mehran Sorourian
- Department of Biology, University of Texas at Arlington, TX, USA
| | | |
Collapse
|
45
|
Deplancke B. Experimental advances in the characterization of metazoan gene regulatory networks. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:12-27. [PMID: 19324929 DOI: 10.1093/bfgp/elp001] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Gene regulatory networks (GRNs) play a vital role in metazoan development and function, and deregulation of these networks is often implicated in disease. GRNs depict the dynamic interactions between genomic and regulatory state components. The genomic components comprise genes and their associated cis-regulatory elements. The regulatory state components consist primarily of transcriptional complexes that bind the latter elements. With the availability of complete genome sequences, several approaches have recently been developed which promise to significantly enhance our ability to identify either the genomic or regulatory state components, or the interactions between these two. In this review, I provide an in-depth overview of these approaches and detail how each contributes to a more comprehensive understanding of GRN composition and function.
Collapse
Affiliation(s)
- Bart Deplancke
- Ecole Polytechnique Fédérale de Lausanne, School of Life Sciences, Institute of Bioengineering, Lausanne, Switzerland.
| |
Collapse
|
46
|
Kulakovskiy IV, Favorov AV, Makeev VJ. Motif discovery and motif finding from genome-mapped DNase footprint data. ACTA ACUST UNITED AC 2009; 25:2318-25. [PMID: 19605419 DOI: 10.1093/bioinformatics/btp434] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Footprint data is an important source of information on transcription factor recognition motifs. However, a footprinting fragment can contain no sequences similar to known protein recognition sites. Inspection of genome fragments nearby can help to identify missing site positions. RESULTS Genome fragments containing footprints were supplied to a pipeline that constructed a position weight matrix (PWM) for different motif lengths and selected the optimal PWM. Fragments were aligned with the SeSiMCMC sampler and a new heuristic algorithm, Bigfoot. Footprints with missing hits were found for approximately 50% of factors. Adding only 2 bp on both sides of a footprinting fragment recovered most hits. We automatically constructed motifs for 41 Drosophila factors. New motifs can recognize footprints with a greater sensitivity at the same false positive rate than existing models. Also we discuss possible overfitting of constructed motifs. AVAILABILITY Software and the collection of regulatory motifs are freely available at http://line.imb.ac.ru/DMMPMM.
Collapse
Affiliation(s)
- Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.
| | | | | |
Collapse
|
47
|
Rach EA, Yuan HY, Majoros WH, Tomancak P, Ohler U. Motif composition, conservation and condition-specificity of single and alternative transcription start sites in the Drosophila genome. Genome Biol 2009; 10:R73. [PMID: 19589141 PMCID: PMC2728527 DOI: 10.1186/gb-2009-10-7-r73] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2008] [Revised: 04/21/2009] [Accepted: 07/09/2009] [Indexed: 01/05/2023] Open
Abstract
A map of transcription start sites across the Drosophila genome, providing insights into initiation patterns and spatiotemporal conditions. Background Transcription initiation is a key component in the regulation of gene expression. mRNA 5' full-length sequencing techniques have enhanced our understanding of mammalian transcription start sites (TSSs), revealing different initiation patterns on a genomic scale. Results To identify TSSs in Drosophila melanogaster, we applied a hierarchical clustering strategy on available 5' expressed sequence tags (ESTs) and identified a high quality set of 5,665 TSSs for approximately 4,000 genes. We distinguished two initiation patterns: 'peaked' TSSs, and 'broad' TSS cluster groups. Peaked promoters were found to contain location-specific sequence elements; conversely, broad promoters were associated with non-location-specific elements. In alignments across other Drosophila genomes, conservation levels of sequence elements exceeded 90% within the melanogaster subgroup, but dropped considerably for distal species. Elements in broad promoters had lower levels of conservation than those in peaked promoters. When characterizing the distributions of ESTs, 64% of TSSs showed distinct associations to one out of eight different spatiotemporal conditions. Available whole-genome tiling array time series data revealed different temporal patterns of embryonic activity across the majority of genes with distinct alternative promoters. Many genes with maternally inherited transcripts were found to have alternative promoters utilized later in development. Core promoters of maternally inherited transcripts showed differences in motif composition compared to zygotically active promoters. Conclusions Our study provides a comprehensive map of Drosophila TSSs and the conditions under which they are utilized. Distinct differences in motif associations with initiation pattern and spatiotemporal utilization illustrate the complex regulatory code of transcription initiation.
Collapse
Affiliation(s)
- Elizabeth A Rach
- Program in Computational Biology and Bioinformatics, Duke University, Science Drive, Durham, NC 27708, USA
| | | | | | | | | |
Collapse
|
48
|
Yokoyama KD, Ohler U, Wray GA. Measuring spatial preferences at fine-scale resolution identifies known and novel cis-regulatory element candidates and functional motif-pair relationships. Nucleic Acids Res 2009; 37:e92. [PMID: 19483094 PMCID: PMC2715254 DOI: 10.1093/nar/gkp423] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Transcriptional regulation is mediated by the collective binding of proteins called transcription factors to cis-regulatory elements. A handful of factors are known to function at particular distances from the transcription start site, although the extent to which this occurs is not well understood. Spatial dependencies can also exist between pairs of binding motifs, facilitating factor-pair interactions. We sought to determine to what extent spatial preferences measured at high-scale resolution could be utilized to predict cis-regulatory elements as well as motif-pairs binding interacting proteins. We introduce the ‘motif positional function’ model which predicts spatial biases using regression analysis, differentiating noise from true position-specific overrepresentation at single-nucleotide resolution. Our method predicts 48 consensus motifs exhibiting positional enrichment within human promoters, including fourteen motifs without known binding partners. We then extend the model to analyze distance preferences between pairs of motifs. We find that motif-pairs binding interacting factors often co-occur preferentially at multiple distances, with intervals between preferred distances often corresponding to the turn of the DNA double-helix. This offers a novel means by which to predict sequence elements with a collective role in gene regulation.
Collapse
Affiliation(s)
- Ken Daigoro Yokoyama
- Biology Department, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USA
| | | | | |
Collapse
|
49
|
Curk T, Petrovic U, Shaulsky G, Zupan B. Rule-based clustering for gene promoter structure discovery. Methods Inf Med 2009; 48:229-35. [PMID: 19387502 PMCID: PMC2746478 DOI: 10.3414/me9225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
BACKGROUND The genetic cellular response to internal and external changes is determined by the sequence and structure of gene-regulatory promoter regions. OBJECTIVES Using data on gene-regulatory elements (i.e., either putative or known transcription factor binding sites) and data on gene expression profiles we can discover structural elements in promoter regions and infer the underlying programs of gene regulation. Such hypotheses obtained in silico can greatly assist us in experiment planning. The principal obstacle for such approaches is the combinatorial explosion in different combinations of promoter elements to be examined. METHODS Stemming from several state-of-the-art machine learning approaches we here propose a heuristic, rule-based clustering method that uses gene expression similarity to guide the search for informative structures in promoters, thus exploring only the most promising parts of the vast and expressively rich rule-space. RESULTS We present the utility of the method in the analysis of gene expression data on budding yeast S. cerevisiae where cells were induced to proliferate peroxisomes. CONCLUSIONS We demonstrate that the proposed approach is able to infer informative relations uncovering relatively complex structures in gene promoter regions that regulate gene expression.
Collapse
Affiliation(s)
- Tomaz Curk
- Tomaz Curk, University of Ljubljana, Faculty of Comp. and Inf. Science, Trzaska c. 25, 1000 Ljubljana, Slovenija.
| | | | | | | |
Collapse
|
50
|
Complex organizational structure of the genome revealed by genome-wide analysis of single and alternative promoters in Drosophila melanogaster. BMC Genomics 2009; 10:9. [PMID: 19128496 PMCID: PMC2631479 DOI: 10.1186/1471-2164-10-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2008] [Accepted: 01/07/2009] [Indexed: 12/31/2022] Open
Abstract
Background The promoter is a critical necessary transcriptional cis-regulatory element. In addition to its role as an assembly site for the basal transcriptional apparatus, the promoter plays a key part in mediating temporal and spatial aspects of gene expression through differential binding of transcription factors and selective interaction with distal enhancers. Although many genes have multiple promoters, little attention has been focused on how these relate to one another; nor has much study been directed at relationships between promoters of adjacent genes. Results We have undertaken a systematic investigation of Drosophila promoters. We divided promoters into three groups: unique promoters, first alternative promoters (the most 5' of a gene's multiple promoters), and downstream alternative promoters (the remaining alternative promoters 3' to the first). We observed distinct nucleotide distribution and sequence motif preferences among these three classes. We also investigated the promoters of neighboring genes and found that a greater than expected number of adjacent genes have similar sequence motif profiles, which may allow the genes to be regulated in a coordinated fashion. Consistent with this, there is a positive correlation between similar promoter motifs and related gene expression profiles for these genes. Conclusions Our results suggest that different regulatory mechanisms may apply to each of the three promoter classes, and provide a mechanism for "gene expression neighborhoods," local clusters of co-expressed genes. As a whole, our data reveal an unexpected complexity of genomic organization at the promoter level with respect to both alternative and neighboring promoters.
Collapse
|