1
|
Buffry AD, Mendes CC, McGregor AP. The Functionality and Evolution of Eukaryotic Transcriptional Enhancers. ADVANCES IN GENETICS 2016; 96:143-206. [PMID: 27968730 DOI: 10.1016/bs.adgen.2016.08.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Enhancers regulate precise spatial and temporal patterns of gene expression in eukaryotes and, moreover, evolutionary changes in these modular cis-regulatory elements may represent the predominant genetic basis for phenotypic evolution. Here, we review approaches to identify and functionally analyze enhancers and their transcription factor binding sites, including assay for transposable-accessible chromatin-sequencing (ATAC-Seq) and clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9, respectively. We also explore enhancer functionality, including how transcription factor binding sites combine to regulate transcription, as well as research on shadow and super enhancers, and how enhancers can act over great distances and even in trans. Finally, we discuss recent theoretical and empirical data on how transcription factor binding sites and enhancers evolve. This includes how the function of enhancers is maintained despite the turnover of transcription factor binding sites as well as reviewing studies where mutations in enhancers have been shown to underlie morphological change.
Collapse
Affiliation(s)
- A D Buffry
- Oxford Brookes University, Oxford, United Kingdom
| | - C C Mendes
- Oxford Brookes University, Oxford, United Kingdom
| | - A P McGregor
- Oxford Brookes University, Oxford, United Kingdom
| |
Collapse
|
2
|
Nadimpalli S, Persikov AV, Singh M. Pervasive variation of transcription factor orthologs contributes to regulatory network evolution. PLoS Genet 2015; 11:e1005011. [PMID: 25748510 PMCID: PMC4351887 DOI: 10.1371/journal.pgen.1005011] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Accepted: 01/18/2015] [Indexed: 01/17/2023] Open
Abstract
Differences in transcriptional regulatory networks underlie much of the phenotypic variation observed across organisms. Changes to cis-regulatory elements are widely believed to be the predominant means by which regulatory networks evolve, yet examples of regulatory network divergence due to transcription factor (TF) variation have also been observed. To systematically ascertain the extent to which TFs contribute to regulatory divergence, we analyzed the evolution of the largest class of metazoan TFs, Cys2-His2 zinc finger (C2H2-ZF) TFs, across 12 Drosophila species spanning ~45 million years of evolution. Remarkably, we uncovered that a significant fraction of all C2H2-ZF 1-to-1 orthologs in flies exhibit variations that can affect their DNA-binding specificities. In addition to loss and recruitment of C2H2-ZF domains, we found diverging DNA-contacting residues in ~44% of domains shared between D. melanogaster and the other fly species. These diverging DNA-contacting residues, found in ~70% of the D. melanogaster C2H2-ZF genes in our analysis and corresponding to ~26% of all annotated D. melanogaster TFs, show evidence of functional constraint: they tend to be conserved across phylogenetic clades and evolve slower than other diverging residues. These same variations were rarely found as polymorphisms within a population of D. melanogaster flies, indicating their rapid fixation. The predicted specificities of these dynamic domains gradually change across phylogenetic distances, suggesting stepwise evolutionary trajectories for TF divergence. Further, whereas proteins with conserved C2H2-ZF domains are enriched in developmental functions, those with varying domains exhibit no functional enrichments. Our work suggests that a subset of highly dynamic and largely unstudied TFs are a likely source of regulatory variation in Drosophila and other metazoans.
Collapse
Affiliation(s)
- Shilpa Nadimpalli
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Anton V. Persikov
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
3
|
Wotton KR, Jiménez-Guri E, Crombach A, Janssens H, Alcaine-Colet A, Lemke S, Schmidt-Ott U, Jaeger J. Quantitative system drift compensates for altered maternal inputs to the gap gene network of the scuttle fly Megaselia abdita. eLife 2015; 4:e04785. [PMID: 25560971 PMCID: PMC4337606 DOI: 10.7554/elife.04785] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 01/02/2015] [Indexed: 12/20/2022] Open
Abstract
The segmentation gene network in insects can produce equivalent phenotypic outputs despite differences in upstream regulatory inputs between species. We investigate the mechanistic basis of this phenomenon through a systems-level analysis of the gap gene network in the scuttle fly Megaselia abdita (Phoridae). It combines quantification of gene expression at high spatio-temporal resolution with systematic knock-downs by RNA interference (RNAi). Initiation and dynamics of gap gene expression differ markedly between M. abdita and Drosophila melanogaster, while the output of the system converges to equivalent patterns at the end of the blastoderm stage. Although the qualitative structure of the gap gene network is conserved, there are differences in the strength of regulatory interactions between species. We term such network rewiring 'quantitative system drift'. It provides a mechanistic explanation for the developmental hourglass model in the dipteran lineage. Quantitative system drift is likely to be a widespread mechanism for developmental evolution.
Collapse
Affiliation(s)
- Karl R Wotton
- European Molecular Biology Laboratory, CRG Systems Biology Research Unit, Centre for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Eva Jiménez-Guri
- European Molecular Biology Laboratory, CRG Systems Biology Research Unit, Centre for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Anton Crombach
- European Molecular Biology Laboratory, CRG Systems Biology Research Unit, Centre for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Hilde Janssens
- European Molecular Biology Laboratory, CRG Systems Biology Research Unit, Centre for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Anna Alcaine-Colet
- European Molecular Biology Laboratory, CRG Systems Biology Research Unit, Centre for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
- Universitat de Barcelona, Barcelona, Spain
| | - Steffen Lemke
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, United States
| | - Urs Schmidt-Ott
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, United States
| | - Johannes Jaeger
- European Molecular Biology Laboratory, CRG Systems Biology Research Unit, Centre for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
4
|
Navarro C, Lopez FJ, Cano C, Garcia-Alcalde F, Blanco A. CisMiner: genome-wide in-silico cis-regulatory module prediction by fuzzy itemset mining. PLoS One 2014; 9:e108065. [PMID: 25268582 PMCID: PMC4182448 DOI: 10.1371/journal.pone.0108065] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 08/25/2014] [Indexed: 01/18/2023] Open
Abstract
Eukaryotic gene control regions are known to be spread throughout non-coding DNA sequences which may appear distant from the gene promoter. Transcription factors are proteins that coordinately bind to these regions at transcription factor binding sites to regulate gene expression. Several tools allow to detect significant co-occurrences of closely located binding sites (cis-regulatory modules, CRMs). However, these tools present at least one of the following limitations: 1) scope limited to promoter or conserved regions of the genome; 2) do not allow to identify combinations involving more than two motifs; 3) require prior information about target motifs. In this work we present CisMiner, a novel methodology to detect putative CRMs by means of a fuzzy itemset mining approach able to operate at genome-wide scale. CisMiner allows to perform a blind search of CRMs without any prior information about target CRMs nor limitation in the number of motifs. CisMiner tackles the combinatorial complexity of genome-wide cis-regulatory module extraction using a natural representation of motif combinations as itemsets and applying the Top-Down Fuzzy Frequent- Pattern Tree algorithm to identify significant itemsets. Fuzzy technology allows CisMiner to better handle the imprecision and noise inherent to regulatory processes. Results obtained for a set of well-known binding sites in the S. cerevisiae genome show that our method yields highly reliable predictions. Furthermore, CisMiner was also applied to putative in-silico predicted transcription factor binding sites to identify significant combinations in S. cerevisiae and D. melanogaster, proving that our approach can be further applied genome-wide to more complex genomes. CisMiner is freely accesible at: http://genome2.ugr.es/cisminer. CisMiner can be queried for the results presented in this work and can also perform a customized cis-regulatory module prediction on a query set of transcription factor binding sites provided by the user.
Collapse
Affiliation(s)
- Carmen Navarro
- Department of Computer Science and AI, University of Granada, Granada, Spain
| | - Francisco J. Lopez
- Andalusian Human Genome Sequencing Centre (CASEGH), Medical Genome Project (MGP), Sevilla, Spain
| | - Carlos Cano
- Department of Computer Science and AI, University of Granada, Granada, Spain
| | | | - Armando Blanco
- Department of Computer Science and AI, University of Granada, Granada, Spain
| |
Collapse
|
5
|
Abstract
With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.
Collapse
|
6
|
Maeso I, Irimia M, Tena JJ, Casares F, Gómez-Skarmeta JL. Deep conservation of cis-regulatory elements in metazoans. Philos Trans R Soc Lond B Biol Sci 2013; 368:20130020. [PMID: 24218633 DOI: 10.1098/rstb.2013.0020] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Despite the vast morphological variation observed across phyla, animals share multiple basic developmental processes orchestrated by a common ancestral gene toolkit. These genes interact with each other building complex gene regulatory networks (GRNs), which are encoded in the genome by cis-regulatory elements (CREs) that serve as computational units of the network. Although GRN subcircuits involved in ancient developmental processes are expected to be at least partially conserved, identification of CREs that are conserved across phyla has remained elusive. Here, we review recent studies that revealed such deeply conserved CREs do exist, discuss the difficulties associated with their identification and describe new approaches that will facilitate this search.
Collapse
Affiliation(s)
- Ignacio Maeso
- Department of Zoology, University of Oxford, , Oxford, UK
| | | | | | | | | |
Collapse
|
7
|
Starr MO, Ho MCW, Gunther EJM, Tu YK, Shur AS, Goetz SE, Borok MJ, Kang V, Drewell RA. Molecular dissection of cis-regulatory modules at the Drosophila bithorax complex reveals critical transcription factor signature motifs. Dev Biol 2011; 359:290-302. [PMID: 21821017 PMCID: PMC3202680 DOI: 10.1016/j.ydbio.2011.07.028] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 07/17/2011] [Accepted: 07/19/2011] [Indexed: 11/17/2022]
Abstract
At the Drosophila melanogaster bithorax complex (BX-C) over 330kb of intergenic DNA is responsible for directing the transcription of just three homeotic (Hox) genes during embryonic development. A number of distinct enhancer cis-regulatory modules (CRMs) are responsible for controlling the specific expression patterns of the Hox genes in the BX-C. While it has proven possible to identify orthologs of known BX-C CRMs in different Drosophila species using overall sequence conservation, this approach has not proven sufficiently effective for identifying novel CRMs or defining the key functional sequences within enhancer CRMs. Here we demonstrate that the specific spatial clustering of transcription factor (TF) binding sites is important for BX-C enhancer activity. A bioinformatic search for combinations of putative TF binding sites in the BX-C suggests that simple clustering of binding sites is frequently not indicative of enhancer activity. However, through molecular dissection and evolutionary comparison across the Drosophila genus we discovered that specific TF binding site clustering patterns are an important feature of three known BX-C enhancers. Sub-regions of the defined IAB5 and IAB7b enhancers were both found to contain an evolutionarily conserved signature motif of clustered TF binding sites which is critical for the functional activity of the enhancers. Together, these results indicate that the spatial organization of specific activator and repressor binding sites within BX-C enhancers is of greater importance than overall sequence conservation and is indicative of enhancer functional activity.
Collapse
Affiliation(s)
| | | | | | - Yen-Kuei Tu
- Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| | - Andrey S. Shur
- Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| | - Sara E. Goetz
- Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| | - Matthew J. Borok
- Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| | - Victoria Kang
- Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| | - Robert A. Drewell
- Biology Department, Harvey Mudd College, 301 Platt Boulevard, Claremont, CA 91711, USA
| |
Collapse
|
8
|
Swamy KBS, Chu WY, Wang CY, Tsai HK, Wang D. Evidence of association between nucleosome occupancy and the evolution of transcription factor binding sites in yeast. BMC Evol Biol 2011; 11:150. [PMID: 21627806 PMCID: PMC3124427 DOI: 10.1186/1471-2148-11-150] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2011] [Accepted: 05/31/2011] [Indexed: 11/14/2022] Open
Abstract
Background Divergence of transcription factor binding sites is considered to be an important source of regulatory evolution. The associations between transcription factor binding sites and phenotypic diversity have been investigated in many model organisms. However, the understanding of other factors that contribute to it is still limited. Recent studies have elucidated the effect of chromatin structure on molecular evolution of genomic DNA. Though the profound impact of nucleosome positions on gene regulation has been reported, their influence on transcriptional evolution is still less explored. With the availability of genome-wide nucleosome map in yeast species, it is thus desirable to investigate their impact on transcription factor binding site evolution. Here, we present a comprehensive analysis of the role of nucleosome positioning in the evolution of transcription factor binding sites. Results We compared the transcription factor binding site frequency in nucleosome occupied regions and nucleosome depleted regions in promoters of old (orthologs among Saccharomycetaceae) and young (Saccharomyces specific) genes; and in duplicate gene pairs. We demonstrated that nucleosome occupied regions accommodate greater binding site variations than nucleosome depleted regions in young genes and in duplicate genes. This finding was confirmed by measuring the difference in evolutionary rates of binding sites in sensu stricto yeasts at nucleosome occupied regions and nucleosome depleted regions. The binding sites at nucleosome occupied regions exhibited a consistently higher evolution rate than those at nucleosome depleted regions, corroborating the difference in the selection constraints at the two regions. Finally, through site-directed mutagenesis experiment, we found that binding site gain or loss events at nucleosome depleted regions may cause more expression differences than those in nucleosome occupied regions. Conclusions Our study indicates the existence of different selection constraint on binding sites at nucleosome occupied regions than at the nucleosome depleted regions. We found that the binding sites have a different rate of evolution at nucleosome occupied and depleted regions. Finally, using transcription factor binding site-directed mutagenesis experiment, we confirmed the difference in the impact of binding site changes on expression at these regions. Thus, our work demonstrates the importance of composite analysis of chromatin and transcriptional evolution.
Collapse
Affiliation(s)
- Krishna B S Swamy
- Institute of Information Science, Academia Sinica, Taipei, 115, Taiwan
| | | | | | | | | |
Collapse
|
9
|
Soccio RE, Tuteja G, Everett LJ, Li Z, Lazar MA, Kaestner KH. Species-specific strategies underlying conserved functions of metabolic transcription factors. Mol Endocrinol 2011; 25:694-706. [PMID: 21292830 DOI: 10.1210/me.2010-0454] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The winged helix protein FOXA2 and the nuclear receptor peroxisome proliferator-activated receptor-γ (PPARγ) are highly conserved, regionally expressed transcription factors (TFs) that regulate networks of genes controlling complex metabolic functions. Cistrome analysis for Foxa2 in mouse liver and PPARγ in mouse adipocytes has previously produced consensus-binding sites that are nearly identical to those used by the corresponding TFs in human cells. We report here that, despite the conservation of the canonical binding motif, the great majority of binding regions for FOXA2 in human liver and for PPARγ in human adipocytes are not in the orthologous locations corresponding to the mouse genome, and vice versa. Of note, TF binding can be absent in one species despite sequence conservation, including motifs that do support binding in the other species, demonstrating a major limitation of in silico binding site prediction. Whereas only approximately 10% of binding sites are conserved, gene-centric analysis reveals that about 50% of genes with nearby TF occupancy are shared across species for both hepatic FOXA2 and adipocyte PPARγ. Remarkably, for both TFs, many of the shared genes function in tissue-specific metabolic pathways, whereas species-unique genes fail to show enrichment for these pathways. Nonetheless, the species-unique genes, like the shared genes, showed the expected transcriptional regulation by the TFs in loss-of-function experiments. Thus, species-specific strategies underlie the biological functions of metabolic TFs that are highly conserved across mammalian species. Analysis of factor binding in multiple species may be necessary to distinguish apparent species-unique noise and reveal functionally relevant information.
Collapse
Affiliation(s)
- Raymond E Soccio
- Division of Endocrinology, Diabetes, and Metabolism, Department of Medicine, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104-6149, USA
| | | | | | | | | | | |
Collapse
|
10
|
Venkataram S, Fay JC. Is transcription factor binding site turnover a sufficient explanation for cis-regulatory sequence divergence? Genome Biol Evol 2010; 2:851-8. [PMID: 21068212 PMCID: PMC2997565 DOI: 10.1093/gbe/evq066] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The molecular evolution of cis-regulatory sequences is not well understood. Comparisons of closely related species show that cis-regulatory sequences contain a large number of sites constrained by purifying selection. In contrast, there are a number of examples from distantly related species where cis-regulatory sequences retain little to no sequence similarity but drive similar patterns of gene expression. Binding site turnover, whereby the gain of a redundant binding site enables loss of a previously functional site, is one model by which cis-regulatory sequences can diverge without a concurrent change in function. To determine whether cis-regulatory sequence divergence is consistent with binding site turnover, we examined binding site evolution within orthologous intergenic sequences from 14 yeast species defined by their syntenic relationships with adjacent coding sequences. Both local and global alignments show that nearly all distantly related orthologous cis-regulatory sequences have no significant level of sequence similarity but are enriched for experimentally identified binding sites. Yet, a significant proportion of experimentally identified binding sites that are conserved in closely related species are absent in distantly related species and so cannot be explained by binding site turnover. Depletion of binding sites depends on the transcription factor but is detectable for a quarter of all transcription factors examined. Our results imply that binding site turnover is not a sufficient explanation for cis-regulatory sequence evolution.
Collapse
|
11
|
Spirov AV, Holloway DM. Design of a dynamic model of genes with multiple autonomous regulatory modules by evolutionary computations. PROCEDIA COMPUTER SCIENCE 2010; 1:999-1008. [PMID: 20930945 PMCID: PMC2949972 DOI: 10.1016/j.procs.2010.04.111] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
A new approach to design a dynamic model of genes with multiple autonomous regulatory modules by evolutionary computations is proposed. The approach is based on Genetic Algorithms (GA), with new crossover operators especially designed for these purposes. The new operators use local homology between parental strings to preserve building blocks found by the algorithm. The approach exploits the subbasin-portal architecture of the fitness functions suitable for this kind of evolutionary modeling. This architecture is significant for Royal Road class fitness functions. Two real-life Systems Biology problems with such fitness functions are implemented here: evolution of the bacterial promoter rrnPl and of the enhancer of the Drosophila even-skipped gene. The effectiveness of the approach compared to standard GA is demonstrated on several benchmark and real-life tasks.
Collapse
Affiliation(s)
- Alexander V. Spirov
- State University of New York at Stony Brook, Computer Science Department and Center of Excellence in Wireless & Information Technology, Stony Brook University Research & Development Park, 1500 Stony Brook Road, Stony Brook, NY 11794-6040, USA
| | - David M. Holloway
- Mathematics Department, British Columbia Institute of Technology, Burnaby, B.C., Canada; Biology Department, University of Victoria, B.C., Canada
| |
Collapse
|
12
|
Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species. PLoS Biol 2010; 8:e1000343. [PMID: 20351773 PMCID: PMC2843597 DOI: 10.1371/journal.pbio.1000343] [Citation(s) in RCA: 155] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2009] [Accepted: 02/17/2010] [Indexed: 01/06/2023] Open
Abstract
Genome-wide comparison of transcription factor binding between related Drosophila species highlights how sequence changes affect the biochemical events that underlie animal development. Changes in gene expression play an important role in evolution, yet the molecular mechanisms underlying regulatory evolution are poorly understood. Here we compare genome-wide binding of the six transcription factors that initiate segmentation along the anterior-posterior axis in embryos of two closely related species: Drosophila melanogaster and Drosophila yakuba. Where we observe binding by a factor in one species, we almost always observe binding by that factor to the orthologous sequence in the other species. Levels of binding, however, vary considerably. The magnitude and direction of the interspecies differences in binding levels of all six factors are strongly correlated, suggesting a role for chromatin or other factor-independent forces in mediating the divergence of transcription factor binding. Nonetheless, factor-specific quantitative variation in binding is common, and we show that it is driven to a large extent by the gain and loss of cognate recognition sequences for the given factor. We find only a weak correlation between binding variation and regulatory function. These data provide the first genome-wide picture of how modest levels of sequence divergence between highly morphologically similar species affect a system of coordinately acting transcription factors during animal development, and highlight the dominant role of quantitative variation in transcription factor binding over short evolutionary distances. The differentiation of cells, tissues, and organs during animal development is established by a process in which genes that control cell identity and behavior are turned on and off at specific times and places. This process is choreographed, to a large extent, by a collection of proteins known as transcription factors that bind to specific sequences in DNA and thereby modulate the expression of neighboring genes. Because of the central role that transcription factors play in shaping organismal form and function, they have long been suggested to be major players in phenotypic evolution. However, we have a poor understanding of how changes to DNA affect transcription factor binding in living systems. Here, we use a combination of biochemical and genomic techniques to compare, between two closely related species of fruit flies in the genus Drosophila, the binding of six transcription factors that help establish the characteristic segments that form along the anterior-posterior (head to tail) axis in developing flies. We show that the patterns of transcription factor binding between these closely related species are broadly conserved, consistent with the nearly identical development and appearance of these species. However, we also show that, whereas the DNA changes that have accumulated between these species in the five million years since their divergence—roughly one difference per 10 basepairs—have not altered the locations where these factors bind, they have had a considerable effect on the amount of factor bound at each site across a population of embryos. We can trace these quantitative differences in binding to the gain and loss of the short sequences known to be preferentially recognized by these factors, giving us key insights into the effect that sequence changes have on the biochemical events that underlie animal development.
Collapse
|
13
|
Weirauch MT, Hughes TR. Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same. Trends Genet 2010; 26:66-74. [PMID: 20083321 DOI: 10.1016/j.tig.2009.12.002] [Citation(s) in RCA: 126] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2009] [Revised: 12/09/2009] [Accepted: 12/09/2009] [Indexed: 12/28/2022]
Abstract
Regulatory regions with similar transcriptional output often have little overt sequence similarity, both within and between genomes. Although cis- and trans-regulatory changes can contribute to sequence divergence without dramatically altering gene expression outputs, heterologous DNA often functions similarly in organisms that share little regulatory sequence similarities (e.g. human DNA in fish), indicating that trans-regulatory mechanisms tend to diverge more slowly and can accommodate a variety of cis-regulatory configurations. This capacity to 'tinker' with regulatory DNA probably relates to the complexity, robustness and evolvability of regulatory systems, but cause-and-effect relationships among evolutionary processes and properties of regulatory systems remain a topic of debate. The challenge of understanding the concrete mechanisms underlying cis-regulatory evolution - including the conservation of function without the conservation of sequence - relates to the challenge of understanding the function of regulatory systems in general. Currently, we are largely unable to recognize functionally similar regulatory DNA.
Collapse
Affiliation(s)
- Matthew T Weirauch
- Banting and Best Department of Medical Research and Donnelly Centre for Cellular and Biomolecular Research, Ontario, Canada
| | | |
Collapse
|
14
|
Sorourian M, Betrán E. Turnover and lineage-specific broadening of the transcription start site in a testis-specific retrogene. Fly (Austin) 2010; 4:3-11. [PMID: 20160503 PMCID: PMC2855778 DOI: 10.4161/fly.4.1.11136] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Proteasomes are large multisubunit complexes responsible for regulated protein degradation. Made of a core particle (20S) and regulatory caps (19S), proteasomal proteins are encoded by at least 33 genes, of which 12 have been shown to have testis-specific isoforms in Drosophila melanogaster. Pros28.1A (also known as Prosalpha4T1), a young retroduplicate copy of Pros28.1 (also known as Prosalpha4), is one of these isoforms. It is present in the D. melanogaster subgroup and was previously shown to be testis-specific in D. melanogaster. Here, we show its testis-specific transcription in all D. melanogaster subgroup species. Due to this conserved pattern of expression in the species harboring this insertion, we initially expected that a regulatory region common to these species evolved prior to the speciation event. We determined that the region driving testis expression in D. melanogaster is not far from the coding region (within 272 bp upstream of the ATG). However, different Transcription Start Sites (TSSs) are used in D. melanogaster and D. simulans, and a "broad" transcription start site is used in D. yakuba. These results suggest one of the following scenarios: (1) there is a conserved motif in the 5' region of the gene that can be used as an upstream or downstream element or at different distance depending on the species; (2) different species evolved diverse regulatory sequences for the same pattern of expression (i.e., "TSS turnover"); or (3) the transcription start site can be broad or narrow depending on the species. This work reveals the difficulties of studying gene regulation in one species and extrapolating those findings to close relatives.
Collapse
Affiliation(s)
- Mehran Sorourian
- Department of Biology, University of Texas at Arlington, TX, USA
| | | |
Collapse
|
15
|
Guruceaga E, Segura V, Corrales FJ, Rubio A. Genome-wide proximal promoter analysis and interpretation. Methods Mol Biol 2010; 593:157-174. [PMID: 19957149 DOI: 10.1007/978-1-60327-194-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2025]
Abstract
High-throughput gene expression technologies based on DNA microarrays allow the examination of biological systems. However, the interpretation of the complex molecular descriptions generated by these approaches is still challenging. The development of new methodologies to identify common regulatory mechanisms involved in the control of the expression of a set of co-expressed genes might enhance our capacity to extract functional information from genomic data sets. In this chapter, we describe a method that integrates different sources of information: gene expression data, genome sequence information, described transcription factor binding sites (TFBSs), functional information, and bibliographic data. The starting point of the analysis is the extraction of promoter sequences from a whole genome and the detection of TFBSs in each gene promoter. This information allows the identification of enriched TFBSs in the proximal promoter of differentially expressed genes. The functional and bibliographic interpretation of the results improves our biological insight into the regulatory mechanisms involved in a microarray experiment.
Collapse
Affiliation(s)
- Elizabeth Guruceaga
- CEIT, Centro de Estudios e Investigaciones Técnicas de Gipuzkoa, San Sebastian, Spain
| | | | | | | |
Collapse
|
16
|
Kim J, He X, Sinha S. Evolution of regulatory sequences in 12 Drosophila species. PLoS Genet 2009; 5:e1000330. [PMID: 19132088 PMCID: PMC2607023 DOI: 10.1371/journal.pgen.1000330] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2008] [Accepted: 12/05/2008] [Indexed: 01/07/2023] Open
Abstract
Characterization of the evolutionary constraints acting on cis-regulatory sequences is crucial to comparative genomics and provides key insights on the evolution of organismal diversity. We study the relationships among orthologous cis-regulatory modules (CRMs) in 12 Drosophila species, especially with respect to the evolution of transcription factor binding sites, and report statistical evidence in favor of key evolutionary hypotheses. Binding sites are found to have position-specific substitution rates. However, the selective forces at different positions of a site do not act independently, and the evidence suggests that constraints on sites are often based on their exact binding affinities. Binding site loss is seen to conform to a molecular clock hypothesis. The rate of site loss is transcription factor–specific and depends on the strength of binding and, in some cases, the presence of other binding sites in close proximity. Our analysis is based on a novel computational method for aligning orthologous CRMs on a tree, which rigorously accounts for alignment uncertainties and exploits binding site predictions through a unified probabilistic framework. Finally, we report weak purifying selection on short deletions, providing important clues about overall spatial constraints on CRMs. Our results present a complex picture of regulatory sequence evolution, with substantial plasticity that depends on a number of factors. The insights gained in this study will help us to understand the combinatorial control of gene regulation and how it evolves. They will pave the way for theoretical models that are cognizant of the important determinants of regulatory sequence evolution and will be critical in genome-wide identification of non-coding sequences under purifying or positive selection. The spatial–temporal expression pattern of a gene, which is crucial to its function, is controlled by cis-regulatory DNA sequences. Forming the basic units of regulatory sequences are transcription factor binding sites, often organized into larger modules that determine gene expression in response to combinatorial environmental signals. Understanding the conservation and change of regulatory sequences is critical to our knowledge of the unity as well as diversity of animal development and phenotypes. In this paper, we study the evolution of sequences involved in the regulation of body patterning in the Drosophila embryo. We find that mutations of nucleotides within a binding site are constrained by evolutionary forces to preserve the site's binding affinity to the cognate transcription factor. Functional binding sites are frequently destroyed during evolution and the rate of loss across evolutionary spans is roughly constant. We also find that the evolutionary fate of a site strongly depends on its context; a pair of interacting sites are more likely to survive mutational forces than isolated sites. Together, these findings provide new insights and pose new challenges to our understanding of cis-regulatory sequences and their evolution.
Collapse
Affiliation(s)
- Jaebum Kim
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xin He
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
17
|
Hauenschild A, Ringrose L, Altmutter C, Paro R, Rehmsmeier M. Evolutionary plasticity of polycomb/trithorax response elements in Drosophila species. PLoS Biol 2008; 6:e261. [PMID: 18959483 PMCID: PMC2573935 DOI: 10.1371/journal.pbio.0060261] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2008] [Accepted: 09/15/2008] [Indexed: 12/22/2022] Open
Abstract
cis-Regulatory DNA elements contain multiple binding sites for activators and repressors of transcription. Among these elements are enhancers, which establish gene expression states, and Polycomb/Trithorax response elements (PREs), which take over from enhancers and maintain transcription states of several hundred developmentally important genes. PREs are essential to the correct identities of both stem cells and differentiated cells. Evolutionary differences in cis-regulatory elements are a rich source of phenotypic diversity, and functional binding sites within regulatory elements turn over rapidly in evolution. However, more radical evolutionary changes that go beyond motif turnover have been difficult to assess. We used a combination of genome-wide bioinformatic prediction and experimental validation at specific loci, to evaluate PRE evolution across four Drosophila species. Our results show that PRE evolution is extraordinarily dynamic. First, we show that the numbers of PREs differ dramatically between species. Second, we demonstrate that functional binding sites within PREs at conserved positions turn over rapidly in evolution, as has been observed for enhancer elements. Finally, although it is theoretically possible that new elements can arise out of nonfunctional sequence, evidence that they do so is lacking. We show here that functional PREs are found at nonorthologous sites in conserved gene loci. By demonstrating that PRE evolution is not limited to the adaptation of preexisting elements, these findings document a novel dimension of cis-regulatory evolution.
Collapse
Affiliation(s)
- Arne Hauenschild
- Universität Bielefeld, Center for Biotechnology (CeBiTec),
Bielefeld, Germany
| | - Leonie Ringrose
- Institute of Molecular Biotechnology (IMBA), Vienna, Austria
- Zentrum für Molekulare Biologie der Universität
Heidelberg (ZMBH), Heidelberg, Germany
- * To whom correspondence should be addressed. E-mail:
(MR); (LR)
| | | | - Renato Paro
- Zentrum für Molekulare Biologie der Universität
Heidelberg (ZMBH), Heidelberg, Germany
- Department of Biosystems Science and Engineering, ETH Zurich,
Basel, Switzerland
| | - Marc Rehmsmeier
- Universität Bielefeld, Center for Biotechnology (CeBiTec),
Bielefeld, Germany
- Gregor Mendel Institute of Molecular Plant Biology (GMI), Vienna,
Austria
- * To whom correspondence should be addressed. E-mail:
(MR); (LR)
| |
Collapse
|
18
|
Ettwiller L, Budd A, Spitz F, Wittbrodt J. Analysis of mammalian gene batteries reveals both stable ancestral cores and highly dynamic regulatory sequences. Genome Biol 2008; 9:R172. [PMID: 19087242 PMCID: PMC2646276 DOI: 10.1186/gb-2008-9-12-r172] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2008] [Revised: 12/01/2008] [Accepted: 12/16/2008] [Indexed: 12/18/2022] Open
Abstract
Analysis of the evolutionary dynamics of target gene batteries controlled by 16 different transcription factors reveals stable ancestral cores and highly dynamic regulatory sequences Background Changes in gene regulation are suspected to comprise one of the driving forces for evolution. To address the extent of cis-regulatory changes and how they impact on gene regulatory networks across eukaryotes, we systematically analyzed the evolutionary dynamics of target gene batteries controlled by 16 different transcription factors. Results We found that gene batteries show variable conservation within vertebrates, with slow and fast evolving modules. Hence, while a key gene battery associated with the cell cycle is conserved throughout metazoans, the POU5F1 (Oct4) and SOX2 batteries in embryonic stem cells show strong conservation within mammals, with the striking exception of rodents. Within the genes composing a given gene battery, we could identify a conserved core that likely reflects the ancestral function of the corresponding transcription factor. Interestingly, we show that the association between a transcription factor and its target genes is conserved even when we exclude conserved sequence similarities of their promoter regions from our analysis. This supports the idea that turnover, either of the transcription factor binding site or its direct neighboring sequence, is a pervasive feature of proximal regulatory sequences. Conclusions Our study reveals the dynamics of evolutionary changes within metazoan gene networks, including both the composition of gene batteries and the architecture of target gene promoters. This variation provides the playground required for evolutionary innovation around conserved ancestral core functions.
Collapse
Affiliation(s)
- Laurence Ettwiller
- Developmental Biology Unit, EMBL-Heidelberg, Meyerhofstrasse 1, Heidelberg, 69117, Germany.
| | | | | | | |
Collapse
|
19
|
Hare EE, Peterson BK, Iyer VN, Meier R, Eisen MB. Sepsid even-skipped enhancers are functionally conserved in Drosophila despite lack of sequence conservation. PLoS Genet 2008; 4:e1000106. [PMID: 18584029 PMCID: PMC2430619 DOI: 10.1371/journal.pgen.1000106] [Citation(s) in RCA: 221] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2008] [Accepted: 05/22/2008] [Indexed: 12/31/2022] Open
Abstract
The gene expression pattern specified by an animal regulatory sequence is generally viewed as arising from the particular arrangement of transcription factor binding sites it contains. However, we demonstrate here that regulatory sequences whose binding sites have been almost completely rearranged can still produce identical outputs. We sequenced the even-skipped locus from six species of scavenger flies (Sepsidae) that are highly diverged from the model species Drosophila melanogaster, but share its basic patterns of developmental gene expression. Although there is little sequence similarity between the sepsid eve enhancers and their well-characterized D. melanogaster counterparts, the sepsid and Drosophila enhancers drive nearly identical expression patterns in transgenic D. melanogaster embryos. We conclude that the molecular machinery that connects regulatory sequences to the transcription apparatus is more flexible than previously appreciated. In exploring this diverse collection of sequences to identify the shared features that account for their similar functions, we found a small number of short (20-30 bp) sequences nearly perfectly conserved among the species. These highly conserved sequences are strongly enriched for pairs of overlapping or adjacent binding sites. Together, these observations suggest that the local arrangement of binding sites relative to each other is more important than their overall arrangement into larger units of cis-regulatory function.
Collapse
Affiliation(s)
- Emily E. Hare
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
| | - Brant K. Peterson
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
- Center for Integrative Genomics, University of California Berkeley, Berkeley, California, United States of America
| | - Venky N. Iyer
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
| | - Rudolf Meier
- Department of Biological Sciences, National University of Singapore, Singapore
| | - Michael B. Eisen
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
- Center for Integrative Genomics, University of California Berkeley, Berkeley, California, United States of America
- Genomics Division, Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
- California Institute for Quantitative Biosciences, Berkeley, California, United States of America
| |
Collapse
|
20
|
Bai Y, Casola C, Betrán E. Evolutionary origin of regulatory regions of retrogenes in Drosophila. BMC Genomics 2008; 9:241. [PMID: 18498650 PMCID: PMC2413143 DOI: 10.1186/1471-2164-9-241] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2008] [Accepted: 05/22/2008] [Indexed: 12/29/2022] Open
Abstract
Background Retrogenes are processed copies of other genes. This duplication mechanism produces a copy of the parental gene that should not contain introns, and usually does not contain cis-regulatory regions. Here, we computationally address the evolutionary origin of promoter and other cis-regulatory regions in retrogenes using a total of 94 Drosophila retroposition events we recently identified. Previous tissue expression data has revealed that a large fraction of these retrogenes are specifically and/or highly expressed in adult testes of Drosophila. Results In this work, we infer that retrogenes do not generally carry regulatory regions from aberrant upstream or normal transcripts of their parental genes, and that expression patterns of neighboring genes are not consistently shared by retrogenes. Additionally, transposable elements do not appear to substantially provide regulatory regions to retrogenes. Interestingly, we find that there is an excess of retrogenes in male testis neighborhoods that is not explained by insertional biases of the retroelement machinery used for retroposition. Conclusion We conclude that retrogenes' regulatory regions mostly do not represent a random set of existing regulatory regions. On the contrary, our conclusion is that selection is likely to have played an important role in the persistence of autosomal testis biased retrogenes. Selection in favor of retrogenes inserted in male testis neighborhoods and at the sequence level to produce testis expression is postulated to have occurred.
Collapse
Affiliation(s)
- Yongsheng Bai
- Department of Biology, University of Texas at Arlington, Arlington, TX, USA.
| | | | | |
Collapse
|
21
|
Simpson P, Ayyar S. Chapter 3 Evolution of Cis‐Regulatory Sequences in Drosophila. LONG-RANGE CONTROL OF GENE EXPRESSION 2008; 61:67-106. [DOI: 10.1016/s0065-2660(07)00003-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
22
|
Buckley NJ. Analysis of transcription, chromatin dynamics and epigenetic changes in neural genes. Prog Neurobiol 2007; 83:195-210. [PMID: 17884276 DOI: 10.1016/j.pneurobio.2007.07.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2006] [Revised: 06/14/2007] [Accepted: 07/18/2007] [Indexed: 01/08/2023]
Abstract
The ways in which gene transcription is investigated have undergone radical change since the turn of the millennium. Piece-meal approaches focussed upon model genes have increasingly been complemented by genome-wide approaches that allow interrogation of multiple cohorts of genes or even entire genomes. This sea change has been founded upon the increasing availability of whole genome sequences and the attendant evolution of microarray based discovery platforms. Collectively, these approaches are being used to build a global and dynamic perspective of transcription factor occupancy, co-factor recruitment and epigenetic signature. As yet, few of these approaches have been applied to the study of neuronal gene transcription, but this is set to change. Here, I review these key developments and point to their potential application to the study of transcriptional and epigenetic changes in neurons in health and disease.
Collapse
Affiliation(s)
- Noel J Buckley
- King's College London, Department of Neuroscience, Institute of Psychiatry, Centre for the Cellular Basis of Behaviour, CCBB/CCIB, Room 1-045, 125 Coldharbour Lane, London SE5 9NU, UK.
| |
Collapse
|
23
|
Abstract
This is an introductory review on how genes interact to produce biological functions. Transcriptional interactions involve the binding of proteins to regulatory DNA. Specific binding sites can be identified by genomic analysis, and these undergo a stochastic evolution process governed by selection, mutations, and genetic drift. We focus on the links between the biophysical function and the evolution of regulatory elements. In particular, we infer fitness landscapes of binding sites from genomic data, leading to a quantitative evolutionary picture of regulation.
Collapse
Affiliation(s)
- Michael Lässig
- Institut für Theoretische Physik, Universität zu Köln, Zülpicher Str, 77, 50937 Köln, Germany.
| |
Collapse
|
24
|
Landry CR, Hartl DL, Ranz JM. Genome clashes in hybrids: insights from gene expression. Heredity (Edinb) 2007; 99:483-93. [PMID: 17687247 DOI: 10.1038/sj.hdy.6801045] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
In interspecific hybrids, novel phenotypes often emerge from the interaction of two divergent genomes. Interactions between the two transcriptional networks are assumed to contribute to these unpredicted new phenotypes by inducing novel patterns of gene expression. Here we provide a review of the recent literature on the accumulation of regulatory incompatibilities. We review specific examples of regulatory incompatibilities reported at particular loci as well as genome-scale surveys of gene expression in interspecific hybrids. Finally, we consider and preview novel technologies that could help decipher how divergent transcriptional networks interact in hybrids between species.
Collapse
Affiliation(s)
- C R Landry
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | | | | |
Collapse
|
25
|
Enikeeva FN, Kotelnikova EA, Gelfand MS, Makeev VJ. A model of evolution with constant selective pressure for regulatory DNA sites. BMC Evol Biol 2007; 7:125. [PMID: 17662135 PMCID: PMC1978210 DOI: 10.1186/1471-2148-7-125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2007] [Accepted: 07/27/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Molecular evolution is usually described assuming a neutral or weakly non-neutral substitution model. Recently, new data have become available on evolution of sequence regions under a selective pressure, e.g. transcription factor binding sites. To reconstruct the evolutionary history of such sequences, one needs evolutionary models that take into account a substantial constant selective pressure. RESULTS We present a simple evolutionary model with a single preferred (consensus) nucleotide and the neutral substitution model adopted for all other nucleotides. This evolutionary model has a rate matrix in which all substitutions that do not involve the consensus nucleotide occur with the same rate. The model has two time scales for achieving a stationary distribution; in the general case only one of the two rate parameters can be evaluated from the stationary distribution. In the middle-time zone, a counterintuitive behavior was observed for some parameter values, with a probability of conservation for a non-consensus nucleotide greater than that for the consensus nucleotide. Such an effect can be observed only in the case of weak preference for the consensus nucleotide, when the probability to observe the consensus nucleotide in the stationary distribution is less than 1/2. If the substitution rate is represented as a product of mutation and fixation, only the fixation can be calculated from the stationary distribution. The exhibited conservation of non-consensus nucleotides does not take place if the elements of mutation matrix are identical, and can be related to the reduced mutation rate between the non-consensus nucleotides. This bias can have no effect on the stationary distribution of nucleotide frequencies calculated over the ensemble of multiple alignments, e.g. transcription factor binding sites upstream of different sets of co-regulated orthologous genes. CONCLUSION The derived model can be used as a null model when analyzing the evolution of orthologous transcription factor binding sites. In particular, our findings show that a nucleotide preferred at some position of a multiple alignment of binding sites for some transcription factor in the same genome is not necessarily the most conserved nucleotide in an alignment of orthologous sites from different species. However, this effect can take place only in the case of a mutation matrix whose elements are not identical.
Collapse
Affiliation(s)
- Farida N Enikeeva
- Institute for Information Transmission Problems (the Kharkevich Institute) of RAS, Bolshoi Karetny pereulok, 19, GSP-4, Moscow, 127994, Russia
| | - Ekaterina A Kotelnikova
- State Research Institute of Genetics and Selection of Industrial Microorganisms, 1st Dorozhnyj proezd, 1, Moscow, 113535, Russia
- Ariadne Genomics Inc. 9700 Great Seneca Highway, Suite 113, Rockville, MD 20850, USA
| | - Mikhail S Gelfand
- Institute for Information Transmission Problems (the Kharkevich Institute) of RAS, Bolshoi Karetny pereulok, 19, GSP-4, Moscow, 127994, Russia
- Faculty of Bioengineering and Bioinformatics, Moscow State University, Vorobyevy Gory 1-73, Moscow, 119992, Russia
| | - Vsevolod J Makeev
- State Research Institute of Genetics and Selection of Industrial Microorganisms, 1st Dorozhnyj proezd, 1, Moscow, 113535, Russia
- Engelgardt Institute of Molecular Biology of RAS, Vavilova 32, Moscow, 119991, Russia
| |
Collapse
|
26
|
Li L, Zhu Q, He X, Sinha S, Halfon MS. Large-scale analysis of transcriptional cis-regulatory modules reveals both common features and distinct subclasses. Genome Biol 2007; 8:R101. [PMID: 17550599 PMCID: PMC2394749 DOI: 10.1186/gb-2007-8-6-r101] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2007] [Revised: 05/23/2007] [Accepted: 06/05/2007] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Transcriptional cis-regulatory modules (for example, enhancers) play a critical role in regulating gene expression. While many individual regulatory elements have been characterized, they have never been analyzed as a class. RESULTS We have performed the first such large-scale study of cis-regulatory modules in order to determine whether they have common properties that might aid in their identification and contribute to our understanding of the mechanisms by which they function. A total of 280 individual, experimentally verified cis-regulatory modules from Drosophila were analyzed for a range of sequence-level and functional properties. We report here that regulatory modules do indeed share common properties, among them an elevated GC content, an increased level of interspecific sequence conservation, and a tendency to be transcribed into RNA. However, we find that dense clustering of transcription factor binding sites, especially homotypic clustering, which is commonly believed to be a general characteristic of regulatory modules, is rather a feature that belongs chiefly to a specific subclass. This has important implications for current computational approaches, many of which are biased toward this subset. We explore two new strategies to assess binding site clustering and gauge their performances with respect to their ability to detect all 280 modules and various functionally coherent subsets. CONCLUSION Our findings demonstrate that cis-regulatory modules share common features that help to define them as a class and that may lead to new insights into mechanisms of gene regulation. However, these properties alone may not be sufficient to reliably distinguish regulatory from non-regulatory sequences. We also demonstrate that there are distinct subclasses of cis-regulatory modules that are more amenable to in silico detection than others and that these differences must be taken into account when attempting genome-wide regulatory element discovery.
Collapse
Affiliation(s)
- Long Li
- Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14214, USA
| | - Qianqian Zhu
- Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14214, USA
| | - Xin He
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Marc S Halfon
- Department of Biochemistry, State University of New York at Buffalo, Buffalo, NY 14214, USA
- Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY 14214, USA
- New York State Center of Excellence in Bioinformatics and the Life Sciences, Buffalo, NY 14203, USA
- Department of Molecular and Cellular Biology, Roswell Park Cancer Institute, Buffalo, NY 14263, USA
| |
Collapse
|
27
|
Abstract
Before any intelligence can appear, a world endowed with the potential for being experienced as a body of phenomena has to be existent. Indeed, if there is to be an intelligence, there first has to be something intelligible. Hence, when an intelligence is present, "creation" must already have taken place. Nevertheless, biological complexity has been deemed by some to be one of the privileged points of insertion of a supernatural intelligence endowed with temporal and causal primacy. In the course of a critical review, it is pointed out that the spectacle of nature's spontaneous tinkering with the structures and performances of informational macromolecules and with interactive connections among these molecules suggests that intelligence and design are absent from evolution. Nor is intelligent design required for explaining biological complexity, which can increase spontaneously as a byproduct of combinatorial intermolecular gambles and of the restoration of molecular damage wrought by mutations. One of the possible molecular pathways to spontaneous evolutionary increases in complexity is described.
Collapse
|
28
|
Taylor J, Tyekucheva S, King DC, Hardison RC, Miller W, Chiaromonte F. ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements. Genome Res 2006; 16:1596-604. [PMID: 17053093 PMCID: PMC1665643 DOI: 10.1101/gr.4537706] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Genomic sequence signals - such as base composition, presence of particular motifs, or evolutionary constraint - have been used effectively to identify functional elements. However, approaches based only on specific signals known to correlate with function can be quite limiting. When training data are available, application of computational learning algorithms to multispecies alignments has the potential to capture broader and more informative sequence and evolutionary patterns that better characterize a class of elements. However, effective exploitation of patterns in multispecies alignments is impeded by the vast number of possible alignment columns and by a limited understanding of which particular strings of columns may characterize a given class. We have developed a computational method, called ESPERR (evolutionary and sequence pattern extraction through reduced representations), which uses training examples to learn encodings of multispecies alignments into reduced forms tailored for the prediction of chosen classes of functional elements. ESPERR produces a greatly improved Regulatory Potential score, which can discriminate regulatory regions from neutral sites with excellent accuracy ( approximately 94%). This score captures strong signals (GC content and conservation), as well as subtler signals (with small contributions from many different alignment patterns) that characterize the regulatory elements in our training set. ESPERR is also effective for predicting other classes of functional elements, as we show for DNaseI hypersensitive sites and highly conserved regions with developmental enhancer activity. Our software, training data, and genome-wide predictions are available from our Web site (http://www.bx.psu.edu/projects/esperr).
Collapse
Affiliation(s)
- James Taylor
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Corresponding authors.E-mail ; fax (814) 863-6699.E-mail ; fax (814) 863-6699
| | - Svitlana Tyekucheva
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - David C. King
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ross C. Hardison
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Webb Miller
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Francesca Chiaromonte
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Corresponding authors.E-mail ; fax (814) 863-6699.E-mail ; fax (814) 863-6699
| |
Collapse
|
29
|
Ponting CP, Lunter G. Signatures of adaptive evolution within human non-coding sequence. Hum Mol Genet 2006; 15 Spec No 2:R170-5. [PMID: 16987880 DOI: 10.1093/hmg/ddl182] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The human genome is often portrayed as consisting of three sequence types, each distinguished by their mode of evolution. Purifying selection is estimated to act on 2.5-5.0% of the genome, whereas virtually all remaining sequence is considered to have evolved neutrally and to be devoid of functionality. The third mode of evolution, positive selection of advantageous changes, is considered rare. Such instances have been inferred only for a handful of sites, and these lie almost exclusively within protein-coding genes. Nevertheless, the majority of positively selected sequence is expected to lie within the wealth of functional 'dark matter' present outside of the coding sequence. Here, we review the evolutionary evidence for the majority of human-conserved DNA lying outside of the protein-coding sequence. We argue that within this non-coding fraction lies at least 1 Mb of functional sequence that has accumulated many beneficial nucleotide replacements. Illuminating the functions of this adaptive dark matter will lead to a better understanding of the sequence changes that have shaped the innovative biology of our species.
Collapse
Affiliation(s)
- Chris P Ponting
- MRC Functional Genetics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford OX1 3QX, UK.
| | | |
Collapse
|
30
|
Moses AM, Pollard DA, Nix DA, Iyer VN, Li XY, Biggin MD, Eisen MB. Large-scale turnover of functional transcription factor binding sites in Drosophila. PLoS Comput Biol 2006; 2:e130. [PMID: 17040121 PMCID: PMC1599766 DOI: 10.1371/journal.pcbi.0020130] [Citation(s) in RCA: 182] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2006] [Accepted: 08/21/2006] [Indexed: 11/30/2022] Open
Abstract
The gain and loss of functional transcription factor binding sites has been proposed as a major source of evolutionary change in cis-regulatory DNA and gene expression. We have developed an evolutionary model to study binding-site turnover that uses multiple sequence alignments to assess the evolutionary constraint on individual binding sites, and to map gain and loss events along a phylogenetic tree. We apply this model to study the evolutionary dynamics of binding sites of the Drosophila melanogaster transcription factor Zeste, using genome-wide in vivo (ChIP-chip) binding data to identify functional Zeste binding sites, and the genome sequences of D. melanogaster, D. simulans, D. erecta, and D. yakuba to study their evolution. We estimate that more than 5% of functional Zeste binding sites in D. melanogaster were gained along the D. melanogaster lineage or lost along one of the other lineages. We find that Zeste-bound regions have a reduced rate of binding-site loss and an increased rate of binding-site gain relative to flanking sequences. Finally, we show that binding-site gains and losses are asymmetrically distributed with respect to D. melanogaster, consistent with lineage-specific acquisition and loss of Zeste-responsive regulatory elements.
Collapse
Affiliation(s)
- Alan M Moses
- Graduate Group in Biophysics, University of California Berkeley, Berkeley, California, United States of America
| | - Daniel A Pollard
- Graduate Group in Biophysics, University of California Berkeley, Berkeley, California, United States of America
| | - David A Nix
- Department of Genome Sciences, Genomics Division, Ernest Orlando Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Venky N Iyer
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
| | - Xiao-Yong Li
- Department of Genome Sciences, Genomics Division, Ernest Orlando Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Mark D Biggin
- Department of Genome Sciences, Genomics Division, Ernest Orlando Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Michael B Eisen
- Graduate Group in Biophysics, University of California Berkeley, Berkeley, California, United States of America
- Department of Genome Sciences, Genomics Division, Ernest Orlando Lawrence Berkeley National Lab, Berkeley, California, United States of America
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America
- Center for Integrative Genomics, University of California Berkeley, Berkeley, California, United States of America
| |
Collapse
|
31
|
Rebeiz M, Stone T, Posakony JW. An ancient transcriptional regulatory linkage. Dev Biol 2006; 281:299-308. [PMID: 15893980 DOI: 10.1016/j.ydbio.2005.03.004] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2005] [Revised: 03/08/2005] [Accepted: 03/08/2005] [Indexed: 11/18/2022]
Abstract
Changes in gene regulatory networks are a major engine for creating developmental novelty during evolution. Conversely, regulatory linkages that survive for very long evolutionary periods might be characteristic of ancient and abstract functions of fundamental utility to all metazoans. The proneural genes, which encode a distinctive family of basic helix-loop-helix (bHLH) transcriptional activators, act to promote neural cell fates in the ectoderm of diverse species. Here we report that these genes have been associated for at least 600-700 million years--since before the cnidarian/bilaterian divergence--with a high-affinity binding site for Hairy/Enhancer of split (Hes) repressor proteins. We suggest that the systematic identification of such ancient and conserved connections will be a powerful means of uncovering the primordial functions of transcription factors and signaling systems.
Collapse
Affiliation(s)
- Mark Rebeiz
- Division of Biological Sciences, Section of Cell and Developmental Biology, University of California San Diego, La Jolla, CA 92093-0349, USA
| | | | | |
Collapse
|
32
|
Pollard DA, Moses AM, Iyer VN, Eisen MB. Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments. BMC Bioinformatics 2006; 7:376. [PMID: 16904011 PMCID: PMC1613255 DOI: 10.1186/1471-2105-7-376] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2006] [Accepted: 08/14/2006] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Molecular evolutionary studies of noncoding sequences rely on multiple alignments. Yet how multiple alignment accuracy varies across sequence types, tree topologies, divergences and tools, and further how this variation impacts specific inferences, remains unclear. RESULTS Here we develop a molecular evolution simulation platform, CisEvolver, with models of background noncoding and transcription factor binding site evolution, and use simulated alignments to systematically examine multiple alignment accuracy and its impact on two key molecular evolutionary inferences: transcription factor binding site conservation and divergence estimation. We find that the accuracy of multiple alignments is determined almost exclusively by the pairwise divergence distance of the two most diverged species and that additional species have a negligible influence on alignment accuracy. Conserved transcription factor binding sites align better than surrounding noncoding DNA yet are often found to be misaligned at relatively short divergence distances, such that studies of binding site gain and loss could easily be confounded by alignment error. Divergence estimates from multiple alignments tend to be overestimated at short divergence distances but reach a tool specific divergence at which they cease to increase, leading to underestimation at long divergences. Our most striking finding was that overall alignment accuracy, binding site alignment accuracy and divergence estimation accuracy vary greatly across branches in a tree and are most accurate for terminal branches connecting sister taxa and least accurate for internal branches connecting sub-alignments. CONCLUSION Our results suggest that variation in alignment accuracy can lead to errors in molecular evolutionary inferences that could be construed as biological variation. These findings have implications for which species to choose for analyses, what kind of errors would be expected for a given set of species and how multiple alignment tools and phylogenetic inference methods might be improved to minimize or control for alignment errors.
Collapse
Affiliation(s)
- Daniel A Pollard
- Graduate Group in Biophysics, University of California, Berkeley, CA 94720, USA
| | - Alan M Moses
- Graduate Group in Biophysics, University of California, Berkeley, CA 94720, USA
| | - Venky N Iyer
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Michael B Eisen
- Graduate Group in Biophysics, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
- Department of Genome Sciences, Genomics Division, Ernest Orlando Lawrence Berkeley National Lab, Berkeley, CA 94720, USA
- Center for Integrative Genomics, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
33
|
Johnson R, Gamblin RJ, Ooi L, Bruce AW, Donaldson IJ, Westhead DR, Wood IC, Jackson RM, Buckley NJ. Identification of the REST regulon reveals extensive transposable element-mediated binding site duplication. Nucleic Acids Res 2006; 34:3862-77. [PMID: 16899447 PMCID: PMC1557810 DOI: 10.1093/nar/gkl525] [Citation(s) in RCA: 116] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2006] [Revised: 06/01/2006] [Accepted: 07/10/2006] [Indexed: 11/26/2022] Open
Abstract
The genome-wide mapping of gene-regulatory motifs remains a major goal that will facilitate the modelling of gene-regulatory networks and their evolution. The repressor element 1 is a long, conserved transcription factor-binding site which recruits the transcriptional repressor REST to numerous neuron-specific target genes. REST plays important roles in multiple biological processes and disease states. To map RE1 sites and target genes, we created a position specific scoring matrix representing the RE1 and used it to search the human and mouse genomes. We identified 1301 and 997 RE1s inhuman and mouse genomes, respectively, of which >40% are novel. By employing an ontological analysis we show that REST target genes are significantly enriched in a number of functional classes. Taking the novel REST target gene CACNA1A as an experimental model, we show that it can be regulated by multiple RE1s of different binding affinities, which are only partially conserved between human and mouse. A novel BLAST methodology indicated that many RE1s belong to closely related families. Most of these sequences are associated with transposable elements, leading us to propose that transposon-mediated duplication and insertion of RE1s has led to the acquisition of novel target genes by REST during evolution.
Collapse
Affiliation(s)
- Rory Johnson
- Institute of Membrane and Systems Biology, University of Leeds, Leeds LS2 9JT, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Podvinec M, Meyer UA. Prediction of cis-regulatory elements for drug-activated transcription factors in the regulation of drug-metabolising enzymes and drug transporters. Expert Opin Drug Metab Toxicol 2006; 2:367-79. [PMID: 16863440 DOI: 10.1517/17425255.2.3.367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The expression of drug-metabolising enzymes is affected by many endogenous and exogenous factors, including sex, age, diet and exposure to xenobiotics and drugs. To understand fully how the organism metabolises a drug, these alterations in gene expression must be taken into account. The central process, the definition of likely regulatory elements in the genes coding for enzymes and transporters involved in drug disposition, can be vastly accelerated using existing and emerging bioinformatics methods to unravel the regulatory networks causing drug-mediated induction of genes. Here, various approaches to predict transcription factor interactions with regulatory DNA elements are reviewed.
Collapse
Affiliation(s)
- Michael Podvinec
- Swiss Institute of Bioinformatics and Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.
| | | |
Collapse
|
35
|
Abnizova I, Rust AG, Robinson M, Te Boekhorst R, Gilks WR. Transcription binding site prediction using Markov models. J Bioinform Comput Biol 2006; 4:425-41. [PMID: 16819793 DOI: 10.1142/s0219720006001813] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2005] [Revised: 12/28/2005] [Accepted: 01/08/2006] [Indexed: 11/18/2022]
Abstract
One of the main goals of analysing DNA sequences is to understand the temporal and positional information that specifies gene expression. An important step in this process is the recognition of gene expression regulatory elements. Experimental procedures for this are slow and costly. In this paper we present a computational non-supervised algorithm that facilitates the process by statistically identifying the most likely regions within a putative regulatory sequence. A probabilistic technique is presented, based on the approximation of regulatory DNA with a Markov chain, for the location of putative transcription factor binding sites in a single stretch of DNA. Hereto we developed a procedure to approximate the order of Markov model for a given DNA sequence that circumvents some of the prohibitive assumptions underlying Markov modeling. Application of the algorithm to data from 55 genes in five species shows the high sensitivity of this Markov search algorithm. Our algorithm does not require any prior knowledge in the form of description or cross-genomic comparison; it is context sensitive and takes DNA heterogeneity into account.
Collapse
|
36
|
Wratten NS, McGregor AP, Shaw PJ, Dover GA. Evolutionary and functional analysis of the tailless enhancer in Musca domestica and Drosophila melanogaster. Evol Dev 2006; 8:6-15. [PMID: 16409378 DOI: 10.1111/j.1525-142x.2006.05070.x] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
To further understand the evolutionary dynamics of the regulatory interactions underlying development, we expand on our previous analysis of hunchback and compare the structure and function of the tailless enhancer between Musca domestica and Drosophila melanogaster. Our analysis shows that although the expression patterns and functional protein domains of tll are conserved between Musca and Drosophila, the enhancer sequences are unalignable. Upon closer investigation, we find that these highly diverged enhancer sequences encode the same regulatory information necessary for Bicoid, Dorsal, and the terminal system to drive tll expression. The binding sites for these transcription factors differ in the sequence, number, spacing, and position between the Drosophila and Musca tll enhancers, and we were unable to establish homology between binding sites from each species. This implies that the Musca and Drosophila Bcd-binding sites have evolved de novo in the 100 million years since these species diverged. However, in transgenic Drosophila embryos the Musca tll enhancer is able to drive the same expression pattern as endogenous Drosophila tll. Therefore, during the rapid evolution of enhancer sequences individual binding sites are continually lost and gained, but the transcriptional output is maintained by compensatory mutations in cis and in trans.
Collapse
Affiliation(s)
- Naomi S Wratten
- Department of Genetics, University of Leicester, Leicester LE1 7RH, UK
| | | | | | | |
Collapse
|
37
|
Abnizova I, Gilks WR. Studying statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the eukaryotic genomes. Brief Bioinform 2006; 7:48-54. [PMID: 16761364 DOI: 10.1093/bib/bbk004] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
There are no well-known properties in regulatory DNA analogous to those in coding sequences; their spatial location is not regular, the consensus regulatory elements are often degenerate and there are no understandable rules governing their evolution. This makes it difficult to recognize regulatory regions within genome. We review developments in the statistical characterization of regulatory regions and methods of their recognition in eukaryotic genomes.
Collapse
|
38
|
Abstract
Bioinformatics studies of transcriptional regulation in the metazoa are significantly hindered by the absence of readily available data on large numbers of transcriptional cis-regulatory modules (CRMs). Even the richly annotated Drosophila melanogaster genome lacks extensive CRM information. We therefore present here a database of Drosophila CRMs curated from the literature complete with both DNA sequence and a searchable description of the gene expression pattern regulated by each CRM. This resource should greatly facilitate the development of computational approaches to CRM discovery as well as bioinformatics analyses of regulatory sequence properties and evolution.
Collapse
Affiliation(s)
- Steven M Gallo
- Center for Computational Research, 140 Farber Hall, State University of New York at Buffalo, 3435 Main Street, Buffalo, NY 14214, USA
| | | | | | | |
Collapse
|
39
|
Vavouri T, Elgar G. Prediction of cis-regulatory elements using binding site matrices--the successes, the failures and the reasons for both. Curr Opin Genet Dev 2005; 15:395-402. [PMID: 15950456 DOI: 10.1016/j.gde.2005.05.002] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2005] [Accepted: 05/23/2005] [Indexed: 01/02/2023]
Abstract
Protein-DNA interactions control many aspects of animal development and cellular responses to the environment. Although profiling of individual transcription factor binding sites is not a reliable guide for predicting the position of cis-regulatory elements in large genomes, modelling the evolution and the organization of regulatory elements has provided enough information to make some successful predictions. For vertebrate genomes, the field is limited by the lack of sufficient experimental data upon which to build reliable models. Nonetheless, a combination of experimental, computational and comparative data is likely to reveal aspects of complex regulatory networks in vertebrates, just as it has already done for simple eukaryotic genomes.
Collapse
Affiliation(s)
- Tanya Vavouri
- Comparative Genomics Group, MRC Rosalind Franklin Centre for Genomics Research, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
| | | |
Collapse
|
40
|
Doniger SW, Huh J, Fay JC. Identification of functional transcription factor binding sites using closely related Saccharomyces species. Genome Res 2005; 15:701-9. [PMID: 15837806 PMCID: PMC1088298 DOI: 10.1101/gr.3578205] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Comparative genomics provides a rapid means of identifying functional DNA elements by their sequence conservation between species. Transcription factor binding sites (TFBSs) may constitute a significant fraction of these conserved sequences, but the annotation of specific TFBSs is complicated by the fact that these short, degenerate sequences may frequently be conserved by chance rather than functional constraint. To identify intergenic sequences that function as TFBSs, we calculated the probability of binding site conservation between Saccharomyces cerevisiae and its two closest relatives under a neutral model of evolution. We found that this probability is <5% for 134 of 163 transcription factor binding motifs, implying that we can reliably annotate binding sites for the majority of these transcription factors by conservation alone. Although our annotation relies on a number of assumptions, mutations in five of five conserved Ume6 binding sites and three of four conserved Ndt80 binding sites show Ume6- and Ndt80-dependent effects on gene expression. We also found that three of five unconserved Ndt80 binding sites show Ndt80-dependent effects on gene expression. Together these data imply that although sequence conservation can be reliably used to predict functional TFBSs, unconserved sequences might also make a significant contribution to a species' biology.
Collapse
Affiliation(s)
- Scott W Doniger
- Computational Biology Program, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | | | |
Collapse
|
41
|
Papatsenko D, Levine M. Quantitative analysis of binding motifs mediating diverse spatial readouts of the Dorsal gradient in the Drosophila embryo. Proc Natl Acad Sci U S A 2005; 102:4966-71. [PMID: 15795372 PMCID: PMC555988 DOI: 10.1073/pnas.0409414102] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2004] [Indexed: 01/26/2023] Open
Abstract
Dorsal is a sequence-specific transcription factor that is distributed in a broad nuclear gradient across the dorsal-ventral (DV) axis of the early Drosophila embryo. It initiates gastrulation by regulating at least 30-50 target genes in a concentration-dependent fashion. Previous studies identified 18 enhancers that are directly regulated by different concentrations of Dorsal. Here, we employ computational methods to determine the basis for these distinct transcriptional outputs. Orthologous enhancers were identified in a variety of divergent Drosophila species, and their comparison revealed several conserved sequence features responsible for DV patterning. In particular, the quality of Dorsal and Twist recognition sequences correlates with the DV coordinates of gene expression relative to the Dorsal gradient. These findings are entirely consistent with a gradient threshold model for DV patterning, whereby the quality of individual Dorsal binding sites determines in vivo occupancy of target enhancers by the Dorsal gradient. Linked Dorsal and Twist binding sites constitute a conserved composite element in certain "type 2" Dorsal target enhancers, which direct gene expression in ventral regions of the neurogenic ectoderm in response to intermediate levels of the Dorsal gradient. Similar motif arrangements were identified in orthologous loci in the distant mosquito genome, Anopheles gambiae. We discuss how Dorsal and Twist work either additively or synergistically to activate different target enhancers.
Collapse
Affiliation(s)
- Dmitri Papatsenko
- Department of Molecular and Cell Biology, Division of Genetics, Genomics, and Development, Center for Integrative Genomics, University of California, 16 Barker Hall No. 3204, Berkeley, CA 94720-3204, USA.
| | | |
Collapse
|
42
|
Eddy SR. A model of the statistical power of comparative genome sequence analysis. PLoS Biol 2005; 3:e10. [PMID: 15660152 PMCID: PMC539325 DOI: 10.1371/journal.pbio.0030010] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2004] [Accepted: 11/02/2004] [Indexed: 01/30/2023] Open
Abstract
Comparative genome sequence analysis is powerful, but sequencing genomes is expensive. It is desirable to be able to predict how many genomes are needed for comparative genomics, and at what evolutionary distances. Here I describe a simple mathematical model for the common problem of identifying conserved sequences. The model leads to some useful rules of thumb. For a given evolutionary distance, the number of comparative genomes needed for a constant level of statistical stringency in identifying conserved regions scales inversely with the size of the conserved feature to be detected. At short evolutionary distances, the number of comparative genomes required also scales inversely with distance. These scaling behaviors provide some intuition for future comparative genome sequencing needs, such as the proposed use of “phylogenetic shadowing” methods using closely related comparative genomes, and the feasibility of high-resolution detection of small conserved features. The mathematical model presented in this work will help to inform comparative genomics strategies for identifying conserved DNA sequences
Collapse
Affiliation(s)
- Sean R Eddy
- Howard Hughes Medical Institute and Department of Genetics, Washington University School of Medicine Saint Louis, Missouri United States of America.
| |
Collapse
|
43
|
Costas J, Pereira PS, Vieira CP, Pinho S, Vieira J, Casares F. Dynamics and function of intron sequences of the wingless gene during the evolution of the Drosophila genus. Evol Dev 2004; 6:325-35. [PMID: 15330865 DOI: 10.1111/j.1525-142x.2004.04040.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
To understand the function and evolution of genes with complex patterns of expression, such as the Drosophila wingless gene, it is essential to know how their transcription is regulated. However, extracting the relevant regulatory information from a genome is still a complex task. We used a combination of comparative genomics and functional approaches to identify putative regulatory sequences in two introns (1 and 3) of the wingless gene and to infer their evolution. Comparison of the sequences obtained from several Drosophila species revealed colinear and well-conserved sequence blocks in both introns. Drosophila willistoni showed a rate of evolution, in both introns, faster than expected from its phylogenetic position. Intron 3 appeared to be composed of two separate modules, one of them lost in the willistoni group. We tested whether sequence conservation in noncoding regions is a reliable indicator of regulatory function and, if this function is conserved, by analyzing D. melanogaster transgenic reporter lines harboring intron 3 sequences from D. melanogaster (Sophophora subgenus) and the species from the Drosophila subgenus presenting the most divergent sequence, D. americana. The analysis indicated that intron 3 contains pupal enhancers conserved during the evolution of the genus, despite the fact that only 30% of the D. melanogaster intron 3 sequences lie in conserved blocks. Additional analysis of D. melanogaster transgenic reporter lines harboring intron 3 sequences from D. willistoni revealed the absence of an abdomen-specific expression pattern, probably due to the above-mentioned loss of a regulatory module in this species.
Collapse
Affiliation(s)
- J Costas
- Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, Rua do Campo Alegre 823, Porto 4150-180, Portugal
| | | | | | | | | | | |
Collapse
|
44
|
Bergman CM, Carlson JW, Celniker SE. Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster. Bioinformatics 2004; 21:1747-9. [PMID: 15572468 DOI: 10.1093/bioinformatics/bti173] [Citation(s) in RCA: 140] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Despite increasing numbers of computational tools developed to predict cis-regulatory sequences, the availability of high-quality datasets of transcription factor binding sites limits advances in the bioinformatics of gene regulation. Here we present such a dataset based on a systematic literature curation and genome annotation of DNase I footprints for the fruitfly, Drosophila melanogaster. Using the experimental results of 201 primary references, we annotated 1367 binding sites from 87 transcription factors and 101 target genes in the D.melanogaster genome sequence. These data will provide a rich resource for future bioinformatics analyses of transcriptional regulation in Drosophila such as constructing motif models, training cis-regulatory module detectors, benchmarking alignment tools and continued text mining of the extensive literature on transcriptional regulation in this important model organism. AVAILABILITY http://www.flyreg.org/ CONTACT cbergman@gen.cam.ac.uk.
Collapse
Affiliation(s)
- Casey M Bergman
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK.
| | | | | |
Collapse
|
45
|
Adaptive evolution of transcription factor binding sites. BMC Evol Biol 2004; 4:42. [PMID: 15511291 PMCID: PMC535555 DOI: 10.1186/1471-2148-4-42] [Citation(s) in RCA: 146] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2004] [Accepted: 10/28/2004] [Indexed: 11/18/2022] Open
Abstract
Background The regulation of a gene depends on the binding of transcription factors to specific sites located in the regulatory region of the gene. The generation of these binding sites and of cooperativity between them are essential building blocks in the evolution of complex regulatory networks. We study a theoretical model for the sequence evolution of binding sites by point mutations. The approach is based on biophysical models for the binding of transcription factors to DNA. Hence we derive empirically grounded fitness landscapes, which enter a population genetics model including mutations, genetic drift, and selection. Results We show that the selection for factor binding generically leads to specific correlations between nucleotide frequencies at different positions of a binding site. We demonstrate the possibility of rapid adaptive evolution generating a new binding site for a given transcription factor by point mutations. The evolutionary time required is estimated in terms of the neutral (background) mutation rate, the selection coefficient, and the effective population size. Conclusions The efficiency of binding site formation is seen to depend on two joint conditions: the binding site motif must be short enough and the promoter region must be long enough. These constraints on promoter architecture are indeed seen in eukaryotic systems. Furthermore, we analyse the adaptive evolution of genetic switches and of signal integration through binding cooperativity between different sites. Experimental tests of this picture involving the statistics of polymorphisms and phylogenies of sites are discussed.
Collapse
|
46
|
Berg J, Lässig M. Local graph alignment and motif search in biological networks. Proc Natl Acad Sci U S A 2004; 101:14689-94. [PMID: 15448202 PMCID: PMC522014 DOI: 10.1073/pnas.0305199101] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Interaction networks are of central importance in postgenomic molecular biology, with increasing amounts of data becoming available by high-throughput methods. Examples are gene regulatory networks or protein interaction maps. The main challenge in the analysis of these data is to read off biological functions from the topology of the network. Topological motifs, i.e., patterns occurring repeatedly at different positions in the network, have recently been identified as basic modules of molecular information processing. In this article, we discuss motifs derived from families of mutually similar but not necessarily identical patterns. We establish a statistical model for the occurrence of such motifs, from which we derive a scoring function for their statistical significance. Based on this scoring function, we develop a search algorithm for topological motifs called graph alignment, a procedure with some analogies to sequence alignment. The algorithm is applied to the gene regulation network of Escherichia coli.
Collapse
Affiliation(s)
- Johannes Berg
- Institut für Theoretische Physik, Universität zu Köln, Zülpicherstrasse 77, 50937 Cologne, Germany.
| | | |
Collapse
|
47
|
Pollard DA, Bergman CM, Stoye J, Celniker SE, Eisen MB. Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics 2004; 5:6. [PMID: 14736341 PMCID: PMC344529 DOI: 10.1186/1471-2105-5-6] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2003] [Accepted: 01/21/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against "correct" alignments inferred from structural data. For noncoding sequences, where such independent validation is lacking, simulation provides an effective means to generate "correct" alignments with which to benchmark alignment tools. RESULTS Using rates of noncoding sequence evolution estimated from the genus Drosophila, we simulated alignments over a range of divergence times under varying models incorporating point substitution, insertion/deletion events, and short blocks of constrained sequences such as those found in cis-regulatory regions. We then compared "correct" alignments generated by a modified version of the ROSE simulation platform to alignments of the simulated derived sequences produced by eight pairwise alignment tools (Avid, BlastZ, Chaos, ClustalW, DiAlign, Lagan, Needle, and WABA) to determine the off-the-shelf performance of each tool. As expected, the ability to align noncoding sequences accurately decreases with increasing divergence for all tools, and declines faster in the presence of insertion/deletion evolution. Global alignment tools (Avid, ClustalW, Lagan, and Needle) typically have higher sensitivity over entire noncoding sequences as well as in constrained sequences. Local tools (BlastZ, Chaos, and WABA) have lower overall sensitivity as a consequence of incomplete coverage, but have high specificity to detect constrained sequences as well as high sensitivity within the subset of sequences they align. Tools such as DiAlign, which generate both local and global outputs, produce alignments of constrained sequences with both high sensitivity and specificity for divergence distances in the range of 1.25-3.0 substitutions per site. CONCLUSION For species with genomic properties similar to Drosophila, we conclude that a single pair of optimally diverged species analyzed with a high performance alignment tool can yield accurate and specific alignments of functionally constrained noncoding sequences. Further algorithm development, optimization of alignment parameters, and benchmarking studies will be necessary to extract the maximal biological information from alignments of functional noncoding DNA.
Collapse
Affiliation(s)
- Daniel A Pollard
- Biophysics Graduate Group, University of California, Berkeley, CA 94720, USA
| | - Casey M Bergman
- Department of Genome Science, Life Science Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department of Genetics, University of Cambridge, Cambridge, UK CB2 3EH
| | - Jens Stoye
- Technische Fakultät, Universität Bielefeld, 33594 Bielefeld, Germany
| | - Susan E Celniker
- Department of Genome Science, Life Science Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Michael B Eisen
- Department of Genome Science, Life Science Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
48
|
Costas J, Vieira CP, Casares F, Vieira J. Genomic characterization of a repetitive motif strongly associated with developmental genes in Drosophila. BMC Genomics 2003; 4:52. [PMID: 14675495 PMCID: PMC327093 DOI: 10.1186/1471-2164-4-52] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2003] [Accepted: 12/16/2003] [Indexed: 12/03/2022] Open
Abstract
Background Non-coding DNA represents a high proportion of all metazoan genomes. Although an undetermined fraction of this DNA may be considered devoid of any function, it also contains important information residing in specific cis-regulatory sequences. Results We report a 27 bp motif that is overrepresented within the fly genome. This motif does not show any significant similarity with transposon sequences and is strongly associated with genes involved in development and/or signal transduction. The 27 bp motif is preferentially located within introns, and has a tendency to be present in multiple copies around genes. Furthermore, it is often found embedded in known non-coding regulatory regions. The regulatory network defined by this motif is partially shared in D. pseudoobscura. Conclusion We have identified a 27 bp cis-regulatory sequence widely distributed within the Drosophila genome in association with developmental genes. This motif may be very useful towards the annotation of functional regulatory regions within the Drosophila genome and the construction of regulatory networks of Drosophila development.
Collapse
Affiliation(s)
- Javier Costas
- Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, Rua do Campo Alegre 823, 4150 Porto, Portugal
- Present address: Unidade de Medicina Molecular, Complexo Hospitalario Universitario de Santiago, rúa Choupana s/n, Edf. Consultas, planta -2, E15706 Santiago de Compostela, Spain
| | - Cristina P Vieira
- Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, Rua do Campo Alegre 823, 4150 Porto, Portugal
| | - Fernando Casares
- Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, Rua do Campo Alegre 823, 4150 Porto, Portugal
| | - Jorge Vieira
- Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, Rua do Campo Alegre 823, 4150 Porto, Portugal
| |
Collapse
|