51
|
Ramialison M, Reinhardt R, Henrich T, Wittbrodt B, Kellner T, Lowy CM, Wittbrodt J. Cis-regulatory properties of medaka synexpression groups. Development 2012; 139:917-28. [PMID: 22318626 DOI: 10.1242/dev.071803] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
During embryogenesis, tissue specification is triggered by the expression of a unique combination of developmental genes and their expression in time and space is crucial for successful development. Synexpression groups are batteries of spatiotemporally co-expressed genes that act in shared biological processes through their coordinated expression. Although several synexpression groups have been described in numerous vertebrate species, the regulatory mechanisms that orchestrate their common complex expression pattern remain to be elucidated. Here we performed a pilot screen on 560 genes of the vertebrate model system medaka (Oryzias latipes) to systematically identify synexpression groups and investigate their regulatory properties by searching for common regulatory cues. We find that synexpression groups share DNA motifs that are arranged in various combinations into cis-regulatory modules that drive co-expression. In contrast to previous assumptions that these genes are located randomly in the genome, we discovered that genes belonging to the same synexpression group frequently occur in synexpression clusters in the genome. This work presents a first repertoire of synexpression group common signatures, a resource that will contribute to deciphering developmental gene regulatory networks.
Collapse
Affiliation(s)
- Mirana Ramialison
- University of Heidelberg, Centre for Organismal Studies, Heidelberg, Germany.
| | | | | | | | | | | | | |
Collapse
|
52
|
Clustering of DNA words and biological function: A proof of principle. J Theor Biol 2012; 297:127-36. [DOI: 10.1016/j.jtbi.2011.12.024] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Revised: 12/20/2011] [Accepted: 12/21/2011] [Indexed: 02/08/2023]
|
53
|
Junion G, Spivakov M, Girardot C, Braun M, Gustafson E, Birney E, Furlong E. A Transcription Factor Collective Defines Cardiac Cell Fate and Reflects Lineage History. Cell 2012; 148:473-86. [DOI: 10.1016/j.cell.2012.01.030] [Citation(s) in RCA: 222] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Revised: 08/16/2011] [Accepted: 01/17/2012] [Indexed: 11/28/2022]
|
54
|
Abstract
The study of cis-regulatory DNAs that control developmental gene expression is integral to the modeling of comprehensive genomic regulatory networks for embryogenesis. Ascidian embryos provide a unique opportunity for the analysis of cis-regulatory DNAs with cellular resolution in the context of a simple but typical chordate body plan. Here, we review landmark studies that have laid the foundations for the study of transcriptional enhancers, among other cis-regulatory DNAs, and their roles in ascidian development. The studies using ascidians of the Ciona genus have capitalized on a unique electroporation technique that permits the simultaneous transfection of hundreds of fertilized eggs, which develop rapidly and express transgenes with little mosaicism. Current studies using the ascidian embryo benefit from extensively annotated genomic resources to characterize transcript models in silico. The search for functional noncoding sequences can be guided by bioinformatic analyses combining evolutionary conservation, gene coexpression, and combinations of overrepresented short-sequence motifs. The power of the transient transfection assays has allowed thorough dissection of numerous cis-regulatory modules, which provided insights into the functional constraints that shape enhancer architecture and diversification. Future studies will benefit from pioneering stable transgenic lines and the analysis of chromatin states. Whole genome expression, functional and DNA binding data are being integrated into comprehensive genomic regulatory network models of early ascidian cell specification with a single-cell resolution that is unique among chordate model systems.
Collapse
|
55
|
Rebeiz M, Castro B, Liu F, Yue F, Posakony JW. Ancestral and conserved cis-regulatory architectures in developmental control genes. Dev Biol 2011; 362:282-94. [PMID: 22185795 DOI: 10.1016/j.ydbio.2011.12.011] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Revised: 12/01/2011] [Accepted: 12/06/2011] [Indexed: 11/19/2022]
Abstract
Among developmental control genes, transcription factor-target gene "linkages"--the direct connections between target genes and the factors that control their patterns of expression--can show remarkable evolutionary stability. However, the specific binding sites that mediate and define these regulatory connections are themselves often subject to rapid turnover. Here we describe several instances in which particular transcription factor binding motif combinations have evidently been conserved upstream of orthologous target genes for extraordinarily long evolutionary periods. This occurs against a backdrop in which other binding sites for the same factors are coming and going rapidly. Our examples include a particular Dpp Silencer Element upstream of insect brinker genes, in combination with a novel motif we refer to as the Downstream Element; combinations of a Suppressor of Hairless Paired Site (SPS) and a specific proneural protein binding site associated with arthropod Notch pathway target genes; and a three-motif combination, also including an SPS, upstream of deuterostome Hes repressor genes, which are also Notch targets. We propose that these stable motif architectures have been conserved intact from a deep ancestor, in part because they mediate a special mode of regulation that cannot be supplied by the other, unstable motif instances.
Collapse
Affiliation(s)
- Mark Rebeiz
- Division of Biological Sciences/CDB, University of California San Diego, La Jolla, CA 92093, USA
| | | | | | | | | |
Collapse
|
56
|
He X, Duque TSPC, Sinha S. Evolutionary origins of transcription factor binding site clusters. Mol Biol Evol 2011; 29:1059-70. [PMID: 22075113 DOI: 10.1093/molbev/msr277] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Empirical studies have revealed that regulatory DNA sequences such as enhancers or promoters often harbor multiple binding sites for the same transcription factor. Such "homotypic site clustering" has been hypothesized as arising out of functional requirements of the sequences. Here, we propose an alternative explanation of this phenomenon that multisite enhancers are common because they are favored by evolutionary sampling of the genotype-phenotype landscape. To test this hypothesis, we developed a new computational framework specialized for population genetic simulations of enhancer evolution. It uses a thermodynamics-based model of enhancer function, integrating information from strong as well as weak binding sites, to determine the strength of selection. Using this framework, we found that even when simpler genotypes exist for a desired strength of regulation, relatively complex genotypes (enhancers with more sites) are more readily reached by the simulated evolutionary process. We show that there are more ways to "build" a fit genotype with many weak sites than with a few strong sites, and this is why evolution finds complex genotypes more often. Our claims are consistent with an empirical analysis of binding site content in enhancers characterized in Drosophila melanogaster and their orthologs in other Drosophila species. We also characterized a subtle but significant difference between genotypes likely to be sampled by evolution and equally fit genotypes one would obtain by uniform sampling of the fitness landscape, that is, an "evolutionary signature" in enhancer sequences. Finally, we investigated potential effects of other factors, such as rugged fitness landscapes, short local duplications, and noise characteristics of enhancers, on the emergence of homotypic site clustering. Homotypic site clustering is an important contributor to the complexity and function of cis-regulatory sequences. This work provides a simple null hypothesis for its origin, against which alternative adaptationist explanations may be evaluated, and cautions against "evolutionary mirages" present in common features of genomic sequence. The quantitative framework we develop here can be used more generally to understand how mechanisms of enhancer action influence their composition and evolution.
Collapse
Affiliation(s)
- Xin He
- Department of Biochemistry, University of California at San Francisco, CA, USA
| | | | | |
Collapse
|
57
|
Eichenlaub MP, Ettwiller L. De novo genesis of enhancers in vertebrates. PLoS Biol 2011; 9:e1001188. [PMID: 22069375 PMCID: PMC3206014 DOI: 10.1371/journal.pbio.1001188] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2011] [Accepted: 09/22/2011] [Indexed: 02/02/2023] Open
Abstract
Whole genome duplication in teleost fish reveals that a few changes in non-regulatory genomic sequences are a source for generating new enhancers. Evolutionary innovation relies partially on changes in gene regulation. While a growing body of evidence demonstrates that such innovation is generated by functional changes or translocation of regulatory elements via mobile genetic elements, the de novo generation of enhancers from non-regulatory/non-mobile sequences has, to our knowledge, not previously been demonstrated. Here we show evidence for the de novo genesis of enhancers in vertebrates. For this, we took advantage of the massive gene loss following the last whole genome duplication in teleosts to systematically identify regions that have lost their coding capacity but retain sequence conservation with mammals. We found that these regions show enhancer activity while the orthologous coding regions have no regulatory activity. These results demonstrate that these enhancers have been de novo generated in fish. By revealing that minor changes in non-regulatory sequences are sufficient to generate new enhancers, our study highlights an important playground for creating new regulatory variability and evolutionary innovation. The genome of each living organism contains thousands of genes, and the precise control of the timing and location of expression of these genes is key for normal development and homeostasis of each individual. Despite the oftentimes high genetic similarity between organisms, the source of phenotypic differences, for example between human and mouse, is thought to originate mainly from changes in how and when genes are expressed. This is partially determined by enhancers, that contribute to the control of gene expression. For decades, duplication of existing genomic enhancers, mobile elements, and changes in the sequence of existing enhancers were believed to be the major ways of increasing the number and modifying the activity of enhancers. In this study, we show that enhancers don't have to be derived from pre-existing ones but can also appear de novo in regions of the genome that were previously not regulating gene expression. We analyzed teleost fish genomes and found three regions for which a limited number of changes in the DNA sequence was sufficient to generate new enhancers. We predict that such a process is frequent in vertebrate genomes, making de novo generation of enhancers an important mechanism for creating variation in gene expression.
Collapse
Affiliation(s)
| | - Laurence Ettwiller
- Centre for Organismal Studies, University of Heidelberg, Heidelberg, Germany
- * E-mail:
| |
Collapse
|
58
|
Liu F, Chang XJ, Ye Y, Xie WB, Wu P, Lian XM. Comprehensive sequence and whole-life-cycle expression profile analysis of the phosphate transporter gene family in rice. MOLECULAR PLANT 2011; 4:1105-22. [PMID: 21832284 DOI: 10.1093/mp/ssr058] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Plant phosphate transporter (PT) genes comprise a large family with important roles in various physiological and biochemical processes. In this study, a database search yielded 26 potential PT family genes in rice (Oryza sativa). Analysis of these genes led to identification of eight conserved motifs and 5-12 trans-membrane segments, most of them conserved. A total of 237 putative cis elements were found in the 2-kb upstream region of these genes. Of these, a majority were Pi-response and other stress-related cis regulatory elements, such as PHO-like, TATA-box-like, PHR1, or Helix-loop-helix elements, and WRKY1 and ABRE elements, suggesting gene regulation by these signals. Comprehensive expression analysis of these genes was performed using data from microarrays hybridized with RNA from 27 tissues covering the entire lifecycle from three rice genotypes: Minghui 63, Zhenshan 97, and Shanyou 63. Real-time PCR analysis confirmed that three rice PT genes are preferentially expressed in stamen at 1 d before flowering, two in panicle at the heading stage, and two in flag leaf at 14 d after the heading stage. Hormone-treatment experiments revealed differential up-regulation or down-regulation of 11 rice PT genes in seedlings exposed to five hormones, respectively. These results will be useful for elucidating the roles of these genes in the growth, development, and stress response of the rice plant.
Collapse
Affiliation(s)
- Fang Liu
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan 430070, China
| | | | | | | | | | | |
Collapse
|
59
|
Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 2011; 12:628-40. [PMID: 21850043 DOI: 10.1038/nrg3046] [Citation(s) in RCA: 404] [Impact Index Per Article: 28.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Genome and exome sequencing yield extensive catalogues of human genetic variation. However, pinpointing the few phenotypically causal variants among the many variants present in human genomes remains a major challenge, particularly for rare and complex traits wherein genetic information alone is often insufficient. Here, we review approaches to estimate the deleteriousness of single nucleotide variants (SNVs), which can be used to prioritize disease-causal variants. We describe recent advances in comparative and functional genomics that enable systematic annotation of both coding and non-coding variants. Application and optimization of these methods will be essential to find the genetic answers that sequencing promises to hide in plain sight.
Collapse
|
60
|
Barrière A, Gordon KL, Ruvinsky I. Distinct functional constraints partition sequence conservation in a cis-regulatory element. PLoS Genet 2011; 7:e1002095. [PMID: 21655084 PMCID: PMC3107193 DOI: 10.1371/journal.pgen.1002095] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2010] [Accepted: 04/07/2011] [Indexed: 11/25/2022] Open
Abstract
Different functional constraints contribute to different evolutionary rates across genomes. To understand why some sequences evolve faster than others in a single cis-regulatory locus, we investigated function and evolutionary dynamics of the promoter of the Caenorhabditis elegans unc-47 gene. We found that this promoter consists of two distinct domains. The proximal promoter is conserved and is largely sufficient to direct appropriate spatial expression. The distal promoter displays little if any conservation between several closely related nematodes. Despite this divergence, sequences from all species confer robustness of expression, arguing that this function does not require substantial sequence conservation. We showed that even unrelated sequences have the ability to promote robust expression. A prominent feature shared by all of these robustness-promoting sequences is an AT-enriched nucleotide composition consistent with nucleosome depletion. Because general sequence composition can be maintained despite sequence turnover, our results explain how different functional constraints can lead to vastly disparate rates of sequence divergence within a promoter. Comparison between genome sequences of different species is a powerful tool in modern biology because important features are maintained by natural selection and are therefore conserved. However, some important sequences within genomes evolve considerably faster than others. One possible explanation is that they encode little or no function. Alternatively, they may evolve under different constraints that permit sequence turnover while maintaining function. Here we report that the promoter of the unc-47 gene of C. elegans contains two discrete elements. One has a highly conserved sequence that determines the spatial expression pattern. Another shows no sequence conservation, but it makes expression of the gene robust, that is, consistent between individuals and resilient to environmental challenges. Remarkably, multiple unrelated sequences are capable of promoting robust expression. Nucleotide composition of these sequences suggests that open chromatin may play a role in conferring robustness of gene expression. Because general sequence composition and therefore expression robustness can be maintained despite sequence turnover, our results offer an explanation of how rapidly diverging promoter elements can nevertheless remain functionally conserved.
Collapse
Affiliation(s)
- Antoine Barrière
- Department of Ecology and Evolution and Institute for Genomics and Systems Biology, Chicago, Illinois, United States of America
| | - Kacy L. Gordon
- Department of Organismal Biology and Anatomy, The University of Chicago, Chicago, Illinois, United States of America
| | - Ilya Ruvinsky
- Department of Ecology and Evolution and Institute for Genomics and Systems Biology, Chicago, Illinois, United States of America
- Department of Organismal Biology and Anatomy, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
61
|
Abstract
The tunicates, or urochordates, constitute a large group of marine animals whose recent common ancestry with vertebrates is reflected in the tadpole-like larvae of most tunicates. Their diversity and key phylogenetic position are enhanced, from a research viewpoint, by anatomically simple and transparent embryos, compact rapidly evolving genomes, and the availability of powerful experimental and computational tools with which to study these organisms. Tunicates are thus a powerful system for exploring chordate evolution and how extreme variation in genome sequence and gene regulatory network architecture is compatible with the preservation of an ancestral chordate body plan.
Collapse
Affiliation(s)
- Patrick Lemaire
- Institut du Biologie de Développement de Marseille Luminy (IBDML, UMR 6216, CNRS, Université de la Méditerranée), Parc Scientifique de Luminy Case 907, F-13288, Marseille Cedex 9, France
- Centre de Recherches en Biochimie Macromoléculaire (CRBM, UMR5237, CNRS, Universités Montpellier 1 and 2), 1919 route de Mende, F-34293, Montpellier Cedex 05, France
| |
Collapse
|
62
|
Nguyen TT, Foteinou PT, Calvano SE, Lowry SF, Androulakis IP. Computational identification of transcriptional regulators in human endotoxemia. PLoS One 2011; 6:e18889. [PMID: 21637747 PMCID: PMC3103499 DOI: 10.1371/journal.pone.0018889] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2010] [Accepted: 03/23/2011] [Indexed: 12/21/2022] Open
Abstract
One of the great challenges in the post-genomic era is to decipher the underlying principles governing the dynamics of biological responses. As modulating gene expression levels is among the key regulatory responses of an organism to changes in its environment, identifying biologically relevant transcriptional regulators and their putative regulatory interactions with target genes is an essential step towards studying the complex dynamics of transcriptional regulation. We present an analysis that integrates various computational and biological aspects to explore the transcriptional regulation of systemic inflammatory responses through a human endotoxemia model. Given a high-dimensional transcriptional profiling dataset from human blood leukocytes, an elementary set of temporal dynamic responses which capture the essence of a pro-inflammatory phase, a counter-regulatory response and a dysregulation in leukocyte bioenergetics has been extracted. Upon identification of these expression patterns, fourteen inflammation-specific gene batteries that represent groups of hypothetically ‘coregulated’ genes are proposed. Subsequently, statistically significant cis-regulatory modules (CRMs) are identified and decomposed into a list of critical transcription factors (34) that are validated largely on primary literature. Finally, our analysis further allows for the construction of a dynamic representation of the temporal transcriptional regulatory program across the host, deciphering possible combinatorial interactions among factors under which they might be active. Although much remains to be explored, this study has computationally identified key transcription factors and proposed a putative time-dependent transcriptional regulatory program associated with critical transcriptional inflammatory responses. These results provide a solid foundation for future investigations to elucidate the underlying transcriptional regulatory mechanisms under the host inflammatory response. Also, the assumption that coexpressed genes that are functionally relevant are more likely to share some common transcriptional regulatory mechanism seems to be promising, making the proposed framework become essential in unravelling context-specific transcriptional regulatory interactions underlying diverse mammalian biological processes.
Collapse
Affiliation(s)
- Tung T. Nguyen
- BioMaPS Institute for Quantitative Biology, Rutgers University, Piscataway, New Jersey, United States of America
| | - Panagiota T. Foteinou
- Department of Biomedical Engineering, Rutgers University, Piscataway, New Jersey, United States of America
| | - Steven E. Calvano
- Department of Surgery, Robert Wood Johnson Medical School, University of Medicine and Dentistry, New Jersey, New Brunswick, New Jersey, United States of America
| | - Stephen F. Lowry
- Department of Surgery, Robert Wood Johnson Medical School, University of Medicine and Dentistry, New Jersey, New Brunswick, New Jersey, United States of America
| | - Ioannis P. Androulakis
- Department of Biomedical Engineering, Rutgers University, Piscataway, New Jersey, United States of America
- Department of Surgery, Robert Wood Johnson Medical School, University of Medicine and Dentistry, New Jersey, New Brunswick, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
63
|
An integrated pipeline for the genome-wide analysis of transcription factor binding sites from ChIP-Seq. PLoS One 2011; 6:e16432. [PMID: 21358819 PMCID: PMC3040171 DOI: 10.1371/journal.pone.0016432] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2010] [Accepted: 12/21/2010] [Indexed: 11/19/2022] Open
Abstract
ChIP-Seq has become the standard method for genome-wide profiling DNA association of transcription factors. To simplify analyzing and interpreting ChIP-Seq data, which typically involves using multiple applications, we describe an integrated, open source, R-based analysis pipeline. The pipeline addresses data input, peak detection, sequence and motif analysis, visualization, and data export, and can readily be extended via other R and Bioconductor packages. Using a standard multicore computer, it can be used with datasets consisting of tens of thousands of enriched regions. We demonstrate its effectiveness on published human ChIP-Seq datasets for FOXA1, ER, CTCF and STAT1, where it detected co-occurring motifs that were consistent with the literature but not detected by other methods. Our pipeline provides the first complete set of Bioconductor tools for sequence and motif analysis of ChIP-Seq and ChIP-chip data.
Collapse
|
64
|
Kugler JE, Gazdoiu S, Oda-Ishii I, Passamaneck YJ, Erives AJ, Di Gregorio A. Temporal regulation of the muscle gene cascade by Macho1 and Tbx6 transcription factors in Ciona intestinalis. J Cell Sci 2010; 123:2453-63. [PMID: 20592183 DOI: 10.1242/jcs.066910] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
For over a century, muscle formation in the ascidian embryo has been representative of 'mosaic' development. The molecular basis of muscle-fate predetermination has been partly elucidated with the discovery of Macho1, a maternal zinc-finger transcription factor necessary and sufficient for primary muscle development, and of its transcriptional intermediaries Tbx6b and Tbx6c. However, the molecular mechanisms by which the maternal information is decoded by cis-regulatory modules (CRMs) associated with muscle transcription factor and structural genes, and the ways by which a seamless transition from maternal to zygotic transcription is ensured, are still mostly unclear. By combining misexpression assays with CRM analyses, we have identified the mechanisms through which Ciona Macho1 (Ci-Macho1) initiates expression of Ci-Tbx6b and Ci-Tbx6c, and we have unveiled the cross-regulatory interactions between the latter transcription factors. Knowledge acquired from the analysis of the Ci-Tbx6b CRM facilitated both the identification of a related CRM in the Ci-Tbx6c locus and the characterization of two CRMs associated with the structural muscle gene fibrillar collagen 1 (CiFCol1). We use these representative examples to reconstruct how compact CRMs orchestrate the muscle developmental program from pre-localized ooplasmic determinants to differentiated larval muscle in ascidian embryos.
Collapse
Affiliation(s)
- Jamie E Kugler
- Department of Cell and Developmental Biology, Weill Medical College of Cornell University, 1300 York Avenue, Box 60, New York, NY 10065, USA
| | | | | | | | | | | |
Collapse
|
65
|
Abstract
Development progresses through a sequence of cellular identities which are determined by the activities of networks of transcription factor genes. Alterations in cis-regulatory elements of these genes play a major role in evolutionary change, but little is known about the mechanisms responsible for maintaining conserved patterns of gene expression. We have studied the evolution of cis-regulatory mechanisms controlling the SCL gene, which encodes a key transcriptional regulator of blood, vasculature, and brain development and exhibits conserved function and pattern of expression throughout vertebrate evolution. SCL cis-regulatory elements are conserved between frog and chicken but accrued alterations at an accelerated rate between 310 and 200 million years ago, with subsequent fixation of a new cis-regulatory pattern at the beginning of the mammalian radiation. As a consequence, orthologous elements shared by mammals and lower vertebrates exhibit functional differences and binding site turnover between widely separated cis-regulatory modules. However, the net effect of these alterations is constancy of overall regulatory inputs and of expression pattern. Our data demonstrate remarkable cis-regulatory remodelling across the SCL locus and indicate that stable patterns of expression can mask extensive regulatory change. These insights illuminate our understanding of vertebrate evolution.
Collapse
|
66
|
Rister J, Desplan C. Deciphering the genome's regulatory code: the many languages of DNA. Bioessays 2010; 32:381-4. [PMID: 20394065 DOI: 10.1002/bies.200900197] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The generation of patterns and the diversity of cell types in a multicellular organism require differential gene regulation. At the heart of this process are enhancers or cis-regulatory modules (CRMs), genomic regions that are bound by transcription factors (TFs) that control spatio-temporal gene expression in developmental networks. To date, only a few CRMs have been studied in detail and the underlying cis-regulatory code is not well understood. Here, we review recent progress on the genome-wide identification of CRMs with chromatin immunoprecipitation of TF-DNA complexes followed by microarrays (ChIP-on-chip). We focus on two computational approaches that have succeeded in predicting the expression pattern driven by a CRM either based on TF binding site preferences and their expression levels, or quantitative analysis of CRM occupancy by key TFs. We also discuss the current limits of these methods and highlight some of the key problems that have to be solved to gain a more complete understanding of the structure and function of CRMs.
Collapse
Affiliation(s)
- Jens Rister
- Center for Developmental Genetics, Department of Biology, New York University, 1009 Silver Center, New York, NY 10003, USA
| | | |
Collapse
|
67
|
Landolin JM, Johnson DS, Trinklein ND, Aldred SF, Medina C, Shulha H, Weng Z, Myers RM. Sequence features that drive human promoter function and tissue specificity. Genome Res 2010; 20:890-8. [PMID: 20501695 DOI: 10.1101/gr.100370.109] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Promoters are important regulatory elements that contain the necessary sequence features for cells to initiate transcription. To functionally characterize a large set of human promoters, we measured the transcriptional activities of 4575 putative promoters across eight cell lines using transient transfection reporter assays. In parallel, we measured gene expression in the same cell lines and observed a significant correlation between promoter activity and endogenous gene expression (r = 0.43). As transient transfection assays directly measure the promoting effect of a defined fragment of DNA sequence, decoupled from epigenetic, chromatin, or long-range regulatory effects, we sought to predict whether a promoter was active using sequence features alone. CG dinucleotide content was highly predictive of ubiquitous promoter activity, necessitating the separation of promoters into two groups: high CG promoters, mostly ubiquitously active, and low CG promoters, mostly cell line-specific. Computational models trained on the binding potential of transcriptional factor (TF) binding motifs could predict promoter activities in both high and low CG groups: average area under the receiver operating characteristic curve (AUC) of the models was 91% and exceeded the AUC of CG content by an average of 23%. Known relationships, for example, between HNF4A and hepatocytes, were recapitulated in the corresponding cell lines, in this case the liver-derived cell line HepG2. Half of the associations between tissue-specific TFs and cell line-specific promoters were new. Our study underscores the importance of collecting functional information from complementary assays and conditions to understand biology in a systematic framework.
Collapse
Affiliation(s)
- Jane M Landolin
- Division of Life Sciences, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | | | | | | | | | | | | | | |
Collapse
|
68
|
Khoueiry P, Rothbächer U, Ohtsuka Y, Daian F, Frangulian E, Roure A, Dubchak I, Lemaire P. A cis-regulatory signature in ascidians and flies, independent of transcription factor binding sites. Curr Biol 2010; 20:792-802. [PMID: 20434338 DOI: 10.1016/j.cub.2010.03.063] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2010] [Revised: 03/12/2010] [Accepted: 03/23/2010] [Indexed: 12/22/2022]
Abstract
BACKGROUND Transcription initiation is controlled by cis-regulatory modules. Although these modules are usually made of clusters of short transcription factor binding sites, a small minority of such clusters in the genome have cis-regulatory activity. This paradox is currently unsolved. RESULTS To identify what discriminates active from inactive clusters, we focused our attention on short topologically unconstrained clusters of two ETS and two GATA binding sites, similar to the early neural enhancer of Ciona intestinalis Otx. We first computationally identified 55 such clusters, conserved between the two Ciona genomes. In vivo assay of the activity of 19 hits identified three novel early neural enhancers, all located next to genes coexpressed with Otx. Optimization of ETS and GATA binding sites was not always sufficient to confer activity to inactive clusters. Rather, a dinucleotide sequence code associated to nucleosome depletion showed a robust correlation with enhancer potential. Identification of a large collection of Ciona regulatory regions revealed that predicted nucleosome depletion constitutes a general signature of Ciona enhancers, which is conserved between orthologous loci in the two Ciona genomes and which partitions conserved noncoding sequences into a major nucleosome-bound fraction and a minor nucleosome-free fraction with higher cis-regulatory potential. We also found this signature in a large fraction of short Drosophila cis-regulatory modules. CONCLUSION This study indicates that a sequence-based dinucleotide signature, previously associated with nucleosome depletion and independent of transcription factor binding sites, contributes to the definition of a local cis-regulatory potential in two metazoa, Ciona intestinalis and Drosophila melanogaster.
Collapse
Affiliation(s)
- Pierre Khoueiry
- Institut du Biologie de Développement de Marseille Luminy (IBDML, UMR 6216), CNRS, Université de la Méditerranée, Parc Scientifique de Luminy Case 907, F-13288, Marseille Cedex 9, France.
| | | | | | | | | | | | | | | |
Collapse
|
69
|
Kubo A, Suzuki N, Yuan X, Nakai K, Satoh N, Imai KS, Satou Y. Genomic cis-regulatory networks in the early Ciona intestinalis embryo. Development 2010; 137:1613-23. [PMID: 20392745 DOI: 10.1242/dev.046789] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Precise spatiotemporal gene expression during animal development is achieved through gene regulatory networks, in which sequence-specific transcription factors (TFs) bind to cis-regulatory elements of target genes. Although numerous cis-regulatory elements have been identified in a variety of systems, their global architecture in the gene networks that regulate animal development is not well understood. Here, we determined the structure of the core networks at the cis-regulatory level in early embryos of the chordate Ciona intestinalis by chromatin immunoprecipitation (ChIP) of 11 TFs. The regulatory systems of the 11 TF genes examined were tightly interconnected with one another. By combining analysis of the ChIP data with the results of previous comprehensive analyses of expression profiles and knockdown of regulatory genes, we found that most of the previously determined interactions are direct. We focused on cis-regulatory networks responsible for the Ciona mesodermal tissues by examining how the networks specify these tissues at the level of their cis-regulatory architecture. We also found many interactions that had not been predicted by simple gene knockdown experiments, and we showed that a significant fraction of TF-DNA interactions make major contributions to the regulatory control of target gene expression.
Collapse
Affiliation(s)
- Atsushi Kubo
- Department of Zoology, Graduate School of Science, Kyoto University, Sakyo-ku, Kyoto, Japan
| | | | | | | | | | | | | |
Collapse
|
70
|
Amin NM, Shi H, Liu J. The FoxF/FoxC factor LET-381 directly regulates both cell fate specification and cell differentiation in C. elegans mesoderm development. Development 2010; 137:1451-60. [PMID: 20335356 DOI: 10.1242/dev.048496] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Forkhead transcription factors play crucial and diverse roles in mesoderm development. In particular, FoxF and FoxC genes are, respectively, involved in the development of visceral/splanchnic mesoderm and non-visceral mesoderm in coelomate animals. Here, we show at single-cell resolution that, in the pseudocoelomate nematode C. elegans, the single FoxF/FoxC transcription factor LET-381 functions in a feed-forward mechanism in the specification and differentiation of the non-muscle mesodermal cells, the coelomocytes (CCs). LET-381/FoxF directly activates the CC specification factor, the Six2 homeodomain protein CEH-34, and functions cooperatively with CEH-34/Six2 to directly activate genes required for CC differentiation. Our results unify a diverse set of studies on the functions of FoxF/FoxC factors and provide a model for how FoxF/FoxC factors function during mesoderm development.
Collapse
Affiliation(s)
- Nirav M Amin
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | | | | |
Collapse
|
71
|
Weirauch MT, Hughes TR. Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same. Trends Genet 2010; 26:66-74. [PMID: 20083321 DOI: 10.1016/j.tig.2009.12.002] [Citation(s) in RCA: 126] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2009] [Revised: 12/09/2009] [Accepted: 12/09/2009] [Indexed: 12/28/2022]
Abstract
Regulatory regions with similar transcriptional output often have little overt sequence similarity, both within and between genomes. Although cis- and trans-regulatory changes can contribute to sequence divergence without dramatically altering gene expression outputs, heterologous DNA often functions similarly in organisms that share little regulatory sequence similarities (e.g. human DNA in fish), indicating that trans-regulatory mechanisms tend to diverge more slowly and can accommodate a variety of cis-regulatory configurations. This capacity to 'tinker' with regulatory DNA probably relates to the complexity, robustness and evolvability of regulatory systems, but cause-and-effect relationships among evolutionary processes and properties of regulatory systems remain a topic of debate. The challenge of understanding the concrete mechanisms underlying cis-regulatory evolution - including the conservation of function without the conservation of sequence - relates to the challenge of understanding the function of regulatory systems in general. Currently, we are largely unable to recognize functionally similar regulatory DNA.
Collapse
Affiliation(s)
- Matthew T Weirauch
- Banting and Best Department of Medical Research and Donnelly Centre for Cellular and Biomolecular Research, Ontario, Canada
| | | |
Collapse
|
72
|
Guerrero L, Marco-Ferreres R, Serrano AL, Arredondo JJ, Cervera M. Secondary enhancers synergise with primary enhancers to guarantee fine-tuned muscle gene expression. Dev Biol 2010; 337:16-28. [DOI: 10.1016/j.ydbio.2009.10.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2009] [Revised: 09/15/2009] [Accepted: 10/03/2009] [Indexed: 11/27/2022]
|
73
|
Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 2009; 462:65-70. [PMID: 19890324 DOI: 10.1038/nature08531] [Citation(s) in RCA: 299] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2009] [Accepted: 09/22/2009] [Indexed: 11/09/2022]
Abstract
Development requires the establishment of precise patterns of gene expression, which are primarily controlled by transcription factors binding to cis-regulatory modules. Although transcription factor occupancy can now be identified at genome-wide scales, decoding this regulatory landscape remains a daunting challenge. Here we used a novel approach to predict spatio-temporal cis-regulatory activity based only on in vivo transcription factor binding and enhancer activity data. We generated a high-resolution atlas of cis-regulatory modules describing their temporal and combinatorial occupancy during Drosophila mesoderm development. The binding profiles of cis-regulatory modules with characterized expression were used to train support vector machines to predict five spatio-temporal expression patterns. In vivo transgenic reporter assays demonstrate the high accuracy of these predictions and reveal an unanticipated plasticity in transcription factor binding leading to similar expression. This data-driven approach does not require previous knowledge of transcription factor sequence affinity, function or expression, making it widely applicable.
Collapse
|
74
|
Cameron RA, Davidson EH. Flexibility of transcription factor target site position in conserved cis-regulatory modules. Dev Biol 2009; 336:122-35. [DOI: 10.1016/j.ydbio.2009.09.018] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2009] [Revised: 09/09/2009] [Accepted: 09/10/2009] [Indexed: 10/20/2022]
|
75
|
He X, Chen CC, Hong F, Fang F, Sinha S, Ng HH, Zhong S. A biophysical model for analysis of transcription factor interaction and binding site arrangement from genome-wide binding data. PLoS One 2009; 4:e8155. [PMID: 19956545 PMCID: PMC2780727 DOI: 10.1371/journal.pone.0008155] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Accepted: 11/10/2009] [Indexed: 11/19/2022] Open
Abstract
Background How transcription factors (TFs) interact with cis-regulatory sequences and interact with each other is a fundamental, but not well understood, aspect of gene regulation. Methodology/Principal Findings We present a computational method to address this question, relying on the established biophysical principles. This method, STAP (sequence to affinity prediction), takes into account all combinations and configurations of strong and weak binding sites to analyze large scale transcription factor (TF)-DNA binding data to discover cooperative interactions among TFs, infer sequence rules of interaction and predict TF target genes in new conditions with no TF-DNA binding data. The distinctions between STAP and other statistical approaches for analyzing cis-regulatory sequences include the utility of physical principles and the treatment of the DNA binding data as quantitative representation of binding strengths. Applying this method to the ChIP-seq data of 12 TFs in mouse embryonic stem (ES) cells, we found that the strength of TF-DNA binding could be significantly modulated by cooperative interactions among TFs with adjacent binding sites. However, further analysis on five putatively interacting TF pairs suggests that such interactions may be relatively insensitive to the distance and orientation of binding sites. Testing a set of putative Nanog motifs, STAP showed that a novel Nanog motif could better explain the ChIP-seq data than previously published ones. We then experimentally tested and verified the new Nanog motif. A series of comparisons showed that STAP has more predictive power than several state-of-the-art methods for cis-regulatory sequence analysis. We took advantage of this power to study the evolution of TF-target relationship in Drosophila. By learning the TF-DNA interaction models from the ChIP-chip data of D. melanogaster (Mel) and applying them to the genome of D. pseudoobscura (Pse), we found that only about half of the sequences strongly bound by TFs in Mel have high binding affinities in Pse. We show that prediction of functional TF targets from ChIP-chip data can be improved by using the conservation of STAP predicted affinities as an additional filter. Conclusions/Significance STAP is an effective method to analyze binding site arrangements, TF cooperativity, and TF target genes from genome-wide TF-DNA binding data.
Collapse
Affiliation(s)
- Xin He
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
| | - Chieh-Chun Chen
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
| | - Feng Hong
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
| | - Fang Fang
- Gene Regulation Laboratory, Genome Institute of Singapore, Singapore, Singapore
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
| | - Huck-Hui Ng
- Gene Regulation Laboratory, Genome Institute of Singapore, Singapore, Singapore
| | - Sheng Zhong
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois, United States of America
- * E-mail:
| |
Collapse
|
76
|
Meireles-Filho ACA, Stark A. Comparative genomics of gene regulation-conservation and divergence of cis-regulatory information. Curr Opin Genet Dev 2009; 19:565-70. [PMID: 19913403 DOI: 10.1016/j.gde.2009.10.006] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Revised: 10/06/2009] [Accepted: 10/06/2009] [Indexed: 01/13/2023]
Abstract
We recently witnessed a tremendous increase in genomics studies on gene regulation and in entirely sequenced genomes from closely related species. This has triggered analyses that suggest a wide range of evolutionary dynamics of gene regulation, from rapid turnover of transcription-factor binding sites to conservation of enhancer function across large evolutionary distances. Many examples show that enhancers can evolve beyond recognizable sequence similarity while retaining function. However, bioinformatics approaches are increasingly able to detect conserved regulatory elements through characteristic evolutionary sequence signatures. Cis-regulatory changes are also a major source of morphological evolution, which might be facilitated by many biochemically functional elements that are selectively neutral and by the buffering function of redundant enhancers and 'shadow' enhancers.
Collapse
|
77
|
Wilson MD, Odom DT. Evolution of transcriptional control in mammals. Curr Opin Genet Dev 2009; 19:579-85. [PMID: 19913406 DOI: 10.1016/j.gde.2009.10.003] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2009] [Revised: 09/07/2009] [Accepted: 10/07/2009] [Indexed: 01/18/2023]
Abstract
Changes in gene expression directed by transcriptional regulators can give rise to new phenotypes. While gene expression profiles can be maintained across large evolutionary distances, transcription factor-DNA interactions diverge rapidly. The application of new genome-wide methodologies has begun refining our global understanding of when and where mammalian transcription factors interact with DNA, thereby providing new insight into the mechanisms of transcriptional evolution. The interplay between cis and trans regulation of gene expression is an increasingly active area of investigation, and recent studies suggest that mutations in cis-regulatory DNA can explain many inter-species differences in gene expression.
Collapse
Affiliation(s)
- Michael D Wilson
- Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | | |
Collapse
|
78
|
Chen CH, Chuang TJ, Liao BY, Chen FC. Scanning for the signatures of positive selection for human-specific insertions and deletions. Genome Biol Evol 2009; 1:415-9. [PMID: 20333210 PMCID: PMC2817433 DOI: 10.1093/gbe/evp041] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/16/2009] [Indexed: 12/03/2022] Open
Abstract
Human-specific small insertions and deletions (HS indels, with lengths <100 bp) are reported to be ubiquitous in the human genome. However, whether these indels contribute to human-specific traits remains unclear. Here we employ a modified McDonald–Kreitman (MK) test and a combinatorial population genetics approach to infer, respectively, the occurrence of positive selection and recent selective sweep events associated with HS indels. We first extract 625,890 HS indels from the human–chimpanzee–macaque–mouse multiple alignments and classify them into nonpolymorphic (41%) and polymorphic (59%) indels with reference to the human indel polymorphism data. The modified MK test is then applied to 100-kb partially overlapped sliding windows across the human genome to scan for the signs of positive selection. After excluding the possibility of biased gene conversion and controlling for false discovery rate, we show that HS indels are potentially positively selected in about 10 Mb of the human genome. Furthermore, the indel-associated positively selected regions overlap with genes more often than expected. However, our result suggests that the potential targets of positive selection are located in noncoding regions. Meanwhile, we also demonstrate that the genomic regions surrounding HS indels are more frequently involved in recent selective sweep than the other regions. In addition, HS indels are associated with distinct recent selective sweep events in different human subpopulations. Our results suggest that HS indels may have been associated with human adaptive changes at both the species level and the subpopulation level.
Collapse
|
79
|
Kim HD, Shay T, O'Shea EK, Regev A. Transcriptional regulatory circuits: predicting numbers from alphabets. Science 2009; 325:429-32. [PMID: 19628860 PMCID: PMC2745280 DOI: 10.1126/science.1171347] [Citation(s) in RCA: 124] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Transcriptional regulatory circuits govern how cis and trans factors transform signals into messenger RNA (mRNA) expression levels. With advances in quantitative and high-throughput technologies that allow measurement of gene expression state in different conditions, data that can be used to build and test models of transcriptional regulation is being generated at a rapid pace. Here, we review experimental and computational methods used to derive detailed quantitative circuit models on a small scale and cruder, genome-wide models on a large scale. We discuss the potential of combining small- and large-scale approaches to understand the working and wiring of transcriptional regulatory circuits.
Collapse
Affiliation(s)
- Harold D Kim
- Howard Hughes Medical Institute, Harvard University Faculty of Arts and Sciences Center for Systems Biology, Department of Molecular and Cellular Biology, Cambridge, MA 02138, USA
| | | | | | | |
Collapse
|
80
|
Evidence for gene length as a determinant of gene coexpression in protein complexes. Genetics 2009; 183:751-4, 1SI-5SI. [PMID: 19620395 DOI: 10.1534/genetics.109.105361] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Variation of gene length imposes a challenge on genes requiring coexpression. Using a large human protein complex data set, we show that genes encoding subunits of the same protein complex tend to have similar length. The length uniformity is greater for complexes with stronger coexpression. We also show that the rate of gene length evolution is associated with gene coexpression level within a complex. These results suggest a new angle in understanding the evolution of protein complexes as well as the regulation of gene coexpression.
Collapse
|
81
|
Liu R, Hannenhalli S, Bucan M. Motifs and cis-regulatory modules mediating the expression of genes co-expressed in presynaptic neurons. Genome Biol 2009; 10:R72. [PMID: 19570198 PMCID: PMC2728526 DOI: 10.1186/gb-2009-10-7-r72] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Revised: 06/11/2009] [Accepted: 07/01/2009] [Indexed: 12/19/2022] Open
Abstract
An integrative strategy of comparative genomics, experimental and computational approaches reveals aspects of a regulatory network controlling neuronal-specific expression in presynaptic neurons. Background Hundreds of proteins modulate neurotransmitter release and synaptic plasticity during neuronal development and in response to synaptic activity. The expression of genes in the pre- and post-synaptic neurons is under stringent spatio-temporal control, but the mechanism underlying the neuronal expression of these genes remains largely unknown. Results Using unbiased in vivo and in vitro screens, we characterized the cis elements regulating the Rab3A gene, which is expressed abundantly in presynaptic neurons. A set of identified regulatory elements of the Rab3A gene corresponded to the defined Rab3A multi-species conserved elements. In order to identify clusters of enriched transcription factor binding sites, for example, cis-regulatory modules, we analyzed intergenic multi-species conserved elements in the vicinity of nine presynaptic genes, including Rab3A, that are highly and specifically expressed in brain regions. Sixteen transcription factor binding motifs were over-represented in these multi-species conserved elements. Based on a combined occurrence for these enriched motifs, multi-species conserved elements in the vicinity of 107 previously identified presynaptic genes were scored and ranked. We then experimentally validated the scoring strategy by showing that 12 of 16 (75%) high-scoring multi-species conserved elements functioned as neuronal enhancers in a cell-based assay. Conclusions This work introduces an integrative strategy of comparative genomics, experimental, and computational approaches to reveal aspects of a regulatory network controlling neuronal-specific expression of genes in presynaptic neurons.
Collapse
Affiliation(s)
- Rui Liu
- Department of Genetics and Penn Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | | |
Collapse
|
82
|
Comparative genomics allows the discovery of cis-regulatory elements in mosquitoes. Proc Natl Acad Sci U S A 2009; 106:3053-8. [PMID: 19211788 DOI: 10.1073/pnas.0813264106] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The discovery and mapping of cis-regulatory elements is important for understanding regulation of gene transcription in mosquito vectors of human diseases. Genome sequence data are available for 3 species, Aedes aegypti, Anopheles gambiae, and Culex quinquefasciatus (Diptera: Culicidae), representing 2 subfamilies (Culicinae and Anophelinae) that are estimated to have diverged 145 to 200 million years ago. Comparative genomics tools were used to screen genomic DNA fragments located in the 5'-end flanking regions of orthologous genes. These analyses resulted in the identification of 137 sequences, designated "mosquito motifs," 7 to 9 nucleotides in length, representing 18 families of putative cis-regulatory elements conserved significantly among the 3 species when compared to the fruit fly, Drosophila melanogaster. Forty-one of the motifs were implicated previously in experiments as sites for binding transcription factors or functioning in the regulation of mosquito gene expression. Further analyses revealed associations between specific motifs and expression profiles, particularly in those genes that show increased or decreased mRNA abundance in females following a blood meal, and those accumulating transcription products exclusively or preferentially in the midgut, fat bodies, or ovaries. These results validate the methodology and support a relationship between the discovered motifs and the conservation of hematophagy in mosquitoes.
Collapse
|
83
|
Affiliation(s)
- Albert Erives
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire
| |
Collapse
|
84
|
Kim J, He X, Sinha S. Evolution of regulatory sequences in 12 Drosophila species. PLoS Genet 2009; 5:e1000330. [PMID: 19132088 PMCID: PMC2607023 DOI: 10.1371/journal.pgen.1000330] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2008] [Accepted: 12/05/2008] [Indexed: 01/07/2023] Open
Abstract
Characterization of the evolutionary constraints acting on cis-regulatory sequences is crucial to comparative genomics and provides key insights on the evolution of organismal diversity. We study the relationships among orthologous cis-regulatory modules (CRMs) in 12 Drosophila species, especially with respect to the evolution of transcription factor binding sites, and report statistical evidence in favor of key evolutionary hypotheses. Binding sites are found to have position-specific substitution rates. However, the selective forces at different positions of a site do not act independently, and the evidence suggests that constraints on sites are often based on their exact binding affinities. Binding site loss is seen to conform to a molecular clock hypothesis. The rate of site loss is transcription factor–specific and depends on the strength of binding and, in some cases, the presence of other binding sites in close proximity. Our analysis is based on a novel computational method for aligning orthologous CRMs on a tree, which rigorously accounts for alignment uncertainties and exploits binding site predictions through a unified probabilistic framework. Finally, we report weak purifying selection on short deletions, providing important clues about overall spatial constraints on CRMs. Our results present a complex picture of regulatory sequence evolution, with substantial plasticity that depends on a number of factors. The insights gained in this study will help us to understand the combinatorial control of gene regulation and how it evolves. They will pave the way for theoretical models that are cognizant of the important determinants of regulatory sequence evolution and will be critical in genome-wide identification of non-coding sequences under purifying or positive selection. The spatial–temporal expression pattern of a gene, which is crucial to its function, is controlled by cis-regulatory DNA sequences. Forming the basic units of regulatory sequences are transcription factor binding sites, often organized into larger modules that determine gene expression in response to combinatorial environmental signals. Understanding the conservation and change of regulatory sequences is critical to our knowledge of the unity as well as diversity of animal development and phenotypes. In this paper, we study the evolution of sequences involved in the regulation of body patterning in the Drosophila embryo. We find that mutations of nucleotides within a binding site are constrained by evolutionary forces to preserve the site's binding affinity to the cognate transcription factor. Functional binding sites are frequently destroyed during evolution and the rate of loss across evolutionary spans is roughly constant. We also find that the evolutionary fate of a site strongly depends on its context; a pair of interacting sites are more likely to survive mutational forces than isolated sites. Together, these findings provide new insights and pose new challenges to our understanding of cis-regulatory sequences and their evolution.
Collapse
Affiliation(s)
- Jaebum Kim
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xin He
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
85
|
Liberman LM, Stathopoulos A. Design flexibility in cis-regulatory control of gene expression: synthetic and comparative evidence. Dev Biol 2008; 327:578-89. [PMID: 19135437 DOI: 10.1016/j.ydbio.2008.12.020] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2008] [Revised: 12/13/2008] [Accepted: 12/16/2008] [Indexed: 11/18/2022]
Abstract
In early Drosophila embryos, the transcription factor Dorsal regulates patterns of gene expression and cell fate specification along the dorsal-ventral axis. How gene expression is produced within the broad lateral domain of the presumptive neurogenic ectoderm is not understood. To investigate transcriptional control during neurogenic ectoderm specification, we examined divergence and function of an embryonic cis-regulatory element controlling the gene short gastrulation (sog). While transcription factor binding sites are not completely conserved, we demonstrate that these sequences are bona fide regulatory elements, despite variable regulatory architecture. Mutation of conserved sequences revealed that putative transcription factor binding sites for Dorsal and Zelda, a ubiquitous maternal transcription factor, are required for proper sog expression. When Zelda and Dorsal sites are paired in a synthetic regulatory element, broad lateral expression results. However, synthetic regulatory elements that contain Dorsal and an additional activator also drive expression throughout the neurogenic ectoderm. Our results suggest that interaction between Dorsal and Zelda drives expression within the presumptive neurogenic ectoderm, but they also demonstrate that regulatory architecture directing expression in this domain is flexible. We propose a model for neurogenic ectoderm specification in which gene regulation occurs at the intersection of temporal and spatial transcription factor inputs.
Collapse
Affiliation(s)
- Louisa M Liberman
- California Institute of Technology, Division of Biology, 1200 E. California Blvd., MC 114-96, Pasadena, CA 91125, USA
| | | |
Collapse
|
86
|
Kuntz SG, Schwarz EM, DeModena JA, De Buysscher T, Trout D, Shizuya H, Sternberg PW, Wold BJ. Multigenome DNA sequence conservation identifies Hox cis-regulatory elements. Genome Res 2008; 18:1955-68. [PMID: 18981268 DOI: 10.1101/gr.085472.108] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
To learn how well ungapped sequence comparisons of multiple species can predict cis-regulatory elements in Caenorhabditis elegans, we made such predictions across the large, complex ceh-13/lin-39 locus and tested them transgenically. We also examined how prediction quality varied with different genomes and parameters in our comparisons. Specifically, we sequenced approximately 0.5% of the C. brenneri and C. sp. 3 PS1010 genomes, and compared five Caenorhabditis genomes (C. elegans, C. briggsae, C. brenneri, C. remanei, and C. sp. 3 PS1010) to find regulatory elements in 22.8 kb of noncoding sequence from the ceh-13/lin-39 Hox subcluster. We developed the MUSSA program to find ungapped DNA sequences with N-way transitive conservation, applied it to the ceh-13/lin-39 locus, and transgenically assayed 21 regions with both high and low degrees of conservation. This identified 10 functional regulatory elements whose activities matched known ceh-13/lin-39 expression, with 100% specificity and a 77% recovery rate. One element was so well conserved that a similar mouse Hox cluster sequence recapitulated the native nematode expression pattern when tested in worms. Our findings suggest that ungapped sequence comparisons can predict regulatory elements genome-wide.
Collapse
Affiliation(s)
- Steven G Kuntz
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | | | | | | | | | | | | | | |
Collapse
|
87
|
Busser BW, Bulyk ML, Michelson AM. Toward a systems-level understanding of developmental regulatory networks. Curr Opin Genet Dev 2008; 18:521-9. [PMID: 18848887 DOI: 10.1016/j.gde.2008.09.003] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2008] [Revised: 09/09/2008] [Accepted: 09/10/2008] [Indexed: 02/01/2023]
Abstract
Developmental regulatory networks constitute all the interconnections among molecular components that guide embryonic development. Developmental transcriptional regulatory networks (TRNs) are circuits of transcription factors and cis-acting DNA elements that control expression of downstream regulatory and effector genes. Developmental networks comprise functional subnetworks that are deployed sequentially in requisite spatiotemporal patterns. Here, we discuss integrative genomics approaches for elucidating TRNs, with an emphasis on those involved in Drosophila mesoderm development and mammalian embryonic stem cell maintenance and differentiation. As examples of regulatory subnetworks, we consider the transcriptional and signaling regulation of genes that interact to control cell morphology and migration. Finally, we describe integrative experimental and computational strategies for defining the entirety of molecular interactions underlying developmental regulatory networks.
Collapse
Affiliation(s)
- Brian W Busser
- Laboratory of Developmental Systems Biology, National Heart Lung and Blood Institute, NIH, Bethesda, MD 20892, USA
| | | | | |
Collapse
|
88
|
Transcriptional enhancement by GATA1-occupied DNA segments is strongly associated with evolutionary constraint on the binding site motif. Genome Res 2008; 18:1896-905. [PMID: 18818370 DOI: 10.1101/gr.083089.108] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Tissue development and function are exquisitely dependent on proper regulation of gene expression, but it remains controversial whether the genomic signals controlling this process are subject to strong selective constraint. While some studies show that highly constrained noncoding regions act to enhance transcription, other studies show that DNA segments with biochemical signatures of regulatory regions, such as occupancy by a transcription factor, are seemingly unconstrained across mammalian evolution. To test the possible correlation of selective constraint with enhancer activity, we used chromatin immunoprecipitation as an approach unbiased by either evolutionary constraint or prior knowledge of regulatory activity to identify DNA segments within a 66-Mb region of mouse chromosome 7 that are occupied by the erythroid transcription factor GATA1. DNA segments bound by GATA1 were identified by hybridization to high-density tiling arrays, validated by quantitative PCR, and tested for gene regulatory activity in erythroid cells. Whereas almost all of the occupied segments contain canonical WGATAR binding site motifs for GATA1, in only 45% of the cases is the motif deeply preserved (found at the orthologous position in placental mammals or more distant species). However, GATA1-bound segments with high enhancer activity tend to be the ones with an evolutionarily preserved WGATAR motif, and this relationship was confirmed by a loss-of-function assay. Thus, GATA1 binding sites that regulate gene expression during erythroid maturation are under strong selective constraint, while nonconstrained binding may have only a limited or indirect role in regulation.
Collapse
|
89
|
KT/HAK/KUP potassium transporters gene family and their whole-life cycle expression profile in rice (Oryza sativa). Mol Genet Genomics 2008; 280:437-52. [PMID: 18810495 DOI: 10.1007/s00438-008-0377-7] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2008] [Accepted: 08/22/2008] [Indexed: 10/21/2022]
Abstract
KT/HAK/KUP potassium transporter protein-encoding genes constitute a large family in the plant kingdom. The KT/HAK/KUP family is important for various physiological processes of plant life. In this study, we identified 27 potential KT/HAK/KUP family genes in rice (Oryza sativa) by database searching. Analysis of these KT/HAK/KUP family members identified three conserved motifs with unknown functions, and 11-15 trans-membrane segments, most of which are conserved. A total of 144 putative cis-elements were found in the 2 kb upstream region of these genes, of which a Ca2+-responsive cis-element, two light-responsive cis-elements, and a circadian-regulated cis-element were identified in the majority of the members, suggesting regulation of these genes by these signals. A comprehensive expression analysis of these genes was performed using data from microarrays hybridized with RNA samples of 27 tissues covering the entire life cycle from three rice genotypes, Minghui 63, Zhenshan 97, and Shanyou 63. We identified preferential expression of two OsHAK genes in stamen at 1 day before flowering compared with all the other tissues. OsHAK genes were also found to be differentially upregulated or downregulated in rice seedlings subjected to treatments with three hormones. These results would be very useful for elucidating the roles of these genes in growth, development, and stress response of the rice plant.
Collapse
|
90
|
Won KJ, Sandelin A, Marstrand TT, Krogh A. Modeling promoter grammars with evolving hidden Markov models. ACTA ACUST UNITED AC 2008; 24:1669-75. [PMID: 18535083 DOI: 10.1093/bioinformatics/btn254] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Describing and modeling biological features of eukaryotic promoters remains an important and challenging problem within computational biology. The promoters of higher eukaryotes in particular display a wide variation in regulatory features, which are difficult to model. Often several factors are involved in the regulation of a set of co-regulated genes. If so, promoters can be modeled with connected regulatory features, where the network of connections is characteristic for a particular mode of regulation. RESULTS With the goal of automatically deciphering such regulatory structures, we present a method that iteratively evolves an ensemble of regulatory grammars using a hidden Markov Model (HMM) architecture composed of interconnected blocks representing transcription factor binding sites (TFBSs) and background regions of promoter sequences. The ensemble approach reduces the risk of overfitting and generally improves performance. We apply this method to identify TFBSs and to classify promoters preferentially expressed in macrophages, where it outperforms other methods due to the increased predictive power given by the grammar. AVAILABILITY The software and the datasets are available from http://modem.ucsd.edu/won/eHMM.tar.gz
Collapse
Affiliation(s)
- Kyoung-Jae Won
- The Bioinformatics Centre, Department of Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark
| | | | | | | |
Collapse
|
91
|
Hill MM, Broman KW, Stupka E, Smith WC, Jiang D, Sidow A. The C. savignyi genetic map and its integration with the reference sequence facilitates insights into chordate genome evolution. Genome Res 2008; 18:1369-79. [PMID: 18519652 DOI: 10.1101/gr.078576.108] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The urochordate Ciona savignyi is an emerging model organism for the study of chordate evolution, development, and gene regulation. The extreme level of polymorphism in its population has inspired novel approaches in genome assembly, which we here continue to develop. Specifically, we present the reconstruction of all of C. savignyi's chromosomes via the development of a comprehensive genetic map, without a physical map intermediate. The resulting genetic map is complete, having one linkage group for each one of the 14 chromosomes. Eighty-three percent of the reference genome sequence is covered. The chromosomal reconstruction allowed us to investigate the evolution of genome structure in highly polymorphic species, by comparing the genome of C. savignyi to its divergent sister species, Ciona intestinalis. Both genomes have been extensively reshaped by intrachromosomal rearrangements. Interchromosomal changes have been extremely rare. This is in striking contrast to what has been observed in vertebrates, where interchromosomal events are commonplace. These results, when considered in light of the neutral theory, suggest fundamentally different modes of evolution of animal species with large versus small population sizes.
Collapse
Affiliation(s)
- Matthew M Hill
- Department of Pathology, SUMC, Stanford, CA 94305-5324, USA
| | | | | | | | | | | |
Collapse
|
92
|
Cooper GM, Brown CD. Qualifying the relationship between sequence conservation and molecular function. Genome Res 2008; 18:201-5. [PMID: 18245453 DOI: 10.1101/gr.7205808] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Quantification of evolutionary constraints via sequence conservation can be leveraged to annotate genomic functional sequences. Recent efforts addressing the converse of this relationship have identified many sites in metazoan genomes with molecular function but without detectable conservation between related species. Here, we discuss explanations and implications for these results considering both practical and theoretical issues. In particular, phylogenetic scope influences the relationship between sequence conservation and function. Comparisons of distantly related species can detect constraint with high specificity due to the loss of conserved neutral sequence, but sensitivity is sacrificed as a result of functional changes related to lineage-specific biology. The strength of natural selection operating on functional sequence is also important. Mutations to functional sequences that result in small fitness effects are subject to weaker constraints. Therefore, particularly when comparing highly divergent species, functional sequences that are degenerate or biologically redundant will be prone to turnover, wherein functional sequences are replaced by effectively equivalent, but nonorthologous counterparts. Finally, considering the size and complexity of metazoan genomes and the fact that many nonconserved sequences are associated with sequence-degenerate, low-level molecular functions, we find it likely that there exist many biochemically functional sequences that are not under constraint. This hypothesis does not lead to the conclusion that huge amounts of vertebrate genomes are functionally important, but rather that such "functionality" represents molecular noise that has weak or no effect on organismal phenotypes.
Collapse
Affiliation(s)
- Gregory M Cooper
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
| | | |
Collapse
|
93
|
Sardet C, Swalla BJ, Satoh N, Sasakura Y, Branno M, Thompson EM, Levine M, Nishida H. Euro chordates: Ascidian community swims ahead. The 4th International Tunicate meeting in Villefranche sur Mer. Dev Dyn 2008; 237:1207-13. [DOI: 10.1002/dvdy.21487] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
|
94
|
|