101
|
Genome-wide prediction and functional validation of promoter motifs regulating gene expression in spore and infection stages of Phytophthora infestans. PLoS Pathog 2013; 9:e1003182. [PMID: 23516354 PMCID: PMC3597505 DOI: 10.1371/journal.ppat.1003182] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2012] [Accepted: 12/20/2012] [Indexed: 01/18/2023] Open
Abstract
Most eukaryotic pathogens have complex life cycles in which gene expression networks orchestrate the formation of cells specialized for dissemination or host colonization. In the oomycete Phytophthora infestans, the potato late blight pathogen, major shifts in mRNA profiles during developmental transitions were identified using microarrays. We used those data with search algorithms to discover about 100 motifs that are over-represented in promoters of genes up-regulated in hyphae, sporangia, sporangia undergoing zoosporogenesis, swimming zoospores, or germinated cysts forming appressoria (infection structures). Most of the putative stage-specific transcription factor binding sites (TFBSs) thus identified had features typical of TFBSs such as position or orientation bias, palindromy, and conservation in related species. Each of six motifs tested in P. infestans transformants using the GUS reporter gene conferred the expected stage-specific expression pattern, and several were shown to bind nuclear proteins in gel-shift assays. Motifs linked to the appressoria-forming stage, including a functionally validated TFBS, were over-represented in promoters of genes encoding effectors and other pathogenesis-related proteins. To understand how promoter and genome architecture influence expression, we also mapped transcription patterns to the P. infestans genome assembly. Adjacent genes were not typically induced in the same stage, including genes transcribed in opposite directions from small intergenic regions, but co-regulated gene pairs occurred more than expected by random chance. These data help illuminate the processes regulating development and pathogenesis, and will enable future attempts to purify the cognate transcription factors. The genus Phytophthora includes over one hundred species of plant pathogens that have devastating effects worldwide in agriculture and natural environments. Its most notorious member is P. infestans, which causes the late blight diseases of potato and tomato. Their success as pathogens is dependent on the formation of specialized cells for plant-to-plant transmission and host infection, but little is known about how this is regulated. Recognizing that changes in gene expression drive the formation of these cell types, we used a computational approach to predict the sequences of about one hundred transcription factor binding sites associated with expression in either of five life stages, including several types of spores and infection structures. We then used a functional testing strategy to prove their biological activity by showing that the DNA motifs enabled the stage-specific expression of a transgene. Our work lays the groundwork for dissecting the molecular mechanisms that regulate life-stage transitions and pathogenesis in Phytophthora. A similar approach should be useful for other plant and animal pathogens.
Collapse
|
102
|
Yun T, Yi GS. Biclustering for the comprehensive search of correlated gene expression patterns using clustered seed expansion. BMC Genomics 2013; 14:144. [PMID: 23496895 PMCID: PMC3618306 DOI: 10.1186/1471-2164-14-144] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2012] [Accepted: 02/21/2013] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND In a functional analysis of gene expression data, biclustering method can give crucial information by showing correlated gene expression patterns under a subset of conditions. However, conventional biclustering algorithms still have some limitations to show comprehensive and stable outputs. RESULTS We propose a novel biclustering approach called "BIclustering by Correlated and Large number of Individual Clustered seeds (BICLIC)" to find comprehensive sets of correlated expression patterns in biclusters using clustered seeds and their expansion with correlation of gene expression. BICLIC outperformed competing biclustering algorithms by completely recovering implanted biclusters in simulated datasets with various types of correlated patterns: shifting, scaling, and shifting-scaling. Furthermore, in a real yeast microarray dataset and a lung cancer microarray dataset, BICLIC found more comprehensive sets of biclusters that are significantly enriched to more diverse sets of biological terms than those of other competing biclustering algorithms. CONCLUSIONS BICLIC provides significant benefits in finding comprehensive sets of correlated patterns and their functional implications from a gene expression dataset.
Collapse
Affiliation(s)
- Taegyun Yun
- Department of Information and Communications Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea
| | - Gwan-Su Yi
- Department of Information and Communications Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea
- Department of Bio and Brain Engineering, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon, 305-701, Republic of Korea
| |
Collapse
|
103
|
Hariharan R, Simon R, Pillai MR, Taylor TD. Comparative analysis of DNA word abundances in four yeast genomes using a novel statistical background model. PLoS One 2013; 8:e58038. [PMID: 23472131 PMCID: PMC3589456 DOI: 10.1371/journal.pone.0058038] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 01/29/2013] [Indexed: 11/18/2022] Open
Abstract
Previous studies have shown that the identification and analysis of both abundant and rare k-mers or “DNA words of length k” in genomic sequences using suitable statistical background models can reveal biologically significant sequence elements. Other studies have investigated the uni/multimodal distribution of k-mer abundances or “k-mer spectra” in different DNA sequences. However, the existing background models are affected to varying extents by compositional bias. Moreover, the distribution of k-mer abundances in the context of related genomes has not been studied previously. Here, we present a novel statistical background model for calculating k-mer enrichment in DNA sequences based on the average of the frequencies of the two (k-1) mers for each k-mer. Comparison of our null model with the commonly used ones, including Markov models of different orders and the single mismatch model, shows that our method is more robust to compositional AT-rich bias and detects many additional, repeat-poor over-abundant k-mers that are biologically meaningful. Analysis of overrepresented genomic k-mers (4≤k≤16) from four yeast species using this model showed that the fraction of overrepresented DNA words falls linearly as k increases; however, a significant number of overabundant k-mers exists at higher values of k. Finally, comparative analysis of k-mer abundance scores across four yeast species revealed a mixture of unimodal and multimodal spectra for the various genomic sub-regions analyzed.
Collapse
Affiliation(s)
- Ramkumar Hariharan
- Cancer Research Program, Rajiv Gandhi Center for Biotechnology, Thiruvananthapuram, Kerala, India
| | | | | | | |
Collapse
|
104
|
Vernot B, Stergachis AB, Maurano MT, Vierstra J, Neph S, Thurman RE, Stamatoyannopoulos JA, Akey JM. Personal and population genomics of human regulatory variation. Genome Res 2013; 22:1689-97. [PMID: 22955981 PMCID: PMC3431486 DOI: 10.1101/gr.134890.111] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The characteristics and evolutionary forces acting on regulatory variation in humans remains elusive because of the difficulty in defining functionally important noncoding DNA. Here, we combine genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (DHSs) from 138 cell and tissue types with whole-genome sequences of 53 geographically diverse individuals in order to better delimit the patterns of regulatory variation in humans. We estimate that individuals likely harbor many more functionally important variants in regulatory DNA compared with protein-coding regions, although they are likely to have, on average, smaller effect sizes. Moreover, we demonstrate that there is significant heterogeneity in the level of functional constraint in regulatory DNA among different cell types. We also find marked variability in functional constraint among transcription factor motifs in regulatory DNA, with sequence motifs for major developmental regulators, such as HOX proteins, exhibiting levels of constraint comparable to protein-coding regions. Finally, we perform a genome-wide scan of recent positive selection and identify hundreds of novel substrates of adaptive regulatory evolution that are enriched for biologically interesting pathways such as melanogenesis and adipocytokine signaling. These data and results provide new insights into patterns of regulatory variation in individuals and populations and demonstrate that a large proportion of functionally important variation lies beyond the exome.
Collapse
Affiliation(s)
- Benjamin Vernot
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | |
Collapse
|
105
|
Vandenbon A, Kumagai Y, Teraguchi S, Amada KM, Akira S, Standley DM. A Parzen window-based approach for the detection of locally enriched transcription factor binding sites. BMC Bioinformatics 2013; 14:26. [PMID: 23331723 PMCID: PMC3602658 DOI: 10.1186/1471-2105-14-26] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 01/14/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of cis- and trans-acting factors regulating gene expression remains an important problem in biology. Bioinformatics analyses of regulatory regions are hampered by several difficulties. One is that binding sites for regulatory proteins are often not significantly over-represented in the set of DNA sequences of interest, because of high levels of false positive predictions, and because of positional restrictions on functional binding sites with regard to the transcription start site. RESULTS We have developed a novel method for the detection of regulatory motifs based on their local over-representation in sets of regulatory regions. The method makes use of a Parzen window-based approach for scoring local enrichment, and during evaluation of significance it takes into account GC content of sequences. We show that the accuracy of our method compares favourably to that of other methods, and that our method is capable of detecting not only generally over-represented regulatory motifs, but also locally over-represented motifs that are often missed by standard motif detection approaches. Using a number of examples we illustrate the validity of our approach and suggest applications, such as the analysis of weaker binding sites. CONCLUSIONS Our approach can be used to suggest testable hypotheses for wet-lab experiments. It has potential for future analyses, such as the prediction of weaker binding sites. An online application of our approach, called LocaMo Finder (Local Motif Finder), is available at http://sysimm.ifrec.osaka-u.ac.jp/tfbs/locamo/.
Collapse
Affiliation(s)
- Alexis Vandenbon
- Laboratory of Systems Immunology, Immunology Frontier Research Center, Osaka University, Osaka, Japan.
| | | | | | | | | | | |
Collapse
|
106
|
Klepper K, Drabløs F. MotifLab: a tools and data integration workbench for motif discovery and regulatory sequence analysis. BMC Bioinformatics 2013; 14:9. [PMID: 23323883 PMCID: PMC3556059 DOI: 10.1186/1471-2105-14-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2012] [Accepted: 01/10/2013] [Indexed: 12/19/2022] Open
Abstract
Background Traditional methods for computational motif discovery often suffer from poor performance. In particular, methods that search for sequence matches to known binding motifs tend to predict many non-functional binding sites because they fail to take into consideration the biological state of the cell. In recent years, genome-wide studies have generated a lot of data that has the potential to improve our ability to identify functional motifs and binding sites, such as information about chromatin accessibility and epigenetic states in different cell types. However, it is not always trivial to make use of this data in combination with existing motif discovery tools, especially for researchers who are not skilled in bioinformatics programming. Results Here we present MotifLab, a general workbench for analysing regulatory sequence regions and discovering transcription factor binding sites and cis-regulatory modules. MotifLab supports comprehensive motif discovery and analysis by allowing users to integrate several popular motif discovery tools as well as different kinds of additional information, including phylogenetic conservation, epigenetic marks, DNase hypersensitive sites, ChIP-Seq data, positional binding preferences of transcription factors, transcription factor interactions and gene expression. MotifLab offers several data-processing operations that can be used to create, manipulate and analyse data objects, and complete analysis workflows can be constructed and automatically executed within MotifLab, including graphical presentation of the results. Conclusions We have developed MotifLab as a flexible workbench for motif analysis in a genomic context. The flexibility and effectiveness of this workbench has been demonstrated on selected test cases, in particular two previously published benchmark data sets for single motifs and modules, and a realistic example of genes responding to treatment with forskolin. MotifLab is freely available at http://www.motiflab.org.
Collapse
Affiliation(s)
- Kjetil Klepper
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway.
| | | |
Collapse
|
107
|
Xu B, Schones DE, Wang Y, Liang H, Li G. A structural-based strategy for recognition of transcription factor binding sites. PLoS One 2013; 8:e52460. [PMID: 23320072 PMCID: PMC3540023 DOI: 10.1371/journal.pone.0052460] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2012] [Accepted: 11/19/2012] [Indexed: 12/30/2022] Open
Abstract
Scanning through genomes for potential transcription factor binding sites (TFBSs) is becoming increasingly important in this post-genomic era. The position weight matrix (PWM) is the standard representation of TFBSs utilized when scanning through sequences for potential binding sites. However, many transcription factor (TF) motifs are short and highly degenerate, and methods utilizing PWMs to scan for sites are plagued by false positives. Furthermore, many important TFs do not have well-characterized PWMs, making identification of potential binding sites even more difficult. One approach to the identification of sites for these TFs has been to use the 3D structure of the TF to predict the DNA structure around the TF and then to generate a PWM from the predicted 3D complex structure. However, this approach is dependent on the similarity of the predicted structure to the native structure. We introduce here a novel approach to identify TFBSs utilizing structure information that can be applied to TFs without characterized PWMs, as long as a 3D complex structure (TF/DNA) exists. This approach utilizes an energy function that is uniquely trained on each structure. Our approach leads to increased prediction accuracy and robustness compared with those using a more general energy function. The software is freely available upon request.
Collapse
Affiliation(s)
- Beisi Xu
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, The Chinese Academy of Sciences, Dalian, Liaoning, China
- Department of Microbiology, Immunology and Biochemistry, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
- Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Dustin E. Schones
- Department of Cancer Biology, Beckman Research Institute, City of Hope, Duarte, California, United States of America
| | - Yongmei Wang
- Department of Chemistry, University of Memphis, Memphis, Tennessee, United States of America
| | - Haojun Liang
- Department of Polymer Science and Engineering, University of Science and Technology of China, Hefei, Anhui, China
| | - Guohui Li
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, The Chinese Academy of Sciences, Dalian, Liaoning, China
- * E-mail:
| |
Collapse
|
108
|
Lajoie M, Gascuel O, Lefort V, Bréhélin L. Computational discovery of regulatory elements in a continuous expression space. Genome Biol 2012. [PMID: 23186104 PMCID: PMC4053739 DOI: 10.1186/gb-2012-13-11-r109] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Approaches for regulatory element discovery from gene expression data usually rely on clustering algorithms to partition the data into clusters of co-expressed genes. Gene regulatory sequences are then mined to find overrepresented motifs in each cluster. However, this ad hoc partition rarely fits the biological reality. We propose a novel method called RED2 that avoids data clustering by estimating motif densities locally around each gene. We show that RED2 detects numerous motifs not detected by clustering-based approaches, and that most of these correspond to characterized motifs. RED2 can be accessed online through a user-friendly interface.
Collapse
|
109
|
Katara P, Grover A, Sharma V. Phylogenetic footprinting: a boost for microbial regulatory genomics. PROTOPLASMA 2012; 249:901-907. [PMID: 22113593 DOI: 10.1007/s00709-011-0351-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2011] [Accepted: 11/09/2011] [Indexed: 05/31/2023]
Abstract
Phylogenetic footprinting is a method for the discovery of regulatory elements in a set of homologous regulatory regions, usually collected from multiple species. It does so by identifying the best conserved motifs in those homologous regions. There are two popular sets of methods-alignment-based and motif-based, which are generally employed for phylogenetic methods. However, serious efforts have lacked to develop a tool exclusively for phylogenetic footprinting, based on either of these methods. Nevertheless, a number of software and tools exist that can be applied for prediction of phylogenetic footprinting with variable degree of success. The output from these tools may get affected by a number of factors associated with current state of knowledge, techniques and other resources available. We here present a critical apprehension of various phylogenetic approaches with reference to prokaryotes outlining the available resources and also discussing various factors affecting footprinting in order to make a clear idea about the proper use of this approach on prokaryotes.
Collapse
Affiliation(s)
- Pramod Katara
- Department of Bioscience and Biotechnology, Banasthali University, Banasthali, 304022, India.
| | | | | |
Collapse
|
110
|
Wang Y, Ding J, Daniell H, Hu H, Li X. Motif analysis unveils the possible co-regulation of chloroplast genes and nuclear genes encoding chloroplast proteins. PLANT MOLECULAR BIOLOGY 2012; 80:177-87. [PMID: 22733202 DOI: 10.1007/s11103-012-9938-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 06/15/2012] [Indexed: 06/01/2023]
Abstract
Chloroplasts play critical roles in land plant cells. Despite their importance and the availability of at least 200 sequenced chloroplast genomes, the number of known DNA regulatory sequences in chloroplast genomes are limited. In this paper, we designed computational methods to systematically study putative DNA regulatory sequences in intergenic regions near chloroplast genes in seven plant species and in promoter sequences of nuclear genes in Arabidopsis and rice. We found that -35/-10 elements alone cannot explain the transcriptional regulation of chloroplast genes. We also concluded that there are unlikely motifs shared by intergenic sequences of most of chloroplast genes, indicating that these genes are regulated differently. Finally and surprisingly, we found five conserved motifs, each of which occurs in no more than six chloroplast intergenic sequences, are significantly shared by promoters of nuclear-genes encoding chloroplast proteins. By integrating information from gene function annotation, protein subcellular localization analyses, protein-protein interaction data, and gene expression data, we further showed support of the functionality of these conserved motifs. Our study implies the existence of unknown nuclear-encoded transcription factors that regulate both chloroplast genes and nuclear genes encoding chloroplast protein, which sheds light on the understanding of the transcriptional regulation of chloroplast genes.
Collapse
Affiliation(s)
- Ying Wang
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | | | | | | | | |
Collapse
|
111
|
A Bayesian Scoring Scheme based Particle Swarm Optimization algorithm to identify transcription factor binding sites. Appl Soft Comput 2012. [DOI: 10.1016/j.asoc.2012.04.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
112
|
Glucose, nitrogen, and phosphate repletion in Saccharomyces cerevisiae: common transcriptional responses to different nutrient signals. G3-GENES GENOMES GENETICS 2012; 2:1003-17. [PMID: 22973537 PMCID: PMC3429914 DOI: 10.1534/g3.112.002808] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 06/20/2012] [Indexed: 01/01/2023]
Abstract
Saccharomyces cerevisiae are able to control growth in response to changes in nutrient availability. The limitation for single macronutrients, including nitrogen (N) and phosphate (P), produces stable arrest in G1/G0. Restoration of the limiting nutrient quickly restores growth. It has been shown that glucose (G) depletion/repletion very rapidly alters the levels of more than 2000 transcripts by at least 2-fold, a large portion of which are involved with either protein production in growth or stress responses in starvation. Although the signals generated by G, N, and P are thought to be quite distinct, we tested the hypothesis that depletion and repletion of any of these three nutrients would affect a common core set of genes as part of a generalized response to conditions that promote growth and quiescence. We found that the response to depletion of G, N, or P produced similar quiescent states with largely similar transcriptomes. As we predicted, repletion of each of the nutrients G, N, or P induced a large (501) common core set of genes and repressed a large (616) common gene set. Each nutrient also produced nutrient-specific transcript changes. The transcriptional responses to each of the three nutrients depended on cAMP and, to a lesser extent, the TOR pathway. All three nutrients stimulated cAMP production within minutes of repletion, and artificially increasing cAMP levels was sufficient to replicate much of the core transcriptional response. The recently identified transceptors Gap1, Mep1, Mep2, and Mep3, as well as Pho84, all played some role in the core transcriptional responses to N or P. As expected, we found some evidence of cross talk between nutrient signals, yet each nutrient sends distinct signals.
Collapse
|
113
|
Ma S, Bachan S, Porto M, Bohnert HJ, Snyder M, Dinesh-Kumar SP. Discovery of stress responsive DNA regulatory motifs in Arabidopsis. PLoS One 2012; 7:e43198. [PMID: 22912824 PMCID: PMC3418279 DOI: 10.1371/journal.pone.0043198] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 07/17/2012] [Indexed: 11/25/2022] Open
Abstract
The discovery of DNA regulatory motifs in the sequenced genomes using computational methods remains challenging. Here, we present MotifIndexer - a comprehensive strategy for de novo identification of DNA regulatory motifs at a genome level. Using word-counting methods, we indexed the existence of every 8-mer oligo composed of bases A, C, G, T, r, y, s, w, m, k, n or 12-mer oligo composed of A, C, G, T, n, in the promoters of all predicted genes of Arabidopsis thaliana genome and of selected stress-induced co-expressed genes. From this analysis, we identified number of over-represented motifs. Among these, major critical motifs were identified using a position filter. We used a model based on uniform distribution and the z-scores derived from this model to describe position bias. Interestingly, many motifs showed position bias towards the transcription start site. We extended this model to show biased distribution of motifs in the genomes of both A. thaliana and rice. We also used MotifIndexer to identify conserved motifs in co-expressed gene groups from two Arabidopsis species, A. thaliana and A. lyrata. This new comparative genomics method does not depend on alignments of homologous gene promoter sequences.
Collapse
Affiliation(s)
- Shisong Ma
- Department of Plant Biology and the Genome Center, College of Biological Sciences, University of California Davis, Davis, California, United States of America
- * E-mail: (SPD-K); (SM)
| | - Shawn Bachan
- Department of Plant Biology and the Genome Center, College of Biological Sciences, University of California Davis, Davis, California, United States of America
| | - Matthew Porto
- Department of Plant Biology and the Genome Center, College of Biological Sciences, University of California Davis, Davis, California, United States of America
| | - Hans J. Bohnert
- Departements of Plant Biology and Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Michael Snyder
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Savithramma P. Dinesh-Kumar
- Department of Plant Biology and the Genome Center, College of Biological Sciences, University of California Davis, Davis, California, United States of America
- * E-mail: (SPD-K); (SM)
| |
Collapse
|
114
|
Wang S, Yin Y, Ma Q, Tang X, Hao D, Xu Y. Genome-scale identification of cell-wall related genes in Arabidopsis based on co-expression network analysis. BMC PLANT BIOLOGY 2012; 12:138. [PMID: 22877077 PMCID: PMC3463447 DOI: 10.1186/1471-2229-12-138] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 07/30/2012] [Indexed: 05/21/2023]
Abstract
BACKGROUND Identification of the novel genes relevant to plant cell-wall (PCW) synthesis represents a highly important and challenging problem. Although substantial efforts have been invested into studying this problem, the vast majority of the PCW related genes remain unknown. RESULTS Here we present a computational study focused on identification of the novel PCW genes in Arabidopsis based on the co-expression analyses of transcriptomic data collected under 351 conditions, using a bi-clustering technique. Our analysis identified 217 highly co-expressed gene clusters (modules) under some experimental conditions, each containing at least one gene annotated as PCW related according to the Purdue Cell Wall Gene Families database. These co-expression modules cover 349 known/annotated PCW genes and 2,438 new candidates. For each candidate gene, we annotated the specific PCW synthesis stages in which it is involved and predicted the detailed function. In addition, for the co-expressed genes in each module, we predicted and analyzed their cis regulatory motifs in the promoters using our motif discovery pipeline, providing strong evidence that the genes in each co-expression module are transcriptionally co-regulated. From the all co-expression modules, we infer that 108 modules are related to four major PCW synthesis components, using three complementary methods. CONCLUSIONS We believe our approach and data presented here will be useful for further identification and characterization of PCW genes. All the predicted PCW genes, co-expression modules, motifs and their annotations are available at a web-based database: http://csbl.bmb.uga.edu/publications/materials/shanwang/CWRPdb/index.html.
Collapse
Affiliation(s)
- Shan Wang
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, Athens, GA, USA
- Key Lab for Molecular Enzymology and Engineering of the Ministry of Education, Jilin University, Changchun, China
- Biotechnology Research Centre, Jilin Academy of Agricultural Sciences (JAAS), Changchun, China
| | - Yanbin Yin
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, Athens, GA, USA
- BESC BioEerngy Science Center, University of Georgia, Athens, GA, USA
| | - Qin Ma
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, Athens, GA, USA
- BESC BioEerngy Science Center, University of Georgia, Athens, GA, USA
| | - Xiaojia Tang
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, Athens, GA, USA
| | - Dongyun Hao
- Key Lab for Molecular Enzymology and Engineering of the Ministry of Education, Jilin University, Changchun, China
- Biotechnology Research Centre, Jilin Academy of Agricultural Sciences (JAAS), Changchun, China
| | - Ying Xu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, and Institute of Bioinformatics, Athens, GA, USA
- BESC BioEerngy Science Center, University of Georgia, Athens, GA, USA
- College of Computer Science and Technology, Jilin University, Changchun, China
| |
Collapse
|
115
|
Guo Y, Mahony S, Gifford DK. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Comput Biol 2012; 8:e1002638. [PMID: 22912568 PMCID: PMC3415389 DOI: 10.1371/journal.pcbi.1002638] [Citation(s) in RCA: 202] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 06/15/2012] [Indexed: 12/27/2022] Open
Abstract
An essential component of genome function is the syntax of genomic regulatory elements that determine how diverse transcription factors interact to orchestrate a program of regulatory control. A precise characterization of in vivo spacing constraints between key transcription factors would reveal key aspects of this genomic regulatory language. To discover novel transcription factor spatial binding constraints in vivo, we developed a new integrative computational method, genome wide event finding and motif discovery (GEM). GEM resolves ChIP data into explanatory motifs and binding events at high spatial resolution by linking binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence. GEM analysis of 63 transcription factors in 214 ENCODE human ChIP-Seq experiments recovers more known factor motifs than other contemporary methods, and discovers six new motifs for factors with unknown binding specificity. GEM's adaptive learning of binding-event read distributions allows it to further improve upon previous methods for processing ChIP-Seq and ChIP-exo data to yield unsurpassed spatial resolution and discovery of closely spaced binding events of the same factor. In a systematic analysis of in vivo sequence-specific transcription factor binding using GEM, we have found hundreds of spatial binding constraints between factors. GEM found 37 examples of factor binding constraints in mouse ES cells, including strong distance-specific constraints between Klf4 and other key regulatory factors. In human ENCODE data, GEM found 390 examples of spatially constrained pair-wise binding, including such novel pairs as c-Fos:c-Jun/USF1, CTCF/Egr1, and HNF4A/FOXA1. The discovery of new factor-factor spatial constraints in ChIP data is significant because it proposes testable models for regulatory factor interactions that will help elucidate genome function and the implementation of combinatorial control. The letters in our genome spell words and phrases that control when each gene is activated. To understand how these words and phrases function in health and disease, we have developed a new computational method to determine what word positions in our genomic text are used by each genome regulatory protein, and how these active words are spaced relative to one another. Our method achieves exceptional spatial accuracy by integrating experimental data with the text of our genome to find the precise words that are regulated by each protein factor. Using this analysis we have discovered novel word spacings in the experimental data that suggest novel genome grammatical control constructs.
Collapse
Affiliation(s)
- Yuchun Guo
- Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Shaun Mahony
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- * E-mail: (SM); (DKG)
| | - David K. Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- * E-mail: (SM); (DKG)
| |
Collapse
|
116
|
Mittal D, Madhyastha DA, Grover A. Genome-wide transcriptional profiles during temperature and oxidative stress reveal coordinated expression patterns and overlapping regulons in rice. PLoS One 2012; 7:e40899. [PMID: 22815860 PMCID: PMC3397947 DOI: 10.1371/journal.pone.0040899] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2012] [Accepted: 06/14/2012] [Indexed: 11/19/2022] Open
Abstract
Genome wide transcriptional changes by cold stress, heat stress and oxidative stress in rice seedlings were analyzed. Heat stress resulted in predominant changes in transcripts of heat shock protein and heat shock transcription factor genes, as well as genes associated with synthesis of scavengers of reactive oxygen species and genes that control the level of sugars, metabolites and auxins. Cold stress treatment caused differential expression of transcripts of various transcription factors including desiccation response element binding proteins and different kinases. Transcripts of genes that are part of calcium signaling, reactive oxygen scavenging and diverse metabolic reactions were differentially expressed during cold stress. Oxidative stress induced by hydrogen peroxide treatment, resulted in significant up-regulation in transcript levels of genes related to redox homeostasis and down-regulation of transporter proteins. ROS homeostasis appeared to play central role in response to temperature extremes. The key transcription factors that may underlie the concerted transcriptional changes of specific components in various signal transduction networks involved are highlighted. Co-ordinated expression pattern and promoter architectures based analysis (promoter models and overrepresented transcription factor binding sites) suggested potential regulons involved in stress responses. A considerable overlap was noted at the level of transcription as well as in regulatory modules of differentially expressed genes.
Collapse
Affiliation(s)
- Dheeraj Mittal
- Department of Plant Molecular Biology, University of Delhi South Campus, New Delhi, India
| | | | - Anil Grover
- Department of Plant Molecular Biology, University of Delhi South Campus, New Delhi, India
- * E-mail:
| |
Collapse
|
117
|
Abstract
Transcription factors and the short, often degenerate DNA sequences they recognize are central regulators of gene expression, but their regulatory code is challenging to dissect experimentally. Thus, computational approaches have long been used to identify putative regulatory elements from the patterns in promoter sequences. Here we present a new algorithm “POWRS” (POsition-sensitive WoRd Set) for identifying regulatory sequence motifs, specifically developed to address two common shortcomings of existing algorithms. First, POWRS uses the position-specific enrichment of regulatory elements near transcription start sites to significantly increase sensitivity, while providing new information about the preferred localization of those elements. Second, POWRS forgoes position weight matrices for a discrete motif representation that appears more resistant to over-generalization. We apply this algorithm to discover sequences related to constitutive, high-level gene expression in the model plant Arabidopsis thaliana, and then experimentally validate the importance of those elements by systematically mutating two endogenous promoters and measuring the effect on gene expression levels. This provides a foundation for future efforts to rationally engineer gene expression in plants, a problem of great importance in developing biotech crop varieties. Availability: BSD-licensed Python code at http://grassrootsbio.com/papers/powrs/.
Collapse
|
118
|
Tan M, Yu D, Jin Y, Dou L, Li B, Wang Y, Yue J, Liang L. An information transmission model for transcription factor binding at regulatory DNA sites. Theor Biol Med Model 2012; 9:19. [PMID: 22672438 PMCID: PMC3442977 DOI: 10.1186/1742-4682-9-19] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2012] [Accepted: 05/17/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Computational identification of transcription factor binding sites (TFBSs) is a rapid, cost-efficient way to locate unknown regulatory elements. With increased potential for high-throughput genome sequencing, the availability of accurate computational methods for TFBS prediction has never been as important as it currently is. To date, identifying TFBSs with high sensitivity and specificity is still an open challenge, necessitating the development of novel models for predicting transcription factor-binding regulatory DNA elements. RESULTS Based on the information theory, we propose a model for transcription factor binding of regulatory DNA sites. Our model incorporates position interdependencies in effective ways. The model computes the information transferred (TI) between the transcription factor and the TFBS during the binding process and uses TI as the criterion to determine whether the sequence motif is a possible TFBS. Based on this model, we developed a computational method to identify TFBSs. By theoretically proving and testing our model using both real and artificial data, we found that our model provides highly accurate predictive results. CONCLUSIONS In this study, we present a novel model for transcription factor binding regulatory DNA sites. The model can provide an increased ability to detect TFBSs.
Collapse
Affiliation(s)
- Mingfeng Tan
- Beijing Institute of Biotechnology, Beijing 100071, China
| | | | | | | | | | | | | | | |
Collapse
|
119
|
Zhang L, Yu S, Zuo K, Luo L, Tang K. Identification of gene modules associated with drought response in rice by network-based analysis. PLoS One 2012; 7:e33748. [PMID: 22662107 PMCID: PMC3360736 DOI: 10.1371/journal.pone.0033748] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2011] [Accepted: 02/17/2012] [Indexed: 12/11/2022] Open
Abstract
Understanding the molecular mechanisms that underlie plant responses to drought stress is challenging due to the complex interplay of numerous different genes. Here, we used network-based gene clustering to uncover the relationships between drought-responsive genes from large microarray datasets. We identified 2,607 rice genes that showed significant changes in gene expression under drought stress; 1,392 genes were highly intercorrelated to form 15 gene modules. These drought-responsive gene modules are biologically plausible, with enrichments for genes in common functional categories, stress response changes, tissue-specific expression and transcription factor binding sites. We observed that a gene module (referred to as module 4) consisting of 134 genes was significantly associated with drought response in both drought-tolerant and drought-sensitive rice varieties. This module is enriched for genes involved in controlling the response of the plant to water and embryonic development, including a heat shock transcription factor as the key regulator in the expression of ABRE-containing genes. These results suggest that module 4 is highly conserved in the ABA-mediated drought response pathway in different rice varieties. Moreover, our study showed that many hub genes clustered in rice chromosomes had significant associations with QTLs for drought stress tolerance. The relationship between hub gene clusters and drought tolerance QTLs may provide a key to understand the genetic basis of drought tolerance in rice.
Collapse
Affiliation(s)
- Lida Zhang
- Plant Biotechnology Research Center, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Shunwu Yu
- Shanghai Agrobiological Gene Center, Shanghai Academy of Agricultural Sciences, Shanghai, China
| | - Kaijing Zuo
- Plant Biotechnology Research Center, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
| | - Lijun Luo
- Shanghai Agrobiological Gene Center, Shanghai Academy of Agricultural Sciences, Shanghai, China
| | - Kexuan Tang
- Plant Biotechnology Research Center, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai, China
- * E-mail:
| |
Collapse
|
120
|
Quantitative modeling of transcriptional regulatory networks by integrating multiple source of knowledge. Bioprocess Biosyst Eng 2012; 35:1555-65. [PMID: 22614332 DOI: 10.1007/s00449-012-0746-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2011] [Accepted: 04/30/2012] [Indexed: 01/26/2023]
Abstract
A key challenge in the post genome era is to identify genome-wide transcriptional regulatory networks, which specify the interactions between transcription factors and their target genes. In this work, a regulatory model-based binding energy is proposed to quantify the transcriptional regulatory network. Multiple quantities, including binding affinity, regulatory efficiency, and the activity level of transcription factor (TF) are incorporated into a general learning model. The sequence features of the promoter are exploited to derive the binding energy. Comparing with the previous models that only employ microarray data, the proposed model can bridge the gap between the relative background frequency of the observed nucleotide and the gene's transcription rate. Experimental results show that the proposed model can effectively identify the parameters and the activity level of TF. Moreover, the kinetic parameters introduced in the proposed model can reveal more biological sense than some previous models can do.
Collapse
|
121
|
Zambelli F, Pesole G, Pavesi G. Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. Brief Bioinform 2012; 14:225-37. [PMID: 22517426 PMCID: PMC3603212 DOI: 10.1093/bib/bbs016] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Motif discovery has been one of the most widely studied problems in bioinformatics ever since genomic and protein sequences have been available. In particular, its application to the de novo prediction of putative over-represented transcription factor binding sites in nucleotide sequences has been, and still is, one of the most challenging flavors of the problem. Recently, novel experimental techniques like chromatin immunoprecipitation (ChIP) have been introduced, permitting the genome-wide identification of protein-DNA interactions. ChIP, applied to transcription factors and coupled with genome tiling arrays (ChIP on Chip) or next-generation sequencing technologies (ChIP-Seq) has opened new avenues in research, as well as posed new challenges to bioinformaticians developing algorithms and methods for motif discovery.
Collapse
|
122
|
Hao H, Kim DS, Klocke B, Johnson KR, Cui K, Gotoh N, Zang C, Gregorski J, Gieser L, Peng W, Fann Y, Seifert M, Zhao K, Swaroop A. Transcriptional regulation of rod photoreceptor homeostasis revealed by in vivo NRL targetome analysis. PLoS Genet 2012; 8:e1002649. [PMID: 22511886 PMCID: PMC3325202 DOI: 10.1371/journal.pgen.1002649] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Accepted: 02/23/2012] [Indexed: 11/18/2022] Open
Abstract
A stringent control of homeostasis is critical for functional maintenance and survival of neurons. In the mammalian retina, the basic motif leucine zipper transcription factor NRL determines rod versus cone photoreceptor cell fate and activates the expression of many rod-specific genes. Here, we report an integrated analysis of NRL-centered gene regulatory network by coupling chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-Seq) data from Illumina and ABI platforms with global expression profiling and in vivo knockdown studies. We identified approximately 300 direct NRL target genes. Of these, 22 NRL targets are associated with human retinal dystrophies, whereas 95 mapped to regions of as yet uncloned retinal disease loci. In silico analysis of NRL ChIP-Seq peak sequences revealed an enrichment of distinct sets of transcription factor binding sites. Specifically, we discovered that genes involved in photoreceptor function include binding sites for both NRL and homeodomain protein CRX. Evaluation of 26 ChIP-Seq regions validated their enhancer functions in reporter assays. In vivo knockdown of 16 NRL target genes resulted in death or abnormal morphology of rod photoreceptors, suggesting their importance in maintaining retinal function. We also identified histone demethylase Kdm5b as a novel secondary node in NRL transcriptional hierarchy. Exon array analysis of flow-sorted photoreceptors in which Kdm5b was knocked down by shRNA indicated its role in regulating rod-expressed genes. Our studies identify candidate genes for retinal dystrophies, define cis-regulatory module(s) for photoreceptor-expressed genes and provide a framework for decoding transcriptional regulatory networks that dictate rod homeostasis.
Collapse
Affiliation(s)
- Hong Hao
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Douglas S. Kim
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | | | - Kory R. Johnson
- Information Technology and Bioinformatics Program, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Kairong Cui
- Laboratory of Molecular Immunology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Norimoto Gotoh
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Chongzhi Zang
- Department of Physics, The George Washington University, Washington, D.C., United States of America
| | - Janina Gregorski
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Linn Gieser
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Weiqun Peng
- Department of Physics, The George Washington University, Washington, D.C., United States of America
| | - Yang Fann
- Information Technology and Bioinformatics Program, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, United States of America
| | | | - Keji Zhao
- Laboratory of Molecular Immunology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Anand Swaroop
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
123
|
McGuire AM, Weiner B, Park ST, Wapinski I, Raman S, Dolganov G, Peterson M, Riley R, Zucker J, Abeel T, White J, Sisk P, Stolte C, Koehrsen M, Yamamoto RT, Iacobelli-Martinez M, Kidd MJ, Maer AM, Schoolnik GK, Regev A, Galagan J. Comparative analysis of Mycobacterium and related Actinomycetes yields insight into the evolution of Mycobacterium tuberculosis pathogenesis. BMC Genomics 2012; 13:120. [PMID: 22452820 PMCID: PMC3388012 DOI: 10.1186/1471-2164-13-120] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 03/28/2012] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND The sequence of the pathogen Mycobacterium tuberculosis (Mtb) strain H37Rv has been available for over a decade, but the biology of the pathogen remains poorly understood. Genome sequences from other Mtb strains and closely related bacteria present an opportunity to apply the power of comparative genomics to understand the evolution of Mtb pathogenesis. We conducted a comparative analysis using 31 genomes from the Tuberculosis Database (TBDB.org), including 8 strains of Mtb and M. bovis, 11 additional Mycobacteria, 4 Corynebacteria, 2 Streptomyces, Rhodococcus jostii RHA1, Nocardia farcinia, Acidothermus cellulolyticus, Rhodobacter sphaeroides, Propionibacterium acnes, and Bifidobacterium longum. RESULTS Our results highlight the functional importance of lipid metabolism and its regulation, and reveal variation between the evolutionary profiles of genes implicated in saturated and unsaturated fatty acid metabolism. It also suggests that DNA repair and molybdopterin cofactors are important in pathogenic Mycobacteria. By analyzing sequence conservation and gene expression data, we identify nearly 400 conserved noncoding regions. These include 37 predicted promoter regulatory motifs, of which 14 correspond to previously validated motifs, as well as 50 potential noncoding RNAs, of which we experimentally confirm the expression of four. CONCLUSIONS Our analysis of protein evolution highlights gene families that are associated with the adaptation of environmental Mycobacteria to obligate pathogenesis. These families include fatty acid metabolism, DNA repair, and molybdopterin biosynthesis. Our analysis reinforces recent findings suggesting that small noncoding RNAs are more common in Mycobacteria than previously expected. Our data provide a foundation for understanding the genome and biology of Mtb in a comparative context, and are available online and through TBDB.org.
Collapse
|
124
|
Clustering of DNA words and biological function: A proof of principle. J Theor Biol 2012; 297:127-36. [DOI: 10.1016/j.jtbi.2011.12.024] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Revised: 12/20/2011] [Accepted: 12/21/2011] [Indexed: 02/08/2023]
|
125
|
Orzechowski Westholm J, Tronnersjö S, Nordberg N, Olsson I, Komorowski J, Ronne H. Gis1 and Rph1 regulate glycerol and acetate metabolism in glucose depleted yeast cells. PLoS One 2012; 7:e31577. [PMID: 22363679 PMCID: PMC3283669 DOI: 10.1371/journal.pone.0031577] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2011] [Accepted: 01/09/2012] [Indexed: 01/10/2023] Open
Abstract
Aging in organisms as diverse as yeast, nematodes, and mammals is delayed by caloric restriction, an effect mediated by the nutrient sensing TOR, RAS/cAMP, and AKT/Sch9 pathways. The transcription factor Gis1 functions downstream of these pathways in extending the lifespan of nutrient restricted yeast cells, but the mechanisms involved are still poorly understood. We have used gene expression microarrays to study the targets of Gis1 and the related protein Rph1 in different growth phases. Our results show that Gis1 and Rph1 act both as repressors and activators, on overlapping sets of genes as well as on distinct targets. Interestingly, both the activities and the target specificities of Gis1 and Rph1 depend on the growth phase. Thus, both proteins are associated with repression during exponential growth, targeting genes with STRE or PDS motifs in their promoters. After the diauxic shift, both become involved in activation, with Gis1 acting primarily on genes with PDS motifs, and Rph1 on genes with STRE motifs. Significantly, Gis1 and Rph1 control a number of genes involved in acetate and glycerol formation, metabolites that have been implicated in aging. Furthermore, several genes involved in acetyl-CoA metabolism are downregulated by Gis1.
Collapse
Affiliation(s)
- Jakub Orzechowski Westholm
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden
| | - Susanna Tronnersjö
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Plant Biology and Forest Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Niklas Nordberg
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Department of Microbiology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Ida Olsson
- Department of Microbiology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Jan Komorowski
- Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden
- Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Warsaw, Poland
- Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Hans Ronne
- Department of Microbiology, Swedish University of Agricultural Sciences, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
126
|
Aittokallio T, Kurki M, Nevalainen O, Nikula T, West A, Lahesmaa R. Computational Strategies for Analyzing Data in Gene Expression Microarray Experiments. J Bioinform Comput Biol 2012; 1:541-86. [PMID: 15290769 DOI: 10.1142/s0219720003000319] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2003] [Revised: 07/02/2003] [Indexed: 11/18/2022]
Abstract
Microarray analysis has become a widely used method for generating gene expression data on a genomic scale. Microarrays have been enthusiastically applied in many fields of biological research, even though several open questions remain about the analysis of such data. A wide range of approaches are available for computational analysis, but no general consensus exists as to standard for microarray data analysis protocol. Consequently, the choice of data analysis technique is a crucial element depending both on the data and on the goals of the experiment. Therefore, basic understanding of bioinformatics is required for optimal experimental design and meaningful interpretation of the results. This review summarizes some of the common themes in DNA microarray data analysis, including data normalization and detection of differential expression. Algorithms are demonstrated by analyzing cDNA microarray data from an experiment monitoring gene expression in T helper cells. Several computational biology strategies, along with their relative merits, are overviewed and potential areas for additional research discussed. The goal of the review is to provide a computational framework for applying and evaluating such bioinformatics strategies. Solid knowledge of microarray informatics contributes to the implementation of more efficient computational protocols for the given data obtained through microarray experiments.
Collapse
Affiliation(s)
- Tero Aittokallio
- Department of Computational Biology, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-Shi, Chiba 277-8562, Japan.
| | | | | | | | | | | |
Collapse
|
127
|
Abstract
The ability to chronicle transcription-factor binding events throughout the development of an organism would facilitate mapping of transcriptional networks that control cell-fate decisions. We describe a method for permanently recording protein-DNA interactions in mammalian cells. We endow transcription factors with the ability to deposit a transposon into the genome near to where they bind. The transposon becomes a "calling card" that the transcription factor leaves behind to record its visit to the genome. The locations of the calling cards can be determined by massively parallel DNA sequencing. We show that the transcription factor SP1 fused to the piggyBac transposase directs insertion of the piggyBac transposon near SP1 binding sites. The locations of transposon insertions are highly reproducible and agree with sites of SP1-binding determined by ChIP-seq. Genes bound by SP1 are more likely to be expressed in the HCT116 cell line we used, and SP1-bound CpG islands show a strong preference to be unmethylated. This method has the potential to trace transcription-factor binding throughout cellular and organismal development in a way that has heretofore not been possible.
Collapse
|
128
|
Finding Transcription Factor Binding Motifs for Coregulated Genes by Combining Sequence Overrepresentation with Cross-Species Conservation. JOURNAL OF PROBABILITY AND STATISTICS 2012. [DOI: 10.1155/2012/830575] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Novel computational methods for finding transcription factor binding motifs have long been sought due to tedious work of experimentally identifying them. However, the current prevailing methods yield a large number of false positive predictions due to the short, variable nature of transcriptional factor binding sites (TFBSs). We proposed here a method that combines sequence overrepresentation and cross-species sequence conservation to detect TFBSs in upstream regions of a given set of coregulated genes. We applied the method to 35S. cerevisiaetranscriptional factors with known DNA binding motifs (with the support of orthologous sequences from genomes ofS. mikatae,S. bayanus, andS. paradoxus), and the proposed method outperformed the single-genome-based motif finding methodsMEMEandAlignACEas well as the multiple-genome-based methodsPHYMEandFootprinterfor the majority of these transcriptional factors. Compared with the prevailing motif finding software, our method has some advantages in finding transcriptional factor binding motifs for potential coregulated genes if the gene upstream sequences of multiple closely related species are available. Although we used yeast genomes to assess our method in this study, it might also be applied to other organisms if suitable related species are available and the upstream sequences of coregulated genes can be obtained for the multiple closely related species.
Collapse
|
129
|
Characterization of complex regulatory networks and identification of promoter regulatory elements in yeast: "in silico" and "wet-lab" approaches. Methods Mol Biol 2012; 809:27-48. [PMID: 22113266 DOI: 10.1007/978-1-61779-376-9_2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Transcription is the first step in the flow of biological information from genome to proteome and its tight regulation is a crucial checkpoint in most biological processes occurring in all living organisms. In eukaryotes, one of the most important mechanisms of transcriptional regulation relies on the activity of transcription factors which, upon binding to specific nucleotide motifs (consensus) present in the promoter region of target genes, modulate the activity of RNA polymerase II activating and/or repressing gene transcription. The identification of binding sites for these transcription factors is crucial to the understanding of transcriptional regulation at the molecular level and to the prediction of putative target genes for each transcription factor. However, transcription regulation cannot simply be reduced to transcription factor-gene associations. Frequently, the transcript level of a given gene is determined by a multitude of activators and/or repressors resulting in intertwined and complex regulatory networks. Two case studies dedicated to the study of transcriptional regulation in the experimental model Saccharomyces cerevisiae are presented in this chapter. The computational tools available in YEASTRACT information system are explored in both studies, to identify the regulatory elements that serve as functional DNA-binding sites for a transcription factor (Rim101p), and to characterize the regulatory network underlying the transcriptional regulation of a given yeast gene (FLR1). A set of easily accessible experimental approaches that can be used to confirm the predictions of the bioinformatic analysis is also detailed.
Collapse
|
130
|
Technau M, Knispel M, Roth S. Molecular mechanisms of EGF signaling-dependent regulation of pipe, a gene crucial for dorsoventral axis formation in Drosophila. Dev Genes Evol 2011; 222:1-17. [PMID: 22198544 PMCID: PMC3291829 DOI: 10.1007/s00427-011-0384-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 11/29/2011] [Indexed: 01/28/2023]
Abstract
During Drosophila oogenesis the expression of the sulfotransferase Pipe in ventral follicle cells is crucial for dorsoventral axis formation. Pipe modifies proteins that are incorporated in the ventral eggshell and activate Toll signaling which in turn initiates embryonic dorsoventral patterning. Ventral pipe expression is the result of an oocyte-derived EGF signal which down-regulates pipe in dorsal follicle cells. The analysis of mutant follicle cell clones reveals that none of the transcription factors known to act downstream of EGF signaling in Drosophila is required or sufficient for pipe regulation. However, the pipe cis-regulatory region harbors a 31-bp element which is essential for pipe repression, and ovarian extracts contain a protein that binds this element. Thus, EGF signaling does not act by down-regulating an activator of pipe as previously suggested but rather by activating a repressor. Surprisingly, this repressor acts independent of the common co-repressors Groucho or CtBP.
Collapse
Affiliation(s)
- Martin Technau
- Institute for Developmental Biology, Biocenter, University of Cologne, Zuelpicher Straße 47b, 50674, Cologne, Germany
| | | | | |
Collapse
|
131
|
Gordân R, Murphy KF, McCord RP, Zhu C, Vedenko A, Bulyk ML. Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights. Genome Biol 2011; 12:R125. [PMID: 22189060 PMCID: PMC3334620 DOI: 10.1186/gb-2011-12-12-r125] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2011] [Revised: 12/09/2011] [Accepted: 12/21/2011] [Indexed: 11/24/2022] Open
Abstract
Background Transcription factors (TFs) play a central role in regulating gene expression by interacting with cis-regulatory DNA elements associated with their target genes. Recent surveys have examined the DNA binding specificities of most Saccharomyces cerevisiae TFs, but a comprehensive evaluation of their data has been lacking. Results We analyzed in vitro and in vivo TF-DNA binding data reported in previous large-scale studies to generate a comprehensive, curated resource of DNA binding specificity data for all characterized S. cerevisiae TFs. Our collection comprises DNA binding site motifs and comprehensive in vitro DNA binding specificity data for all possible 8-bp sequences. Investigation of the DNA binding specificities within the basic leucine zipper (bZIP) and VHT1 regulator (VHR) TF families revealed unexpected plasticity in TF-DNA recognition: intriguingly, the VHR TFs, newly characterized by protein binding microarrays in this study, recognize bZIP-like DNA motifs, while the bZIP TF Hac1 recognizes a motif highly similar to the canonical E-box motif of basic helix-loop-helix (bHLH) TFs. We identified several TFs with distinct primary and secondary motifs, which might be associated with different regulatory functions. Finally, integrated analysis of in vivo TF binding data with protein binding microarray data lends further support for indirect DNA binding in vivo by sequence-specific TFs. Conclusions The comprehensive data in this curated collection allow for more accurate analyses of regulatory TF-DNA interactions, in-depth structural studies of TF-DNA specificity determinants, and future experimental investigations of the TFs' predicted target genes and regulatory roles.
Collapse
Affiliation(s)
- Raluca Gordân
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | | | |
Collapse
|
132
|
Harris EY, Ponts N, Le Roch KG, Lonardi S. Chromatin-driven de novo discovery of DNA binding motifs in the human malaria parasite. BMC Genomics 2011; 12:601. [PMID: 22165844 PMCID: PMC3282892 DOI: 10.1186/1471-2164-12-601] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2011] [Accepted: 12/13/2011] [Indexed: 11/10/2022] Open
Abstract
Background Despite extensive efforts to discover transcription factors and their binding sites in the human malaria parasite Plasmodium falciparum, only a few transcription factor binding motifs have been experimentally validated to date. As a consequence, gene regulation in P. falciparum is still poorly understood. There is now evidence that the chromatin architecture plays an important role in transcriptional control in malaria. Results We propose a methodology for discovering cis-regulatory elements that uses for the first time exclusively dynamic chromatin remodeling data. Our method employs nucleosome positioning data collected at seven time points during the erythrocytic cycle of P. falciparum to discover putative DNA binding motifs and their transcription factor binding sites along with their associated clusters of target genes. Our approach results in 129 putative binding motifs within the promoter region of known genes. About 75% of those are novel, the remaining being highly similar to experimentally validated binding motifs. About half of the binding motifs reported show statistically significant enrichment in functional gene sets and strong positional bias in the promoter region. Conclusion Experimental results establish the principle that dynamic chromatin remodeling data can be used in lieu of gene expression data to discover binding motifs and their transcription factor binding sites. Our approach can be applied using only dynamic nucleosome positioning data, independent from any knowledge of gene function or expression.
Collapse
Affiliation(s)
- Elena Y Harris
- Department of Cell Biology and Neuroscience, University of California, Riverside, CA 92521, USA
| | | | | | | |
Collapse
|
133
|
Sequence-based classification using discriminatory motif feature selection. PLoS One 2011; 6:e27382. [PMID: 22102890 PMCID: PMC3213122 DOI: 10.1371/journal.pone.0027382] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Accepted: 10/16/2011] [Indexed: 11/19/2022] Open
Abstract
Most existing methods for sequence-based classification use exhaustive feature generation, employing, for example, all -mer patterns. The motivation behind such (enumerative) approaches is to minimize the potential for overlooking important features. However, there are shortcomings to this strategy. First, practical constraints limit the scope of exhaustive feature generation to patterns of length , such that potentially important, longer () predictors are not considered. Second, features so generated exhibit strong dependencies, which can complicate understanding of derived classification rules. Third, and most importantly, numerous irrelevant features are created. These concerns can compromise prediction and interpretation. While remedies have been proposed, they tend to be problem-specific and not broadly applicable. Here, we develop a generally applicable methodology, and an attendant software pipeline, that is predicated on discriminatory motif finding. In addition to the traditional training and validation partitions, our framework entails a third level of data partitioning, a discovery partition. A discriminatory motif finder is used on sequences and associated class labels in the discovery partition to yield a (small) set of features. These features are then used as inputs to a classifier in the training partition. Finally, performance assessment occurs on the validation partition. Important attributes of our approach are its modularity (any discriminatory motif finder and any classifier can be deployed) and its universality (all data, including sequences that are unaligned and/or of unequal length, can be accommodated). We illustrate our approach on two nucleosome occupancy datasets and a protein solubility dataset, previously analyzed using enumerative feature generation. Our method achieves excellent performance results, with and without optimization of classifier tuning parameters. A Python pipeline implementing the approach is available at http://www.epibiostat.ucsf.edu/biostat/sen/dmfs/.
Collapse
|
134
|
Mullenbrock S, Shah J, Cooper GM. Global expression analysis identified a preferentially nerve growth factor-induced transcriptional program regulated by sustained mitogen-activated protein kinase/extracellular signal-regulated kinase (ERK) and AP-1 protein activation during PC12 cell differentiation. J Biol Chem 2011; 286:45131-45. [PMID: 22065583 DOI: 10.1074/jbc.m111.274076] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Neuronal differentiation of PC12 cells in response to NGF is a prototypical model in which signal duration determines a biological response. Sustained ERK activity induced by NGF, as compared with transient activity induced by EGF, is critical to the differentiation of these cells. To characterize the transcriptional program activated preferentially by NGF, we compared global gene expression profiles between cells treated with NGF and EGF for 2-4 h, when sustained ERK signaling in response to NGF is most distinct from the transient signal elicited by EGF. This analysis identified 69 genes that were preferentially up-regulated in response to NGF. As expected, up-regulation of these genes was mediated by sustained ERK signaling. In addition, they were up-regulated in response to other neuritogenic treatments (pituitary adenylate cyclase-activating polypeptide and 12-O-tetradecanoylphorbol-13-acetate plus dbcAMP) and were enriched for genes related to neuronal differentiation/function. Computational analysis and chromatin immunoprecipitation identified binding of CREB and AP-1 family members (Fos, FosB, Fra1, JunB, JunD) upstream of >30 and 50%, respectively, of the preferentially NGF-induced genes. Expression of several AP-1 family members was induced by both EGF and NGF, but their induction was more robust and sustained in response to NGF. The binding of Fos family members to their target genes was similarly sustained in response to NGF and was reduced upon MEK inhibition, suggesting that AP-1 contributes significantly to the NGF transcriptional program. Interestingly, Fra1 as well as two other NGF-induced AP-1 targets (HB-EGF and miR-21) function in positive feedback loops that may contribute to sustained AP-1 activity.
Collapse
Affiliation(s)
- Steven Mullenbrock
- Department of Biology, Boston University, Boston, Massachusetts 02215, USA
| | | | | |
Collapse
|
135
|
Guan Y, Yao V, Tsui K, Gebbia M, Dunham MJ, Nislow C, Troyanskaya OG. Nucleosome-coupled expression differences in closely-related species. BMC Genomics 2011; 12:466. [PMID: 21942931 PMCID: PMC3209474 DOI: 10.1186/1471-2164-12-466] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Accepted: 09/26/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome-wide nucleosome occupancy is negatively related to the average level of transcription factor motif binding based on studies in yeast and several other model organisms. The degree to which nucleosome-motif interactions relate to phenotypic changes across species is, however, unknown. RESULTS We address this challenge by generating nucleosome positioning and cell cycle expression data for Saccharomyces bayanus and show that differences in nucleosome occupancy reflect cell cycle expression divergence between two yeast species, S. bayanus and S. cerevisiae. Specifically, genes with nucleosome-depleted MBP1 motifs upstream of their coding sequence show periodic expression during the cell cycle, whereas genes with nucleosome-shielded motifs do not. In addition, conserved cell cycle regulatory motifs across these two species are more nucleosome-depleted compared to those that are not conserved, suggesting that the degree of conservation of regulatory sites varies, and is reflected by nucleosome occupancy patterns. Finally, many changes in cell cycle gene expression patterns across species can be correlated to changes in nucleosome occupancy on motifs (rather than to the presence or absence of motifs). CONCLUSIONS Our observations suggest that alteration of nucleosome occupancy is a previously uncharacterized feature related to the divergence of cell cycle expression between species.
Collapse
Affiliation(s)
- Yuanfang Guan
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | | | | | | | | | | | | |
Collapse
|
136
|
Jajamovich GH, Wang X, Arkin AP, Samoilov MS. Bayesian multiple-instance motif discovery with BAMBI: inference of recombinase and transcription factor binding sites. Nucleic Acids Res 2011; 39:e146. [PMID: 21948794 PMCID: PMC3241671 DOI: 10.1093/nar/gkr745] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Finding conserved motifs in genomic sequences represents one of essential bioinformatic problems. However, achieving high discovery performance without imposing substantial auxiliary constraints on possible motif features remains a key algorithmic challenge. This work describes BAMBI-a sequential Monte Carlo motif-identification algorithm, which is based on a position weight matrix model that does not require additional constraints and is able to estimate such motif properties as length, logo, number of instances and their locations solely on the basis of primary nucleotide sequence data. Furthermore, should biologically meaningful information about motif attributes be available, BAMBI takes advantage of this knowledge to further refine the discovery results. In practical applications, we show that the proposed approach can be used to find sites of such diverse DNA-binding molecules as the cAMP receptor protein (CRP) and Din-family site-specific serine recombinases. Results obtained by BAMBI in these and other settings demonstrate better statistical performance than any of the four widely-used profile-based motif discovery methods: MEME, BioProspector with BioOptimizer, SeSiMCMC and Motif Sampler as measured by the nucleotide-level correlation coefficient. Additionally, in the case of Din-family recombinase target site discovery, the BAMBI-inferred motif is found to be the only one functionally accurate from the underlying biochemical mechanism standpoint. C++ and Matlab code is available at http://www.ee.columbia.edu/~guido/BAMBI or http://genomics.lbl.gov/BAMBI/.
Collapse
Affiliation(s)
- Guido H Jajamovich
- Electrical Engineering Department, Columbia University, New York, NY 10027, USA
| | | | | | | |
Collapse
|
137
|
Shi J, Yang W, Chen M, Du Y, Zhang J, Wang K. AMD, an automated motif discovery tool using stepwise refinement of gapped consensuses. PLoS One 2011; 6:e24576. [PMID: 21931761 PMCID: PMC3171486 DOI: 10.1371/journal.pone.0024576] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2011] [Accepted: 08/14/2011] [Indexed: 11/21/2022] Open
Abstract
Motif discovery is essential for deciphering regulatory codes from high throughput genomic data, such as those from ChIP-chip/seq experiments. However, there remains a lack of effective and efficient methods for the identification of long and gapped motifs in many relevant tools reported to date. We describe here an automated tool that allows for de novo discovery of transcription factor binding sites, regardless of whether the motifs are long or short, gapped or contiguous.
Collapse
Affiliation(s)
- Jiantao Shi
- Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Graduate School of the Chinese Academy of Sciences, Shanghai, China
| | - Wentao Yang
- Shanghai Institute of Hematology and Sino-French Center for Life Science and Genomics, Rui-Jin Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Mingjie Chen
- Shanghai Institute of Hematology and Sino-French Center for Life Science and Genomics, Rui-Jin Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yanzhi Du
- Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ji Zhang
- Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Shanghai Institute of Hematology and Sino-French Center for Life Science and Genomics, Rui-Jin Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
- * E-mail:
| | - Kankan Wang
- Shanghai Institute of Hematology and Sino-French Center for Life Science and Genomics, Rui-Jin Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
138
|
Abstract
Accurately predicting regulatory sequences and enhancers in entire genomes is an important but difficult problem, especially in large vertebrate genomes. With the advent of ChIP-seq technology, experimental detection of genome-wide EP300/CREBBP bound regions provides a powerful platform to develop predictive tools for regulatory sequences and to study their sequence properties. Here, we develop a support vector machine (SVM) framework which can accurately identify EP300-bound enhancers using only genomic sequence and an unbiased set of general sequence features. Moreover, we find that the predictive sequence features identified by the SVM classifier reveal biologically relevant sequence elements enriched in the enhancers, but we also identify other features that are significantly depleted in enhancers. The predictive sequence features are evolutionarily conserved and spatially clustered, providing further support of their functional significance. Although our SVM is trained on experimental data, we also predict novel enhancers and show that these putative enhancers are significantly enriched in both ChIP-seq signal and DNase I hypersensitivity signal in the mouse brain and are located near relevant genes. Finally, we present results of comparisons between other EP300/CREBBP data sets using our SVM and uncover sequence elements enriched and/or depleted in the different classes of enhancers. Many of these sequence features play a role in specifying tissue-specific or developmental-stage-specific enhancer activity, but our results indicate that some features operate in a general or tissue-independent manner. In addition to providing a high confidence list of enhancer targets for subsequent experimental investigation, these results contribute to our understanding of the general sequence structure of vertebrate enhancers.
Collapse
|
139
|
Doyle CE, Donaldson ME, Morrison EN, Saville BJ. Ustilago maydis transcript features identified through full-length cDNA analysis. Mol Genet Genomics 2011; 286:143-59. [PMID: 21750919 DOI: 10.1007/s00438-011-0634-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2011] [Accepted: 06/28/2011] [Indexed: 12/13/2022]
Abstract
Ustilago maydis is the model for investigating basidiomycete biotrophic plant pathogens. To further the annotation of its genome, 12,943 full-length cDNA sequences were used to construct databases for the promoter and untranslated regions of U. maydis genes. A subset of clones was sequenced to determine full cDNA sequences. These and the original ESTs were assembled into contigs representing 3,058, or 45%, of the predicted U. maydis genes. The new sequencing allowed the confirmation of 2,842 gene models, 690 of which contain an intron. The use of full-length cDNA clone sequences ensured that untranslated regions were physically linked to the open reading frames (ORFs), not merely aligned upstream of the start of transcription. Identified sequence features include: (1) over 500 potential short upstream ORFs, (2) 95 gene models that require further annotation, (3) one new potential ORF, (4) varying GC content in different gene regions, (5) a WebLogo motif for the start of translation, (6) the correlation of UTR length with transcript representation in cDNA libraries and with gene function categories, (7) a relationship between natural antisense transcripts and UTR length that differs from that of Saccharomyces cerevisiae, (8) a potential relationship between DNA replication and the control of transcription, and (9) new insights regarding mechanisms for the control of transcription and mRNA maturation in U. maydis.
Collapse
Affiliation(s)
- Colleen E Doyle
- Environmental and Life Sciences Graduate Program, Trent University, Peterborough, ON K9J 7B8, Canada
| | | | | | | |
Collapse
|
140
|
Giannopoulou EG, Elemento O. An integrated ChIP-seq analysis platform with customizable workflows. BMC Bioinformatics 2011; 12:277. [PMID: 21736739 PMCID: PMC3145611 DOI: 10.1186/1471-2105-12-277] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Accepted: 07/07/2011] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq), enables unbiased and genome-wide mapping of protein-DNA interactions and epigenetic marks. The first step in ChIP-seq data analysis involves the identification of peaks (i.e., genomic locations with high density of mapped sequence reads). The next step consists of interpreting the biological meaning of the peaks through their association with known genes, pathways, regulatory elements, and integration with other experiments. Although several programs have been published for the analysis of ChIP-seq data, they often focus on the peak detection step and are usually not well suited for thorough, integrative analysis of the detected peaks. RESULTS To address the peak interpretation challenge, we have developed ChIPseeqer, an integrative, comprehensive, fast and user-friendly computational framework for in-depth analysis of ChIP-seq datasets. The novelty of our approach is the capability to combine several computational tools in order to create easily customized workflows that can be adapted to the user's needs and objectives. In this paper, we describe the main components of the ChIPseeqer framework, and also demonstrate the utility and diversity of the analyses offered, by analyzing a published ChIP-seq dataset. CONCLUSIONS ChIPseeqer facilitates ChIP-seq data analysis by offering a flexible and powerful set of computational tools that can be used in combination with one another. The framework is freely available as a user-friendly GUI application, but all programs are also executable from the command line, thus providing flexibility and automatability for advanced users.
Collapse
Affiliation(s)
- Eugenia G Giannopoulou
- HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, 1305 York Avenue, New York, NY 10021, USA
| | | |
Collapse
|
141
|
Sch9 regulates ribosome biogenesis via Stb3, Dot6 and Tod6 and the histone deacetylase complex RPD3L. EMBO J 2011; 30:3052-64. [PMID: 21730963 PMCID: PMC3160192 DOI: 10.1038/emboj.2011.221] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2011] [Accepted: 06/08/2011] [Indexed: 01/22/2023] Open
Abstract
TORC1 is a conserved multisubunit kinase complex that regulates many aspects of eukaryotic growth including the biosynthesis of ribosomes. The TOR protein kinase resident in TORC1 is responsive to environmental cues and is potently inhibited by the natural product rapamycin. Recent characterization of the rapamycin-sensitive phosphoproteome in yeast has yielded insights into how TORC1 regulates growth. Here, we show that Sch9, an AGC family kinase and direct substrate of TORC1, promotes ribosome biogenesis (Ribi) and ribosomal protein (RP) gene expression via direct inhibitory phosphorylation of the transcriptional repressors Stb3, Dot6 and Tod6. Deletion of STB3, DOT6 and TOD6 partially bypasses the growth and cell size defects of an sch9 strain and reveals interdependent regulation of both Ribi and RP gene expression, and other aspects of Ribi. Dephosphorylation of Stb3, Dot6 and Tod6 enables recruitment of the RPD3L histone deacetylase complex to repress Ribi/RP gene promoters. Taken together with previous studies, these results suggest that Sch9 is a master regulator of ribosome biogenesis through the control of Ribi, RP, ribosomal RNA and tRNA gene transcription.
Collapse
|
142
|
Ruiz-Medrano R, Xoconostle-Cázares B, Ham BK, Li G, Lucas WJ. Vascular expression in Arabidopsis is predicted by the frequency of CT/GA-rich repeats in gene promoters. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2011; 67:130-44. [PMID: 21435051 DOI: 10.1111/j.1365-313x.2011.04581.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Phloem-transported signals play an important role in regulating plant development and in orchestrating responses to environmental stimuli. Among such signals, phloem-mobile RNAs have been shown to play an important role as long-distance signaling agents. At maturity, angiosperm sieve elements are enucleate, and thus transcripts in the phloem translocation stream probably originate from the nucleate companion cells. In the present study, a pumpkin (Cucurbita maxima) phloem transcriptome was used to test for the presence of common motifs within the promoters of this unique set of genes, which may function to coordinate expression in cells of the vascular system. A bioinformatics analysis of the upstream sequences from 150 Arabidopsis genes homologous to members of the pumpkin phloem transcriptome identified degenerate sequences containing CT/GA- and GT/CA-rich motifs that were common to many of these promoters. Parallel studies performed on genes shown previously to be expressed in phloem tissues identified similar motifs. An expanded analysis, based on homologs of the pumpkin phloem transcriptome from cucumber (Cucumis sativus), identified similar sets of common motifs within the promoters of these genes. Promoter analysis offered support for the hypothesis that these motifs regulate expression within the vascular system. Our findings are discussed in terms of a role for these motifs in coordinating gene expression within the companion cell/sieve element system. These motifs could provide a useful bioinformatics tool for genome-wide screens on plants for which phloem tissues cannot readily be obtained.
Collapse
Affiliation(s)
- Roberto Ruiz-Medrano
- Department of Biotechnology and Bioengineering, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Avenida IPN 2508, Zacatenco, 07360 Mexico DF, Mexico
| | | | | | | | | |
Collapse
|
143
|
Zhang S, Li S, Niu M, Pham PT, Su Z. MotifClick: prediction of cis-regulatory binding sites via merging cliques. BMC Bioinformatics 2011; 12:238. [PMID: 21679436 PMCID: PMC3225181 DOI: 10.1186/1471-2105-12-238] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2010] [Accepted: 06/16/2011] [Indexed: 11/21/2022] Open
Abstract
Background Although dozens of algorithms and tools have been developed to find a set of cis-regulatory binding sites called a motif in a set of intergenic sequences using various approaches, most of these tools focus on identifying binding sites that are significantly different from their background sequences. However, some motifs may have a similar nucleotide distribution to that of their background sequences. Therefore, such binding sites can be missed by these tools. Results Here, we present a graph-based polynomial-time algorithm, MotifClick, for the prediction of cis-regulatory binding sites, in particular, those that have a similar nucleotide distribution to that of their background sequences. To find binding sites with length k, we construct a graph using some 2(k-1)-mers in the input sequences as the vertices, and connect two vertices by an edge if the maximum number of matches of the local gapless alignments between the two 2(k-1)-mers is greater than a cutoff value. We identify a motif as a set of similar k-mers from a merged group of maximum cliques associated with some vertices. Conclusions When evaluated on both synthetic and real datasets of prokaryotes and eukaryotes, MotifClick outperforms existing leading motif-finding tools for prediction accuracy and balancing the prediction sensitivity and specificity in general. In particular, when the distribution of nucleotides of binding sites is similar to that of their background sequences, MotifClick is more likely to identify the binding sites than the other tools.
Collapse
Affiliation(s)
- Shaoqiang Zhang
- Department of Bioinformatics and Genomics, Center for Bioinformatics Research, the University of North Carolina at Charlotte, 28223, USA
| | | | | | | | | |
Collapse
|
144
|
LEAFY target genes reveal floral regulatory logic, cis motifs, and a link to biotic stimulus response. Dev Cell 2011; 20:430-43. [PMID: 21497757 DOI: 10.1016/j.devcel.2011.03.019] [Citation(s) in RCA: 182] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Revised: 03/05/2011] [Accepted: 03/29/2011] [Indexed: 11/20/2022]
Abstract
The transition from vegetative growth to flower formation is critical for the survival of flowering plants. The plant-specific transcription factor LEAFY (LFY) has central, evolutionarily conserved roles in this process, both in the formation of the first flower and later in floral patterning. We performed genome-wide binding and expression studies to elucidate the molecular mechanisms by which LFY executes these roles. Our study reveals that LFY directs an elaborate regulatory network in control of floral homeotic gene expression. LFY also controls the expression of genes that regulate the response to external stimuli in Arabidopsis. Thus, our findings support a key role for LFY in the coordination of reproductive stage development and disease response programs in plants that may ensure optimal allocation of plant resources for reproductive fitness. Finally, motif analyses reveal a possible mechanism for stage-specific LFY recruitment and suggest a role for LFY in overcoming polycomb repression.
Collapse
|
145
|
Wang J, Wang Y, Wang Z, Liu L, Zhu XG, Ma X. Synchronization of cytoplasmic and transferred mitochondrial ribosomal protein gene expression in land plants is linked to Telo-box motif enrichment. BMC Evol Biol 2011; 11:161. [PMID: 21668973 PMCID: PMC3212954 DOI: 10.1186/1471-2148-11-161] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2010] [Accepted: 06/13/2011] [Indexed: 02/08/2023] Open
Abstract
Background Chloroplasts and mitochondria evolved from the endosymbionts of once free-living eubacteria, and they transferred most of their genes to the host nuclear genome during evolution. The mechanisms used by plants to coordinate the expression of such transferred genes, as well as other genes in the host nuclear genome, are still poorly understood. Results In this paper, we use nuclear-encoded chloroplast (cpRPGs), as well as mitochondrial (mtRPGs) and cytoplasmic (euRPGs) ribosomal protein genes to study the coordination of gene expression between organelles and the host. Results show that the mtRPGs, but not the cpRPGs, exhibit strongly synchronized expression with euRPGs in all investigated land plants and that this phenomenon is linked to the presence of a telo-box DNA motif in the promoter regions of mtRPGs and euRPGs. This motif is also enriched in the promoter regions of genes involved in DNA replication. Sequence analysis further indicates that mtRPGs, in contrast to cpRPGs, acquired telo-box from the host nuclear genome. Conclusions Based on our results, we propose a model of plant nuclear genome evolution where coordination of activities in mitochondria and chloroplast and other cellular functions, including cell cycle, might have served as a strong selection pressure for the differential acquisition of telo-box between mtRPGs and cpRPGs. This research also highlights the significance of physiological needs in shaping transcriptional regulatory evolution.
Collapse
Affiliation(s)
- Jie Wang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | | | | | | | | | | |
Collapse
|
146
|
Mira NP, Henriques SF, Keller G, Teixeira MC, Matos RG, Arraiano CM, Winge DR, Sá-Correia I. Identification of a DNA-binding site for the transcription factor Haa1, required for Saccharomyces cerevisiae response to acetic acid stress. Nucleic Acids Res 2011; 39:6896-907. [PMID: 21586585 PMCID: PMC3167633 DOI: 10.1093/nar/gkr228] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The transcription factor Haa1 is the main player in reprogramming yeast genomic expression in response to acetic acid stress. Mapping of the promoter region of one of the Haa1-activated genes, TPO3, allowed the identification of an acetic acid responsive element (ACRE) to which Haa1 binds in vivo. The in silico analysis of the promoter regions of the genes of the Haa1-regulon led to the identification of an Haa1-responsive element (HRE) 5'-GNN(G/C)(A/C)(A/G)G(A/G/C)G-3'. Using surface plasmon resonance experiments and electrophoretic mobility shift assays it is demonstrated that Haa1 interacts with high affinity (K(D) of 2 nM) with the HRE motif present in the ACRE region of TPO3 promoter. No significant interaction was found between Haa1 and HRE motifs having adenine nucleotides at positions 6 and 8 (K(D) of 396 and 6780 nM, respectively) suggesting that Haa1p does not recognize these motifs in vivo. A lower affinity of Haa1 toward HRE motifs having mutations in the guanine nucleotides at position 7 and 9 (K(D) of 21 and 119 nM, respectively) was also observed. Altogether, the results obtained indicate that the minimal functional binding site of Haa1 is 5'-(G/C)(A/C)GG(G/C)G-3'. The Haa1-dependent transcriptional regulatory network active in yeast response to acetic acid stress is proposed.
Collapse
Affiliation(s)
- Nuno P Mira
- IBB, Instituto Biotecnologia e Bioengenharia, Center for Biological and Chemical Engineering, Instituto Superior Técnico, Avenida Rovisco Pais, 1049-001 Lisbon, Portugal
| | | | | | | | | | | | | | | |
Collapse
|
147
|
Pilalis E, Chatziioannou AA, Grigoroudis AI, Panagiotidis CA, Kolisis FN, Kyriakidis DA. Escherichia coli genome-wide promoter analysis: identification of additional AtoC binding target elements. BMC Genomics 2011; 12:238. [PMID: 21569465 PMCID: PMC3118216 DOI: 10.1186/1471-2164-12-238] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2010] [Accepted: 05/13/2011] [Indexed: 11/16/2022] Open
Abstract
Background Studies on bacterial signal transduction systems have revealed complex networks of functional interactions, where the response regulators play a pivotal role. The AtoSC system of E. coli activates the expression of atoDAEB operon genes, and the subsequent catabolism of short-chain fatty acids, upon acetoacetate induction. Transcriptome and phenotypic analyses suggested that atoSC is also involved in several other cellular activities, although we have recently reported a palindromic repeat within the atoDAEB promoter as the single, cis-regulatory binding site of the AtoC response regulator. In this work, we used a computational approach to explore the presence of yet unidentified AtoC binding sites within other parts of the E. coli genome. Results Through the implementation of a computational de novo motif detection workflow, a set of candidate motifs was generated, representing putative AtoC binding targets within the E. coli genome. In order to assess the biological relevance of the motifs and to select for experimental validation of those sequences related robustly with distinct cellular functions, we implemented a novel approach that applies Gene Ontology Term Analysis to the motif hits and selected those that were qualified through this procedure. The computational results were validated using Chromatin Immunoprecipitation assays to assess the in vivo binding of AtoC to the predicted sites. This process verified twenty-two additional AtoC binding sites, located not only within intergenic regions, but also within gene-encoding sequences. Conclusions This study, by tracing a number of putative AtoC binding sites, has indicated an AtoC-related cross-regulatory function. This highlights the significance of computational genome-wide approaches in elucidating complex patterns of bacterial cell regulation.
Collapse
Affiliation(s)
- Eleftherios Pilalis
- Institute of Biological Research and Biotechnology, National Hellenic Research Foundation, Athens, Greece
| | | | | | | | | | | |
Collapse
|
148
|
Paik HJ, Ryu TW, Heo HS, Seo SW, Lee DH, Hur CG. Predicting tissue-specific expressions based on sequence characteristics. BMB Rep 2011; 44:250-5. [DOI: 10.5483/bmbrep.2011.44.4.250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
149
|
Zheng X, Liu T, Yang Z, Wang J. Large cliques in Arabidopsis gene coexpression network and motif discovery. JOURNAL OF PLANT PHYSIOLOGY 2011; 168:611-618. [PMID: 21044807 DOI: 10.1016/j.jplph.2010.09.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2010] [Revised: 08/31/2010] [Accepted: 09/06/2010] [Indexed: 05/30/2023]
Abstract
Identification of cis-regulatory elements in Arabidopsis is a key step to understanding its transcriptional regulation scheme. In this study, the Arabidopsis gene coexpression network was constructed using the ATTED-II data, and thereafter a subgraph-induced approach and clique-finding algorithm were used to extract gene coexpression groups from the gene coexpression network. A total of 23 large coexpression gene groups were obtained, with each consisting of more than 100 highly correlated genes. Four classical tools were used to predict motifs in the promoter regions of coexpressed genes. Consequently, we detected a large number of candidate biologically relevant regulatory elements, and many of them are consistent with known cis-regulatory elements from AGRIS and AthaMap. Experiments on coexpressed groups, including E2Fa target genes, showed that our method had a high probability of returning the real binding motif. Our study provides the basis for future cis-regulatory module analysis and creates a starting point to unravel regulatory networks of Arabidopsis thaliana.
Collapse
Affiliation(s)
- Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai 200234, China
| | | | | | | |
Collapse
|
150
|
Kim TM, Park PJ. Advances in analysis of transcriptional regulatory networks. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2011; 3:21-35. [PMID: 21069662 DOI: 10.1002/wsbm.105] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
A transcriptional regulatory network represents a molecular framework in which developmental or environmental cues are transformed into differential expression of genes. Transcriptional regulation is mediated by the combinatorial interplay between cis-regulatory DNA elements and trans-acting transcription factors, and is perhaps the most important mechanism for controlling gene expression. Recent innovations, most notably the method for detecting protein-DNA interactions genome-wide, can help provide a comprehensive catalog of cis-regulatory elements and their interaction with given trans-acting factors in a given condition. A transcriptional regulatory network that integrates such information can lead to a systems-level understanding of regulatory mechanisms. In this review, we will highlight the key aspects of current knowledge on eukaryotic transcriptional regulation, especially on known transcription factors and their interacting regulatory elements. Then we will review some recent technical advances for genome-wide mapping of DNA-protein interactions based on high-throughput sequencing. Finally, we will discuss the types of biological insights that can be obtained from a network-level understanding of transcription regulation as well as future challenges in the field.
Collapse
Affiliation(s)
- Tae-Min Kim
- Center for Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | | |
Collapse
|