1
|
López Y, Vandenbon A, Nose A, Nakai K. Modeling the cis-regulatory modules of genes expressed in developmental stages of Drosophila melanogaster. PeerJ 2017; 5:e3389. [PMID: 28584716 PMCID: PMC5452948 DOI: 10.7717/peerj.3389] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 05/08/2017] [Indexed: 12/30/2022] Open
Abstract
Because transcription is the first step in the regulation of gene expression, understanding how transcription factors bind to their DNA binding motifs has become absolutely necessary. It has been shown that the promoters of genes with similar expression profiles share common structural patterns. This paper presents an extensive study of the regulatory regions of genes expressed in 24 developmental stages of Drosophila melanogaster. It proposes the use of a combination of structural features, such as positioning of individual motifs relative to the transcription start site, orientation, pairwise distance between motifs, and presence of motifs anywhere in the promoter for predicting gene expression from structural features of promoter sequences. RNA-sequencing data was utilized to create and validate the 24 models. When genes with high-scoring promoters were compared to those identified by RNA-seq samples, 19 (79.2%) statistically significant models, a number that exceeds previous studies, were obtained. Each model yielded a set of highly informative features, which were used to search for genes with similar biological functions.
Collapse
Affiliation(s)
- Yosvany López
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.,Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Alexis Vandenbon
- Immunology Frontier Research Center, Osaka University, Osaka, Japan
| | - Akinao Nose
- Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Kenta Nakai
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
2
|
Gotea V, Ovcharenko I. DiRE: identifying distant regulatory elements of co-expressed genes. Nucleic Acids Res 2008; 36:W133-9. [PMID: 18487623 PMCID: PMC2447744 DOI: 10.1093/nar/gkn300] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2007] [Revised: 04/23/2008] [Accepted: 04/29/2008] [Indexed: 11/13/2022] Open
Abstract
Regulation of gene expression in eukaryotic genomes is established through a complex cooperative activity of proximal promoters and distant regulatory elements (REs) such as enhancers, repressors and silencers. We have developed a web server named DiRE, based on the Enhancer Identification (EI) method, for predicting distant regulatory elements in higher eukaryotic genomes, namely for determining their chromosomal location and functional characteristics. The server uses gene co-expression data, comparative genomics and profiles of transcription factor binding sites (TFBSs) to determine TFBS-association signatures that can be used for discriminating specific regulatory functions. DiRE's unique feature is its ability to detect REs outside of proximal promoter regions, as it takes advantage of the full gene locus to conduct the search. DiRE can predict common REs for any set of input genes for which the user has prior knowledge of co-expression, co-function or other biologically meaningful grouping. The server predicts function-specific REs consisting of clusters of specifically-associated TFBSs and it also scores the association of individual transcription factors (TFs) with the biological function shared by the group of input genes. Its integration with the Array2BIO server allows users to start their analysis with raw microarray expression data. The DiRE web server is freely available at http://dire.dcode.org.
Collapse
Affiliation(s)
| | - Ivan Ovcharenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894
| |
Collapse
|
3
|
Vandenbon A, Miyamoto Y, Takimoto N, Kusakabe T, Nakai K. Markov chain-based promoter structure modeling for tissue-specific expression pattern prediction. DNA Res 2008; 15:3-11. [PMID: 18258700 PMCID: PMC2650632 DOI: 10.1093/dnares/dsm034] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Transcriptional regulation is the first level of regulation of gene expression and is therefore a major topic in computational biology. Genes with similar expression patterns can be assumed to be co-regulated at the transcriptional level by promoter sequences with a similar structure. Current approaches for modeling shared regulatory features tend to focus mainly on clustering of cis-regulatory sites. Here we introduce a Markov chain-based promoter structure model that uses both shared motifs and shared features from an input set of promoter sequences to predict candidate genes with similar expression. The model uses positional preference, order, and orientation of motifs. The trained model is used to score a genomic set of promoter sequences: high-scoring promoters are assumed to have a structure similar to the input sequences and are thus expected to drive similar expression patterns. We applied our model on two datasets in Caenorhabditis elegans and in Ciona intestinalis. Both computational and experimental verifications indicate that this model is capable of predicting candidate promoters driving similar expression patterns as the input-regulatory sequences. This model can be useful for finding promising candidate genes for wet-lab experiments and for increasing our understanding of transcriptional regulation.
Collapse
Affiliation(s)
- Alexis Vandenbon
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | | | | | | | | |
Collapse
|
4
|
Yuan Y, Guo L, Shen L, Liu JS. Predicting gene expression from sequence: a reexamination. PLoS Comput Biol 2008; 3:e243. [PMID: 18052544 PMCID: PMC2098866 DOI: 10.1371/journal.pcbi.0030243] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2007] [Accepted: 10/19/2007] [Indexed: 11/21/2022] Open
Abstract
Although much of the information regarding genes' expressions is encoded in the genome, deciphering such information has been very challenging. We reexamined Beer and Tavazoie's (BT) approach to predict mRNA expression patterns of 2,587 genes in Saccharomyces cerevisiae from the information in their respective promoter sequences. Instead of fitting complex Bayesian network models, we trained naïve Bayes classifiers using only the sequence-motif matching scores provided by BT. Our simple models correctly predict expression patterns for 79% of the genes, based on the same criterion and the same cross-validation (CV) procedure as BT, which compares favorably to the 73% accuracy of BT. The fact that our approach did not use position and orientation information of the predicted binding sites but achieved a higher prediction accuracy, motivated us to investigate a few biological predictions made by BT. We found that some of their predictions, especially those related to motif orientations and positions, are at best circumstantial. For example, the combinatorial rules suggested by BT for the PAC and RRPE motifs are not unique to the cluster of genes from which the predictive model was inferred, and there are simpler rules that are statistically more significant than BT's ones. We also show that CV procedure used by BT to estimate their method's prediction accuracy is inappropriate and may have overestimated the prediction accuracy by about 10%. Through binding to certain sequence-specific sites upstream of the target genes, a special class of proteins called transcription factors (TFs) control transcription activities, i.e., expression amounts, of the downstream genes. The DNA sequence patterns bound by TFs are called motifs. It has been shown in an article by Beer and Tavazoie (BT) published in Cell in 2004 that a gene's expression pattern can be well-predicted based only on its upstream sequence information in the form of matching scores of a set of sequence motifs and the location and orientation of corresponding predicted binding sites. Here we report a new naïve Bayes method for such a prediction task. Compared to BT's work, our model is simpler, more robust, and achieves a higher prediction accuracy using only the motif matching score. In our method, the location and orientation information do not further help the prediction in a global way. Our result also casts doubt on several biological hypotheses generated by BT based on their model. Finally, we show that the cross-validation procedure used by BT to estimate their method's prediction accuracy is inappropriate and may have overestimated the accuracy by about 10%.
Collapse
Affiliation(s)
- Yuan Yuan
- Department of Statistics, Harvard University, Cambridge, Massachusetts, United States of America
| | | | | | | |
Collapse
|
5
|
Liu GB, Jiang YF, Yan H, Zhao KN. Computational analysis of base composition pattern and promoter elements in the putative promoter regions in relation to expression profiles of 682 human genes on chromosome 22. ACTA ACUST UNITED AC 2007; 17:270-81. [PMID: 17312946 DOI: 10.1080/10425170600886136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Abstract The base composition pattern (BCP) in the putative promoter region (PPRs) up to 5 Kb lengths of 682 human genes on Chromosome 22 (Chr22) was examined. Two-dimensional (2D) and three-dimensional (3D) functions were designed to delineate the DNA base composition, with four major patterns identified. It is found that 17.6% genes include TATA box, 28.0% GC box, 18.9% CAAT box and 38.4% CpG islands, and approximately 10% genes have one of four putative initiator (Inr) motifs. The occurrence of the promoter elements is tightly associated with the base composition features in the promoter regions, and the associations of the base composition features with occurrence of the promoter elements in the promoter regions mediate tissue-wide expression of the genes in human. The occurrence of two or more promoter elements in the promoter regions is required for the medium- and wide-range expression profiles of the human genes on Chr22. Thus, the reported data shed light on the characteristics of the PPRs of the human genes on Chr22, which may improve our understanding of regulatory roles of the PPRs with occurrence of the promoter elements in gene expression.
Collapse
Affiliation(s)
- Guang Bin Liu
- Department of Biological and Physical Sciences, Faculty of Science, Centre for Systems Biology, The University of Southern Queensland, Toowoomba, Qld 4350, Australia.
| | | | | | | |
Collapse
|
6
|
Pati A, Vasquez-Robinet C, Heath LS, Grene R, Murali TM. XcisClique: analysis of regulatory bicliques. BMC Bioinformatics 2006; 7:218. [PMID: 16630346 PMCID: PMC1513260 DOI: 10.1186/1471-2105-7-218] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2005] [Accepted: 04/21/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Modeling of cis-elements or regulatory motifs in promoter (upstream) regions of genes is a challenging computational problem. In this work, set of regulatory motifs simultaneously present in the promoters of a set of genes is modeled as a biclique in a suitably defined bipartite graph. A biologically meaningful co-occurrence of multiple cis-elements in a gene promoter is assessed by the combined analysis of genomic and gene expression data. Greater statistical significance is associated with a set of genes that shares a common set of regulatory motifs, while simultaneously exhibiting highly correlated gene expression under given experimental conditions. METHODS XcisClique, the system developed in this work, is a comprehensive infrastructure that associates annotated genome and gene expression data, models known cis-elements as regular expressions, identifies maximal bicliques in a bipartite gene-motif graph; and ranks bicliques based on their computed statistical significance. Significance is a function of the probability of occurrence of those motifs in a biclique (a hypergeometric distribution), and on the new sum of absolute values statistic (SAV) that uses Spearman correlations of gene expression vectors. SAV is a statistic well-suited for this purpose as described in the discussion. RESULTS XcisClique identifies new motif and gene combinations that might indicate as yet unidentified involvement of sets of genes in biological functions and processes. It currently supports Arabidopsis thaliana and can be adapted to other organisms, assuming the existence of annotated genomic sequences, suitable gene expression data, and identified regulatory motifs. A subset of Xcis Clique functionalities, including the motif visualization component MotifSee, source code, and supplementary material are available at https://bioinformatics.cs.vt.edu/xcisclique/.
Collapse
Affiliation(s)
- Amrita Pati
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Cecilia Vasquez-Robinet
- Department of Plant Pathology, Physiology, and Weed Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Lenwood S Heath
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - Ruth Grene
- Department of Plant Pathology, Physiology, and Weed Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | - TM Murali
- Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| |
Collapse
|
7
|
Abstract
Among more than 120 genes that are now known to regulate mammalian pigmentation, one of the key genes is MC1R, which encodes the melanocortin 1 receptor, a seven transmembrane G protein-coupled receptor expressed on the surface of melanocytes. Since the monoexonic sequence of the gene was cloned and characterized more than a decade ago, tremendous efforts have been dedicated to the extensive genotyping of mostly red-haired populations all around the world, thus providing allelic variants that may or may not account for melanoma susceptibility in the presence or absence of ultraviolet (UV) exposure. Soluble factors, such as proopiomelanocortin (POMC) derivatives, agouti signal protein (ASP) and others, regulate MC1R expression, leading to improved photoprotection via increased eumelanin synthesis or in contrast, inducing the switch to pheomelanin. However, there is an obvious lack of knowledge regarding the numerous and complex regulatory mechanisms that govern the expression of MC1R at the intra-cellular level, from gene transcription in response to an external stimulus to the expression of the mature receptor on the melanocyte surface.
Collapse
Affiliation(s)
- Francois Rouzaud
- Laboratory of Cell Biology, National Cancer Institute, National Institutes of Health, Building 37, Room 2132, Bethesda, MD 20892, USA
| | | |
Collapse
|
8
|
Zhu Z, Shendure J, Church GM. Discovering functional transcription-factor combinations in the human cell cycle. Genome Res 2005; 15:848-55. [PMID: 15930495 PMCID: PMC1142475 DOI: 10.1101/gr.3394405] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
With the completion of full genome sequences and advancement in high-throughput technologies, in silico methods have been successfully used to integrate diverse data sources toward unraveling the combinatorial nature of transcriptional regulation. So far, almost all of these studies are restricted to lower eukaryotes such as budding yeast. We describe here a computational search for functional transcription-factor (TF) combinations using phylogenetically conserved sequences and microarray-based expression data. Taking into account both orientational and positional constraints, we investigated the overrepresentation of binding sites in the vicinity of one another and whether these combinations result in more coherent expression profiles. Without any prior biological knowledge, the search led to the discovery of several experimentally established TF associations, as well as some novel ones. In particular, we identified a regulatory module controlling cell cycle-dependent transcription of G2-M genes and expanded its functional generality. We also detected many homotypic combinations, supporting the importance of binding-site density in transcriptional regulation of higher eukaryotes.
Collapse
Affiliation(s)
- Zhou Zhu
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
| | | | | |
Collapse
|
9
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2004. [PMCID: PMC2447475 DOI: 10.1002/cfg.357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|