1
|
Li Y, Jaiswal SK, Kaur R, Alsaadi D, Liang X, Drews F, DeLoia JA, Krivak T, Petrykowska HM, Gotea V, Welch L, Elnitski L. Differential gene expression identifies a transcriptional regulatory network involving ER-alpha and PITX1 in invasive epithelial ovarian cancer. BMC Cancer 2021; 21:768. [PMID: 34215221 PMCID: PMC8254236 DOI: 10.1186/s12885-021-08276-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 04/23/2021] [Indexed: 12/16/2022] Open
Abstract
Background The heterogeneous subtypes and stages of epithelial ovarian cancer (EOC) differ in their biological features, invasiveness, and response to chemotherapy, but the transcriptional regulators causing their differences remain nebulous. Methods In this study, we compared high-grade serous ovarian cancers (HGSOCs) to low malignant potential or serous borderline tumors (SBTs). Our aim was to discover new regulatory factors causing distinct biological properties of HGSOCs and SBTs. Results In a discovery dataset, we identified 11 differentially expressed genes (DEGs) between SBTs and HGSOCs. Their expression correctly classified 95% of 267 validation samples. Two of the DEGs, TMEM30B and TSPAN1, were significantly associated with worse overall survival in patients with HGSOC. We also identified 17 DEGs that distinguished stage II vs. III HGSOC. In these two DEG promoter sets, we identified significant enrichment of predicted transcription factor binding sites, including those of RARA, FOXF1, BHLHE41, and PITX1. Using published ChIP-seq data acquired from multiple non-ovarian cell types, we showed additional regulatory factors, including AP2-gamma/TFAP2C, FOXA1, and BHLHE40, bound at the majority of DEG promoters. Several of the factors are known to cooperate with and predict the presence of nuclear hormone receptor estrogen receptor alpha (ER-alpha). We experimentally confirmed ER-alpha and PITX1 presence at the DEGs by performing ChIP-seq analysis using the ovarian cancer cell line PEO4. Finally, RNA-seq analysis identified recurrent gene fusion events in our EOC tumor set. Some of these fusions were significantly associated with survival in HGSOC patients; however, the fusion genes are not regulated by the transcription factors identified for the DEGs. Conclusions These data implicate an estrogen-responsive regulatory network in the differential gene expression between ovarian cancer subtypes and stages, which includes PITX1. Importantly, the transcription factors associated with our DEG promoters are known to form the MegaTrans complex in breast cancer. This is the first study to implicate the MegaTrans complex in contributing to the distinct biological trajectories of malignant and indolent ovarian cancer subtypes. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-08276-8.
Collapse
Affiliation(s)
- Yichao Li
- School of Electrical Engineering and Computer Science, Ohio University, Athens, OH, USA
| | - Sushil K Jaiswal
- Translational Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Rupleen Kaur
- Translational Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Dana Alsaadi
- Translational Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Xiaoyu Liang
- School of Electrical Engineering and Computer Science, Ohio University, Athens, OH, USA
| | - Frank Drews
- School of Electrical Engineering and Computer Science, Ohio University, Athens, OH, USA
| | - Julie A DeLoia
- Present address: Dignity Health Global Education, Roanoke, Virginia, USA
| | - Thomas Krivak
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Pittsburgh Medical School, Pittsburgh, PA, USA.,Present address: The Western Pennsylvania Hospital, Pittsburgh, PA, USA
| | - Hanna M Petrykowska
- Translational Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Valer Gotea
- Translational Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Lonnie Welch
- School of Electrical Engineering and Computer Science, Ohio University, Athens, OH, USA
| | - Laura Elnitski
- Translational Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
2
|
Zhang MQ. A personal journey on cracking the genomic codes. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-021-0245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
3
|
Identification of Cis-Regulatory Sequences Controlling Pollen-Specific Expression of Hydroxyproline-Rich Glycoprotein Genes in Arabidopsis thaliana. PLANTS 2020; 9:plants9121751. [PMID: 33322028 PMCID: PMC7763877 DOI: 10.3390/plants9121751] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Revised: 11/26/2020] [Accepted: 12/07/2020] [Indexed: 02/06/2023]
Abstract
Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall structural proteins that function in various aspects of plant growth and development, including pollen tube growth. We have previously characterized protein sequence signatures for three family members in the HRGP superfamily: the hyperglycosylated arabinogalactan-proteins (AGPs), the moderately glycosylated extensins (EXTs), and the lightly glycosylated proline-rich proteins (PRPs). However, the mechanism of pollen-specific HRGP gene expression remains unexplored. To this end, we developed an integrative analysis pipeline combining RNA-seq gene expression and promoter sequences to identify cis-regulatory motifs responsible for pollen-specific expression of HRGP genes in Arabidopsis thaliana. Specifically, we mined the public RNA-seq datasets and identified 13 pollen-specific HRGP genes. Ensemble motif discovery identified 15 conserved promoter elements between A.thaliana and A. lyrata. Motif scanning revealed two pollen related transcription factors: GATA12 and brassinosteroid (BR) signaling pathway regulator BZR1. Finally, we performed a regression analysis and demonstrated that the 15 motifs provided a good model of HRGP gene expression in pollen (R = 0.61). In conclusion, we performed the first integrative analysis of cis-regulatory motifs in pollen-specific HRGP genes, revealing important insights into transcriptional regulation in pollen tissue.
Collapse
|
4
|
Masuda K, Renard-Guillet C, Shirahige K, Sutani T. Bioinformatical dissection of fission yeast DNA replication origins. Open Biol 2020; 10:200052. [PMID: 32692956 PMCID: PMC7574548 DOI: 10.1098/rsob.200052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Replication origins in eukaryotes form a base for assembly of the pre-replication complex (pre-RC), thereby serving as an initiation site of DNA replication. Characteristics of replication origin vary among species. In fission yeast Schizosaccharomyces pombe, DNA of high AT content is a distinct feature of replication origins; however, it remains to be understood what the general molecular architecture of fission yeast origin is. Here, we performed ChIP-seq mapping of Orc4 and Mcm2, two representative components of the pre-RC, and described the characteristics of their binding sites. The analysis revealed that fission yeast efficient origins are associated with two similar but independent features: a ≥15 bp-long motif with stretches of As and an AT-rich region of a few hundred bp. The A-rich motif was correlated with chromosomal binding of Orc, a DNA-binding component in the pre-RC, whereas the AT-rich region was associated with efficient binding of the DNA replicative helicase Mcm. These two features, in combination with the third feature, a transcription-poor region of approximately 1 kb, enabled to distinguish efficient replication origins from the rest of chromosome arms with high accuracy. This study, hence, provides a model that describes how multiple functional elements specify DNA replication origins in fission yeast genome.
Collapse
Affiliation(s)
- Koji Masuda
- Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| | - Claire Renard-Guillet
- Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| | - Katsuhiko Shirahige
- Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| | - Takashi Sutani
- Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
| |
Collapse
|
5
|
Grote A, Li Y, Liu C, Voronin D, Geber A, Lustigman S, Unnasch TR, Welch L, Ghedin E. Prediction pipeline for discovery of regulatory motifs associated with Brugia malayi molting. PLoS Negl Trop Dis 2020; 14:e0008275. [PMID: 32574217 PMCID: PMC7337397 DOI: 10.1371/journal.pntd.0008275] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Revised: 07/06/2020] [Accepted: 04/07/2020] [Indexed: 11/19/2022] Open
Abstract
Filarial nematodes can cause debilitating diseases in humans. They have complicated life cycles involving an insect vector and mammalian hosts, and they go through a number of developmental molts. While whole genome sequences of parasitic worms are now available, very little is known about transcription factor (TF) binding sites and their cognate transcription factors that play a role in regulating development. To address this gap, we developed a novel motif prediction pipeline, Emotif Alpha, that integrates ten different motif discovery algorithms, multiple statistical tests, and a comparative analysis of conserved elements between the filarial worms Brugia malayi and Onchocerca volvulus, and the free-living nematode Caenorhabditis elegans. We identified stage-specific TF binding motifs in B. malayi, with a particular focus on those potentially involved in the L3-L4 molt, a stage important for the establishment of infection in the mammalian host. Using an in vitro molting system, we tested and validated three of these motifs demonstrating the accuracy of the motif prediction pipeline. Diseases caused by parasitic worms such as the filariae are among the leading causes of morbidity in the developing world. Very little is known about how development is regulated in these vector-transmitted parasites. We have developed a computational method to identify motifs that correspond to transcription factor binding sites in the genome of the parasitic worm, Brugia malayi, one of the causative agents of lymphatic filariasis. Using this approach, we were able to predict stage-specific transcription factor binding sites involved in a stage of the molting process important for the establishment of the infection. We validated the role of these motifs using an in vitro molting system.
Collapse
Affiliation(s)
- Alexandra Grote
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Yichao Li
- School of Computer Science and Electrical Engineering, Ohio University, Athens, Ohio, United States of America
| | - Canhui Liu
- Center for Global Infectious Disease Research, University of South Florida, Tampa, FL, Florida, United States of America
| | - Denis Voronin
- Laboratory of Molecular Parasitology, Lindsley F. Kimball Research Institute, New York Blood Center, New York, New York, United States of America
| | - Adam Geber
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Sara Lustigman
- Laboratory of Molecular Parasitology, Lindsley F. Kimball Research Institute, New York Blood Center, New York, New York, United States of America
| | - Thomas R. Unnasch
- Center for Global Infectious Disease Research, University of South Florida, Tampa, FL, Florida, United States of America
| | - Lonnie Welch
- School of Computer Science and Electrical Engineering, Ohio University, Athens, Ohio, United States of America
- * E-mail: (LW); (EG)
| | - Elodie Ghedin
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Epidemiology, School of Global Public Health, New York University, New York, New York, United States of America
- * E-mail: (LW); (EG)
| |
Collapse
|
6
|
Foster JM, Grote A, Mattick J, Tracey A, Tsai YC, Chung M, Cotton JA, Clark TA, Geber A, Holroyd N, Korlach J, Li Y, Libro S, Lustigman S, Michalski ML, Paulini M, Rogers MB, Teigen L, Twaddle A, Welch L, Berriman M, Dunning Hotopp JC, Ghedin E. Sex chromosome evolution in parasitic nematodes of humans. Nat Commun 2020; 11:1964. [PMID: 32327641 PMCID: PMC7181701 DOI: 10.1038/s41467-020-15654-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 03/20/2020] [Indexed: 11/09/2022] Open
Abstract
Sex determination mechanisms often differ even between related species yet the evolution of sex chromosomes remains poorly understood in all but a few model organisms. Some nematodes such as Caenorhabditis elegans have an XO sex determination system while others, such as the filarial parasite Brugia malayi, have an XY mechanism. We present a complete B. malayi genome assembly and define Nigon elements shared with C. elegans, which we then map to the genomes of other filarial species and more distantly related nematodes. We find a remarkable plasticity in sex chromosome evolution with several distinct cases of neo-X and neo-Y formation, X-added regions, and conversion of autosomes to sex chromosomes from which we propose a model of chromosome evolution across different nematode clades. The phylum Nematoda offers a new and innovative system for gaining a deeper understanding of sex chromosome evolution.
Collapse
Affiliation(s)
- Jeremy M Foster
- Division of Protein Expression & Modification, New England Biolabs, Ipswich, MA, 01938, USA
| | - Alexandra Grote
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - John Mattick
- Institute for Genome Science, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Alan Tracey
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | | | - Matthew Chung
- Institute for Genome Science, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - James A Cotton
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | | | - Adam Geber
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Nancy Holroyd
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | | | - Yichao Li
- School of Electrical Engineering and Computer Science, Ohio University, Athens, OH, 45701, USA
| | - Silvia Libro
- Division of Protein Expression & Modification, New England Biolabs, Ipswich, MA, 01938, USA
| | - Sara Lustigman
- Laboratory of Molecular Parasitology, Lindsley F. Kimball Research Institute, New York Blood Center, New York, NY, 10065, USA
| | - Michelle L Michalski
- Department of Biology and Microbiology, University of Wisconsin Oshkosh, Oshkosh, WI, 54901, USA
| | - Michael Paulini
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Matthew B Rogers
- Department of Surgery, UPMC Children's Hospital of Pittsburgh, Pittsburgh, PA, 15224, USA
| | - Laura Teigen
- Department of Biology and Microbiology, University of Wisconsin Oshkosh, Oshkosh, WI, 54901, USA
| | - Alan Twaddle
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Lonnie Welch
- School of Electrical Engineering and Computer Science, Ohio University, Athens, OH, 45701, USA
| | - Matthew Berriman
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Julie C Dunning Hotopp
- Institute for Genome Science, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
- Greenebaum Cancer Center, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
| | - Elodie Ghedin
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
- Department of Epidemiology, School of Global Public Health, New York University, New York, NY, 10003, USA.
| |
Collapse
|
7
|
Li Y, Liu Y, Juedes D, Drews F, Bunescu R, Welch L. Set cover-based methods for motif selection. Bioinformatics 2020; 36:1044-1051. [PMID: 31665223 PMCID: PMC7703758 DOI: 10.1093/bioinformatics/btz697] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 08/13/2019] [Accepted: 09/13/2019] [Indexed: 11/14/2022] Open
Abstract
Motivation De novo motif discovery algorithms find statistically over-represented sequence motifs that may function as transcription factor binding sites. Current methods often report large numbers of motifs, making it difficult to perform further analyses and experimental validation. The motif selection problem seeks to identify a minimal set of putative regulatory motifs that characterize sequences of interest (e.g. ChIP-Seq binding regions). Results In this study, the motif selection problem is mapped to variants of the set cover problem that are solved via tabu search and by relaxed integer linear programing (RILP). The algorithms are employed to analyze 349 ChIP-Seq experiments from the ENCODE project, yielding a small number of high-quality motifs that represent putative binding sites of primary factors and cofactors. Specifically, when compared with the motifs reported by Kheradpour and Kellis, the set cover-based algorithms produced motif sets covering 35% more peaks for 11 TFs and identified 4 more putative cofactors for 6 TFs. Moreover, a systematic evaluation using nested cross-validation revealed that the RILP algorithm selected fewer motifs and was able to cover 6% more peaks and 3% fewer background regions, which reduced the error rate by 7%. Availability and implementation The source code of the algorithms and all the datasets are available at https://github.com/YichaoOU/Set_cover_tools. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yichao Li
- Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
| | - Yating Liu
- Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
| | - David Juedes
- Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
| | - Frank Drews
- Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
| | - Razvan Bunescu
- Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
| | - Lonnie Welch
- Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
| |
Collapse
|
8
|
Langer BE, Hiller M. TFforge utilizes large-scale binding site divergence to identify transcriptional regulators involved in phenotypic differences. Nucleic Acids Res 2019; 47:e19. [PMID: 30496469 PMCID: PMC6393245 DOI: 10.1093/nar/gky1200] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Revised: 11/06/2018] [Accepted: 11/15/2018] [Indexed: 12/19/2022] Open
Abstract
Changes in gene regulation are important for phenotypic and in particular morphological evolution. However, it remains challenging to identify the transcription factors (TFs) that contribute to differences in gene regulation and thus to phenotypic differences between species. Here, we present TFforge (Transcription Factor forward genomics), a computational method to identify TFs that are involved in the loss of phenotypic traits. TFforge screens an input set of regulatory genomic regions to detect TFs that exhibit a significant binding site divergence signature in species that lost a particular phenotypic trait. Using simulated data of modular and pleiotropic regulatory elements, we show that TFforge can identify the correct TFs for many different evolutionary scenarios. We applied TFforge to available eye regulatory elements to screen for TFs that exhibit a significant binding site decay signature in subterranean mammals. This screen identified interacting and co-binding eye-related TFs, and thus provides new insights into which TFs likely contribute to eye degeneration in these species. TFforge has broad applicability to identify the TFs that contribute to phenotypic changes between species, and thus can help to unravel the gene-regulatory differences that underlie phenotypic evolution.
Collapse
Affiliation(s)
- Björn E Langer
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, Dresden, Germany.,Center for Systems Biology Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, Dresden, Germany.,Center for Systems Biology Dresden, Germany
| |
Collapse
|
9
|
Al-Ouran R, Schmidt R, Naik A, Jones J, Drews F, Juedes D, Elnitski L, Welch L. Discovering Gene Regulatory Elements Using Coverage-Based Heuristics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1290-1300. [PMID: 26540692 DOI: 10.1109/tcbb.2015.2496261] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Data mining algorithms and sequencing methods (such as RNA-seq and ChIP-seq) are being combined to discover genomic regulatory motifs that relate to a variety of phenotypes. However, motif discovery algorithms often produce very long lists of putative transcription factor binding sites, hindering the discovery of phenotype-related regulatory elements by making it difficult to select a manageable set of candidate motifs for experimental validation. To address this issue, the authors introduce the motif selection problem and provide coverage-based search heuristics for its solution. Analysis of 203 ChIP-seq experiments from the ENCyclopedia of DNA Elements project shows that our algorithms produce motifs that have high sensitivity and specificity and reveals new insights about the regulatory code of the human genome. The greedy algorithm performs the best, selecting a median of two motifs per ChIP-seq transcription factor group while achieving a median sensitivity of 77 percent.
Collapse
|
10
|
Tan Z, Niu B, Tsang KY, Melhado IG, Ohba S, He X, Huang Y, Wang C, McMahon AP, Jauch R, Chan D, Zhang MQ, Cheah KSE. Synergistic co-regulation and competition by a SOX9-GLI-FOXA phasic transcriptional network coordinate chondrocyte differentiation transitions. PLoS Genet 2018; 14:e1007346. [PMID: 29659575 PMCID: PMC5919691 DOI: 10.1371/journal.pgen.1007346] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 04/26/2018] [Accepted: 03/29/2018] [Indexed: 11/18/2022] Open
Abstract
The growth plate mediates bone growth where SOX9 and GLI factors control chondrocyte proliferation, differentiation and entry into hypertrophy. FOXA factors regulate hypertrophic chondrocyte maturation. How these factors integrate into a Gene Regulatory Network (GRN) controlling these differentiation transitions is incompletely understood. We adopted a genome-wide whole tissue approach to establish a Growth Plate Differential Gene Expression Library (GP-DGEL) for fractionated proliferating, pre-hypertrophic, early and late hypertrophic chondrocytes, as an overarching resource for discovery of pathways and disease candidates. De novo motif discovery revealed the enrichment of SOX9 and GLI binding sites in the genes preferentially expressed in proliferating and prehypertrophic chondrocytes, suggesting the potential cooperation between SOX9 and GLI proteins. We integrated the analyses of the transcriptome, SOX9, GLI1 and GLI3 ChIP-seq datasets, with functional validation by transactivation assays and mouse mutants. We identified new SOX9 targets and showed SOX9-GLI directly and cooperatively regulate many genes such as Trps1, Sox9, Sox5, Sox6, Col2a1, Ptch1, Gli1 and Gli2. Further, FOXA2 competes with SOX9 for the transactivation of target genes. The data support a model of SOX9-GLI-FOXA phasic GRN in chondrocyte development. Together, SOX9-GLI auto-regulate and cooperate to activate and repress genes in proliferating chondrocytes. Upon hypertrophy, FOXA competes with SOX9, and control toward terminal differentiation passes to FOXA, RUNX, AP1 and MEF2 factors. In the development of the mammalian growth plate, while several transcription factors are individually well known for their key roles in regulating phases of chondrocyte differentiation, there is little information on how they interact and cooperate with each other. We took an unbiased genome wide approach to identify the transcription factors and signaling pathways that play dominant roles in the chondrocyte differentiation cascade. We developed a searchable library of differentially expressed genes, GP-DGEL, which has fine spatial resolution and global transcriptomic coverage for discovery of processes, pathways and disease candidates. Our work identifies a novel regulatory mechanism that integrates the action of three transcription factors, SOX9, GLI and FOXA. SOX9-GLI auto-regulate and cooperate to activate and repress genes in proliferating chondrocytes. Upon entry into prehypertrophy, FOXA competes with SOX9, and control of hypertrophy passes to FOXA, RUNX, AP1 and MEF2 factors.
Collapse
Affiliation(s)
- Zhijia Tan
- School of Biomedical Sciences, LKS Faculty of Medicine, the University of Hong Kong, Pokfulam, Hong Kong
| | - Ben Niu
- School of Biomedical Sciences, LKS Faculty of Medicine, the University of Hong Kong, Pokfulam, Hong Kong
| | - Kwok Yeung Tsang
- School of Biomedical Sciences, LKS Faculty of Medicine, the University of Hong Kong, Pokfulam, Hong Kong
| | - Ian G. Melhado
- School of Biomedical Sciences, LKS Faculty of Medicine, the University of Hong Kong, Pokfulam, Hong Kong
| | - Shinsuke Ohba
- Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad-CIRM Center for Regenerative Medicine and Stem Cell Research, W.M. Keck School of Medicine of the University of Southern California, Los Angeles, California, United States of America
| | - Xinjun He
- Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad-CIRM Center for Regenerative Medicine and Stem Cell Research, W.M. Keck School of Medicine of the University of Southern California, Los Angeles, California, United States of America
| | - Yongheng Huang
- Genome Regulation Laboratory, Guangzhou Institutes of Biomedicine and Health, Guangzhou, China
| | - Cheng Wang
- School of Biomedical Sciences, LKS Faculty of Medicine, the University of Hong Kong, Pokfulam, Hong Kong
| | - Andrew P. McMahon
- Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad-CIRM Center for Regenerative Medicine and Stem Cell Research, W.M. Keck School of Medicine of the University of Southern California, Los Angeles, California, United States of America
| | - Ralf Jauch
- Genome Regulation Laboratory, Guangzhou Institutes of Biomedicine and Health, Guangzhou, China
| | - Danny Chan
- School of Biomedical Sciences, LKS Faculty of Medicine, the University of Hong Kong, Pokfulam, Hong Kong
| | - Michael Q. Zhang
- Department of Biological Sciences, Center for Systems Biology, The University of Texas at Dallas, Dallas, Texas, United States of America
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing, China
| | - Kathryn S. E. Cheah
- School of Biomedical Sciences, LKS Faculty of Medicine, the University of Hong Kong, Pokfulam, Hong Kong
- * E-mail:
| |
Collapse
|
11
|
Deyneko IV, Kasnitz N, Leschner S, Weiss S. Composing a Tumor Specific Bacterial Promoter. PLoS One 2016; 11:e0155338. [PMID: 27171245 PMCID: PMC4865170 DOI: 10.1371/journal.pone.0155338] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 04/27/2016] [Indexed: 12/12/2022] Open
Abstract
Systemically applied Salmonella enterica spp. have been shown to invade and colonize neoplastic tissues where it retards the growth of many tumors. This offers the possibility to use the bacteria as a vehicle for the tumor specific delivery of therapeutic molecules. Specificity of such delivery is solely depending on promoter sequences that control the production of a target molecule. We have established the functional structure of bacterial promoters that are transcriptionally active exclusively in tumor tissues after systemic application. We observed that the specific transcriptional activation is accomplished by a combination of a weak basal promoter and a strong FNR binding site. This represents a minimal set of control elements required for such activation. In natural promoters, additional DNA remodeling elements are found that alter the level of transcription quantitatively. Inefficiency of the basal promoter ensures the absence of transcription outside tumors. As a proof of concept, we compiled an artificial promoter sequence from individual motifs representing FNR and basal promoter and showed specific activation in a tumor microenvironment. Our results open possibilities for the generation of promoters with an adjusted level of expression of target proteins in particular for applications in bacterial tumor therapy.
Collapse
Affiliation(s)
- Igor V. Deyneko
- Molecular Immunology, Helmholtz Centre for Infection Research, Braunschweig, Germany
- * E-mail:
| | - Nadine Kasnitz
- Molecular Immunology, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Sara Leschner
- Molecular Immunology, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Siegfried Weiss
- Molecular Immunology, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Institute of Immunology, Medical School Hannover, Hannover, Germany
| |
Collapse
|
12
|
Dror I, Golan T, Levy C, Rohs R, Mandel-Gutfreund Y. A widespread role of the motif environment in transcription factor binding across diverse protein families. Genome Res 2015; 25:1268-80. [PMID: 26160164 PMCID: PMC4561487 DOI: 10.1101/gr.184671.114] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2014] [Accepted: 07/08/2015] [Indexed: 12/12/2022]
Abstract
Transcriptional regulation requires the binding of transcription factors (TFs) to short sequence-specific DNA motifs, usually located at the gene regulatory regions. Interestingly, based on a vast amount of data accumulated from genomic assays, it has been shown that only a small fraction of all potential binding sites containing the consensus motif of a given TF actually bind the protein. Recent in vitro binding assays, which exclude the effects of the cellular environment, also demonstrate selective TF binding. An intriguing conjecture is that the surroundings of cognate binding sites have unique characteristics that distinguish them from other sequences containing a similar motif that are not bound by the TF. To test this hypothesis, we conducted a comprehensive analysis of the sequence and DNA shape features surrounding the core-binding sites of 239 and 56 TFs extracted from in vitro HT-SELEX binding assays and in vivo ChIP-seq data, respectively. Comparing the nucleotide content of the regions around the TF-bound sites to the counterpart unbound regions containing the same consensus motifs revealed significant differences that extend far beyond the core-binding site. Specifically, the environment of the bound motifs demonstrated unique sequence compositions, DNA shape features, and overall high similarity to the core-binding motif. Notably, the regions around the binding sites of TFs that belong to the same TF families exhibited similar features, with high agreement between the in vitro and in vivo data sets. We propose that these unique features assist in guiding TFs to their cognate binding sites.
Collapse
Affiliation(s)
- Iris Dror
- Faculty of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel; Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Tamar Golan
- Department of Human Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Carmit Levy
- Department of Human Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Remo Rohs
- Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, Physics, and Computer Science, University of Southern California, Los Angeles, California 90089, USA
| | - Yael Mandel-Gutfreund
- Faculty of Biology, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
| |
Collapse
|
13
|
Cook DJ, Patra B, Kuttippurathu L, Hoek JB, Vadigepalli R. A novel, dynamic pattern-based analysis of NF-κB binding during the priming phase of liver regeneration reveals switch-like functional regulation of target genes. Front Physiol 2015. [PMID: 26217230 PMCID: PMC4493398 DOI: 10.3389/fphys.2015.00189] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Following partial hepatectomy, a coordinated series of molecular events occurs to regulate hepatocyte entry into the cell cycle to recover lost mass. In rats during the first 6 h following resection, hepatocytes are primed by a tightly controlled cytokine response to prepare hepatocytes to begin replication. Although it appears to be a critical element driving regeneration, the cytokine response to resection has not yet been fully characterized. Specifically, the role of one of the key response elements to cytokine signaling (NF-κB) remains incompletely characterized. In this study, we present a novel, genome-wide, pattern-based analysis characterizing NF-κB binding during the priming phase of liver regeneration. We interrogated the dynamic regulation of priming by NF-κB through categorizing NF-κB binding in different temporal profiles: immediate sustained response, early transient response, and delayed response to partial hepatectomy. We then identified functional regulation of NF-κB binding by relating the temporal response profile to differential gene expression. We found that NF-κB bound genes govern negative regulation of cell growth and inflammatory response immediately following hepatectomy. NF-κB also transiently regulates genes responsible for lipid biosynthesis and transport as well as induction of apoptosis following hepatectomy. By the end of the priming phase, NF-κB regulation of genes involved in inflammatory response, negative regulation of cell death, and extracellular structure organization became prominent. These results suggest that NF-κB regulates target genes through binding and unbinding in immediate, transient, and delayed patterns. Such dynamic switch-like patterns of NF-κB binding may govern different functional transitions that drive the onset of regeneration.
Collapse
Affiliation(s)
- Daniel J Cook
- Department of Pathology, Anatomy and Cell Biology, Daniel Baugh Institute for Functional Genomics/Computational Biology, Thomas Jefferson University Philadelphia, PA, USA ; Department of Chemical and Biomolecular Engineering, University of Delaware Newark, DE, USA
| | - Biswanath Patra
- Department of Pathology, Anatomy and Cell Biology, Daniel Baugh Institute for Functional Genomics/Computational Biology, Thomas Jefferson University Philadelphia, PA, USA
| | - Lakshmi Kuttippurathu
- Department of Pathology, Anatomy and Cell Biology, Daniel Baugh Institute for Functional Genomics/Computational Biology, Thomas Jefferson University Philadelphia, PA, USA
| | - Jan B Hoek
- Department of Pathology, Anatomy and Cell Biology, Daniel Baugh Institute for Functional Genomics/Computational Biology, Thomas Jefferson University Philadelphia, PA, USA
| | - Rajanikanth Vadigepalli
- Department of Pathology, Anatomy and Cell Biology, Daniel Baugh Institute for Functional Genomics/Computational Biology, Thomas Jefferson University Philadelphia, PA, USA ; Department of Chemical and Biomolecular Engineering, University of Delaware Newark, DE, USA
| |
Collapse
|
14
|
Bahrami-Samani E, Vo DT, de Araujo PR, Vogel C, Smith AD, Penalva LOF, Uren PJ. Computational challenges, tools, and resources for analyzing co- and post-transcriptional events in high throughput. WILEY INTERDISCIPLINARY REVIEWS. RNA 2015; 6:291-310. [PMID: 25515586 PMCID: PMC4397117 DOI: 10.1002/wrna.1274] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Revised: 10/24/2014] [Accepted: 10/29/2014] [Indexed: 11/10/2022]
Abstract
Co- and post-transcriptional regulation of gene expression is complex and multifaceted, spanning the complete RNA lifecycle from genesis to decay. High-throughput profiling of the constituent events and processes is achieved through a range of technologies that continue to expand and evolve. Fully leveraging the resulting data is nontrivial, and requires the use of computational methods and tools carefully crafted for specific data sources and often intended to probe particular biological processes. Drawing upon databases of information pre-compiled by other researchers can further elevate analyses. Within this review, we describe the major co- and post-transcriptional events in the RNA lifecycle that are amenable to high-throughput profiling. We place specific emphasis on the analysis of the resulting data, in particular the computational tools and resources available, as well as looking toward future challenges that remain to be addressed.
Collapse
Affiliation(s)
- Emad Bahrami-Samani
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA
| | - Dat T. Vo
- Children’s Cancer Research Institute and Department of Cellular and Structural Biology, University of Texas Health Science Center, San Antonio, TX
| | - Patricia Rosa de Araujo
- Children’s Cancer Research Institute and Department of Cellular and Structural Biology, University of Texas Health Science Center, San Antonio, TX
| | - Christine Vogel
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY
| | - Andrew D. Smith
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA
| | - Luiz O. F. Penalva
- Children’s Cancer Research Institute and Department of Cellular and Structural Biology, University of Texas Health Science Center, San Antonio, TX
| | - Philip J. Uren
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA
| |
Collapse
|
15
|
Dogan N, Wu W, Morrissey CS, Chen KB, Stonestrom A, Long M, Keller CA, Cheng Y, Jain D, Visel A, Pennacchio LA, Weiss MJ, Blobel GA, Hardison RC. Occupancy by key transcription factors is a more accurate predictor of enhancer activity than histone modifications or chromatin accessibility. Epigenetics Chromatin 2015; 8:16. [PMID: 25984238 PMCID: PMC4432502 DOI: 10.1186/s13072-015-0009-5] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 04/02/2015] [Indexed: 12/12/2022] Open
Abstract
Background Regulated gene expression controls organismal development, and variation in regulatory patterns has been implicated in complex traits. Thus accurate prediction of enhancers is important for further understanding of these processes. Genome-wide measurement of epigenetic features, such as histone modifications and occupancy by transcription factors, is improving enhancer predictions, but the contribution of these features to prediction accuracy is not known. Given the importance of the hematopoietic transcription factor TAL1 for erythroid gene activation, we predicted candidate enhancers based on genomic occupancy by TAL1 and measured their activity. Contributions of multiple features to enhancer prediction were evaluated based on the results of these and other studies. Results TAL1-bound DNA segments were active enhancers at a high rate both in transient transfections of cultured cells (39 of 79, or 56%) and transgenic mice (43 of 66, or 65%). The level of binding signal for TAL1 or GATA1 did not help distinguish TAL1-bound DNA segments as active versus inactive enhancers, nor did the density of regulation-related histone modifications. A meta-analysis of results from this and other studies (273 tested predicted enhancers) showed that the presence of TAL1, GATA1, EP300, SMAD1, H3K4 methylation, H3K27ac, and CAGE tags at DNase hypersensitive sites gave the most accurate predictors of enhancer activity, with a success rate over 80% and a median threefold increase in activity. Chromatin accessibility assays and the histone modifications H3K4me1 and H3K27ac were sensitive for finding enhancers, but they have high false positive rates unless transcription factor occupancy is also included. Conclusions Occupancy by key transcription factors such as TAL1, GATA1, SMAD1, and EP300, along with evidence of transcription, improves the accuracy of enhancer predictions based on epigenetic features. Electronic supplementary material The online version of this article (doi:10.1186/s13072-015-0009-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nergiz Dogan
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 304 Wartik Laboratory, University Park, PA 16802 USA
| | - Weisheng Wu
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 304 Wartik Laboratory, University Park, PA 16802 USA ; Bioinformatics Core, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218 USA
| | - Christapher S Morrissey
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 304 Wartik Laboratory, University Park, PA 16802 USA
| | - Kuan-Bei Chen
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 304 Wartik Laboratory, University Park, PA 16802 USA
| | - Aaron Stonestrom
- Division of Hematology, The Children's Hospital of Philadelphia, 3401 Civic Center Boulevard, Philadelphia, PA 19104 USA ; Perelman School of Medicine at the University of Pennsylvania, 415 Curie Boulevard, Philadelphia, PA 19104 USA
| | - Maria Long
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 304 Wartik Laboratory, University Park, PA 16802 USA
| | - Cheryl A Keller
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 304 Wartik Laboratory, University Park, PA 16802 USA
| | - Yong Cheng
- Department of Genetics, Mail Stop-5120, Stanford University, Stanford, CA 94305 USA
| | - Deepti Jain
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 304 Wartik Laboratory, University Park, PA 16802 USA
| | - Axel Visel
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mailstop 84-171, Berkeley, CA 94720 USA ; DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598 USA
| | - Len A Pennacchio
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mailstop 84-171, Berkeley, CA 94720 USA ; DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598 USA
| | - Mitchell J Weiss
- Department of Hematology, St Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105 USA
| | - Gerd A Blobel
- Division of Hematology, The Children's Hospital of Philadelphia, 3401 Civic Center Boulevard, Philadelphia, PA 19104 USA ; Perelman School of Medicine at the University of Pennsylvania, 415 Curie Boulevard, Philadelphia, PA 19104 USA
| | - Ross C Hardison
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 304 Wartik Laboratory, University Park, PA 16802 USA
| |
Collapse
|
16
|
Liu X, Wu J, Gu F, Wang J, He Z. Discriminative pattern mining and its applications in bioinformatics. Brief Bioinform 2014; 16:884-900. [PMID: 25433466 DOI: 10.1093/bib/bbu042] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Indexed: 11/13/2022] Open
Abstract
Discriminative pattern mining is one of the most important techniques in data mining. This challenging task is concerned with finding a set of patterns that occur with disproportionate frequency in data sets with various class labels. Such patterns are of great value for group difference detection and classifier construction. Research on finding interesting discriminative patterns in class-labeled data evolves rapidly and lots of algorithms have been proposed to specifically address this problem. Discriminative pattern mining techniques have proven their considerable value in biological data analysis. The archetypical applications in bioinformatics include phosphorylation motif discovery, differentially expressed gene identification, discriminative genotype pattern detection, etc. In this article, we present an overview of discriminative pattern mining and the corresponding effective methods, and subsequently we illustrate their applications to tackling the bioinformatics problems. In the end, we give a general discussion of potential challenges and future work for this task.
Collapse
|
17
|
Pujato M, Kieken F, Skiles AA, Tapinos N, Fiser A. Prediction of DNA binding motifs from 3D models of transcription factors; identifying TLX3 regulated genes. Nucleic Acids Res 2014; 42:13500-12. [PMID: 25428367 PMCID: PMC4267649 DOI: 10.1093/nar/gku1228] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Proper cell functioning depends on the precise spatio-temporal expression of its genetic material. Gene expression is controlled to a great extent by sequence-specific transcription factors (TFs). Our current knowledge on where and how TFs bind and associate to regulate gene expression is incomplete. A structure-based computational algorithm (TF2DNA) is developed to identify binding specificities of TFs. The method constructs homology models of TFs bound to DNA and assesses the relative binding affinity for all possible DNA sequences using a knowledge-based potential, after optimization in a molecular mechanics force field. TF2DNA predictions were benchmarked against experimentally determined binding motifs. Success rates range from 45% to 81% and primarily depend on the sequence identity of aligned target sequences and template structures, TF2DNA was used to predict 1321 motifs for 1825 putative human TF proteins, facilitating the reconstruction of most of the human gene regulatory network. As an illustration, the predicted DNA binding site for the poorly characterized T-cell leukemia homeobox 3 (TLX3) TF was confirmed with gel shift assay experiments. TLX3 motif searches in human promoter regions identified a group of genes enriched in functions relating to hematopoiesis, tissue morphology, endocrine system and connective tissue development and function.
Collapse
Affiliation(s)
- Mario Pujato
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA
| | - Fabien Kieken
- Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA Macromolecular Therapeutics Development, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA
| | - Amanda A Skiles
- Molecular Neuroscience Laboratory, Geisinger Clinic, 100 North Academy Avenue, Danville, PA 17822, USA
| | - Nikos Tapinos
- Molecular Neuroscience Laboratory, Geisinger Clinic, 100 North Academy Avenue, Danville, PA 17822, USA
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA
| |
Collapse
|
18
|
Maaskola J, Rajewsky N. Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models. Nucleic Acids Res 2014; 42:12995-3011. [PMID: 25389269 PMCID: PMC4245949 DOI: 10.1093/nar/gku1083] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We present a discriminative learning method for pattern discovery of binding sites in nucleic acid sequences based on hidden Markov models. Sets of positive and negative example sequences are mined for sequence motifs whose occurrence frequency varies between the sets. The method offers several objective functions, but we concentrate on mutual information of condition and motif occurrence. We perform a systematic comparison of our method and numerous published motif-finding tools. Our method achieves the highest motif discovery performance, while being faster than most published methods. We present case studies of data from various technologies, including ChIP-Seq, RIP-Chip and PAR-CLIP, of embryonic stem cell transcription factors and of RNA-binding proteins, demonstrating practicality and utility of the method. For the alternative splicing factor RBM10, our analysis finds motifs known to be splicing-relevant. The motif discovery method is implemented in the free software package Discrover. It is applicable to genome- and transcriptome-scale data, makes use of available repeat experiments and aside from binary contrasts also more complex data configurations can be utilized.
Collapse
Affiliation(s)
- Jonas Maaskola
- Laboratory for Systems Biology of Gene Regulatory Elements, Max-Delbrück-Center for Molecular Medicine, Robert-Rössle-Strasse 10, Berlin-Buch 13125, Germany
| | - Nikolaus Rajewsky
- Laboratory for Systems Biology of Gene Regulatory Elements, Max-Delbrück-Center for Molecular Medicine, Robert-Rössle-Strasse 10, Berlin-Buch 13125, Germany
| |
Collapse
|
19
|
Leung A, Parks BW, Du J, Trac C, Setten R, Chen Y, Brown K, Lusis AJ, Natarajan R, Schones DE. Open chromatin profiling in mice livers reveals unique chromatin variations induced by high fat diet. J Biol Chem 2014; 289:23557-67. [PMID: 25006255 DOI: 10.1074/jbc.m114.581439] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Metabolic diseases result from multiple genetic and environmental factors. We report here that one manner in which environmental factors can contribute to metabolic disease progression is through modification to chromatin. We demonstrate that high fat diet leads to chromatin remodeling in the livers of C57BL/6J mice, as compared with mice fed a control diet, and that these chromatin changes are associated with changes in gene expression. We further show that the regions of greatest variation in chromatin accessibility are targeted by liver transcription factors, including HNF4α, CCAAT/enhancer-binding protein α (CEBP/α), and FOXA1. Repeating the chromatin and gene expression profiling in another mouse strain, DBA/2J, revealed that the regions of greatest chromatin change are largely strain-specific and that integration of chromatin, gene expression, and genetic data can be used to characterize regulatory regions. Our data indicate dramatic changes in the epigenome due to diet and demonstrate strain-specific dynamics in chromatin remodeling.
Collapse
Affiliation(s)
- Amy Leung
- From the Departments of Diabetes and
| | - Brian W Parks
- the Department of Medicine, UCLA, Los Angeles, California 90095
| | - Juan Du
- Cancer Biology, Beckman Research Institute and the Irell & Manella Graduate School of Biological Sciences, City of Hope, Duarte, California 91010 and
| | - Candi Trac
- Cancer Biology, Beckman Research Institute and
| | - Ryan Setten
- the Irell & Manella Graduate School of Biological Sciences, City of Hope, Duarte, California 91010 and
| | - Yin Chen
- the Irell & Manella Graduate School of Biological Sciences, City of Hope, Duarte, California 91010 and
| | - Kevin Brown
- Cancer Biology, Beckman Research Institute and
| | - Aldons J Lusis
- the Department of Medicine, UCLA, Los Angeles, California 90095
| | - Rama Natarajan
- From the Departments of Diabetes and the Irell & Manella Graduate School of Biological Sciences, City of Hope, Duarte, California 91010 and
| | - Dustin E Schones
- Cancer Biology, Beckman Research Institute and the Irell & Manella Graduate School of Biological Sciences, City of Hope, Duarte, California 91010 and
| |
Collapse
|
20
|
Sequence signatures of genes with accompanying antisense transcripts in Saccharomyces cerevisiae. SCIENCE CHINA-LIFE SCIENCES 2013; 57:52-8. [PMID: 24369357 DOI: 10.1007/s11427-013-4597-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 11/22/2013] [Indexed: 10/25/2022]
Abstract
Recent studies have found many antisense non-coding transcripts at the opposite strand of some protein-coding genes. In yeast, it was reported that such antisense transcripts play regulatory roles for their partner genes by forming a feedback loop with the protein-coding genes. Since not all coding genes have accompanying antisense transcripts, it would be interesting to know whether there are sequence signatures in a coding gene that are decisive or associated with the existence of such antisense partners. We collected all the annotated antisense transcripts in the yeast Saccharomyces cerevisiae, analyzed sequence motifs around the genes with antisense partners, and classified genes with and without accompanying antisense transcripts by using machine learning methods. Some weak but statistically significant sequence features are detected, which indicates that there are sequence signatures around the protein-coding genes that may be decisive or indicative for the existence of accompanying antisense transcripts.
Collapse
|
21
|
Abstract
MOTIVATION Generating accurate transcription factor (TF) binding site motifs from data generated using the next-generation sequencing, especially ChIP-seq, is challenging. The challenge arises because a typical experiment reports a large number of sequences bound by a TF, and the length of each sequence is relatively long. Most traditional motif finders are slow in handling such enormous amount of data. To overcome this limitation, tools have been developed that compromise accuracy with speed by using heuristic discrete search strategies or limited optimization of identified seed motifs. However, such strategies may not fully use the information in input sequences to generate motifs. Such motifs often form good seeds and can be further improved with appropriate scoring functions and rapid optimization. RESULTS We report a tool named discriminative motif optimizer (DiMO). DiMO takes a seed motif along with a positive and a negative database and improves the motif based on a discriminative strategy. We use area under receiver-operating characteristic curve (AUC) as a measure of discriminating power of motifs and a strategy based on perceptron training that maximizes AUC rapidly in a discriminative manner. Using DiMO, on a large test set of 87 TFs from human, drosophila and yeast, we show that it is possible to significantly improve motifs identified by nine motif finders. The motifs are generated/optimized using training sets and evaluated on test sets. The AUC is improved for almost 90% of the TFs on test sets and the magnitude of increase is up to 39%. AVAILABILITY AND IMPLEMENTATION DiMO is available at http://stormo.wustl.edu/DiMO
Collapse
Affiliation(s)
- Ronak Y Patel
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | | |
Collapse
|
22
|
Chen CC, Xiao S, Xie D, Cao X, Song CX, Wang T, He C, Zhong S. Understanding variation in transcription factor binding by modeling transcription factor genome-epigenome interactions. PLoS Comput Biol 2013; 9:e1003367. [PMID: 24339764 PMCID: PMC3854512 DOI: 10.1371/journal.pcbi.1003367] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Accepted: 10/15/2013] [Indexed: 12/20/2022] Open
Abstract
Despite explosive growth in genomic datasets, the methods for studying epigenomic mechanisms of gene regulation remain primitive. Here we present a model-based approach to systematically analyze the epigenomic functions in modulating transcription factor-DNA binding. Based on the first principles of statistical mechanics, this model considers the interactions between epigenomic modifications and a cis-regulatory module, which contains multiple binding sites arranged in any configurations. We compiled a comprehensive epigenomic dataset in mouse embryonic stem (mES) cells, including DNA methylation (MeDIP-seq and MRE-seq), DNA hydroxymethylation (5-hmC-seq), and histone modifications (ChIP-seq). We discovered correlations of transcription factors (TFs) for specific combinations of epigenomic modifications, which we term epigenomic motifs. Epigenomic motifs explained why some TFs appeared to have different DNA binding motifs derived from in vivo (ChIP-seq) and in vitro experiments. Theoretical analyses suggested that the epigenome can modulate transcriptional noise and boost the cooperativity of weak TF binding sites. ChIP-seq data suggested that epigenomic boost of binding affinities in weak TF binding sites can function in mES cells. We showed in theory that the epigenome should suppress the TF binding differences on SNP-containing binding sites in two people. Using personal data, we identified strong associations between H3K4me2/H3K9ac and the degree of personal differences in NFκB binding in SNP-containing binding sites, which may explain why some SNPs introduce much smaller personal variations on TF binding than other SNPs. In summary, this model presents a powerful approach to analyze the functions of epigenomic modifications. This model was implemented into an open source program APEG (Affinity Prediction by Epigenome and Genome, http://systemsbio.ucsd.edu/apeg).
Collapse
Affiliation(s)
- Chieh-Chun Chen
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Shu Xiao
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Dan Xie
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xiaoyi Cao
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Chun-Xiao Song
- Department of Chemistry, University of Chicago, Chicago, Illinois, United States of America
| | - Ting Wang
- Department of Genetics, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Chuan He
- Department of Chemistry, University of Chicago, Chicago, Illinois, United States of America
| | - Sheng Zhong
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
23
|
Kumari S, Ware D. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots. PLoS One 2013; 8:e79011. [PMID: 24205361 PMCID: PMC3812177 DOI: 10.1371/journal.pone.0079011] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Accepted: 09/18/2013] [Indexed: 01/22/2023] Open
Abstract
Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the computational prediction of CPEs across eight plant genomes to help better understand the transcription initiation complex assembly. The distribution of thirteen known CPEs across four monocots (Brachypodium distachyon, Oryza sativa ssp. japonica, Sorghum bicolor, Zea mays) and four dicots (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Glycine max) reveals the structural organization of the core promoter in relation to the TATA-box as well as with respect to other CPEs. The distribution of known CPE motifs with respect to transcription start site (TSS) exhibited positional conservation within monocots and dicots with slight differences across all eight genomes. Further, a more refined subset of annotated genes based on orthologs of the model monocot (O. sativa ssp. japonica) and dicot (A. thaliana) genomes supported the positional distribution of these thirteen known CPEs. DNA free energy profiles provided evidence that the structural properties of promoter regions are distinctly different from that of the non-regulatory genome sequence. It also showed that monocot core promoters have lower DNA free energy than dicot core promoters. The comparison of monocot and dicot promoter sequences highlights both the similarities and differences in the core promoter architecture irrespective of the species-specific nucleotide bias. This study will be useful for future work related to genome annotation projects and can inspire research efforts aimed to better understand regulatory mechanisms of transcription.
Collapse
Affiliation(s)
- Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America,
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America,
- United States Department of Agriculture-Agriculture Research Service, Robert W. Holley Center for Agriculture and Health, Ithaca, New York, United States of America
| |
Collapse
|
24
|
Grau J, Posch S, Grosse I, Keilwagen J. A general approach for discriminative de novo motif discovery from high-throughput data. Nucleic Acids Res 2013; 41:e197. [PMID: 24057214 PMCID: PMC3834837 DOI: 10.1093/nar/gkt831] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research.
Collapse
Affiliation(s)
- Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, D-06099 Halle, Saale, Germany, Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, D-06484 Quedlinburg, Germany and Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Seeland OT Gatersleben, Germany
| | | | | | | |
Collapse
|
25
|
Genome-wide signatures of transcription factor activity: connecting transcription factors, disease, and small molecules. PLoS Comput Biol 2013; 9:e1003198. [PMID: 24039560 PMCID: PMC3764016 DOI: 10.1371/journal.pcbi.1003198] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2012] [Accepted: 07/11/2013] [Indexed: 11/19/2022] Open
Abstract
Identifying transcription factors (TF) involved in producing a genome-wide transcriptional profile is an essential step in building mechanistic model that can explain observed gene expression data. We developed a statistical framework for constructing genome-wide signatures of TF activity, and for using such signatures in the analysis of gene expression data produced by complex transcriptional regulatory programs. Our framework integrates ChIP-seq data and appropriately matched gene expression profiles to identify True REGulatory (TREG) TF-gene interactions. It provides genome-wide quantification of the likelihood of regulatory TF-gene interaction that can be used to either identify regulated genes, or as genome-wide signature of TF activity. To effectively use ChIP-seq data, we introduce a novel statistical model that integrates information from all binding "peaks" within 2 Mb window around a gene's transcription start site (TSS), and provides gene-level binding scores and probabilities of regulatory interaction. In the second step we integrate these binding scores and regulatory probabilities with gene expression data to assess the likelihood of True REGulatory (TREG) TF-gene interactions. We demonstrate the advantages of TREG framework in identifying genes regulated by two TFs with widely different distribution of functional binding events (ERα and E2f1). We also show that TREG signatures of TF activity vastly improve our ability to detect involvement of ERα in producing complex diseases-related transcriptional profiles. Through a large study of disease-related transcriptional signatures and transcriptional signatures of drug activity, we demonstrate that increase in statistical power associated with the use of TREG signatures makes the crucial difference in identifying key targets for treatment, and drugs to use for treatment. All methods are implemented in an open-source R package treg. The package also contains all data used in the analysis including 494 TREG binding profiles based on ENCODE ChIP-seq data. The treg package can be downloaded at http://GenomicsPortals.org.
Collapse
|
26
|
Lim JH, Iggo RD, Barker D. Models incorporating chromatin modification data identify functionally important p53 binding sites. Nucleic Acids Res 2013; 41:5582-93. [PMID: 23599002 PMCID: PMC3675478 DOI: 10.1093/nar/gkt260] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Genome-wide prediction of transcription factor binding sites is notoriously difficult. We have developed and applied a logistic regression approach for prediction of binding sites for the p53 transcription factor that incorporates sequence information and chromatin modification data. We tested this by comparison of predicted sites with known binding sites defined by chromatin immunoprecipitation (ChIP), by the location of predictions relative to genes, by the function of nearby genes and by analysis of gene expression data after p53 activation. We compared the predictions made by our novel model with predictions based only on matches to a sequence position weight matrix (PWM). In whole genome assays, the fraction of known sites identified by the two models was similar, suggesting that there was little to be gained from including chromatin modification data. In contrast, there were highly significant and biologically relevant differences between the two models in the location of the predicted binding sites relative to genes, in the function of nearby genes and in the responsiveness of nearby genes to p53 activation. We propose that these contradictory results can be explained by PWM and ChIP data reflecting primarily biophysical properties of protein–DNA interactions, whereas chromatin modification data capture biologically important functional information.
Collapse
Affiliation(s)
- Ji-Hyun Lim
- Sir Harold Mitchell Building, School of Biology, University of St Andrews, St Andrews, Fife, KY16 9TH, UK
| | | | | |
Collapse
|
27
|
GRAU JAN, KEILWAGEN JENS, GOHR ANDRÉ, PAPONOV IVANA, POSCH STEFAN, SEIFERT MICHAEL, STRICKERT MARC, GROSSE IVO. DISPOM: A DISCRIMINATIVE DE-NOVO MOTIF DISCOVERY TOOL BASED ON THE JSTACS LIBRARY. J Bioinform Comput Biol 2013; 11:1340006. [DOI: 10.1142/s0219720013400064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
DNA-binding proteins are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in target regions of genomic DNA. However, de-novo discovery of these binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not yet been solved satisfactorily. Here, we present a detailed description and analysis of the de-novo motif discovery tool Dispom, which has been developed for finding binding sites of DNA-binding proteins that are differentially abundant in a set of target regions compared to a set of control regions. Two additional features of Dispom are its capability of modeling positional preferences of binding sites and adjusting the length of the motif in the learning process. Dispom yields an increased prediction accuracy compared to existing tools for de-novo motif discovery, suggesting that the combination of searching for differentially abundant motifs, inferring their positional distributions, and adjusting the motif lengths is beneficial for de-novo motif discovery. When applying Dispom to promoters of auxin-responsive genes and those of ABI3 target genes from Arabidopsis thaliana, we identify relevant binding motifs with pronounced positional distributions. These results suggest that learning motifs, their positional distributions, and their lengths by a discriminative learning principle may aid motif discovery from ChIP-chip and gene expression data. We make Dispom freely available as part of Jstacs, an open-source Java library that is tailored to statistical sequence analysis. To facilitate extensions of Dispom, we describe its implementation using Jstacs in this manuscript. In addition, we provide a stand-alone application of Dispom at http://www.jstacs.de/index.php/Dispom for instant use.
Collapse
Affiliation(s)
- JAN GRAU
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, D-06099 Halle/Saale, Germany
| | - JENS KEILWAGEN
- Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Gatersleben, Germany
- Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, D-06484 Quedlinburg, Germany
| | - ANDRÉ GOHR
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, D-06099 Halle/Saale, Germany
| | - IVAN A. PAPONOV
- Institute of Biology II / Botany, Faculty of Biology, Albert–Ludwigs–University Freiburg, D-79104 Freiburg, Germany
| | - STEFAN POSCH
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, D-06099 Halle/Saale, Germany
| | - MICHAEL SEIFERT
- Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Gatersleben, Germany
| | - MARC STRICKERT
- Center for Synthetic Microbiology, SYNMIKRO, Philipps-Universität Marburg, Germany
| | - IVO GROSSE
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, D-06099 Halle/Saale, Germany
| |
Collapse
|
28
|
Natarajan A, Yardimci GG, Sheffield NC, Crawford GE, Ohler U. Predicting cell-type-specific gene expression from regions of open chromatin. Genome Res 2013; 22:1711-22. [PMID: 22955983 PMCID: PMC3431488 DOI: 10.1101/gr.135129.111] [Citation(s) in RCA: 172] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Complex patterns of cell-type-specific gene expression are thought to be achieved by combinatorial binding of transcription factors (TFs) to sequence elements in regulatory regions. Predicting cell-type-specific expression in mammals has been hindered by the oftentimes unknown location of distal regulatory regions. To alleviate this bottleneck, we used DNase-seq data from 19 diverse human cell types to identify proximal and distal regulatory elements at genome-wide scale. Matched expression data allowed us to separate genes into classes of cell-type-specific up-regulated, down-regulated, and constitutively expressed genes. CG dinucleotide content and DNA accessibility in the promoters of these three classes of genes displayed substantial differences, highlighting the importance of including these aspects in modeling gene expression. We associated DNase I hypersensitive sites (DHSs) with genes, and trained classifiers for different expression patterns. TF sequence motif matches in DHSs provided a strong performance improvement in predicting gene expression over the typical baseline approach of using proximal promoter sequences. In particular, we achieved competitive performance when discriminating up-regulated genes from different cell types or genes up- and down-regulated under the same conditions. We identified previously known and new candidate cell-type-specific regulators. The models generated testable predictions of activating or repressive functions of regulators. DNase I footprints for these regulators were indicative of their direct binding to DNA. In summary, we successfully used information of open chromatin obtained by a single assay, DNase-seq, to address the problem of predicting cell-type-specific gene expression in mammalian organisms directly from regulatory sequence.
Collapse
Affiliation(s)
- Anirudh Natarajan
- Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina 27708, USA
| | | | | | | | | |
Collapse
|
29
|
Arvey A, Agius P, Noble WS, Leslie C. Sequence and chromatin determinants of cell-type-specific transcription factor binding. Genome Res 2013; 22:1723-34. [PMID: 22955984 PMCID: PMC3431489 DOI: 10.1101/gr.127712.111] [Citation(s) in RCA: 160] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Gene regulatory programs in distinct cell types are maintained in large part through the cell-type–specific binding of transcription factors (TFs). The determinants of TF binding include direct DNA sequence preferences, DNA sequence preferences of cofactors, and the local cell-dependent chromatin context. To explore the contribution of DNA sequence signal, histone modifications, and DNase accessibility to cell-type–specific binding, we analyzed 286 ChIP-seq experiments performed by the ENCODE Consortium. This analysis included experiments for 67 transcriptional regulators, 15 of which were profiled in both the GM12878 (lymphoblastoid) and K562 (erythroleukemic) human hematopoietic cell lines. To model TF-bound regions, we trained support vector machines (SVMs) that use flexible k-mer patterns to capture DNA sequence signals more accurately than traditional motif approaches. In addition, we trained SVM spatial chromatin signatures to model local histone modifications and DNase accessibility, obtaining significantly more accurate TF occupancy predictions than simpler approaches. Consistent with previous studies, we find that DNase accessibility can explain cell-line–specific binding for many factors. However, we also find that of the 10 factors with prominent cell-type–specific binding patterns, four display distinct cell-type–specific DNA sequence preferences according to our models. Moreover, for two factors we identify cell-specific binding sites that are accessible in both cell types but bound only in one. For these sites, cell-type–specific sequence models, rather than DNase accessibility, are better able to explain differential binding. Our results suggest that using a single motif for each TF and filtering for chromatin accessible loci is not always sufficient to accurately account for cell-type–specific binding profiles.
Collapse
Affiliation(s)
- Aaron Arvey
- Computational Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York 10065, USA
| | | | | | | |
Collapse
|
30
|
Abstract
A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify “motifs” that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery–searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA “background” sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are “too null,” resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where “ground truth” is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced “over-fitting” in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of the LR and ALR algorithms is available at http://code.google.com/p/likelihood-ratio-motifs/.
Collapse
Affiliation(s)
- David Simcha
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America.
| | | | | |
Collapse
|
31
|
Uren PJ, Bahrami-Samani E, Burns SC, Qiao M, Karginov FV, Hodges E, Hannon GJ, Sanford JR, Penalva LOF, Smith AD. Site identification in high-throughput RNA-protein interaction data. Bioinformatics 2012; 28:3013-20. [PMID: 23024010 DOI: 10.1093/bioinformatics/bts569] [Citation(s) in RCA: 239] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Post-transcriptional and co-transcriptional regulation is a crucial link between genotype and phenotype. The central players are the RNA-binding proteins, and experimental technologies [such as cross-linking with immunoprecipitation- (CLIP-) and RIP-seq] for probing their activities have advanced rapidly over the course of the past decade. Statistically robust, flexible computational methods for binding site identification from high-throughput immunoprecipitation assays are largely lacking however. RESULTS We introduce a method for site identification which provides four key advantages over previous methods: (i) it can be applied on all variations of CLIP and RIP-seq technologies, (ii) it accurately models the underlying read-count distributions, (iii) it allows external covariates, such as transcript abundance (which we demonstrate is highly correlated with read count) to inform the site identification process and (iv) it allows for direct comparison of site usage across cell types or conditions. AVAILABILITY AND IMPLEMENTATION We have implemented our method in a software tool called Piranha. Source code and binaries, licensed under the GNU General Public License (version 3) are freely available for download from http://smithlab.usc.edu. CONTACT andrewds@usc.edu SUPPLEMENTARY INFORMATION Supplementary data available at Bioinformatics online.
Collapse
Affiliation(s)
- Philip J Uren
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Sleumer MC, Wei G, Wang Y, Chang H, Xu T, Chen R, Zhang MQ. Regulatory elements of Caenorhabditis elegans ribosomal protein genes. BMC Genomics 2012; 13:433. [PMID: 22928635 PMCID: PMC3575287 DOI: 10.1186/1471-2164-13-433] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 08/17/2012] [Indexed: 01/16/2023] Open
Abstract
Background Ribosomal protein genes (RPGs) are essential, tightly regulated, and highly expressed during embryonic development and cell growth. Even though their protein sequences are strongly conserved, their mechanism of regulation is not conserved across yeast, Drosophila, and vertebrates. A recent investigation of genomic sequences conserved across both nematode species and associated with different gene groups indicated the existence of several elements in the upstream regions of C. elegans RPGs, providing a new insight regarding the regulation of these genes in C. elegans. Results In this study, we performed an in-depth examination of C. elegans RPG regulation and found nine highly conserved motifs in the upstream regions of C. elegans RPGs using the motif discovery algorithm DME. Four motifs were partially similar to transcription factor binding sites from C. elegans, Drosophila, yeast, and human. One pair of these motifs was found to co-occur in the upstream regions of 250 transcripts including 22 RPGs. The distance between the two motifs displayed a complex frequency pattern that was related to their relative orientation. We tested the impact of three of these motifs on the expression of rpl-2 using a series of reporter gene constructs and showed that all three motifs are necessary to maintain the high natural expression level of this gene. One of the motifs was similar to the binding site of an orthologue of POP-1, and we showed that RNAi knockdown of pop-1 impacts the expression of rpl-2. We further determined the transcription start site of rpl-2 by 5’ RACE and found that the motifs lie 40–90 bases upstream of the start site. We also found evidence that a noncoding RNA, contained within the outron of rpl-2, is co-transcribed with rpl-2 and cleaved during trans-splicing. Conclusions Our results indicate that C. elegans RPGs are regulated by a complex novel series of regulatory elements that is evolutionarily distinct from those of all other species examined up until now.
Collapse
Affiliation(s)
- Monica C Sleumer
- Bioinformatics Division, Center for Synthetic and Systems Biology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China
| | | | | | | | | | | | | |
Collapse
|
33
|
Deyneko IV, Weiss S, Leschner S. An integrative computational approach to effectively guide experimental identification of regulatory elements in promoters. BMC Bioinformatics 2012; 13:202. [PMID: 22897887 PMCID: PMC3465240 DOI: 10.1186/1471-2105-13-202] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Accepted: 08/01/2012] [Indexed: 01/22/2023] Open
Abstract
Background Transcriptional activity of genes depends on many factors like DNA motifs, conformational characteristics of DNA, melting etc. and there are computational approaches for their identification. However, in real applications, the number of predicted, for example, DNA motifs may be considerably large. In cases when various computational programs are applied, systematic experimental knock out of each of the potential elements obviously becomes nonproductive. Hence, one needs an approach that is able to integrate many heterogeneous computational methods and upon that suggest selected regulatory elements for experimental verification. Results Here, we present an integrative bioinformatic approach aimed at the discovery of regulatory modules that can be effectively verified experimentally. It is based on combinatorial analysis of known and novel binding motifs, as well as of any other known features of promoters. The goal of this method is the identification of a collection of modules that are specific for an established dataset and at the same time are optimal for experimental verification. The method is particularly effective on small datasets, where most statistical approaches fail. We apply it to promoters that drive tumor-specific gene expression in tumor-colonizing Gram-negative bacteria. The method successfully identified a number of potential modules, which required only a few experiments to be verified. The resulting minimal functional bacterial promoter exhibited high specificity of expression in cancerous tissue. Conclusions Experimental analysis of promoter structures guided by bioinformatics has proved to be efficient. The developed computational method is able to include heterogeneous features of promoters and suggest combinatorial modules for experimental testing. Expansibility and robustness of the methodology implemented in the approach ensures good results for a wide range of problems.
Collapse
Affiliation(s)
- Igor V Deyneko
- Molecular Immunology, Helmholtz Centre for Infection Research, Inhoffenstr, 7, 38124 Braunschweig, Germany.
| | | | | |
Collapse
|
34
|
Nakaki R, Kang J, Tateno M. A novel ab initio identification system of transcriptional regulation motifs in genome DNA sequences based on direct comparison scheme of signal/noise distributions. Nucleic Acids Res 2012; 40:8835-48. [PMID: 22798493 PMCID: PMC3467046 DOI: 10.1093/nar/gks642] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
A novel ab initio parameter-tuning-free system to identify transcriptional factor (TF) binding motifs (TFBMs) in genome DNA sequences was developed. It is based on the comparison of two types of frequency distributions with respect to the TFBM candidates in the target DNA sequences and the non-candidates in the background sequence, with the latter generated by utilizing the intergenic sequences. For benchmark tests, we used DNA sequence datasets extracted by ChIP-on-chip and ChIP-seq techniques and identified 65 yeast and four mammalian TFBMs, with the latter including gaps. The accuracy of our system was compared with those of other available programs (i.e. MEME, Weeder, BioProspector, MDscan and DME) and was the best among them, even without tuning of the parameter set for each TFBM and pre-treatment/editing of the target DNA sequences. Moreover, with respect to some TFs for which the identified motifs are inconsistent with those in the references, our results were revealed to be correct, by comparing them with other existing experimental data. Thus, our identification system does not need any other biological information except for gene positions, and is also expected to be applicable to genome DNA sequences to identify unknown TFBMs as well as known ones.
Collapse
Affiliation(s)
- Ryo Nakaki
- Graduate School of Pure Applied Science, University of Tsukuba, 1-1-1 Tennodai, Tsukuba Science City, Ibaraki 305-8577, Japan
| | | | | |
Collapse
|
35
|
Ma X, Kulkarni A, Zhang Z, Xuan Z, Serfling R, Zhang MQ. A highly efficient and effective motif discovery method for ChIP-seq/ChIP-chip data using positional information. Nucleic Acids Res 2012; 40:e50. [PMID: 22228832 PMCID: PMC3326300 DOI: 10.1093/nar/gkr1135] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Identification of DNA motifs from ChIP-seq/ChIP-chip [chromatin immunoprecipitation (ChIP)] data is a powerful method for understanding the transcriptional regulatory network. However, most established methods are designed for small sample sizes and are inefficient for ChIP data. Here we propose a new k-mer occurrence model to reflect the fact that functional DNA k-mers often cluster around ChIP peak summits. With this model, we introduced a new measure to discover functional k-mers. Using simulation, we demonstrated that our method is more robust against noises in ChIP data than available methods. A novel word clustering method is also implemented to group similar k-mers into position weight matrices (PWMs). Our method was applied to a diverse set of ChIP experiments to demonstrate its high sensitivity and specificity. Importantly, our method is much faster than several other methods for large sample sizes. Thus, we have developed an efficient and effective motif discovery method for ChIP experiments.
Collapse
Affiliation(s)
- Xiaotu Ma
- Department of Molecular and Cell Biology, Center for Systems Biology, University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA
| | | | | | | | | | | |
Collapse
|
36
|
Lee BK, Bhinge AA, Battenhouse A, McDaniell RM, Liu Z, Song L, Ni Y, Birney E, Lieb JD, Furey TS, Crawford GE, Iyer VR. Cell-type specific and combinatorial usage of diverse transcription factors revealed by genome-wide binding studies in multiple human cells. Genome Res 2011; 22:9-24. [PMID: 22090374 DOI: 10.1101/gr.127597.111] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Cell-type diversity is governed in part by differential gene expression programs mediated by transcription factor (TF) binding. However, there are few systematic studies of the genomic binding of different types of TFs across a wide range of human cell types, especially in relation to gene expression. In the ENCODE Project, we have identified the genomic binding locations across 11 different human cell types of CTCF, RNA Pol II (RNAPII), and MYC, three TFs with diverse roles. Our data and analysis revealed how these factors bind in relation to genomic features and shape gene expression and cell-type specificity. CTCF bound predominantly in intergenic regions while RNAPII and MYC preferentially bound to core promoter regions. CTCF sites were relatively invariant across diverse cell types, while MYC showed the greatest cell-type specificity. MYC and RNAPII co-localized at many of their binding sites and putative target genes. Cell-type specific binding sites, in particular for MYC and RNAPII, were associated with cell-type specific functions. Patterns of binding in relation to gene features were generally conserved across different cell types. RNAPII occupancy was higher over exons than adjacent introns, likely reflecting a link between transcriptional elongation and splicing. TF binding was positively correlated with the expression levels of their putative target genes, but combinatorial binding, in particular of MYC and RNAPII, was even more strongly associated with higher gene expression. These data illuminate how combinatorial binding of transcription factors in diverse cell types is associated with gene expression and cell-type specific biology.
Collapse
Affiliation(s)
- Bum-Kyu Lee
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, Section of Molecular Genetics and Microbiology, University of Texas at Austin, Austin, Texas 78712, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Kaneda A, Fujita T, Anai M, Yamamoto S, Nagae G, Morikawa M, Tsuji S, Oshima M, Miyazono K, Aburatani H. Activation of Bmp2-Smad1 signal and its regulation by coordinated alteration of H3K27 trimethylation in Ras-induced senescence. PLoS Genet 2011; 7:e1002359. [PMID: 22072987 PMCID: PMC3207904 DOI: 10.1371/journal.pgen.1002359] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2011] [Accepted: 09/11/2011] [Indexed: 02/06/2023] Open
Abstract
Cellular senescence involves epigenetic alteration, e.g. loss of H3K27me3 in Ink4a-Arf locus. Using mouse embryonic fibroblast (MEF), we here analyzed transcription and epigenetic alteration during Ras-induced senescence on genome-wide scale by chromatin immunoprecipitation (ChIP)-sequencing and microarray. Bmp2 was the most activated secreted factor with H3K4me3 gain and H3K27me3 loss, whereas H3K4me3 loss and de novo formation of H3K27me3 occurred inversely in repression of nine genes, including two BMP-SMAD inhibitors Smad6 and Noggin. DNA methylation alteration unlikely occurred. Ras-activated cells senesced with nuclear accumulation of phosphorylated SMAD1/5/8. Senescence was bypassed in Ras-activated cells when Bmp2/Smad1 signal was blocked by Bmp2 knockdown, Smad6 induction, or Noggin induction. Senescence was induced when recombinant BMP2 protein was added to Bmp2-knocked-down Ras-activated cells. Downstream Bmp2-Smad1 target genes were then analyzed genome-wide by ChIP-sequencing using anti-Smad1 antibody in MEF that was exposed to BMP2. Smad1 target sites were enriched nearby transcription start sites of genes, which significantly correlated to upregulation by BMP2 stimulation. While Smad6 was one of Smad1 target genes to be upregulated by BMP2 exposure, Smad6 repression in Ras-activated cells with increased enrichment of Ezh2 and gain of H3K27me3 suggested epigenetic disruption of negative feedback by Polycomb. Among Smad1 target genes that were upregulated in Ras-activated cells without increased repressive mark, Parvb was found to contribute to growth inhibition as Parvb knockdown lead to escape from senescence. It was revealed through genome-wide analyses in this study that Bmp2-Smad1 signal and its regulation by harmonized epigenomic alteration play an important role in Ras-induced senescence. To avoid becoming cancer cells, cells have a barrier system to block cellular proliferation by falling into irreversible growth arrest, so-called cellular senescence. For future strategy of cancer treatment, it is important to understand how cancer occurs, and investigation of underlying mechanism in senescence can lead to clarification of carcinogenesis mechanism. Epigenetic mechanism including DNA methylation and histone modification may be important to regulate gene expressions properly in senescence. Here, taking advantage of recent technical and methodological advance of genome-wide analyses, we examine epigenome and gene expression alteration in senescence induced by Ras oncogene. We identify that Bmp2-Smad1 signal is critical. We further examine downstream target genes of this critical signal on a genome-wide scale. We show dynamic and coordinated H3K27me3 alteration, e.g. activation of Bmp2 by loss of H3K27me3, repression of the signal inhibitors and the negative feedback loop by gain of H3K27me3, and selective activation of downstream target genes that may contribute to growth arrest. Our findings are helpful in understanding the importance of epigenetic regulation and a critical signal in the physiological barrier system against oncogenic transformation and the importance of disruption of BMP-SMAD signal in cancer, and they may provide an idea how cancer with Ras mutation occurs.
Collapse
Affiliation(s)
- Atsushi Kaneda
- Genome Science Division, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Shi J, Yang W, Chen M, Du Y, Zhang J, Wang K. AMD, an automated motif discovery tool using stepwise refinement of gapped consensuses. PLoS One 2011; 6:e24576. [PMID: 21931761 PMCID: PMC3171486 DOI: 10.1371/journal.pone.0024576] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2011] [Accepted: 08/14/2011] [Indexed: 11/21/2022] Open
Abstract
Motif discovery is essential for deciphering regulatory codes from high throughput genomic data, such as those from ChIP-chip/seq experiments. However, there remains a lack of effective and efficient methods for the identification of long and gapped motifs in many relevant tools reported to date. We describe here an automated tool that allows for de novo discovery of transcription factor binding sites, regardless of whether the motifs are long or short, gapped or contiguous.
Collapse
Affiliation(s)
- Jiantao Shi
- Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Graduate School of the Chinese Academy of Sciences, Shanghai, China
| | - Wentao Yang
- Shanghai Institute of Hematology and Sino-French Center for Life Science and Genomics, Rui-Jin Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Mingjie Chen
- Shanghai Institute of Hematology and Sino-French Center for Life Science and Genomics, Rui-Jin Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yanzhi Du
- Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ji Zhang
- Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Shanghai Institute of Hematology and Sino-French Center for Life Science and Genomics, Rui-Jin Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
- * E-mail:
| | - Kankan Wang
- Shanghai Institute of Hematology and Sino-French Center for Life Science and Genomics, Rui-Jin Hospital affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
39
|
Uren PJ, Burns SC, Ruan J, Singh KK, Smith AD, Penalva LOF. Genomic analyses of the RNA-binding protein Hu antigen R (HuR) identify a complex network of target genes and novel characteristics of its binding sites. J Biol Chem 2011; 286:37063-6. [PMID: 21890634 DOI: 10.1074/jbc.c111.266882] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The ubiquitously expressed RNA-binding protein Hu antigen R (HuR) or ELAVL1 is implicated in a variety of biological processes as well as being linked with a number of diseases, including cancer. Despite a great deal of prior investigation into HuR, there is still much to learn about its function. We take an important step in this direction by conducting cross-linking and immunoprecipitation and RNA sequencing experiments followed by an extensive computational analysis to determine the characteristics of the HuR binding site and impact on the transcriptome. We reveal that HuR targets predominantly uracil-rich single-stranded stretches of varying size, with a strong conservation of structure and sequence composition. Despite the fact that HuR sites are observed in intronic regions, our data do not support a role for HuR in regulating splicing. HuR sites in 3'-UTRs overlap extensively with predicted microRNA target sites, suggesting interplay between the functions of HuR and microRNAs. Network analysis showed that identified targets containing HuR binding sites in the 3' UTR are highly interconnected.
Collapse
Affiliation(s)
- Philip J Uren
- Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | | | | | | | | | | |
Collapse
|
40
|
Kim JK, Choi S. Probabilistic models for semisupervised discriminative motif discovery in DNA sequences. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1309-1317. [PMID: 21778525 DOI: 10.1109/tcbb.2010.84] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Methods for discriminative motif discovery in DNA sequences identify transcription factor binding sites (TFBSs), searching only for patterns that differentiate two sets (positive and negative sets) of sequences. On one hand, discriminative methods increase the sensitivity and specificity of motif discovery, compared to generative models. On the other hand, generative models can easily exploit unlabeled sequences to better detect functional motifs when labeled training samples are limited. In this paper, we develop a hybrid generative/discriminative model which enables us to make use of unlabeled sequences in the framework of discriminative motif discovery, leading to semisupervised discriminative motif discovery. Numerical experiments on yeast ChIP-chip data for discovering DNA motifs demonstrate that the best performance is obtained between the purely-generative and the purely-discriminative and the semisupervised learning improves the performance when labeled sequences are limited.
Collapse
Affiliation(s)
- Jong Kyoung Kim
- Department of Computer Science, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, Pohang 790-784, Korea.
| | | |
Collapse
|
41
|
Huggins P, Zhong S, Shiff I, Beckerman R, Laptenko O, Prives C, Schulz MH, Simon I, Bar-Joseph Z. DECOD: fast and accurate discriminative DNA motif finding. ACTA ACUST UNITED AC 2011; 27:2361-7. [PMID: 21752801 DOI: 10.1093/bioinformatics/btr412] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Motif discovery is now routinely used in high-throughput studies including large-scale sequencing and proteomics. These datasets present new challenges. The first is speed. Many motif discovery methods do not scale well to large datasets. Another issue is identifying discriminative rather than generative motifs. Such discriminative motifs are important for identifying co-factors and for explaining changes in behavior between different conditions. RESULTS To address these issues we developed a method for DECOnvolved Discriminative motif discovery (DECOD). DECOD uses a k-mer count table and so its running time is independent of the size of the input set. By deconvolving the k-mers DECOD considers context information without using the sequences directly. DECOD outperforms previous methods both in speed and in accuracy when using simulated and real biological benchmark data. We performed new binding experiments for p53 mutants and used DECOD to identify p53 co-factors, suggesting new mechanisms for p53 activation. AVAILABILITY The source code and binaries for DECOD are available at http://www.sb.cs.cmu.edu/DECOD CONTACT: zivbj@cs.cmu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peter Huggins
- Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Wang J, Wang Y, Wang Z, Liu L, Zhu XG, Ma X. Synchronization of cytoplasmic and transferred mitochondrial ribosomal protein gene expression in land plants is linked to Telo-box motif enrichment. BMC Evol Biol 2011; 11:161. [PMID: 21668973 PMCID: PMC3212954 DOI: 10.1186/1471-2148-11-161] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2010] [Accepted: 06/13/2011] [Indexed: 02/08/2023] Open
Abstract
Background Chloroplasts and mitochondria evolved from the endosymbionts of once free-living eubacteria, and they transferred most of their genes to the host nuclear genome during evolution. The mechanisms used by plants to coordinate the expression of such transferred genes, as well as other genes in the host nuclear genome, are still poorly understood. Results In this paper, we use nuclear-encoded chloroplast (cpRPGs), as well as mitochondrial (mtRPGs) and cytoplasmic (euRPGs) ribosomal protein genes to study the coordination of gene expression between organelles and the host. Results show that the mtRPGs, but not the cpRPGs, exhibit strongly synchronized expression with euRPGs in all investigated land plants and that this phenomenon is linked to the presence of a telo-box DNA motif in the promoter regions of mtRPGs and euRPGs. This motif is also enriched in the promoter regions of genes involved in DNA replication. Sequence analysis further indicates that mtRPGs, in contrast to cpRPGs, acquired telo-box from the host nuclear genome. Conclusions Based on our results, we propose a model of plant nuclear genome evolution where coordination of activities in mitochondria and chloroplast and other cellular functions, including cell cycle, might have served as a strong selection pressure for the differential acquisition of telo-box between mtRPGs and cpRPGs. This research also highlights the significance of physiological needs in shaping transcriptional regulatory evolution.
Collapse
Affiliation(s)
- Jie Wang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | | | | | | | | | | |
Collapse
|
43
|
Zhang Z, Zhang MQ. Histone modification profiles are predictive for tissue/cell-type specific expression of both protein-coding and microRNA genes. BMC Bioinformatics 2011; 12:155. [PMID: 21569556 PMCID: PMC3120700 DOI: 10.1186/1471-2105-12-155] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2010] [Accepted: 05/14/2011] [Indexed: 02/04/2023] Open
Abstract
Background Gene expression is regulated at both the DNA sequence level and through modification of chromatin. However, the effect of chromatin on tissue/cell-type specific gene regulation (TCSR) is largely unknown. In this paper, we present a method to elucidate the relationship between histone modification/variation (HMV) and TCSR. Results A classifier for differentiating CD4+ T cell-specific genes from housekeeping genes using HMV data was built. We found HMV in both promoter and gene body regions to be predictive of genes which are targets of TCSR. For example, the histone modification types H3K4me3 and H3K27ac were identified as the most predictive for CpG-related promoters, whereas H3K4me3 and H3K79me3 were the most predictive for nonCpG-related promoters. However, genes targeted by TCSR can be predicted using other type of HMVs as well. Such redundancy implies that multiple type of underlying regulatory elements, such as enhancers or intragenic alternative promoters, which can regulate gene expression in a tissue/cell-type specific fashion, may be marked by the HMVs. Finally, we show that the predictive power of HMV for TCSR is not limited to protein-coding genes in CD4+ T cells, as we successfully predicted TCSR targeted genes in muscle cells, as well as microRNA genes with expression specific to CD4+ T cells, by the same classifier which was trained on HMV data of protein-coding genes in CD4+ T cells. Conclusion We have begun to understand the HMV patterns that guide gene expression in both tissue/cell-type specific and ubiquitous manner.
Collapse
Affiliation(s)
- Zhihua Zhang
- Department of Molecular Cell Biology, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | | |
Collapse
|
44
|
Keilwagen J, Grau J, Paponov IA, Posch S, Strickert M, Grosse I. De-novo discovery of differentially abundant transcription factor binding sites including their positional preference. PLoS Comput Biol 2011; 7:e1001070. [PMID: 21347314 PMCID: PMC3037384 DOI: 10.1371/journal.pcbi.1001070] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2010] [Accepted: 12/28/2010] [Indexed: 11/18/2022] Open
Abstract
Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open-source Java framework Jstacs and as a stand-alone application at http://www.jstacs.de/index.php/Dispom. Binding of transcription factors to promoters of genes, and subsequent enhancement or repression of transcription, is one of the main steps of transcriptional gene regulation. Direct or indirect wet-lab experiments allow the identification of approximate regions potentially bound or regulated by a transcription factor. Subsequently, de-novo motif discovery tools can be used for detecting the precise positions of binding sites. Many traditional tools focus on motifs over-represented in the target regions, which often turn out to be similarly over-represented in the entire genome. In contrast, several recent tools focus on differentially abundant motifs in target regions compared to a control set. As binding sites are often located at some preferred distance to the transcription start site, it is favorable to include this information into de-novo motif discovery. Here, we present Dispom a novel approach for learning differentially abundant motifs and their positional preferences simultaneously, which predicts binding sites with increased accuracy compared to many popular de-novo motif discovery tools. When applying Dispom to promoters of auxin-responsive genes of Arabidopsis thaliana, we find a binding motif slightly different from the canonical auxin-response element, which exhibits a strong positional preference and which is considerably more specific to auxin-responsive genes.
Collapse
Affiliation(s)
- Jens Keilwagen
- Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany.
| | | | | | | | | | | |
Collapse
|
45
|
Mason MJ, Plath K, Zhou Q. Identification of context-dependent motifs by contrasting ChIP binding data. ACTA ACUST UNITED AC 2010; 26:2826-32. [PMID: 20870645 DOI: 10.1093/bioinformatics/btq546] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION DNA binding proteins play crucial roles in the regulation of gene expression. Transcription factors (TFs) activate or repress genes directly while other proteins influence chromatin structure for transcription. Binding sites of a TF exhibit a similar sequence pattern called a motif. However, a one-to-one map does not exist between each TF and motif. Many TFs in a protein family may recognize the same motif with subtle nucleotide differences leading to different binding affinities. Additionally, a particular TF may bind different motifs under certain conditions, for example in the presence of different co-regulators. The availability of genome-wide binding data of multiple collaborative TFs makes it possible to detect such context-dependent motifs. RESULTS We developed a contrast motif finder (CMF) for the de novo identification of motifs that are differentially enriched in two sets of sequences. Applying this method to a number of TF binding datasets from mouse embryonic stem cells, we demonstrate that CMF achieves substantially higher accuracy than several well-known motif finding methods. By contrasting sequences bound by distinct sets of TFs, CMF identified two different motifs that may be recognized by Oct4 dependent on the presence of another co-regulator and detected subtle motif signals that may be associated with potential competitive binding between Sox2 and Tcf3. AVAILABILITY The software CMF is freely available for academic use at www.stat.ucla.edu/∼zhou/CMF.
Collapse
Affiliation(s)
- Mike J Mason
- Department of Statistics, University of California, Los Angeles, CA 90095, USA
| | | | | |
Collapse
|
46
|
Cao L, Yu Y, Bilke S, Walker RL, Mayeenuddin LH, Azorsa DO, Yang F, Pineda M, Helman LJ, Meltzer PS. Genome-wide identification of PAX3-FKHR binding sites in rhabdomyosarcoma reveals candidate target genes important for development and cancer. Cancer Res 2010; 70:6497-508. [PMID: 20663909 PMCID: PMC2922412 DOI: 10.1158/0008-5472.can-10-0582] [Citation(s) in RCA: 193] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The PAX3-FKHR fusion protein is present in a majority of alveolar rhabdomyosarcomas associated with increased aggressiveness and poor prognosis. To better understand the molecular pathogenesis of PAX3-FKHR, we carried out the first, unbiased genome-wide identification of PAX3-FKHR binding sites and associated target genes in alveolar rhabdomyosarcoma. The data shows that PAX3-FKHR binds to the same sites as PAX3 at both MYF5 and MYOD enhancers. The genome-wide analysis reveals that the PAX3-FKHR sites are (a) mostly distal to transcription start sites, (b) conserved, (c) enriched for PAX3 motifs, and (d) strongly associated with genes overexpressed in PAX3-FKHR-positive rhabdomyosarcoma cells and tumors. There is little evidence in our data set for PAX3-FKHR binding at the promoter sequences. The genome-wide analysis further illustrates a strong association between PAX3 and E-box motifs in these binding sites, suggestive of a common coregulation for many target genes. We also provide the first direct evidence that FGFR4 and IGF1R are the targets for PAX3-FKHR. The map of PAX3-FKHR binding sites provides a framework for understanding the pathogenic roles of PAX3-FKHR, as well as its molecular targets to allow a systematic evaluation of agents against this aggressive rhabdomyosarcoma.
Collapse
MESH Headings
- Anaplastic Lymphoma Kinase
- Binding Sites
- Cell Line, Tumor
- E-Box Elements
- Genome, Human
- Genome-Wide Association Study
- Humans
- MyoD Protein/genetics
- MyoD Protein/metabolism
- N-Myc Proto-Oncogene Protein
- Nuclear Proteins/genetics
- Nuclear Proteins/metabolism
- Oncogene Proteins/genetics
- Oncogene Proteins/metabolism
- Oncogene Proteins, Fusion/genetics
- Oncogene Proteins, Fusion/metabolism
- Protein-Tyrosine Kinases/genetics
- Protein-Tyrosine Kinases/metabolism
- Receptor Protein-Tyrosine Kinases
- Receptor, Fibroblast Growth Factor, Type 4/genetics
- Receptor, IGF Type 1/antagonists & inhibitors
- Receptor, IGF Type 1/biosynthesis
- Receptor, IGF Type 1/genetics
- Regulatory Elements, Transcriptional
- Rhabdomyosarcoma, Alveolar/genetics
- Rhabdomyosarcoma, Alveolar/metabolism
- Sarcoma, Ewing/genetics
- Sarcoma, Ewing/metabolism
- Transcription Initiation Site
- Up-Regulation
Collapse
Affiliation(s)
- Liang Cao
- Genetics Branch, Center for Cancer Research, National Cancer Institute, National Human Genome Research Institute, Bethesda, Maryland 20892, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Wen J, Chiba A, Cai X. Computational identification of tissue-specific alternative splicing elements in mouse genes from RNA-Seq. Nucleic Acids Res 2010; 38:7895-907. [PMID: 20685814 PMCID: PMC3001057 DOI: 10.1093/nar/gkq679] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Tissue-specific alternative splicing is a key mechanism for generating tissue-specific proteomic diversity in eukaryotes. Splicing regulatory elements (SREs) in pre-mature messenger RNA play a very important role in regulating alternative splicing. In this article, we use mouse RNA-Seq data to determine a positive data set where SREs are over-represented and a reliable negative data set where the same SREs are most likely under-represented for a specific tissue and then employ a powerful discriminative approach to identify SREs. We identified 456 putative splicing enhancers or silencers, of which 221 were predicted to be tissue-specific. Most of our tissue-specific SREs are likely different from constitutive SREs, since only 18% of our exonic splicing enhancers (ESEs) are contained in constitutive RESCUE-ESEs. A relatively small portion (20%) of our SREs is included in tissue-specific SREs in human identified in two recent studies. In the analysis of position distribution of SREs, we found that a dozen of SREs were biased to a specific region. We also identified two very interesting SREs that can function as an enhancer in one tissue but a silencer in another tissue from the same intronic region. These findings provide insight into the mechanism of tissue-specific alternative splicing and give a set of valuable putative SREs for further experimental investigations.
Collapse
Affiliation(s)
- Ji Wen
- Department of Electrical and Computer Engineering, University of Miami, 1251 Memorial Drive, Coral Gables, FL 33146, USA
| | | | | |
Collapse
|
48
|
Zhou X, Sumazin P, Rajbhandari P, Califano A. A systems biology approach to transcription factor binding site prediction. PLoS One 2010; 5:e9878. [PMID: 20360861 PMCID: PMC2845628 DOI: 10.1371/journal.pone.0009878] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2010] [Accepted: 03/02/2010] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The elucidation of mammalian transcriptional regulatory networks holds great promise for both basic and translational research and remains one the greatest challenges to systems biology. Recent reverse engineering methods deduce regulatory interactions from large-scale mRNA expression profiles and cross-species conserved regulatory regions in DNA. Technical challenges faced by these methods include distinguishing between direct and indirect interactions, associating transcription regulators with predicted transcription factor binding sites (TFBSs), identifying non-linearly conserved binding sites across species, and providing realistic accuracy estimates. METHODOLOGY/PRINCIPAL FINDINGS We address these challenges by closely integrating proven methods for regulatory network reverse engineering from mRNA expression data, linearly and non-linearly conserved regulatory region discovery, and TFBS evaluation and discovery. Using an extensive test set of high-likelihood interactions, which we collected in order to provide realistic prediction-accuracy estimates, we show that a careful integration of these methods leads to significant improvements in prediction accuracy. To verify our methods, we biochemically validated TFBS predictions made for both transcription factors (TFs) and co-factors; we validated binding site predictions made using a known E2F1 DNA-binding motif on E2F1 predicted promoter targets, known E2F1 and JUND motifs on JUND predicted promoter targets, and a de novo discovered motif for BCL6 on BCL6 predicted promoter targets. Finally, to demonstrate accuracy of prediction using an external dataset, we showed that sites matching predicted motifs for ZNF263 are significantly enriched in recent ZNF263 ChIP-seq data. CONCLUSIONS/SIGNIFICANCE Using an integrative framework, we were able to address technical challenges faced by state of the art network reverse engineering methods, leading to significant improvement in direct-interaction detection and TFBS-discovery accuracy. We estimated the accuracy of our framework on a human B-cell specific test set, which may help guide future methodological development.
Collapse
Affiliation(s)
- Xiang Zhou
- Department of Biomedical Informatics (DBMI), Columbia University, New York, New York, United States of America
| | - Pavel Sumazin
- Center for Computational Biology and Bioinformatics (C2B2), Columbia University, New York, New York, United States of America
| | - Presha Rajbhandari
- Center for Computational Biology and Bioinformatics (C2B2), Columbia University, New York, New York, United States of America
| | - Andrea Califano
- Department of Biomedical Informatics (DBMI), Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics (C2B2), Columbia University, New York, New York, United States of America
- Herbert Irving Comprehensive Cancer Center, Columbia University, New York, New York, United States of America
| |
Collapse
|
49
|
Labadorf A, Link A, Rogers MF, Thomas J, Reddy AS, Ben-Hur A. Genome-wide analysis of alternative splicing in Chlamydomonas reinhardtii. BMC Genomics 2010; 11:114. [PMID: 20163725 PMCID: PMC2830987 DOI: 10.1186/1471-2164-11-114] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 02/17/2010] [Indexed: 11/12/2022] Open
Abstract
Background Genome-wide computational analysis of alternative splicing (AS) in several flowering plants has revealed that pre-mRNAs from about 30% of genes undergo AS. Chlamydomonas, a simple unicellular green alga, is part of the lineage that includes land plants. However, it diverged from land plants about one billion years ago. Hence, it serves as a good model system to study alternative splicing in early photosynthetic eukaryotes, to obtain insights into the evolution of this process in plants, and to compare splicing in simple unicellular photosynthetic and non-photosynthetic eukaryotes. We performed a global analysis of alternative splicing in Chlamydomonas reinhardtii using its recently completed genome sequence and all available ESTs and cDNAs. Results Our analysis of AS using BLAT and a modified version of the Sircah tool revealed AS of 498 transcriptional units with 611 events, representing about 3% of the total number of genes. As in land plants, intron retention is the most prevalent form of AS. Retained introns and skipped exons tend to be shorter than their counterparts in constitutively spliced genes. The splice site signals in all types of AS events are weaker than those in constitutively spliced genes. Furthermore, in alternatively spliced genes, the prevalent splice form has a stronger splice site signal than the non-prevalent form. Analysis of constitutively spliced introns revealed an over-abundance of motifs with simple repetitive elements in comparison to introns involved in intron retention. In almost all cases, AS results in a truncated ORF, leading to a coding sequence that is around 50% shorter than the prevalent splice form. Using RT-PCR we verified AS of two genes and show that they produce more isoforms than indicated by EST data. All cDNA/EST alignments and splice graphs are provided in a website at http://combi.cs.colostate.edu/as/chlamy. Conclusions The extent of AS in Chlamydomonas that we observed is much smaller than observed in land plants, but is much higher than in simple unicellular heterotrophic eukaryotes. The percentage of different alternative splicing events is similar to flowering plants. Prevalence of constitutive and alternative splicing in Chlamydomonas, together with its simplicity, many available public resources, and well developed genetic and molecular tools for this organism make it an excellent model system to elucidate the mechanisms involved in regulated splicing in photosynthetic eukaryotes.
Collapse
Affiliation(s)
- Adam Labadorf
- Computer Science Department, Colorado State University, Fort Collins, CO, USA
| | | | | | | | | | | |
Collapse
|
50
|
Zhang Y, Wu W, Cheng Y, King DC, Harris RS, Taylor J, Chiaromonte F, Hardison RC. Primary sequence and epigenetic determinants of in vivo occupancy of genomic DNA by GATA1. Nucleic Acids Res 2010; 37:7024-38. [PMID: 19767611 PMCID: PMC2790884 DOI: 10.1093/nar/gkp747] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
DNA sequence motifs and epigenetic modifications contribute to specific binding by a transcription factor, but the extent to which each feature determines occupancy in vivo is poorly understood. We addressed this question in erythroid cells by identifying DNA segments occupied by GATA1 and measuring the level of trimethylation of histone H3 lysine 27 (H3K27me3) and monomethylation of H3 lysine 4 (H3K4me1) along a 66 Mb region of mouse chromosome 7. While 91% of the GATA1-occupied segments contain the consensus binding-site motif WGATAR, only ∼0.7% of DNA segments with such a motif are occupied. Using a discriminative motif enumeration method, we identified additional motifs predictive of occupancy given the presence of WGATAR. The specific motif variant AGATAA and occurrence of multiple WGATAR motifs are both strong discriminators. Combining motifs to pair a WGATAR motif with a binding site motif for GATA1, EKLF or SP1 improves discriminative power. Epigenetic modifications are also strong determinants, with the factor-bound segments highly enriched for H3K4me1 and depleted of H3K27me3. Combining primary sequence and epigenetic determinants captures 52% of the GATA1-occupied DNA segments and substantially increases the specificity, to one out of seven segments with the required motif combination and epigenetic signals being bound.
Collapse
Affiliation(s)
- Ying Zhang
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of Life Sciences
| | | | | | | | | | | | | | | |
Collapse
|