1
|
FitzPatrick VD, Leemans C, van Arensbergen J, van Steensel B, Bussemaker H. Defining the fine structure of promoter activity on a genome-wide scale with CISSECTOR. Nucleic Acids Res 2023; 51:5499-5511. [PMID: 37013986 PMCID: PMC10287907 DOI: 10.1093/nar/gkad232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 03/08/2023] [Accepted: 03/22/2023] [Indexed: 04/05/2023] Open
Abstract
Classic promoter mutagenesis strategies can be used to study how proximal promoter regions regulate the expression of particular genes of interest. This is a laborious process, in which the smallest sub-region of the promoter still capable of recapitulating expression in an ectopic setting is first identified, followed by targeted mutation of putative transcription factor binding sites. Massively parallel reporter assays such as survey of regulatory elements (SuRE) provide an alternative way to study millions of promoter fragments in parallel. Here we show how a generalized linear model (GLM) can be used to transform genome-scale SuRE data into a high-resolution genomic track that quantifies the contribution of local sequence to promoter activity. This coefficient track helps identify regulatory elements and can be used to predict promoter activity of any sub-region in the genome. It thus allows in silico dissection of any promoter in the human genome to be performed. We developed a web application, available at cissector.nki.nl, that lets researchers easily perform this analysis as a starting point for their research into any promoter of interest.
Collapse
Affiliation(s)
- Vincent D FitzPatrick
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| | - Christ Leemans
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Joris van Arensbergen
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Bas van Steensel
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
- Department of Cell Biology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
2
|
Chen SAA, Kern AF, Ang RML, Xie Y, Fraser HB. Gene-by-environment interactions are pervasive among natural genetic variants. Cell Genom 2023; 3:100273. [PMID: 37082145 PMCID: PMC10112290 DOI: 10.1016/j.xgen.2023.100273] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 10/09/2022] [Accepted: 01/31/2023] [Indexed: 04/22/2023]
Abstract
Gene-by-environment (GxE) interactions, in which a genetic variant's phenotypic effect is condition specific, are fundamental for understanding fitness landscapes and evolution but have been difficult to identify at the single-nucleotide level. Although many condition-specific quantitative trait loci (QTLs) have been mapped, these typically contain numerous inconsequential variants in linkage, precluding understanding of the causal GxE variants. Here, we introduce BARcoded Cas9 retron precise parallel editing via homology (CRISPEY-BAR), a high-throughput precision genome editing strategy, and use it to map GxE interactions of naturally occurring genetic polymorphisms impacting yeast growth. We identified hundreds of GxE variants within condition-specific QTLs, revealing unexpected genetic complexity. Moreover, we found that 93.7% of non-neutral natural variants within ergosterol biosynthesis pathway genes showed GxE interactions, including many impacting antifungal drug resistance through diverse molecular mechanisms. In sum, our results suggest an extremely complex, context-dependent fitness landscape characterized by pervasive GxE interactions while also demonstrating massively parallel genome editing as an effective means for investigating this complexity.
Collapse
Affiliation(s)
- Shi-An A. Chen
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Alexander F. Kern
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Roy Moh Lik Ang
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Yihua Xie
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Hunter B. Fraser
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Corresponding author
| |
Collapse
|
3
|
Majewska M, Wysokińska H, Kuźma Ł, Szymczyk P. Eukaryotic and prokaryotic promoter databases as valuable tools in exploring the regulation of gene transcription: a comprehensive overview. Gene 2017; 644:38-48. [PMID: 29104165 DOI: 10.1016/j.gene.2017.10.079] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 07/26/2017] [Accepted: 10/27/2017] [Indexed: 01/02/2023]
Abstract
The complete exploration of the regulation of gene expression remains one of the top-priority goals for researchers. As the regulation is mainly controlled at the level of transcription by promoters, study on promoters and findings are of great importance. This review summarizes forty selected databases that centralize experimental and theoretical knowledge regarding the organization of promoters, interacting transcription factors (TFs) and microRNAs (miRNAs) in many eukaryotic and prokaryotic species. The presented databases offer researchers valuable support in elucidating the regulation of gene transcription.
Collapse
Affiliation(s)
- Małgorzata Majewska
- Department of Biology and Pharmaceutical Botany, Medical University of Lodz, 90-151 Lodz, Poland.
| | - Halina Wysokińska
- Department of Biology and Pharmaceutical Botany, Medical University of Lodz, 90-151 Lodz, Poland
| | - Łukasz Kuźma
- Department of Biology and Pharmaceutical Botany, Medical University of Lodz, 90-151 Lodz, Poland
| | - Piotr Szymczyk
- Department of Pharmaceutical Biotechnology, Medical University of Lodz, 90-151 Lodz, Poland
| |
Collapse
|
4
|
Drozdova P, Rogoza T, Radchenko E, Lipaeva P, Mironova L. Transcriptional response to the [ISP(+) ] prion of Saccharomyces cerevisiae differs from that induced by the deletion of its structural gene, SFP1. FEMS Yeast Res 2014; 14:1160-70. [PMID: 25227157 DOI: 10.1111/1567-1364.12211] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2014] [Revised: 09/09/2014] [Accepted: 09/09/2014] [Indexed: 12/21/2022] Open
Abstract
Currently, several protein-based genetic determinants, or prions, are described in yeast, and several hundred prion candidates have been predicted. Importantly, many known and potential prion proteins regulate transcription; therefore, prion induction should affect gene expression. While it is generally believed that the prion phenotype should mimic the deletion phenotype, this rule has exceptions. Formed by the transcription factor Sfp1p, [ISP(+) ] is one such exception as the [ISP(+) ] and sfp1Δ strains differ in many phenotypic traits. These data suggest that effects of prion formation by a transcription factor and its absence may affect gene expression in a different way. However, studies addressing this issue are practically absent. Here, we explore how [ISP(+) ] affects gene expression and how these changes correspond to the effect of SFP1 deletion. Our data indicate that the [ISP(+) ]-related expression changes cannot be explained by the inactivation of Sfp1p. Remarkably, most Sfp1p targets are not affected in the [ISP(+) ] strain; instead, the genes upregulated in the [ISP(+) ] strain are enriched in Gcn4p and Aft1p targets. We propose that Sfp1p serves as a part of a regulatory complex, and the activity of this complex may be modulated differently by the absence or prionization of Sfp1p.
Collapse
Affiliation(s)
- Polina Drozdova
- Department of Genetics and Biotechnology, Saint Petersburg State University, St. Petersburg, Russia; Laboratory of Amyloid Biology, Saint Petersburg State University, St. Petersburg, Russia
| | | | | | | | | |
Collapse
|
5
|
Ward LD, Wang J, Bussemaker HJ. Characterizing a collective and dynamic component of chromatin immunoprecipitation enrichment profiles in yeast. BMC Genomics 2014; 15:494. [PMID: 24947676 PMCID: PMC4124144 DOI: 10.1186/1471-2164-15-494] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 05/27/2014] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Recent chromatin immunoprecipitation (ChIP) experiments in fly, mouse, and human have revealed the existence of high-occupancy target (HOT) regions or "hotspots" that show enrichment across many assayed DNA-binding proteins. Similar co-enrichment observed in yeast so far has been treated as artifactual, and has not been fully characterized. RESULTS Here we reanalyze ChIP data from both array-based and sequencing-based experiments to show that in the yeast S. cerevisiae, the collective enrichment phenomenon is strongly associated with proximity to noncoding RNA genes and with nucleosome depletion. DNA sequence motifs that confer binding affinity for the proteins are largely absent from these hotspots, suggesting that protein-protein interactions play a prominent role. The hotspots are condition-specific, suggesting that they reflect a chromatin state or protein state, and are not a static feature of underlying sequence. Additionally, only a subset of all assayed factors is associated with these loci, suggesting that the co-enrichment cannot be simply explained by a chromatin state that is universally more prone to immunoprecipitation. CONCLUSIONS Together our results suggest that the co-enrichment patterns observed in yeast represent transcription factor co-occupancy. More generally, they make clear that great caution must be used when interpreting ChIP enrichment profiles for individual factors in isolation, as they will include factor-specific as well as collective contributions.
Collapse
Affiliation(s)
- Lucas D Ward
- />Department of Biological Sciences, Columbia University, 1212 Amsterdam Ave, New York, NY 10027 USA
- />Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
| | - Junbai Wang
- />Department of Biological Sciences, Columbia University, 1212 Amsterdam Ave, New York, NY 10027 USA
- />Department of Pathology, Oslo University Hospital - The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway
| | - Harmen J Bussemaker
- />Department of Biological Sciences, Columbia University, 1212 Amsterdam Ave, New York, NY 10027 USA
- />Center for Computational Biology and Bioinformatics, Columbia University, 1130 St. Nicholas Ave, New York, NY 10032 USA
| |
Collapse
|
6
|
Lee E, de Ridder J, Kool J, Wessels LF, Bussemaker HJ. Identifying regulatory mechanisms underlying tumorigenesis using locus expression signature analysis. Proc Natl Acad Sci U S A 2014; 111:5747-52. [PMID: 24706889 DOI: 10.1073/pnas.1309293111] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Retroviral insertional mutagenesis is a powerful tool for identifying putative cancer genes in mice. To uncover the regulatory mechanisms by which common insertion loci affect downstream processes, we supplemented genotyping data with genome-wide mRNA expression profiling data for 97 tumors induced by retroviral insertional mutagenesis. We developed locus expression signature analysis, an algorithm to construct and interpret the differential gene expression signature associated with each common insertion locus. Comparing locus expression signatures to promoter affinity profiles allowed us to build a detailed map of transcription factors whose protein-level regulatory activity is modulated by a particular locus. We also predicted a large set of drugs that might mitigate the effect of the insertion on tumorigenesis. Taken together, our results demonstrate the potential of a locus-specific signature approach for identifying mammalian regulatory mechanisms in a cancer context.
Collapse
|
7
|
Abstract
The term “transcriptional network” refers to the mechanism(s) that underlies coordinated expression of genes, typically involving transcription factors (TFs) binding to the promoters of multiple genes, and individual genes controlled by multiple TFs. A multitude of studies in the last two decades have aimed to map and characterize transcriptional networks in the yeast Saccharomyces cerevisiae. We review the methodologies and accomplishments of these studies, as well as challenges we now face. For most yeast TFs, data have been collected on their sequence preferences, in vivo promoter occupancy, and gene expression profiles in deletion mutants. These systematic studies have led to the identification of new regulators of numerous cellular functions and shed light on the overall organization of yeast gene regulation. However, many yeast TFs appear to be inactive under standard laboratory growth conditions, and many of the available data were collected using techniques that have since been improved. Perhaps as a consequence, comprehensive and accurate mapping among TF sequence preferences, promoter binding, and gene expression remains an open challenge. We propose that the time is ripe for renewed systematic efforts toward a complete mapping of yeast transcriptional regulatory mechanisms.
Collapse
|
8
|
Abstract
Gene promoters typically contain multiple transcription factor binding sites (TFBSs), which may vary in affinity for their cognate transcription factors (TFs). One major challenge in studying cis-regulation is to understand how TFBS variants affect gene expression. We studied the in vivo effects of TFBS variants on cis-regulation using synthetic promoters coupled with a thermodynamic model of TF binding. We measured expression driven by each promoter with RNA-seq of transcribed sequence barcodes. This allowed reporter genes to be highly multiplexed and increased our statistical power to detect the effects of TFBS variants. We analyzed the effects of TFBS variants using a thermodynamic framework that models both TF-DNA interactions and TF-TF interactions. We found that this system accurately estimates the in vivo relative affinities of TFBSs and predicts unexpected interactions between several TFBSs. Our results reveal that binding site variants can have complex effects on gene expression due to differences in TFBS affinity for cognate TFs and differences in TFBS specificity for noncognate TFs.
Collapse
Affiliation(s)
- Ilaria Mogno
- Center for Genome Sciences and Systems Biology, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | | | | |
Collapse
|
9
|
Huang SS, Clarke DC, Gosline SJ, Labadorf A, Chouinard CR, Gordon W, Lauffenburger DA, Fraenkel E. Linking proteomic and transcriptional data through the interactome and epigenome reveals a map of oncogene-induced signaling. PLoS Comput Biol. 2013;9:e1002887. [PMID: 23408876 PMCID: PMC3567149 DOI: 10.1371/journal.pcbi.1002887] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2012] [Accepted: 11/30/2012] [Indexed: 02/06/2023] Open
Abstract
Cellular signal transduction generally involves cascades of post-translational protein modifications that rapidly catalyze changes in protein-DNA interactions and gene expression. High-throughput measurements are improving our ability to study each of these stages individually, but do not capture the connections between them. Here we present an approach for building a network of physical links among these data that can be used to prioritize targets for pharmacological intervention. Our method recovers the critical missing links between proteomic and transcriptional data by relating changes in chromatin accessibility to changes in expression and then uses these links to connect proteomic and transcriptome data. We applied our approach to integrate epigenomic, phosphoproteomic and transcriptome changes induced by the variant III mutation of the epidermal growth factor receptor (EGFRvIII) in a cell line model of glioblastoma multiforme (GBM). To test the relevance of the network, we used small molecules to target highly connected nodes implicated by the network model that were not detected by the experimental data in isolation and we found that a large fraction of these agents alter cell viability. Among these are two compounds, ICG-001, targeting CREB binding protein (CREBBP), and PKF118–310, targeting β-catenin (CTNNB1), which have not been tested previously for effectiveness against GBM. At the level of transcriptional regulation, we used chromatin immunoprecipitation sequencing (ChIP-Seq) to experimentally determine the genome-wide binding locations of p300, a transcriptional co-regulator highly connected in the network. Analysis of p300 target genes suggested its role in tumorigenesis. We propose that this general method, in which experimental measurements are used as constraints for building regulatory networks from the interactome while taking into account noise and missing data, should be applicable to a wide range of high-throughput datasets. The ways in which cells respond to changes in their environment are controlled by networks of physical links among the proteins and genes. The initial signal of a change in conditions rapidly passes through these networks from the cytoplasm to the nucleus, where it can lead to long-term alterations in cellular behavior by controlling the expression of genes. These cascades of signaling events underlie many normal biological processes. As a result, being able to map out how these networks change in disease can provide critical insights for new approaches to treatment. We present a computational method for reconstructing these networks by finding links between the rapid short-term changes in proteins and the longer-term changes in gene regulation. This method brings together systematic measurements of protein signaling, genome organization and transcription in the context of protein-protein and protein-DNA interactions. When used to analyze datasets from an oncogene expressing cell line model of human glioblastoma, our approach identifies key nodes that affect cell survival and functional transcriptional regulators.
Collapse
|
10
|
Abstract
Saccharomyces cerevisiae is a primary model for studies of transcriptional control, and the specificities of most yeast transcription factors (TFs) have been determined by multiple methods. However, it is unclear which position weight matrices (PWMs) are most useful; for the roughly 200 TFs in yeast, there are over 1200 PWMs in the literature. To address this issue, we created ScerTF, a comprehensive database of 1226 motifs from 11 different sources. We identified a single matrix for each TF that best predicts in vivo data by benchmarking matrices against chromatin immunoprecipitation and TF deletion experiments. We also used in vivo data to optimize thresholds for identifying regulatory sites with each matrix. To correct for biases from different methods, we developed a strategy to combine matrices. These aligned matrices outperform the best available matrix for several TFs. We used the matrices to predict co-occurring regulatory elements in the genome and identified many known TF combinations. In addition, we predict new combinations and provide evidence of combinatorial regulation from gene expression data. The database is available through a web interface at http://ural.wustl.edu/ScerTF. The site allows users to search the database with a regulatory site or matrix to identify the TFs most likely to bind the input sequence.
Collapse
Affiliation(s)
- Aaron T Spivak
- Department of Genetics, Washington University Medical School, St Louis, MO, USA
| | | |
Collapse
|
11
|
Abstract
The yeast Saccharomyces cerevisiae is a prevalent system for the analysis of transcriptional networks. As a result, multiple DNA-binding sequence specificities (motifs) have been derived for most yeast transcription factors (TFs). However, motifs from different studies are often inconsistent with each other, making subsequent analyses complicated and confusing. Here, we have created YeTFaSCo (The Yeast Transcription Factor Specificity Compendium, http://yetfasco.ccbr.utoronto.ca/), an extensive collection of S. cerevisiae TF specificities. YeTFaSCo differs from related databases by being more comprehensive (including 1709 motifs for 256 proteins or protein complexes), and by evaluating the motifs using multiple objective quality metrics. The metrics include correlation between motif matches and ChIP-chip data, gene expression patterns, and GO terms, as well as motif agreement between different studies. YeTFaSCo also features an index of ‘expert-curated’ motifs, each associated with a confidence assessment. In addition, the database website features tools for motif analysis, including a sequence scanning function and precomputed genome-browser tracks of motif occurrences across the entire yeast genome. Users can also search the database for motifs that are similar to a query motif.
Collapse
Affiliation(s)
- Carl G de Boer
- Department of Molecular Genetics, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
| | | |
Collapse
|
12
|
Zheng J, Benschop JJ, Shales M, Kemmeren P, Greenblatt J, Cagney G, Holstege F, Li H, Krogan NJ. Epistatic relationships reveal the functional organization of yeast transcription factors. Mol Syst Biol 2010; 6:420. [PMID: 20959818 DOI: 10.1038/msb.2010.77] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2009] [Accepted: 08/27/2010] [Indexed: 11/09/2022] Open
Abstract
The regulation of gene expression is, in large part, mediated by interplay between the general transcription factors (GTFs) that function to bring about the expression of many genes and site-specific DNA-binding transcription factors (STFs). Here, quantitative genetic profiling using the epistatic miniarray profile (E-MAP) approach allowed us to measure 48 391 pairwise genetic interactions, both negative (aggravating) and positive (alleviating), between and among genes encoding STFs and GTFs in Saccharomyces cerevisiae. This allowed us to both reconstruct regulatory models for specific subsets of transcription factors and identify global epistatic patterns. Overall, there was a much stronger preference for negative relative to positive genetic interactions among STFs than there was among GTFs. Negative genetic interactions, which often identify factors working in non-essential, redundant pathways, were also enriched for pairs of STFs that co-regulate similar sets of genes. Microarray analysis demonstrated that pairs of STFs that display negative genetic interactions regulate gene expression in an independent rather than coordinated manner. Collectively, these data suggest that parallel/compensating relationships between regulators, rather than linear pathways, often characterize transcriptional circuits.
Collapse
|
13
|
Lee E, Bussemaker HJ. Identifying the genetic determinants of transcription factor activity. Mol Syst Biol 2010; 6:412. [PMID: 20865005 DOI: 10.1038/msb.2010.64] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2010] [Accepted: 06/20/2010] [Indexed: 01/03/2023] Open
Abstract
Genome-wide messenger RNA expression levels are highly heritable. However, the molecular mechanisms underlying this heritability are poorly understood. The influence of trans-acting polymorphisms is often mediated by changes in the regulatory activity of one or more sequence-specific transcription factors (TFs). We use a method that exploits prior information about the DNA-binding specificity of each TF to estimate its genotype-specific regulatory activity. To this end, we perform linear regression of genotype-specific differential mRNA expression on TF-specific promoter-binding affinity. Treating inferred TF activity as a quantitative trait and mapping it across a panel of segregants from an experimental genetic cross allows us to identify trans-acting loci (‘aQTLs') whose allelic variation modulates the TF. A few of these aQTL regions contain the gene encoding the TF itself; several others contain a gene whose protein product is known to interact with the TF. Our method is strictly causal, as it only uses sequence-based features as predictors. Application to budding yeast demonstrates a dramatic increase in statistical power, compared with existing methods, to detect locus-TF associations and trans-acting loci. Our aQTL mapping strategy also succeeds in mouse.
Genetic sequence variation naturally perturbs mRNA expression levels in the cell. In recent years, analysis of parallel genotyping and expression profiling data for segregants from genetic crosses between parental strains has revealed that mRNA expression levels are highly heritable. Expression quantitative trait loci (eQTLs), whose allelic variation regulates the expression level of individual genes, have successfully been identified (Brem et al, 2002; Schadt et al, 2003). The molecular mechanisms underlying the heritability of mRNA expression are poorly understood. However, they are likely to involve mediation by transcription factors (TFs). We present a new transcription-factor-centric method that greatly increases our ability to understand what drives the genetic variation in mRNA expression (Figure 1). Our method identifies genomic loci (‘aQTLs') whose allelic variation modulates the protein-level activity of specific TFs. To map aQTLs, we integrate genotyping and expression profiling data with quantitative prior information about DNA-binding specificity of transcription factors in the form of position-specific affinity matrices (Bussemaker et al, 2007). We applied our method in two different organisms: budding yeast and mouse. In our approach, the inferred TF activity is explicitly treated as a quantitative trait, and genetically mapped. The decrease of ‘phenotype space' from that of all genes (in the eQTL approach) to that of all TFs (in our aQTL approach) increases the statistical power to detect trans-acting loci in two distinct ways. First, as each inferred TF activity is derived from a large number of genes, it is far less noisy than mRNA levels of individual genes. Second, the number of trait/marker combinations that needs to be tested for statistical significance in parallel is roughly two orders of magnitude smaller than for eQTLs. We identified a total of 103 locus-TF associations, a more than six-fold improvement over the 17 locus-TF associations identified by several existing methods (Brem et al, 2002; Yvert et al, 2003; Lee et al, 2006; Smith and Kruglyak, 2008; Zhu et al, 2008). The total number of distinct genomic loci identified as an aQTL equals 31, which includes 11 of the 13 previously identified eQTL hotspots (Smith and Kruglyak, 2008). To better understand the mechanisms underlying the identified genetic linkages, we examined the genes within each aQTL region. First, we found four ‘local' aQTLs, which encompass the gene encoding the TF itself. This includes the known polymorphism in the HAP1 gene (Brem et al, 2002), but also novel predictions of trans-acting polymorphisms in RFX1, STB5, and HAP4. Second, using high-throughput protein–protein interaction data, we identified putative causal genes for several aQTLs. For example, we predict that a polymorphism in the cyclin-dependent kinase CDC28 antagonistically modulates the functionally distinct cell cycle regulators Fkh1 and Fkh2. In this and other cases, our approach naturally accounts for post-translational modulation of TF activity at the protein level. We validated our ability to predict locus-TF associations in yeast using gene expression profiles of allele replacement strains from a previous study (Smith and Kruglyak, 2008). Chromosome 15 contains an aQTL whose allelic status influences the activity of no fewer than 30 distinct TFs. This locus includes IRA2, which controls intracellular cAMP levels. We used the gene expression profile of IRA2 replacement strains to confirm that the polymorphism within IRA2 indeed modulates a subset of the TFs whose activity was predicted to link to this locus, and no other TFs. Application of our approach to mouse data identified an aQTL modulating the activity of a specific TF in liver cells. We identified an aQTL on mouse chromosome 7 for Zscan4, a transcription factor containing four zinc finger domains and a SCAN domain. Even though we could not detect a candidate causal gene for Zscan4p because of lack of information about the mouse genome, our result demonstrates that our method also works in higher eukaryotes. In summary, aQTL mapping has a greatly improved sensitivity to detect molecular mechanisms underlying the heritability of gene expression. The successful application of our approach to yeast and mouse data underscores the value of explicitly treating the inferred TF activity as a quantitative trait for increasing statistical power of detecting trans-acting loci. Furthermore, our method is computationally efficient, and easily applicable to any other organism whenever prior information about the DNA-binding specificity of TFs is available. Analysis of parallel genotyping and expression profiling data has shown that mRNA expression levels are highly heritable. Currently, only a tiny fraction of this genetic variance can be mechanistically accounted for. The influence of trans-acting polymorphisms on gene expression traits is often mediated by transcription factors (TFs). We present a method that exploits prior knowledge about the in vitro DNA-binding specificity of a TF in order to map the loci (‘aQTLs') whose inheritance modulates its protein-level regulatory activity. Genome-wide regression of differential mRNA expression on predicted promoter affinity is used to estimate segregant-specific TF activity, which is subsequently mapped as a quantitative phenotype. In budding yeast, our method identifies six times as many locus-TF associations and more than twice as many trans-acting loci as all existing methods combined. Application to mouse data from an F2 intercross identified an aQTL on chromosome VII modulating the activity of Zscan4 in liver cells. Our method has greatly improved statistical power over existing methods, is mechanism based, strictly causal, computationally efficient, and generally applicable.
Collapse
|
14
|
Abstract
This study presents the Yeast Promoter Atlas (YPA, http://ypa.ee.ncku.edu.tw/ or http://ypa.csbb.ntu.edu.tw/) database, which aims to collect comprehensive promoter features in Saccharomyces cerevisiae. YPA integrates nine kinds of promoter features including promoter sequences, genes’ transcription boundaries—transcription start sites (TSSs), five prime untranslated regions (5′-UTRs) and three prime untranslated regions (3′UTRs), TATA boxes, transcription factor binding sites (TFBSs), nucleosome occupancy, DNA bendability, transcription factor (TF) binding, TF knockout expression and TF–TF physical interaction. YPA is designed to present data in a unified manner as many important observations are revealed only when these promoter features are considered altogether. For example, DNA rigidity can prevent nucleosome packaging, thereby making TFBSs in the rigid DNA regions more accessible to TFs. Integrating nucleosome occupancy, DNA bendability, TF binding, TF knockout expression and TFBS data helps to identify which TFBS is actually functional. In YPA, various promoter features can be accessed in a centralized and organized platform. Researchers can easily view if the TFBSs in an interested promoter are occupied by nucleosomes or located in a rigid DNA segment and know if the expression of the downstream gene responds to the knockout of the corresponding TFs. Compared to other established yeast promoter databases, YPA collects not only TFBSs but also many other promoter features to help biologists study transcriptional regulation.
Collapse
Affiliation(s)
- Darby Tien-Hao Chang
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | | | | | | |
Collapse
|
15
|
Campbell TL, De Silva EK, Olszewski KL, Elemento O, Llinás M. Identification and genome-wide prediction of DNA binding specificities for the ApiAP2 family of regulators from the malaria parasite. PLoS Pathog 2010; 6:e1001165. [PMID: 21060817 PMCID: PMC2965767 DOI: 10.1371/journal.ppat.1001165] [Citation(s) in RCA: 182] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2010] [Accepted: 09/27/2010] [Indexed: 11/18/2022] Open
Abstract
The molecular mechanisms underlying transcriptional regulation in apicomplexan parasites remain poorly understood. Recently, the Apicomplexan AP2 (ApiAP2) family of DNA binding proteins was identified as a major class of transcriptional regulators that are found across all Apicomplexa. To gain insight into the regulatory role of these proteins in the malaria parasite, we have comprehensively surveyed the DNA-binding specificities of all 27 members of the ApiAP2 protein family from Plasmodium falciparum revealing unique binding preferences for the majority of these DNA binding proteins. In addition to high affinity primary motif interactions, we also observe interactions with secondary motifs. The ability of a number of ApiAP2 proteins to bind multiple, distinct motifs significantly increases the potential complexity of the transcriptional regulatory networks governed by the ApiAP2 family. Using these newly identified sequence motifs, we infer the trans-factors associated with previously reported plasmodial cis-elements and provide evidence that ApiAP2 proteins modulate key regulatory decisions at all stages of parasite development. Our results offer a detailed view of ApiAP2 DNA binding specificity and take the first step toward inferring comprehensive gene regulatory networks for P. falciparum.
Collapse
Affiliation(s)
- Tracey L. Campbell
- Department of Molecular Biology & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Erandi K. De Silva
- Department of Molecular Biology & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Kellen L. Olszewski
- Department of Molecular Biology & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Olivier Elemento
- Institute for Computational Medicine, Weill Cornell Medical College, New York, New York, United States of America
| | - Manuel Llinás
- Department of Molecular Biology & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
16
|
Wang J, Morigen. BayesPI - a new model to study protein-DNA interactions: a case study of condition-specific protein binding parameters for Yeast transcription factors. BMC Bioinformatics 2009; 10:345. [PMID: 19857274 PMCID: PMC2771022 DOI: 10.1186/1471-2105-10-345] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2009] [Accepted: 10/20/2009] [Indexed: 11/26/2022] Open
Abstract
Background We have incorporated Bayesian model regularization with biophysical modeling of protein-DNA interactions, and of genome-wide nucleosome positioning to study protein-DNA interactions, using a high-throughput dataset. The newly developed method (BayesPI) includes the estimation of a transcription factor (TF) binding energy matrices, the computation of binding affinity of a TF target site and the corresponding chemical potential. Results The method was successfully tested on synthetic ChIP-chip datasets, real yeast ChIP-chip experiments. Subsequently, it was used to estimate condition-specific and species-specific protein-DNA interaction for several yeast TFs. Conclusion The results revealed that the modification of the protein binding parameters and the variation of the individual nucleotide affinity in either recognition or flanking sequences occurred under different stresses and in different species. The findings suggest that such modifications may be adaptive and play roles in the formation of the environment-specific binding patterns of yeast TFs and in the divergence of TF binding sites across the related yeast species.
Collapse
Affiliation(s)
- Junbai Wang
- Division of Pathology, The Norwegian Radium Hospital, Rikshospitalet University Hospital, Montebello 0310 Oslo, Norway.
| | | |
Collapse
|
17
|
Wang K, Alvarez MJ, Bisikirska BC, Linding R, Basso K, Favera RD, Califano A. Dissecting the interface between signaling and transcriptional regulation in human B cells. Pac Symp Biocomput 2009:264-275. [PMID: 19209707 PMCID: PMC2716143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
A key role of signal transduction pathways is to control transcriptional programs in the nucleus as a function of signals received by the cell via complex post-translational modification cascades. This determines cell-context specific responses to environmental stimuli. Given the difficulty of quantitating protein concentration and post-translational modifications, signaling pathway studies are still for the most part conducted one interaction at the time. Thus, genome-wide, cell-context specific dissection of signaling pathways is still an open challenge in molecular systems biology. In this manuscript we extend the MINDy algorithm for the identification of posttranslational modulators of transcription factor activity, to produce a first genome-wide map of the interface between signaling and transcriptional regulatory programs in human B cells. We show that the serine-threonine kinase STK38 emerges as the most pleiotropic signaling protein in this cellular context and we biochemically validate this finding by shRNA-mediated silencing of this kinase, followed by gene expression profile analysis. We also extensively validate the inferred interactions using protein-protein interaction databases and the kinase-substrate interaction prediction algorithm NetworKIN.
Collapse
Affiliation(s)
- Kai Wang
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- Joint Centers for Systems Biology, Columbia University, New York, NY, USA
| | - Mariano J. Alvarez
- Joint Centers for Systems Biology, Columbia University, New York, NY, USA
| | | | - Rune Linding
- Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada
| | - Katia Basso
- Institute of Cancer Genetics, Columbia University, New York, NY, USA
| | | | - Andrea Califano
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- Joint Centers for Systems Biology, Columbia University, New York, NY, USA
- Institute of Cancer Genetics, Columbia University, New York, NY, USA
| |
Collapse
|
18
|
Ward LD, Bussemaker HJ. Predicting functional transcription factor binding through alignment-free and affinity-based analysis of orthologous promoter sequences. ACTA ACUST UNITED AC 2008; 24:i165-71. [PMID: 18586710 PMCID: PMC2718632 DOI: 10.1093/bioinformatics/btn154] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Motivation: The identification of transcription factor (TF) binding sites and the regulatory circuitry that they define is currently an area of intense research. Data from whole-genome chromatin immunoprecipitation (ChIP–chip), whole-genome expression microarrays, and sequencing of multiple closely related genomes have all proven useful. By and large, existing methods treat the interpretation of functional data as a classification problem (between bound and unbound DNA), and the analysis of comparative data as a problem of local alignment (to recover phylogenetic footprints of presumably functional elements). Both of these approaches suffer from the inability to model and detect low-affinity binding sites, which have recently been shown to be abundant and functional. Results: We have developed a method that discovers functional regulatory targets of TFs by predicting the total affinity of each promoter for those factors and then comparing that affinity across orthologous promoters in closely related species. At each promoter, we consider the minimum affinity among orthologs to be the fraction of the affinity that is functional. Because we calculate the affinity of the entire promoter, our method is independent of local alignment. By comparing with functional annotation information and gene expression data in Saccharomyces cerevisiae, we have validated that this biophysically motivated use of evolutionary conservation gives rise to dramatic improvement in prediction of regulatory connectivity and factor–factor interactions compared to the use of a single genome. We propose novel biological functions for several yeast TFs, including the factors Snt2 and Stb4, for which no function has been reported. Our affinity-based approach towards comparative genomics may allow a more quantitative analysis of the principles governing the evolution of non-coding DNA. Availability: The MatrixREDUCE software package is available from http://www.bussemakerlab.org/software/MatrixREDUCE Contact:Harmen.Bussemaker@columbia.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lucas D Ward
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | | |
Collapse
|