1
|
Ma CZ, Brent MR. Inferring TF activities and activity regulators from gene expression data with constraints from TF perturbation data. Bioinformatics 2021; 37:1234-1245. [PMID: 33135076 PMCID: PMC8189679 DOI: 10.1093/bioinformatics/btaa947] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 09/26/2020] [Accepted: 10/27/2020] [Indexed: 12/20/2022] Open
Abstract
Motivation The activity of a transcription factor (TF) in a sample of cells is the extent to which it is exerting its regulatory potential. Many methods of inferring TF activity from gene expression data have been described, but due to the lack of appropriate large-scale datasets, systematic and objective validation has not been possible until now. Results We systematically evaluate and optimize the approach to TF activity inference in which a gene expression matrix is factored into a condition-independent matrix of control strengths and a condition-dependent matrix of TF activity levels. We find that expression data in which the activities of individual TFs have been perturbed are both necessary and sufficient for obtaining good performance. To a considerable extent, control strengths inferred using expression data from one growth condition carry over to other conditions, so the control strength matrices derived here can be used by others. Finally, we apply these methods to gain insight into the upstream factors that regulate the activities of yeast TFs Gcr2, Gln3, Gcn4 and Msn2. Availability and implementation Evaluation code and data are available at https://doi.org/10.5281/zenodo.4050573. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cynthia Z Ma
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
2
|
Kang Y, Patel NR, Shively C, Recio PS, Chen X, Wranik BJ, Kim G, McIsaac RS, Mitra R, Brent MR. Dual threshold optimization and network inference reveal convergent evidence from TF binding locations and TF perturbation responses. Genome Res 2020; 30:459-471. [PMID: 32060051 PMCID: PMC7111528 DOI: 10.1101/gr.259655.119] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 02/11/2020] [Indexed: 12/22/2022]
Abstract
A high-confidence map of the direct, functional targets of each transcription factor (TF) requires convergent evidence from independent sources. Two significant sources of evidence are TF binding locations and the transcriptional responses to direct TF perturbations. Systematic data sets of both types exist for yeast and human, but they rarely converge on a common set of direct, functional targets for a TF. Even the few genes that are both bound and responsive may not be direct functional targets. Our analysis shows that when there are many nonfunctional binding sites and many indirect targets, nonfunctional sites are expected to occur in the cis-regulatory DNA of indirect targets by chance. To address this problem, we introduce dual threshold optimization (DTO), a new method for setting significance thresholds on binding and perturbation-response data, and show that it improves convergence. It also enables comparison of binding data to perturbation-response data that have been processed by network inference algorithms, which further improves convergence. The combination of dual threshold optimization and network inference greatly expands the high-confidence TF network map in both yeast and human. Next, we analyze a comprehensive new data set measuring the transcriptional response shortly after inducing overexpression of a yeast TF. We also present a new yeast binding location data set obtained by transposon calling cards and compare it to recent ChIP-exo data. These new data sets improve convergence and expand the high-confidence network synergistically.
Collapse
Affiliation(s)
- Yiming Kang
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| | - Nikhil R Patel
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| | - Christian Shively
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Pamela Samantha Recio
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Xuhua Chen
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Bernd J Wranik
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | - Griffin Kim
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | - R Scott McIsaac
- Calico Life Sciences LLC, South San Francisco, California 94080, USA
| | - Robi Mitra
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
3
|
Berest I, Arnold C, Reyes-Palomares A, Palla G, Rasmussen KD, Giles H, Bruch PM, Huber W, Dietrich S, Helin K, Zaugg JB. Quantification of Differential Transcription Factor Activity and Multiomics-Based Classification into Activators and Repressors: diffTF. Cell Rep 2019; 29:3147-3159.e12. [DOI: 10.1016/j.celrep.2019.10.106] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 09/20/2019] [Accepted: 10/28/2019] [Indexed: 12/26/2022] Open
|
4
|
Kang Y, Liow HH, Maier EJ, Brent MR. NetProphet 2.0: mapping transcription factor networks by exploiting scalable data resources. Bioinformatics 2017; 34:249-257. [PMID: 28968736 PMCID: PMC5860202 DOI: 10.1093/bioinformatics/btx563] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 03/14/2017] [Accepted: 09/11/2017] [Indexed: 11/15/2022] Open
Abstract
Motivation Cells process information, in part, through transcription factor (TF) networks, which control the rates at which individual genes produce their products. A TF network map is a graph that indicates which TFs bind and directly regulate each gene. Previous work has described network mapping algorithms that rely exclusively on gene expression data and ‘integrative’ algorithms that exploit a wide range of data sources including chromatin immunoprecipitation sequencing (ChIP-seq) of many TFs, genome-wide chromatin marks, and binding specificities for many TFs determined in vitro. However, such resources are available only for a few major model systems and cannot be easily replicated for new organisms or cell types. Results We present NetProphet 2.0, a ‘data light’ algorithm for TF network mapping, and show that it is more accurate at identifying direct targets of TFs than other, similarly data light algorithms. In particular, it improves on the accuracy of NetProphet 1.0, which used only gene expression data, by exploiting three principles. First, combining multiple approaches to network mapping from expression data can improve accuracy relative to the constituent approaches. Second, TFs with similar DNA binding domains bind similar sets of target genes. Third, even a noisy, preliminary network map can be used to infer DNA binding specificities from promoter sequences and these inferred specificities can be used to further improve the accuracy of the network map. Availability and implementation Source code and comprehensive documentation are freely available at https://github.com/yiming-kang/NetProphet_2.0. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yiming Kang
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| | - Hien-Haw Liow
- Department of Mathematics, Washington University, Saint Louis, MO, USA
| | - Ezekiel J Maier
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| | - Michael R Brent
- Department of Computer Science and Engineering and Center for Genome Sciences and Systems Biology, Washington University, Saint Louis, MO, USA
| |
Collapse
|
5
|
Bussemaker HJ, Causton HC, Fazlollahi M, Lee E, Muroff I. Network-based approaches that exploit inferred transcription factor activity to analyze the impact of genetic variation on gene expression. ACTA ACUST UNITED AC 2017; 2:98-102. [PMID: 28691107 DOI: 10.1016/j.coisb.2017.04.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Over the past decade, a number of methods have emerged for inferring protein-level transcription factor activities in individual samples based on prior information about the structure of the gene regulatory network. We discuss how this has enabled new methods for dissecting trans-acting mechanisms that underpin genetic variation in gene expression.
Collapse
Affiliation(s)
- Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027.,Department of Systems Biology, Columbia University, New York, NY 10032
| | - Helen C Causton
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY 10032
| | - Mina Fazlollahi
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY 10029
| | - Eunjee Lee
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY 10029
| | - Ivor Muroff
- Department of Biological Sciences, Columbia University, New York, NY 10027
| |
Collapse
|
6
|
Brent MR. Past Roadblocks and New Opportunities in Transcription Factor Network Mapping. Trends Genet 2016; 32:736-750. [PMID: 27720190 DOI: 10.1016/j.tig.2016.08.009] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2016] [Revised: 08/12/2016] [Accepted: 08/16/2016] [Indexed: 12/11/2022]
Abstract
One of the principal mechanisms by which cells differentiate and respond to changes in external signals or conditions is by changing the activity levels of transcription factors (TFs). This changes the transcription rates of target genes via the cell's TF network, which ultimately contributes to reconfiguring cellular state. Since microarrays provided our first window into global cellular state, computational biologists have eagerly attacked the problem of mapping TF networks, a key part of the cell's control circuitry. In retrospect, however, steady-state mRNA abundance levels were a poor substitute for TF activity levels and gene transcription rates. Likewise, mapping TF binding through chromatin immunoprecipitation proved less predictive of functional regulation and less amenable to systematic elucidation of complete networks than originally hoped. This review explains these roadblocks and the current, unprecedented blossoming of new experimental techniques built on second-generation sequencing, which hold out the promise of rapid progress in TF network mapping.
Collapse
Affiliation(s)
- Michael R Brent
- Departments of Computer Science and Genetics and Center for Genome Sciences and Systems Biology, Washington University, , Saint Louis, MO, USA.
| |
Collapse
|
7
|
Alvarez MJ, Shen Y, Giorgi FM, Lachmann A, Ding BB, Ye BH, Califano A. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat Genet 2016; 48:838-47. [PMID: 27322546 PMCID: PMC5040167 DOI: 10.1038/ng.3593] [Citation(s) in RCA: 477] [Impact Index Per Article: 59.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 05/23/2016] [Indexed: 01/05/2023]
Abstract
Identifying the multiple dysregulated oncoproteins that contribute to tumorigenesis in a given patient is crucial for developing personalized treatment plans. However, accurate inference of aberrant protein activity in biological samples is still challenging as genetic alterations are only partially predictive and direct measurements of protein activity are generally not feasible. To address this problem we introduce and experimentally validate a new algorithm, VIPER (Virtual Inference of Protein-activity by Enriched Regulon analysis), for the accurate assessment of protein activity from gene expression data. We use VIPER to evaluate the functional relevance of genetic alterations in regulatory proteins across all TCGA samples. In addition to accurately inferring aberrant protein activity induced by established mutations, we also identify a significant fraction of tumors with aberrant activity of druggable oncoproteins—despite a lack of mutations, and vice-versa. In vitro assays confirmed that VIPER-inferred protein activity outperforms mutational analysis in predicting sensitivity to targeted inhibitors.
Collapse
Affiliation(s)
- Mariano J Alvarez
- Department of Systems Biology, Columbia University, New York, New York, USA.,DarwinHealth Inc., New York, New York, USA
| | - Yao Shen
- Department of Systems Biology, Columbia University, New York, New York, USA.,DarwinHealth Inc., New York, New York, USA
| | - Federico M Giorgi
- Department of Systems Biology, Columbia University, New York, New York, USA
| | - Alexander Lachmann
- Department of Systems Biology, Columbia University, New York, New York, USA
| | - B Belinda Ding
- Department of Cell Biology, Albert Einstein College of Medicine, New York, New York, USA
| | - B Hilda Ye
- Department of Cell Biology, Albert Einstein College of Medicine, New York, New York, USA
| | - Andrea Califano
- Department of Systems Biology, Columbia University, New York, New York, USA.,Department of Biomedical Informatics, Columbia University, New York, New York, USA.,Department of Biochemistry &Molecular Biophysics, Columbia University, New York, New York, USA.,Institute for Cancer Genetics, Columbia University, New York, New York, USA.,Motor Neuron Center, Columbia University, New York, New York, USA.,Columbia Initiative in Stem Cells, Columbia University, New York, USA
| |
Collapse
|
8
|
Fazlollahi M, Muroff I, Lee E, Causton HC, Bussemaker HJ. Identifying genetic modulators of the connectivity between transcription factors and their transcriptional targets. Proc Natl Acad Sci U S A 2016; 113:E1835-43. [PMID: 26966232 DOI: 10.1073/pnas.1517140113] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Regulation of gene expression by transcription factors (TFs) is highly dependent on genetic background and interactions with cofactors. Identifying specific context factors is a major challenge that requires new approaches. Here we show that exploiting natural variation is a potent strategy for probing functional interactions within gene regulatory networks. We developed an algorithm to identify genetic polymorphisms that modulate the regulatory connectivity between specific transcription factors and their target genes in vivo. As a proof of principle, we mapped connectivity quantitative trait loci (cQTLs) using parallel genotype and gene expression data for segregants from a cross between two strains of the yeast Saccharomyces cerevisiae We identified a nonsynonymous mutation in the DIG2 gene as a cQTL for the transcription factor Ste12p and confirmed this prediction empirically. We also identified three polymorphisms in TAF13 as putative modulators of regulation by Gcn4p. Our method has potential for revealing how genetic differences among individuals influence gene regulatory networks in any organism for which gene expression and genotype data are available along with information on binding preferences for transcription factors.
Collapse
|
9
|
Barah P, B N MN, Jayavelu ND, Sowdhamini R, Shameer K, Bones AM. Transcriptional regulatory networks in Arabidopsis thaliana during single and combined stresses. Nucleic Acids Res 2015; 44:3147-64. [PMID: 26681689 PMCID: PMC4838348 DOI: 10.1093/nar/gkv1463] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 11/28/2015] [Indexed: 11/25/2022] Open
Abstract
Differentially evolved responses to various stress conditions in plants are controlled by complex regulatory circuits of transcriptional activators, and repressors, such as transcription factors (TFs). To understand the general and condition-specific activities of the TFs and their regulatory relationships with the target genes (TGs), we have used a homogeneous stress gene expression dataset generated on ten natural ecotypes of the model plant Arabidopsis thaliana, during five single and six combined stress conditions. Knowledge-based profiles of binding sites for 25 stress-responsive TF families (187 TFs) were generated and tested for their enrichment in the regulatory regions of the associated TGs. Condition-dependent regulatory sub-networks have shed light on the differential utilization of the underlying network topology, by stress-specific regulators and multifunctional regulators. The multifunctional regulators maintain the core stress response processes while the transient regulators confer the specificity to certain conditions. Clustering patterns of transcription factor binding sites (TFBS) have reflected the combinatorial nature of transcriptional regulation, and suggested the putative role of the homotypic clusters of TFBS towards maintaining transcriptional robustness against cis-regulatory mutations to facilitate the preservation of stress response processes. The Gene Ontology enrichment analysis of the TGs reflected sequential regulation of stress response mechanisms in plants.
Collapse
Affiliation(s)
- Pankaj Barah
- Cell, Molecular Biology and Genomics Group, Department of Biology, Norwegian University of Science and Technology, Trondheim N-7491, Norway
| | - Mahantesha Naika B N
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK campus, Bangalore 560 065, India
| | - Naresh Doni Jayavelu
- Department of Chemical Engineering, Norwegian University of Science and Technology, Trondheim N-7491, Norway
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK campus, Bangalore 560 065, India
| | - Khader Shameer
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK campus, Bangalore 560 065, India
| | - Atle M Bones
- Cell, Molecular Biology and Genomics Group, Department of Biology, Norwegian University of Science and Technology, Trondheim N-7491, Norway
| |
Collapse
|
10
|
Liu Q, Su PF, Zhao S, Shyr Y. Transcriptome-wide signatures of tumor stage in kidney renal clear cell carcinoma: connecting copy number variation, methylation and transcription factor activity. Genome Med 2014; 6:117. [PMID: 25648588 PMCID: PMC4293006 DOI: 10.1186/s13073-014-0117-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Accepted: 11/26/2014] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Comparative analysis of expression profiles between early and late stage cancers can help to understand cancer progression and metastasis mechanisms and to predict the clinical aggressiveness of cancer. The observed stage-dependent expression changes can be explained by genetic and epigenetic alterations as well as transcription dysregulation. Unlike genetic and epigenetic alterations, however, activity changes of transcription factors, generally occurring at the post-transcriptional or post-translational level, are hard to detect and quantify. METHODS Here we developed a statistical framework to infer the activity changes of transcription factors by simultaneously taking into account the contributions of genetic and epigenetic alterations to mRNA expression variations. RESULTS Applied to kidney renal clear cell carcinoma (KIRC), the model underscored the role of methylation as a significant contributor to stage-dependent expression alterations and identified key transcription factors as potential drivers of cancer progression. CONCLUSIONS Integrating copy number, methylation, and transcription factor activity signatures to explain stage-dependent expression alterations presented a precise and comprehensive view on the underlying mechanisms during KIRC progression.
Collapse
Affiliation(s)
- Qi Liu
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, TN 37232 USA ; Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232 USA
| | - Pei-Fang Su
- Department of Statistics, National Cheng Kung University, Tainan, 70101 Taiwan
| | - Shilin Zhao
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, TN 37232 USA
| | - Yu Shyr
- Center for Quantitative Sciences, Vanderbilt University School of Medicine, Nashville, TN 37232 USA ; Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232 USA ; Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN 37232 USA ; School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240 China
| |
Collapse
|
11
|
Paul E, Zhu ZI, Landsman D, Morse RH. Genome-wide association of mediator and RNA polymerase II in wild-type and mediator mutant yeast. Mol Cell Biol 2015; 35:331-42. [PMID: 25368384 DOI: 10.1128/MCB.00991-14] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Mediator is a large, multisubunit complex that is required for essentially all mRNA transcription in eukaryotes. In spite of the importance of Mediator, the range of its targets and how it is recruited to these is not well understood. Previous work showed that in Saccharomyces cerevisiae, Mediator contributes to transcriptional activation by two distinct mechanisms, one depending on the tail module triad and favoring SAGA-regulated genes, and the second occurring independently of the tail module and favoring TFIID-regulated genes. Here, we use chromatin immunoprecipitation sequencing (ChIP-seq) to show that dependence on tail module subunits for Mediator recruitment and polymerase II (Pol II) association occurs preferentially at SAGA-regulated over TFIID-regulated genes on a genome-wide scale. We also show that recruitment of tail module subunits to active gene promoters continues genome-wide when Mediator integrity is compromised in med17 temperature-sensitive (ts) yeast, demonstrating the modular nature of the Mediator complex in vivo. In addition, our data indicate that promoters exhibiting strong and stable occupancy by Mediator have a wide range of activity and are enriched for targets of the Tup1-Cyc8 repressor complex. We also identify a number of strong Mediator occupancy peaks that overlap dubious open reading frames (ORFs) and are likely to include previously unrecognized upstream activator sequences.
Collapse
|
12
|
Bosio MC, Negri R, Dieci G. Promoter architectures in the yeast ribosomal expression program. Transcription 2014; 2:71-77. [PMID: 21468232 DOI: 10.4161/trns.2.2.14486] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2010] [Revised: 12/15/2010] [Accepted: 12/16/2010] [Indexed: 12/13/2022] Open
Abstract
Ribosome biogenesis begins with the orchestrated expression of hundreds of genes, including the three large classes of ribosomal protein, ribosome biogenesis and snoRNA genes. Current knowledge about the corresponding promoters suggests the existence of novel class-specific transcriptional strategies and crosstalk between telomere length and cell growth control.
Collapse
Affiliation(s)
- Maria Cristina Bosio
- Dipartimento di Biochimica e Biologia Molecolare; Università degli Studi di Parma; Parma
| | | | | |
Collapse
|
13
|
Fazlollahi M, Lee E, Muroff I, Lu XJ, Gomez-Alcala P, Causton HC, Bussemaker HJ. Harnessing natural sequence variation to dissect posttranscriptional regulatory networks in yeast. G3 (Bethesda) 2014; 4:1539-53. [PMID: 24938291 DOI: 10.1534/g3.114.012039] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Understanding how genomic variation influences phenotypic variation through the molecular networks of the cell is one of the central challenges of biology. Transcriptional regulation has received much attention, but equally important is the posttranscriptional regulation of mRNA stability. Here we applied a systems genetics approach to dissect posttranscriptional regulatory networks in the budding yeast Saccharomyces cerevisiae. Quantitative sequence-to-affinity models were built from high-throughput in vivo RNA binding protein (RBP) binding data for 15 yeast RBPs. Integration of these models with genome-wide mRNA expression data allowed us to estimate protein-level RBP regulatory activity for individual segregants from a genetic cross between two yeast strains. Treating these activities as a quantitative trait, we mapped trans-acting loci (activity quantitative trait loci, or aQTLs) that act via posttranscriptional regulation of transcript stability. We predicted and experimentally confirmed that a coding polymorphism at the IRA2 locus modulates Puf4p activity. Our results also indicate that Puf3p activity is modulated by distinct loci, depending on whether it acts via the 5′ or the 3′ untranslated region of its target mRNAs. Together, our results validate a general strategy for dissecting the connectivity between posttranscriptional regulators and their upstream signaling pathways.
Collapse
|
14
|
Abstract
The term “transcriptional network” refers to the mechanism(s) that underlies coordinated expression of genes, typically involving transcription factors (TFs) binding to the promoters of multiple genes, and individual genes controlled by multiple TFs. A multitude of studies in the last two decades have aimed to map and characterize transcriptional networks in the yeast Saccharomyces cerevisiae. We review the methodologies and accomplishments of these studies, as well as challenges we now face. For most yeast TFs, data have been collected on their sequence preferences, in vivo promoter occupancy, and gene expression profiles in deletion mutants. These systematic studies have led to the identification of new regulators of numerous cellular functions and shed light on the overall organization of yeast gene regulation. However, many yeast TFs appear to be inactive under standard laboratory growth conditions, and many of the available data were collected using techniques that have since been improved. Perhaps as a consequence, comprehensive and accurate mapping among TF sequence preferences, promoter binding, and gene expression remains an open challenge. We propose that the time is ripe for renewed systematic efforts toward a complete mapping of yeast transcriptional regulatory mechanisms.
Collapse
|
15
|
Tepper RG, Ashraf J, Kaletsky R, Kleemann G, Murphy CT, Bussemaker HJ. PQM-1 complements DAF-16 as a key transcriptional regulator of DAF-2-mediated development and longevity. Cell 2013; 154:676-90. [PMID: 23911329 DOI: 10.1016/j.cell.2013.07.006] [Citation(s) in RCA: 196] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2012] [Revised: 04/02/2013] [Accepted: 07/02/2013] [Indexed: 01/27/2023]
Abstract
Reduced insulin/IGF-1-like signaling (IIS) extends C. elegans lifespan by upregulating stress response (class I) and downregulating other (class II) genes through a mechanism that depends on the conserved transcription factor DAF-16/FOXO. By integrating genome-wide mRNA expression responsiveness to DAF-16 with genome-wide in vivo binding data for a compendium of transcription factors, we discovered that PQM-1 is the elusive transcriptional activator that directly controls development (class II) genes by binding to the DAF-16-associated element (DAE). DAF-16 directly regulates class I genes only, through the DAF-16-binding element (DBE). Loss of PQM-1 suppresses daf-2 longevity and further slows development. Surprisingly, the nuclear localization of PQM-1 and DAF-16 is controlled by IIS in opposite ways and was also found to be mutually antagonistic. We observe progressive loss of nuclear PQM-1 with age, explaining declining expression of PQM-1 targets. Together, our data suggest an elegant mechanism for balancing stress response and development.
Collapse
|
16
|
Coulombe-Huntington J, Xia Y. Regulatory network structure as a dominant determinant of transcription factor evolutionary rate. PLoS Comput Biol 2012; 8:e1002734. [PMID: 23093926 DOI: 10.1371/journal.pcbi.1002734] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2012] [Accepted: 08/21/2012] [Indexed: 01/10/2023] Open
Abstract
The evolution of transcriptional regulatory networks has thus far mostly been studied at the level of cis-regulatory elements. To gain a complete understanding of regulatory network evolution we must also study the evolutionary role of trans-factors, such as transcription factors (TFs). Here, we systematically assess genomic and network-level determinants of TF evolutionary rate in yeast, and how they compare to those of generic proteins, while carefully controlling for differences of the TF protein set, such as expression level. We found significantly distinct trends relating TF evolutionary rate to mRNA expression level, codon adaptation index, the evolutionary rate of physical interaction partners, and, confirming previous reports, to protein-protein interaction degree and regulatory in-degree. We discovered that for TFs, the dominant determinants of evolutionary rate lie in the structure of the regulatory network, such as the median evolutionary rate of target genes and the fraction of species-specific target genes. Decomposing the regulatory network by edge sign, we found that this modular evolution of TFs and their targets is limited to activating regulatory relationships. We show that fast evolving TFs tend to regulate other TFs and niche-specific processes and that their targets show larger evolutionary expression changes than targets of other TFs. We also show that the positive trend relating TF regulatory in-degree and evolutionary rate is likely related to the species-specificity of the transcriptional regulation modules. Finally, we discuss likely causes for TFs' different evolutionary relationship to the physical interaction network, such as the prevalence of transient interactions in the TF subnetwork. This work suggests that positive and negative regulatory networks follow very different evolutionary rules, and that transcription factor evolution is best understood at a network- or systems-level.
Collapse
|
17
|
Aguilar D, Oliva B. Functional and topological characterization of transcriptional cooperativity in yeast. BMC Res Notes 2012; 5:227. [PMID: 22574744 PMCID: PMC3499397 DOI: 10.1186/1756-0500-5-227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2012] [Accepted: 04/27/2012] [Indexed: 11/30/2022] Open
Abstract
Background Many cellular programs are regulated through the integration of specific transcriptional signals originated from external stimuli, being cooperation between transcription factors a key feature in this process. In this work, we studied how transcriptional cooperativity in yeast is aimed at integrating different regulatory inputs rather than controlling particular cellular functions from a organizational, evolutionary and functional point of view. Findings Our results showed that cooperative transcription factor pairs co-evolve and are essential for the life of the cell. When organized into a layered regulatory network, we observed that cooperative transcription factors were preferentially placed in the middle layers, which highlights a role in regulatory signal integration. We also observed significant co-activity and co-evolution between members of the same cooperative pairs, but a lack of common co-expression profile. Conclusions Our results suggest that transcriptional cooperativity has a specific role within the regulatory control scheme of the cell, focused in the amplification and integration of cellular signals rather than control of particular cellular functions. This information can be used for better characterization of regulatory interactions between transcription factors, aimed at determining the spatial and temporal control of gene expression.
Collapse
Affiliation(s)
- Daniel Aguilar
- Structural Bioinformatics Group, Departament de Ciencies Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona Biomedical Research Park, c/Dr, Aiguader 88, 08003 Barcelona, Spain.
| | | |
Collapse
|
18
|
Zhang X, Cheng W, Listgarten J, Kadie C, Huang S, Wang W, Heckerman D. Learning transcriptional regulatory relationships using sparse graphical models. PLoS One 2012; 7:e35762. [PMID: 22586449 PMCID: PMC3346750 DOI: 10.1371/journal.pone.0035762] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Accepted: 03/21/2012] [Indexed: 11/19/2022] Open
Abstract
Understanding the organization and function of transcriptional regulatory networks by analyzing high-throughput gene expression profiles is a key problem in computational biology. The challenges in this work are 1) the lack of complete knowledge of the regulatory relationship between the regulators and the associated genes, 2) the potential for spurious associations due to confounding factors, and 3) the number of parameters to learn is usually larger than the number of available microarray experiments. We present a sparse (L1 regularized) graphical model to address these challenges. Our model incorporates known transcription factors and introduces hidden variables to represent possible unknown transcription and confounding factors. The expression level of a gene is modeled as a linear combination of the expression levels of known transcription factors and hidden factors. Using gene expression data covering 39,296 oligonucleotide probes from 1109 human liver samples, we demonstrate that our model better predicts out-of-sample data than a model with no hidden variables. We also show that some of the gene sets associated with hidden variables are strongly correlated with Gene Ontology categories. The software including source code is available at http://grnl1.codeplex.com.
Collapse
Affiliation(s)
- Xiang Zhang
- Microsoft Research, Los Angeles, California, United States of America
- Case Western Reserve University, Cleveland, Ohio, United States of America
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Wei Cheng
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | | | - Carl Kadie
- Microsoft Research, Los Angeles, California, United States of America
| | - Shunping Huang
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Wei Wang
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - David Heckerman
- Microsoft Research, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
19
|
Abstract
MOTIVATION Several statistical tests are available to detect the enrichment of differential expression in gene sets. Such tests were originally proposed for analyzing gene sets associated with biological processes. The objective evaluation of tests on real measurements has not been possible as it is difficult to decide a priori, which processes will be affected in given experiments. RESULTS We present a first large study to rigorously assess and compare the performance of gene set enrichment tests on real expression measurements. Gene sets are defined based on the targets of given regulators such as transcription factors (TFs) and microRNAs (miRNAs). In contrast to processes, TFs and miRNAs are amenable to direct perturbations, e.g. regulator over-expression or deletion. We assess the ability of 14 different statistical tests to predict the perturbations from expression measurements in Escherichia coli, Saccharomyces cerevisiae and human. We also analyze how performance depends on the quality and comprehensiveness of the regulator targets via a permutation approach. We find that ANOVA and Wilcoxons test consistently perform better than for instance Kolmogorov-Smirnov and hypergeometric tests. For scenarios where the optimal test is not known, we suggest to combine all evaluated tests into an unweighted consensus, which also performs well in our assessment. Our results provide a guide for the selection of existing tests as well as a basis for the development and assessment of novel tests.
Collapse
Affiliation(s)
- Haroon Naeem
- Department of Informatics, Ludwig-Maximilians Universität, Amalienstrasse 17, Munich, Germany
| | | | | | | |
Collapse
|
20
|
Marbach D, Roy S, Ay F, Meyer PE, Candeias R, Kahveci T, Bristow CA, Kellis M. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks. Genome Res 2012; 22:1334-49. [PMID: 22456606 PMCID: PMC3396374 DOI: 10.1101/gr.127191.111] [Citation(s) in RCA: 89] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level.
Collapse
Affiliation(s)
- Daniel Marbach
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Hanlon SE, Rizzo JM, Tatomer DC, Lieb JD, Buck MJ. The stress response factors Yap6, Cin5, Phd1, and Skn7 direct targeting of the conserved co-repressor Tup1-Ssn6 in S. cerevisiae. PLoS One 2011; 6:e19060. [PMID: 21552514 PMCID: PMC3084262 DOI: 10.1371/journal.pone.0019060] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2011] [Accepted: 03/23/2011] [Indexed: 11/19/2022] Open
Abstract
Maintaining the proper expression of the transcriptome during development or in response to a changing environment requires a delicate balance between transcriptional regulators with activating and repressing functions. The budding yeast transcriptional co-repressor Tup1-Ssn6 is a model for studying similar repressor complexes in multicellular eukaryotes. Tup1-Ssn6 does not bind DNA directly, but is directed to individual promoters by one or more DNA-binding proteins, referred to as Tup1 recruiters. This functional architecture allows the Tup1-Ssn6 to modulate the expression of genes required for the response to a variety of cellular stresses. To understand the targeting or the Tup1-Ssn6 complex, we determined the genomic distribution of Tup1 and Ssn6 by ChIP-chip. We found that most loci bound by Tup1-Ssn6 could not be explained by co-occupancy with a known recruiting cofactor and that deletion of individual known Tup1 recruiters did not significantly alter the Tup1 binding profile. These observations suggest that new Tup1 recruiting proteins remain to be discovered and that Tup1 recruitment typically depends on multiple recruiting cofactors. To identify new recruiting proteins, we computationally screened for factors with binding patterns similar to the observed Tup1-Ssn6 genomic distribution. Four top candidates, Cin5, Skn7, Phd1, and Yap6, all known to be associated with stress response gene regulation, were experimentally confirmed to physically interact with Tup1 and/or Ssn6. Incorporating these new recruitment cofactors with previously characterized cofactors now explains the majority of Tup1 targeting across the genome, and expands our understanding of the mechanism by which Tup1-Ssn6 is directed to its targets.
Collapse
Affiliation(s)
- Sean E. Hanlon
- Department of Biology, Carolina Center for Genome Sciences and the Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Jason M. Rizzo
- Department of Biochemistry and the Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, New York, United States of America
| | - Deirdre C. Tatomer
- Department of Biology, Carolina Center for Genome Sciences and the Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Jason D. Lieb
- Department of Biology, Carolina Center for Genome Sciences and the Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- * E-mail: (JDL); (MJB)
| | - Michael J. Buck
- Department of Biochemistry and the Center of Excellence in Bioinformatics and Life Sciences, State University of New York at Buffalo, Buffalo, New York, United States of America
- * E-mail: (JDL); (MJB)
| |
Collapse
|
22
|
Abstract
Transcription factor activity is largely regulated through post-translational modification. Here, we report the first integrative model of transcription that includes both interactions between transcription factors and promoters, and between transcription factors and modifying enzymes. Simulations indicate that our method is robust against noise. We validated our tool on a well-studied stress response network in yeast and on a STAT1-mediated regulatory network in human B cells. Our work represents a significant step toward a comprehensive model of gene transcription.
Collapse
Affiliation(s)
- Logan J Everett
- Genomics and Computational Biology Program, 700 Clinical Research Building, 415 Curie Boulevard, Philadelphia, PA 19104, USA.
| | | | | |
Collapse
|
23
|
Lee E, Bussemaker HJ. Identifying the genetic determinants of transcription factor activity. Mol Syst Biol 2010; 6:412. [PMID: 20865005 DOI: 10.1038/msb.2010.64] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2010] [Accepted: 06/20/2010] [Indexed: 01/03/2023] Open
Abstract
Genome-wide messenger RNA expression levels are highly heritable. However, the molecular mechanisms underlying this heritability are poorly understood. The influence of trans-acting polymorphisms is often mediated by changes in the regulatory activity of one or more sequence-specific transcription factors (TFs). We use a method that exploits prior information about the DNA-binding specificity of each TF to estimate its genotype-specific regulatory activity. To this end, we perform linear regression of genotype-specific differential mRNA expression on TF-specific promoter-binding affinity. Treating inferred TF activity as a quantitative trait and mapping it across a panel of segregants from an experimental genetic cross allows us to identify trans-acting loci (‘aQTLs') whose allelic variation modulates the TF. A few of these aQTL regions contain the gene encoding the TF itself; several others contain a gene whose protein product is known to interact with the TF. Our method is strictly causal, as it only uses sequence-based features as predictors. Application to budding yeast demonstrates a dramatic increase in statistical power, compared with existing methods, to detect locus-TF associations and trans-acting loci. Our aQTL mapping strategy also succeeds in mouse.
Genetic sequence variation naturally perturbs mRNA expression levels in the cell. In recent years, analysis of parallel genotyping and expression profiling data for segregants from genetic crosses between parental strains has revealed that mRNA expression levels are highly heritable. Expression quantitative trait loci (eQTLs), whose allelic variation regulates the expression level of individual genes, have successfully been identified (Brem et al, 2002; Schadt et al, 2003). The molecular mechanisms underlying the heritability of mRNA expression are poorly understood. However, they are likely to involve mediation by transcription factors (TFs). We present a new transcription-factor-centric method that greatly increases our ability to understand what drives the genetic variation in mRNA expression (Figure 1). Our method identifies genomic loci (‘aQTLs') whose allelic variation modulates the protein-level activity of specific TFs. To map aQTLs, we integrate genotyping and expression profiling data with quantitative prior information about DNA-binding specificity of transcription factors in the form of position-specific affinity matrices (Bussemaker et al, 2007). We applied our method in two different organisms: budding yeast and mouse. In our approach, the inferred TF activity is explicitly treated as a quantitative trait, and genetically mapped. The decrease of ‘phenotype space' from that of all genes (in the eQTL approach) to that of all TFs (in our aQTL approach) increases the statistical power to detect trans-acting loci in two distinct ways. First, as each inferred TF activity is derived from a large number of genes, it is far less noisy than mRNA levels of individual genes. Second, the number of trait/marker combinations that needs to be tested for statistical significance in parallel is roughly two orders of magnitude smaller than for eQTLs. We identified a total of 103 locus-TF associations, a more than six-fold improvement over the 17 locus-TF associations identified by several existing methods (Brem et al, 2002; Yvert et al, 2003; Lee et al, 2006; Smith and Kruglyak, 2008; Zhu et al, 2008). The total number of distinct genomic loci identified as an aQTL equals 31, which includes 11 of the 13 previously identified eQTL hotspots (Smith and Kruglyak, 2008). To better understand the mechanisms underlying the identified genetic linkages, we examined the genes within each aQTL region. First, we found four ‘local' aQTLs, which encompass the gene encoding the TF itself. This includes the known polymorphism in the HAP1 gene (Brem et al, 2002), but also novel predictions of trans-acting polymorphisms in RFX1, STB5, and HAP4. Second, using high-throughput protein–protein interaction data, we identified putative causal genes for several aQTLs. For example, we predict that a polymorphism in the cyclin-dependent kinase CDC28 antagonistically modulates the functionally distinct cell cycle regulators Fkh1 and Fkh2. In this and other cases, our approach naturally accounts for post-translational modulation of TF activity at the protein level. We validated our ability to predict locus-TF associations in yeast using gene expression profiles of allele replacement strains from a previous study (Smith and Kruglyak, 2008). Chromosome 15 contains an aQTL whose allelic status influences the activity of no fewer than 30 distinct TFs. This locus includes IRA2, which controls intracellular cAMP levels. We used the gene expression profile of IRA2 replacement strains to confirm that the polymorphism within IRA2 indeed modulates a subset of the TFs whose activity was predicted to link to this locus, and no other TFs. Application of our approach to mouse data identified an aQTL modulating the activity of a specific TF in liver cells. We identified an aQTL on mouse chromosome 7 for Zscan4, a transcription factor containing four zinc finger domains and a SCAN domain. Even though we could not detect a candidate causal gene for Zscan4p because of lack of information about the mouse genome, our result demonstrates that our method also works in higher eukaryotes. In summary, aQTL mapping has a greatly improved sensitivity to detect molecular mechanisms underlying the heritability of gene expression. The successful application of our approach to yeast and mouse data underscores the value of explicitly treating the inferred TF activity as a quantitative trait for increasing statistical power of detecting trans-acting loci. Furthermore, our method is computationally efficient, and easily applicable to any other organism whenever prior information about the DNA-binding specificity of TFs is available. Analysis of parallel genotyping and expression profiling data has shown that mRNA expression levels are highly heritable. Currently, only a tiny fraction of this genetic variance can be mechanistically accounted for. The influence of trans-acting polymorphisms on gene expression traits is often mediated by transcription factors (TFs). We present a method that exploits prior knowledge about the in vitro DNA-binding specificity of a TF in order to map the loci (‘aQTLs') whose inheritance modulates its protein-level regulatory activity. Genome-wide regression of differential mRNA expression on predicted promoter affinity is used to estimate segregant-specific TF activity, which is subsequently mapped as a quantitative phenotype. In budding yeast, our method identifies six times as many locus-TF associations and more than twice as many trans-acting loci as all existing methods combined. Application to mouse data from an F2 intercross identified an aQTL on chromosome VII modulating the activity of Zscan4 in liver cells. Our method has greatly improved statistical power over existing methods, is mechanism based, strictly causal, computationally efficient, and generally applicable.
Collapse
|
24
|
Liu Q, Tan Y, Huang T, Ding G, Tu Z, Liu L, Li Y, Dai H, Xie L. TF-centered downstream gene set enrichment analysis: Inference of causal regulators by integrating TF-DNA interactions and protein post-translational modifications information. BMC Bioinformatics 2010; 11 Suppl 11:S5. [PMID: 21172055 PMCID: PMC3024863 DOI: 10.1186/1471-2105-11-s11-s5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Background Inference of causal regulators responsible for gene expression changes under different conditions is of great importance but remains rather challenging. To date, most approaches use direct binding targets of transcription factors (TFs) to associate TFs with expression profiles. However, the low overlap between binding targets of a TF and the affected genes of the TF knockout limits the power of those methods. Results We developed a TF-centered downstream gene set enrichment analysis approach to identify potential causal regulators responsible for expression changes. We constructed hierarchical and multi-layer regulation models to derive possible downstream gene sets of a TF using not only TF-DNA interactions, but also, for the first time, post-translational modifications (PTM) information. We verified our method in one expression dataset of large-scale TF knockout and another dataset involving both TF knockout and TF overexpression. Compared with the flat model using TF-DNA interactions alone, our method correctly identified five more actual perturbed TFs in large-scale TF knockout data and six more perturbed TFs in overexpression data. Potential regulatory pathways downstream of three perturbed regulators— SNF1, AFT1 and SUT1 —were given to demonstrate the power of multilayer regulation models integrating TF-DNA interactions and PTM information. Additionally, our method successfully identified known important TFs and inferred some novel potential TFs involved in the transition from fermentative to glycerol-based respiratory growth and in the pheromone response. Downstream regulation pathways of SUT1 and AFT1 were also supported by the mRNA and/or phosphorylation changes of their mediating TFs and/or “modulator” proteins. Conclusions The results suggest that in addition to direct transcription, indirect transcription and post-translational regulation are also responsible for the effects of TFs perturbation, especially for TFs overexpression. Many TFs inferred by our method are supported by literature. Multiple TF regulation models could lead to new hypotheses for future experiments. Our method provides a valuable framework for analyzing gene expression data to identify causal regulators in the context of TF-DNA interactions and PTM information.
Collapse
Affiliation(s)
- Qi Liu
- School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Lelandais G, Devaux F. Comparative Functional Genomics of Stress Responses in Yeasts. OMICS: A Journal of Integrative Biology 2010; 14:501-15. [DOI: 10.1089/omi.2010.0029] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Gaëlle Lelandais
- Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), INSERM UMR-S 665, Université Paris Diderot, Paris France
| | - Frédéric Devaux
- Laboratoire de génomique des microorganismes, CNRS FRE3214, Université Pierre et Marie Curie, Institut des Cordeliers, Paris, France
| |
Collapse
|
26
|
Zakrzewska A, Boorsma A, Beek AT, Hageman JA, Westerhuis JA, Hellingwerf KJ, Brul S, Klis FM, Smits GJ. Comparative analysis of transcriptome and fitness profiles reveals general and condition-specific cellular functions involved in adaptation to environmental change in Saccharomyces cerevisiae. OMICS 2010; 14:603-14. [PMID: 20695823 DOI: 10.1089/omi.2010.0049] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The transcriptional responses of yeast cells to a wide variety of stress conditions have been studied extensively. In addition, deletion mutant collections have been widely used to measure the combined effect of gene loss and stress on growth (fitness). Here we present a comparative analysis of 1,095 publicly available transcription and genome-wide fitness profiles in yeast, from different laboratories and experimental platforms. We analyzed these data, using T-profiler, to describe the correlation in behavior of a priori defined functional groups. Two-mode clustering analysis of the fitness T-profiles revealed that functional groups involved in regulating ribosome biogenesis and translation offer general stress resistance. These groups are closely related to growth rate and nutrient availability. General stress sensitivity was found in deletion mutant groups functioning in intracellular vesicular transport, actin cytoskeleton organization, and cell polarity, indicating that they play an key role in maintaining yeast adaptability. Analysis of the phenotypic and transcriptional variability of our a priori defined functional groups showed that the quantitative effect on fitness of both resistant and sensitive groups is highly condition-dependent. Finally, we discuss the implications of our results for combinatorial drug design.
Collapse
Affiliation(s)
- Anna Zakrzewska
- Molecular Biology and Microbial Food Safety, Swammerdam Institute for Life Sciences, Netherlands Institute for Systems Biology, University of Amsterdam, Amsterdam, The Netherlands
| | | | | | | | | | | | | | | | | |
Collapse
|
27
|
LIU QJ, WANG ZH, LIU WL, LI D, HE FC, ZHU YP. A Novel Method to Identify The Condition-specific Regulatory Sub-network That Controls The Yeast Cell Cycle Based on Gene Expression Model*. PROG BIOCHEM BIOPHYS 2010. [DOI: 10.3724/sp.j.1206.2009.00581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
28
|
Abstract
Gene transcription is largely regulated by DNA-binding transcription factors (TFs). However, the TF activity itself is modulated via, among other things, post-translational modifications (PTMs) by specific modification enzymes in response to cellular stimuli. TF-PTMs thus serve as "molecular switchboards" that map upstream signaling events to the downstream transcriptional events. An important long-term goal is to obtain a genome-wide map of "regulatory triplets" consisting of a TF, target gene, and a modulator gene that specifically modulates the regulation of the target gene by the TF. A variety of genome-wide data sets can be exploited by computational methods to obtain a rough map of regulatory triplets, which can guide directed experiments. However, a prerequisite to developing such computational tools is a systematic catalog of known instances of regulatory triplets. We first describe PTM-Switchboard, a recent database that stores triplets of genes such that the ability of one gene (the TF) to regulate a target gene is dependent on one or more PTMs catalyzed by a third gene, the modifying enzyme. We also review current computational approaches to infer regulatory triplets from genome-wide data sets and conclude with a discussion of potential future research. PTM-Switchboard is accessible at http://cagr.pcbi.upenn.edu/PTMswitchboard /
Collapse
Affiliation(s)
- Logan Everett
- Department of Genetics, Penn Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA, USA.
| | | | | |
Collapse
|
29
|
Ge H, Wei M, Fabrizio P, Hu J, Cheng C, Longo VD, Li LM. Comparative analyses of time-course gene expression profiles of the long-lived sch9Delta mutant. Nucleic Acids Res 2009; 38:143-58. [PMID: 19880387 PMCID: PMC2800218 DOI: 10.1093/nar/gkp849] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
In an attempt to elucidate the underlying longevity-promoting mechanisms of mutants lacking SCH9, which live three times as long as wild type chronologically, we measured their time-course gene expression profiles. We interpreted their expression time differences by statistical inferences based on prior biological knowledge, and identified the following significant changes: (i) between 12 and 24 h, stress response genes were up-regulated by larger fold changes and ribosomal RNA (rRNA) processing genes were down-regulated more dramatically; (ii) mitochondrial ribosomal protein genes were not up-regulated between 12 and 60 h as wild type were; (iii) electron transport, oxidative phosphorylation and TCA genes were down-regulated early; (iv) the up-regulation of TCA and electron transport was accompanied by deep down-regulation of rRNA processing over time; and (v) rRNA processing genes were more volatile over time, and three associated cis-regulatory elements [rRNA processing element (rRPE), polymerase A and C (PAC) and glucose response element (GRE)] were identified. Deletion of AZF1, which encodes the transcriptional factor that binds to the GRE element, reversed the lifespan extension of sch9Δ. The significant alterations in these time-dependent expression profiles imply that the lack of SCH9 turns on the longevity programme that extends the lifespan through changes in metabolic pathways and protection mechanisms, particularly, the regulation of aerobic respiration and rRNA processing.
Collapse
Affiliation(s)
- Huanying Ge
- Andrus Gerontology Center, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | | | | | | | | | | | | |
Collapse
|
30
|
Xiao Y, Segal MR. Identification of yeast transcriptional regulation networks using multivariate random forests. PLoS Comput Biol 2009; 5:e1000414. [PMID: 19543377 PMCID: PMC2691601 DOI: 10.1371/journal.pcbi.1000414] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2008] [Accepted: 05/12/2009] [Indexed: 02/02/2023] Open
Abstract
The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression measures. An analysis of the multivariate random forest output reveals complex regulatory networks, which consist of cohesive, condition-dependent regulatory cliques. Each regulatory clique features homogeneous gene expression profiles and common motifs or synergistic motif groups. We apply our method to several yeast physiological processes: cell cycle, sporulation, and various stress conditions. Our technique displays excellent performance with regard to identifying known regulatory motifs, including high order interactions. In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes. Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing. These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes. Transcriptional regulation, one of the most complex and intriguing processes in living cells, drives essential downstream cellular processes such as development, proliferation and differentiation. It gives rise to the versatility and flexibility that allows cells to determine their actions and states in response to internal needs or external stimuli by turning on, or shutting off, select sets of genes. This elaborate control of gene expression is realized by sophisticated transcriptional regulatory networks that include a diverse repertoire of transcription factors. Here, we study the relationship between gene expression and transcription factor binding in diverse yeast physiological processes. Our random forest-based method effectively models gene expression measurements simultaneously, bypassing the necessity of analyzing the multiple samples separately. Using our method, we have identified many high-order interactions between regulatory sequences that give rise to condition-specific gene expression.
Collapse
Affiliation(s)
- Yuanyuan Xiao
- Department of Epidemiology and Biostatistics, Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco, California, USA.
| | | |
Collapse
|