51
|
Experimental and Computational Considerations in the Study of RNA-Binding Protein-RNA Interactions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 907:1-28. [PMID: 27256380 DOI: 10.1007/978-3-319-29073-7_1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
After an RNA is transcribed, it undergoes a variety of processing steps that can change the encoded protein sequence (through alternative splicing and RNA editing), regulate the stability of the RNA, and control subcellular localization, timing, and rate of translation. The recent explosion in genomics techniques has enabled transcriptome-wide profiling of RNA processing in an unbiased manner. However, it has also brought with it both experimental challenges in developing improved methods to probe distinct processing steps, as well as computational challenges in data storage, processing, and analysis tools to enable large-scale interpretation in the genomics era. In this chapter we review experimental techniques and challenges in profiling various aspects of RNA processing, as well as recent efforts to develop analyses integrating multiple data sources and techniques to infer RNA regulatory networks.
Collapse
|
52
|
Roy S, Siahpirani AF, Chasman D, Knaack S, Ay F, Stewart R, Wilson M, Sridharan R. A predictive modeling approach for cell line-specific long-range regulatory interactions. Nucleic Acids Res 2015; 43:8694-712. [PMID: 26338778 PMCID: PMC4605315 DOI: 10.1093/nar/gkv865] [Citation(s) in RCA: 85] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2015] [Revised: 08/16/2015] [Accepted: 08/17/2015] [Indexed: 01/28/2023] Open
Abstract
Long range regulatory interactions among distal enhancers and target genes are important for tissue-specific gene expression. Genome-scale identification of these interactions in a cell line-specific manner, especially using the fewest possible datasets, is a significant challenge. We develop a novel computational approach, Regulatory Interaction Prediction for Promoters and Long-range Enhancers (RIPPLE), that integrates published Chromosome Conformation Capture (3C) data sets with a minimal set of regulatory genomic data sets to predict enhancer-promoter interactions in a cell line-specific manner. Our results suggest that CTCF, RAD21, a general transcription factor (TBP) and activating chromatin marks are important determinants of enhancer-promoter interactions. To predict interactions in a new cell line and to generate genome-wide interaction maps, we develop an ensemble version of RIPPLE and apply it to generate interactions in five human cell lines. Computational validation of these predictions using existing ChIA-PET and Hi-C data sets showed that RIPPLE accurately predicts interactions among enhancers and promoters. Enhancer-promoter interactions tend to be organized into subnetworks representing coordinately regulated sets of genes that are enriched for specific biological processes and cis-regulatory elements. Overall, our work provides a systematic approach to predict and interpret enhancer-promoter interactions in a genome-wide cell-type specific manner using a few experimentally tractable measurements.
Collapse
Affiliation(s)
- Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, WI, USA
| | | | - Deborah Chasman
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, WI, USA
| | - Sara Knaack
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, WI, USA
| | - Ferhat Ay
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Ron Stewart
- Morgridge Institute for Research, Madison, WI 53715, USA
| | - Michael Wilson
- Genetics & Genome Biology Program, Hospital for Sick Children (SickKids) and Department of Molecular Genetics, University of Toronto,Toronto, ON, Canada Department of Molecular Genetics, University of Toronto, ON, Canada
| | - Rupa Sridharan
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA Department of Cell and Regenerative biology, University of Wisconsin, Madison, WI 53715, USA
| |
Collapse
|
53
|
Bellot P, Olsen C, Salembier P, Oliveras-Vergés A, Meyer PE. NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference. BMC Bioinformatics 2015; 16:312. [PMID: 26415849 PMCID: PMC4587916 DOI: 10.1186/s12859-015-0728-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 09/06/2015] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND In the last decade, a great number of methods for reconstructing gene regulatory networks from expression data have been proposed. However, very few tools and datasets allow to evaluate accurately and reproducibly those methods. Hence, we propose here a new tool, able to perform a systematic, yet fully reproducible, evaluation of transcriptional network inference methods. RESULTS Our open-source and freely available Bioconductor package aggregates a large set of tools to assess the robustness of network inference algorithms against different simulators, topologies, sample sizes and noise intensities. CONCLUSIONS The benchmarking framework that uses various datasets highlights the specialization of some methods toward network types and data. As a result, it is possible to identify the techniques that have broad overall performances.
Collapse
Affiliation(s)
- Pau Bellot
- Universitat Politecnica de Catalunya BarcelonaTECH, Department of Signal Theory and Communications, UPC-Campus Nord, C/ Jordi Girona, 1-3, Barcelona, 08034, Spain.
- Bioinformatics and Systems Biology (BioSys), Faculty of Sciences, Université de Liège (ULg), 27 Blvd du Rectorat, Liège, 4000, Belgium.
| | - Catharina Olsen
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium.
- Interuniversity Institute of Bioinformatics Brussels, (IB)², Brussels, Belgium.
| | - Philippe Salembier
- Universitat Politecnica de Catalunya BarcelonaTECH, Department of Signal Theory and Communications, UPC-Campus Nord, C/ Jordi Girona, 1-3, Barcelona, 08034, Spain.
| | - Albert Oliveras-Vergés
- Universitat Politecnica de Catalunya BarcelonaTECH, Department of Signal Theory and Communications, UPC-Campus Nord, C/ Jordi Girona, 1-3, Barcelona, 08034, Spain.
| | - Patrick E Meyer
- Bioinformatics and Systems Biology (BioSys), Faculty of Sciences, Université de Liège (ULg), 27 Blvd du Rectorat, Liège, 4000, Belgium.
| |
Collapse
|
54
|
Thompson D, Regev A, Roy S. Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu Rev Cell Dev Biol 2015; 31:399-428. [PMID: 26355593 DOI: 10.1146/annurev-cellbio-100913-012908] [Citation(s) in RCA: 95] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Regulation of gene expression is central to many biological processes. Although reconstruction of regulatory circuits from genomic data alone is therefore desirable, this remains a major computational challenge. Comparative approaches that examine the conservation and divergence of circuits and their components across strains and species can help reconstruct circuits as well as provide insights into the evolution of gene regulatory processes and their adaptive contribution. In recent years, advances in genomic and computational tools have led to a wealth of methods for such analysis at the sequence, expression, pathway, module, and entire network level. Here, we review computational methods developed to study transcriptional regulatory networks using comparative genomics, from sequence to functional data. We highlight how these methods use evolutionary conservation and divergence to reliably detect regulatory components as well as estimate the extent and rate of divergence. Finally, we discuss the promise and open challenges in linking regulatory divergence to phenotypic divergence and adaptation.
Collapse
Affiliation(s)
- Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | | | | |
Collapse
|
55
|
Hodgins-Davis A, Rice DP, Townsend JP. Gene Expression Evolves under a House-of-Cards Model of Stabilizing Selection. Mol Biol Evol 2015; 32:2130-40. [PMID: 25901014 PMCID: PMC4592357 DOI: 10.1093/molbev/msv094] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Divergence in gene regulation is hypothesized to underlie much of phenotypic evolution, but the role of natural selection in shaping the molecular phenotype of gene expression continues to be debated. To resolve the mode of gene expression, evolution requires accessible theoretical predictions for the effect of selection over long timescales. Evolutionary quantitative genetic models of phenotypic evolution can provide such predictions, yet those predictions depend on the underlying hypotheses about the distributions of mutational and selective effects that are notoriously difficult to disentangle. Here, we draw on diverse genomic data sets including expression profiles of natural genetic variation and mutation accumulation lines, empirical estimates of genomic mutation rates, and inferences of genetic architecture to differentiate contrasting hypotheses for the roles of stabilizing selection and mutation in shaping natural expression variation. Our analysis suggests that gene expression evolves in a domain of phenotype space well fit by the House-of-Cards (HC) model. Although the strength of selection inferred is sensitive to the number of loci controlling gene expression, the model is not. The consistency of these results across evolutionary time from budding yeast through fruit fly implies that this model is general and that mutational effects on gene expression are relatively large. Empirical estimates of the genetic architecture of gene expression traits imply that selection provides modest constraints on gene expression levels for most genes, but that the potential for regulatory evolution is high. Our prediction using data from laboratory environments should encourage the collection of additional data sets allowing for more nuanced parameterizations of HC models for gene expression.
Collapse
Affiliation(s)
- Andrea Hodgins-Davis
- Department of Ecology and Evolutionary Biology, Yale University Department of Biostatistics, School of Public Health, Yale University
| | - Daniel P Rice
- Department of Ecology and Evolutionary Biology, Yale University Department of Organismic and Evolutionary Biology, Harvard University
| | - Jeffrey P Townsend
- Department of Ecology and Evolutionary Biology, Yale University Department of Biostatistics, School of Public Health, Yale University Program in Computational Biology and Bioinformatics, Yale University
| |
Collapse
|
56
|
TRRUST: a reference database of human transcriptional regulatory interactions. Sci Rep 2015; 5:11432. [PMID: 26066708 PMCID: PMC4464350 DOI: 10.1038/srep11432] [Citation(s) in RCA: 271] [Impact Index Per Article: 27.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Accepted: 05/22/2015] [Indexed: 11/09/2022] Open
Abstract
The reconstruction of transcriptional regulatory networks (TRNs) is a long-standing challenge in human genetics. Numerous computational methods have been developed to infer regulatory interactions between human transcriptional factors (TFs) and target genes from high-throughput data, and their performance evaluation requires gold-standard interactions. Here we present a database of literature-curated human TF-target interactions, TRRUST (transcriptional regulatory relationships unravelled by sentence-based text-mining, http://www.grnpedia.org/trrust), which currently contains 8,015 interactions between 748 TF genes and 1,975 non-TF genes. A sentence-based text-mining approach was employed for efficient manual curation of regulatory interactions from approximately 20 million Medline abstracts. To the best of our knowledge, TRRUST is the largest publicly available database of literature-curated human TF-target interactions to date. TRRUST also has several useful features: i) information about the mode-of-regulation; ii) tests for target modularity of a query TF; iii) tests for TF cooperativity of a query target; iv) inferences about cooperating TFs of a query TF; and v) prioritizing associated pathways and diseases with a query TF. We observed high enrichment of TF-target pairs in TRRUST for top-scored interactions inferred from high-throughput data, which suggests that TRRUST provides a reliable benchmark for the computational reconstruction of human TRNs.
Collapse
|
57
|
Regulation of transcription factors on sexual dimorphism of fig wasps. Sci Rep 2015; 5:10696. [PMID: 26031454 PMCID: PMC4451555 DOI: 10.1038/srep10696] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 04/27/2015] [Indexed: 11/08/2022] Open
Abstract
Fig wasps exhibit extreme intraspecific morphological divergence in the wings, compound eyes, antennae, body color, and size. Corresponding to this, behaviors and lifestyles between two sexes are also different: females can emerge from fig and fly to other fig tree to oviposit and pollinate, while males live inside fig for all their lifetime. Genetic regulation may drive these extreme intraspecific morphological and behavioral divergence. Transcription factors (TFs) involved in morphological development and physiological activity may exhibit sex-specific expressions. Herein, we detect 865 TFs by using genomic and transcriptomic data of the fig wasp Ceratosolen solmsi. Analyses of transcriptomic data indicated that up-regulated TFs in females show significant enrichment in development of the wing, eye and antenna in all stages, from larva to adult. Meanwhile, TFs related to the development of a variety of organs display sex-specific patterns of expression in the adults and these may contribute significantly to their sexual dimorphism. In addition, up-regulated TFs in adult males exhibit enrichment in genitalia development and circadian rhythm, which correspond with mating and protandry. This finding is consistent with their sex-specific behaviors. In conclusion, our results strongly indicate that TFs play important roles in the sexual dimorphism of fig wasps.
Collapse
|
58
|
Coolon JD, Stevenson KR, McManus CJ, Yang B, Graveley BR, Wittkopp PJ. Molecular Mechanisms and Evolutionary Processes Contributing to Accelerated Divergence of Gene Expression on the Drosophila X Chromosome. Mol Biol Evol 2015; 32:2605-15. [PMID: 26041937 DOI: 10.1093/molbev/msv135] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
In species with a heterogametic sex, population genetics theory predicts that DNA sequences on the X chromosome can evolve faster than comparable sequences on autosomes. Both neutral and nonneutral evolutionary processes can generate this pattern. Complex traits like gene expression are not predicted to have accelerated evolution by these theories, yet a "faster-X" pattern of gene expression divergence has recently been reported for both Drosophila and mammals. Here, we test the hypothesis that accelerated adaptive evolution of cis-regulatory sequences on the X chromosome is responsible for this pattern by comparing the relative contributions of cis- and trans-regulatory changes to patterns of faster-X expression divergence observed between strains and species of Drosophila with a range of divergence times. We find support for this hypothesis, especially among male-biased genes, when comparing different species. However, we also find evidence that trans-regulatory differences contribute to a faster-X pattern of expression divergence both within and between species. This contribution is surprising because trans-acting regulators of X-linked genes are generally assumed to be randomly distributed throughout the genome. We found, however, that X-linked transcription factors appear to preferentially regulate expression of X-linked genes, providing a potential mechanistic explanation for this result. The contribution of trans-regulatory variation to faster-X expression divergence was larger within than between species, suggesting that it is more likely to result from neutral processes than positive selection. These data show how accelerated evolution of both coding and noncoding sequences on the X chromosome can lead to accelerated expression divergence on the X chromosome relative to autosomes.
Collapse
Affiliation(s)
- Joseph D Coolon
- Department of Ecology and Evolutionary Biology, University of Michigan
| | - Kraig R Stevenson
- Department of Computational Medicine and Bioinformatics, University of Michigan
| | - C Joel McManus
- Department of Biological Sciences, Carnegie Mellon University Department of Genetics and Developmental Biology, Institute for Systems Genomics, University of Connecticut Health Center
| | - Bing Yang
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan
| | - Brenton R Graveley
- Department of Genetics and Developmental Biology, Institute for Systems Genomics, University of Connecticut Health Center
| | - Patricia J Wittkopp
- Department of Ecology and Evolutionary Biology, University of Michigan Department of Computational Medicine and Bioinformatics, University of Michigan Department of Molecular, Cellular, and Developmental Biology, University of Michigan
| |
Collapse
|
59
|
Blais A. Myogenesis in the Genomics Era. J Mol Biol 2015; 427:2023-38. [DOI: 10.1016/j.jmb.2015.02.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Revised: 02/04/2015] [Accepted: 02/05/2015] [Indexed: 01/06/2023]
|
60
|
Nicolle R, Radvanyi F, Elati M. CoRegNet: reconstruction and integrated analysis of co-regulatory networks. Bioinformatics 2015; 31:3066-8. [PMID: 25979476 PMCID: PMC4565029 DOI: 10.1093/bioinformatics/btv305] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 05/08/2015] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED CoRegNet is an R/Bioconductor package to analyze large-scale transcriptomic data by highlighting sets of co-regulators. Based on a transcriptomic dataset, CoRegNet can be used to: reconstruct a large-scale co-regulatory network, integrate regulation evidences such as transcription factor binding sites and ChIP data, estimate sample-specific regulator activity, identify cooperative transcription factors and analyze the sample-specific combinations of active regulators through an interactive visualization tool. In this study CoRegNet was used to identify driver regulators of bladder cancer. AVAILABILITY CoRegNet is available at http://bioconductor.org/packages/CoRegNet CONTACT remy.nicolle@issb.genopole.fr or mohamed.elati@issb.genopole.fr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rémy Nicolle
- iSSB, CNRS, University of Evry, Genopole, 91030 Evry Cedex, France, Institut Curie, PSL Research University, 75248 Cedex 05, France and CNRS UMR144, 75248 Cedex 05, France iSSB, CNRS, University of Evry, Genopole, 91030 Evry Cedex, France, Institut Curie, PSL Research University, 75248 Cedex 05, France and CNRS UMR144, 75248 Cedex 05, France iSSB, CNRS, University of Evry, Genopole, 91030 Evry Cedex, France, Institut Curie, PSL Research University, 75248 Cedex 05, France and CNRS UMR144, 75248 Cedex 05, France
| | - François Radvanyi
- iSSB, CNRS, University of Evry, Genopole, 91030 Evry Cedex, France, Institut Curie, PSL Research University, 75248 Cedex 05, France and CNRS UMR144, 75248 Cedex 05, France iSSB, CNRS, University of Evry, Genopole, 91030 Evry Cedex, France, Institut Curie, PSL Research University, 75248 Cedex 05, France and CNRS UMR144, 75248 Cedex 05, France
| | - Mohamed Elati
- iSSB, CNRS, University of Evry, Genopole, 91030 Evry Cedex, France, Institut Curie, PSL Research University, 75248 Cedex 05, France and CNRS UMR144, 75248 Cedex 05, France
| |
Collapse
|
61
|
Blatti C, Kazemian M, Wolfe S, Brodsky M, Sinha S. Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism. Nucleic Acids Res 2015; 43:3998-4012. [PMID: 25791631 PMCID: PMC4417154 DOI: 10.1093/nar/gkv195] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 02/24/2015] [Indexed: 11/17/2022] Open
Abstract
Characterization of cell type specific regulatory networks and elements is a major challenge in genomics, and emerging strategies frequently employ high-throughput genome-wide assays of transcription factor (TF) to DNA binding, histone modifications or chromatin state. However, these experiments remain too difficult/expensive for many laboratories to apply comprehensively to their system of interest. Here, we explore the potential of elucidating regulatory systems in varied cell types using computational techniques that rely on only data of gene expression, low-resolution chromatin accessibility, and TF–DNA binding specificities (‘motifs’). We show that static computational motif scans overlaid with chromatin accessibility data reasonably approximate experimentally measured TF–DNA binding. We demonstrate that predicted binding profiles and expression patterns of hundreds of TFs are sufficient to identify major regulators of ∼200 spatiotemporal expression domains in the Drosophila embryo. We are then able to learn reliable statistical models of enhancer activity for over 70 expression domains and apply those models to annotate domain specific enhancers genome-wide. Throughout this work, we apply our motif and accessibility based approach to comprehensively characterize the regulatory network of fruitfly embryonic development and show that the accuracy of our computational method compares favorably to approaches that rely on data from many experimental assays.
Collapse
Affiliation(s)
- Charles Blatti
- Department of Computer Science, University of Illinois, Urbana, IL 61801, USA
| | - Majid Kazemian
- National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Scot Wolfe
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01655, USA Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Michael Brodsky
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01655, USA Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois, Urbana, IL 61801, USA Institute of Genomic Biology, University of Illinois, Urbana, IL 61801, USA
| |
Collapse
|
62
|
Llopart A. Parallel faster-X evolution of gene expression and protein sequences in Drosophila: beyond differences in expression properties and protein interactions. PLoS One 2015; 10:e0116829. [PMID: 25789611 PMCID: PMC4366066 DOI: 10.1371/journal.pone.0116829] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 12/15/2014] [Indexed: 12/27/2022] Open
Abstract
Population genetics models predict that the X (or Z) chromosome will evolve at faster rates than the autosomes in XY (or ZW) systems. Studies of molecular evolution using large datasets in multiple species have provided evidence supporting this faster-X effect in protein-coding sequences and, more recently, in transcriptomes. However, X-linked and autosomal genes differ significantly in important properties besides hemizygosity in males, including gene expression levels, tissue specificity in gene expression, and the number of interactions in which they are involved (i.e., protein-protein or DNA-protein interactions). Most important, these properties are known to correlate with rates of evolution, which raises the question of whether differences between the X chromosome and autosomes in gene properties, rather than hemizygosity, are sufficient to explain faster-X evolution. Here I investigate this possibility using whole genome sequences and transcriptomes of Drosophila yakuba and D. santomea and show that this is not the case. Additional factors are needed to account for faster-X evolution of both gene expression and protein-coding sequences beyond differences in gene properties, likely a higher incidence of positive selection in combination with the accumulation of weakly deleterious mutations.
Collapse
Affiliation(s)
- Ana Llopart
- Department of Biology, The University of Iowa, Iowa City, Iowa, United States of America
- Interdisciplinary Graduate Program in Genetics, The University of Iowa, Iowa City, Iowa, United States of America
- * E-mail:
| |
Collapse
|
63
|
Dong X, Jiang Z, Peng YL, Zhang Z. Revealing shared and distinct gene network organization in Arabidopsis immune responses by integrative analysis. PLANT PHYSIOLOGY 2015; 167:1186-203. [PMID: 25614062 PMCID: PMC4348776 DOI: 10.1104/pp.114.254292] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Pattern-triggered immunity (PTI) and effector-triggered immunity (ETI) are two main plant immune responses to counter pathogen invasion. Genome-wide gene network organizing principles leading to quantitative differences between PTI and ETI have remained elusive. We combined an advanced machine learning method and modular network analysis to systematically characterize the organizing principles of Arabidopsis (Arabidopsis thaliana) PTI and ETI at three network resolutions. At the single network node/edge level, we ranked genes and gene interactions based on their ability to distinguish immune response from normal growth and successfully identified many immune-related genes associated with PTI and ETI. Topological analysis revealed that the top-ranked gene interactions tend to link network modules. At the subnetwork level, we identified a subnetwork shared by PTI and ETI encompassing 1,159 genes and 1,289 interactions. This subnetwork is enriched in interactions linking network modules and is also a hotspot of attack by pathogen effectors. The subnetwork likely represents a core component in the coordination of multiple biological processes to favor defense over development. Finally, we constructed modular network models for PTI and ETI to explain the quantitative differences in the global network architecture. Our results indicate that the defense modules in ETI are organized into relatively independent structures, explaining the robustness of ETI to genetic mutations and effector attacks. Taken together, the multiscale comparisons of PTI and ETI provide a systems biology perspective on plant immunity and emphasize coordination among network modules to establish a robust immune response.
Collapse
Affiliation(s)
- Xiaobao Dong
- State Key Laboratory of Agrobiotechnology (X.D., Z.J., Y.-L.P., Z.Z.), College of Biological Sciences (X.D., Z.J., Z.Z.), and Ministry of Agriculture Key Laboratory for Plant Pathology (Y.-L.P.), China Agricultural University, Beijing 100193, China
| | - Zhenhong Jiang
- State Key Laboratory of Agrobiotechnology (X.D., Z.J., Y.-L.P., Z.Z.), College of Biological Sciences (X.D., Z.J., Z.Z.), and Ministry of Agriculture Key Laboratory for Plant Pathology (Y.-L.P.), China Agricultural University, Beijing 100193, China
| | - You-Liang Peng
- State Key Laboratory of Agrobiotechnology (X.D., Z.J., Y.-L.P., Z.Z.), College of Biological Sciences (X.D., Z.J., Z.Z.), and Ministry of Agriculture Key Laboratory for Plant Pathology (Y.-L.P.), China Agricultural University, Beijing 100193, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology (X.D., Z.J., Y.-L.P., Z.Z.), College of Biological Sciences (X.D., Z.J., Z.Z.), and Ministry of Agriculture Key Laboratory for Plant Pathology (Y.-L.P.), China Agricultural University, Beijing 100193, China
| |
Collapse
|
64
|
Ma C, Zhang HH, Wang X. Machine learning for Big Data analytics in plants. TRENDS IN PLANT SCIENCE 2014; 19:798-808. [PMID: 25223304 DOI: 10.1016/j.tplants.2014.08.004] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Revised: 07/30/2014] [Accepted: 08/20/2014] [Indexed: 05/19/2023]
Abstract
Rapid advances in high-throughput genomic technology have enabled biology to enter the era of 'Big Data' (large datasets). The plant science community not only needs to build its own Big-Data-compatible parallel computing and data management infrastructures, but also to seek novel analytical paradigms to extract information from the overwhelming amounts of data. Machine learning offers promising computational and analytical solutions for the integrative analysis of large, heterogeneous and unstructured datasets on the Big-Data scale, and is gradually gaining popularity in biology. This review introduces the basic concepts and procedures of machine-learning applications and envisages how machine learning could interface with Big Data technology to facilitate basic research and biotechnology in the plant sciences.
Collapse
Affiliation(s)
- Chuang Ma
- School of Plant Sciences, University of Arizona, 1140 E. South Campus Drive, Tucson, AZ 85721, USA
| | - Hao Helen Zhang
- Department of Mathematics, University of Arizona, 617 North Santa Rita Ave, Tucson, AZ 85721, USA
| | - Xiangfeng Wang
- School of Plant Sciences, University of Arizona, 1140 E. South Campus Drive, Tucson, AZ 85721, USA; Department of Plant Genetics and Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, China.
| |
Collapse
|
65
|
Heyndrickx KS, Van de Velde J, Wang C, Weigel D, Vandepoele K. A functional and evolutionary perspective on transcription factor binding in Arabidopsis thaliana. THE PLANT CELL 2014; 26:3894-910. [PMID: 25361952 PMCID: PMC4247581 DOI: 10.1105/tpc.114.130591] [Citation(s) in RCA: 84] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2014] [Revised: 10/07/2014] [Accepted: 10/12/2014] [Indexed: 05/19/2023]
Abstract
Understanding the mechanisms underlying gene regulation is paramount to comprehend the translation from genotype to phenotype. The two are connected by gene expression, and it is generally thought that variation in transcription factor (TF) function is an important determinant of phenotypic evolution. We analyzed publicly available genome-wide chromatin immunoprecipitation experiments for 27 TFs in Arabidopsis thaliana and constructed an experimental network containing 46,619 regulatory interactions and 15,188 target genes. We identified hub targets and highly occupied target (HOT) regions, which are enriched for genes involved in development, stimulus responses, signaling, and gene regulatory processes in the currently profiled network. We provide several lines of evidence that TF binding at plant HOT regions is functional, in contrast to that in animals, and not merely the result of accessible chromatin. HOT regions harbor specific DNA motifs, are enriched for differentially expressed genes, and are often conserved across crucifers and dicots, even though they are not under higher levels of purifying selection than non-HOT regions. Distal bound regions are under purifying selection as well and are enriched for a chromatin state showing regulation by the Polycomb repressive complex. Gene expression complexity is positively correlated with the total number of bound TFs, revealing insights in the regulatory code for genes with different expression breadths. The integration of noncanonical and canonical DNA motif information yields new hypotheses on cobinding and tethering between specific TFs involved in flowering and light regulation.
Collapse
Affiliation(s)
- Ken S Heyndrickx
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Jan Van de Velde
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Congmao Wang
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Klaas Vandepoele
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| |
Collapse
|
66
|
Rhee DY, Cho DY, Zhai B, Slattery M, Ma L, Mintseris J, Wong CY, White KP, Celniker SE, Przytycka TM, Gygi SP, Obar RA, Artavanis-Tsakonas S. Transcription factor networks in Drosophila melanogaster. Cell Rep 2014; 8:2031-2043. [PMID: 25242320 DOI: 10.1016/j.celrep.2014.08.038] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 06/09/2014] [Accepted: 08/16/2014] [Indexed: 11/15/2022] Open
Abstract
Specific cellular fates and functions depend on differential gene expression, which occurs primarily at the transcriptional level and is controlled by complex regulatory networks of transcription factors (TFs). TFs act through combinatorial interactions with other TFs, cofactors, and chromatin-remodeling proteins. Here, we define protein-protein interactions using a coaffinity purification/mass spectrometry method and study 459 Drosophila melanogaster transcription-related factors, representing approximately half of the established catalog of TFs. We probe this network in vivo, demonstrating functional interactions for many interacting proteins, and test the predictive value of our data set. Building on these analyses, we combine regulatory network inference models with physical interactions to define an integrated network that connects combinatorial TF protein interactions to the transcriptional regulatory network of the cell. We use this integrated network as a tool to connect the functional network of genetic modifiers related to mastermind, a transcriptional cofactor of the Notch pathway.
Collapse
Affiliation(s)
- David Y Rhee
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Dong-Yeon Cho
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Bo Zhai
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Matthew Slattery
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Lijia Ma
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Julian Mintseris
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Christina Y Wong
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Kevin P White
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Susan E Celniker
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Teresa M Przytycka
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Steven P Gygi
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Robert A Obar
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Spyros Artavanis-Tsakonas
- Department of Cell Biology, Harvard Medical School, Boston, MA 02115, USA; Biogen Idec, Inc., Cambridge, MA 02142, USA.
| |
Collapse
|
67
|
López Y, Vandenbon A, Nakai K. A set of structural features defines the cis-regulatory modules of antenna-expressed genes in Drosophila melanogaster. PLoS One 2014; 9:e104342. [PMID: 25153327 PMCID: PMC4143197 DOI: 10.1371/journal.pone.0104342] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2014] [Accepted: 07/13/2014] [Indexed: 11/18/2022] Open
Abstract
Unraveling the biological information within the regulatory region (RR) of genes has become one of the major focuses of current genomic research. It has been hypothesized that RRs of co-expressed genes share similar architecture, but to the best of our knowledge, no studies have simultaneously examined multiple structural features, such as positioning of cis-regulatory elements relative to transcription start sites and to each other, and the order and orientation of regulatory motifs, to accurately describe overall cis-regulatory structure. In our work we present an improved computational method that builds a feature collection based on all of these structural features. We demonstrate the utility of this approach by modeling the cis-regulatory modules of antenna-expressed genes in Drosophila melanogaster. Six potential antenna-related motifs were predicted initially, including three that appeared to be novel. A feature set was created with the predicted motifs, where a correlation-based filter was used to remove irrelevant features, and a genetic algorithm was designed to optimize the feature set. Finally, a set of eight highly informative structural features was obtained for the RRs of antenna-expressed genes, achieving an area under the curve of 0.841. We used these features to score all D. melanogaster RRs for potentially unknown antenna-expressed genes sharing a similar regulatory structure. Validation of our predictions with an independent RNA sequencing dataset showed that 76.7% of genes with high scoring RRs were expressed in antenna. In addition, we found that the structural features we identified are highly conserved in RRs of orthologs in other Drosophila sibling species. This approach to identify tissue-specific regulatory structures showed comparable performance to previous approaches, but also uncovered additional interesting features because it also considered the order and orientation of motifs.
Collapse
Affiliation(s)
- Yosvany López
- Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Alexis Vandenbon
- Immunology Frontier Research Center, Osaka University, Osaka, Japan
| | - Kenta Nakai
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
68
|
Van de Velde J, Heyndrickx KS, Vandepoele K. Inference of transcriptional networks in Arabidopsis through conserved noncoding sequence analysis. THE PLANT CELL 2014; 26:2729-45. [PMID: 24989046 PMCID: PMC4145110 DOI: 10.1105/tpc.114.127001] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Transcriptional regulation plays an important role in establishing gene expression profiles during development or in response to (a)biotic stimuli. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity, and the identification of individual TFBS in genome sequences is a major goal to inferring regulatory networks. We have developed a phylogenetic footprinting approach for the identification of conserved noncoding sequences (CNSs) across 12 dicot plants. Whereas both alignment and non-alignment-based techniques were applied to identify functional motifs in a multispecies context, our method accounts for incomplete motif conservation as well as high sequence divergence between related species. We identified 69,361 footprints associated with 17,895 genes. Through the integration of known TFBS obtained from the literature and experimental studies, we used the CNSs to compile a gene regulatory network in Arabidopsis thaliana containing 40,758 interactions, of which two-thirds act through binding events located in DNase I hypersensitive sites. This network shows significant enrichment toward in vivo targets of known regulators, and its overall quality was confirmed using five different biological validation metrics. Finally, through the integration of detailed expression and function information, we demonstrate how static CNSs can be converted into condition-dependent regulatory networks, offering opportunities for regulatory gene annotation.
Collapse
Affiliation(s)
- Jan Van de Velde
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium
| | - Ken S Heyndrickx
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Systems Biology, VIB, B-9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium
| |
Collapse
|
69
|
Murali T, Pacifico S, Finley RL. Integrating the interactome and the transcriptome of Drosophila. BMC Bioinformatics 2014; 15:177. [PMID: 24913703 PMCID: PMC4229734 DOI: 10.1186/1471-2105-15-177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 05/28/2014] [Indexed: 12/29/2022] Open
Abstract
Background Networks of interacting genes and gene products mediate most cellular and developmental processes. High throughput screening methods combined with literature curation are identifying many of the protein-protein interactions (PPI) and protein-DNA interactions (PDI) that constitute these networks. Most of the detection methods, however, fail to identify the in vivo spatial or temporal context of the interactions. Thus, the interaction data are a composite of the individual networks that may operate in specific tissues or developmental stages. Genome-wide expression data may be useful for filtering interaction data to identify the subnetworks that operate in specific spatial or temporal contexts. Here we take advantage of the extensive interaction and expression data available for Drosophila to analyze how interaction networks may be unique to specific tissues and developmental stages. Results We ranked genes on a scale from ubiquitously expressed to tissue or stage specific and examined their interaction patterns. Interestingly, ubiquitously expressed genes have many more interactions among themselves than do non-ubiquitously expressed genes both in PPI and PDI networks. While the PDI network is enriched for interactions between tissue-specific transcription factors and their tissue-specific targets, a preponderance of the PDI interactions are between ubiquitous and non-ubiquitously expressed genes and proteins. In contrast to PDI, PPI networks are depleted for interactions among tissue- or stage- specific proteins, which instead interact primarily with widely expressed proteins. In light of these findings, we present an approach to filter interaction data based on gene expression levels normalized across tissues or developmental stages. We show that this filter (the percent maximum or pmax filter) can be used to identify subnetworks that function within individual tissues or developmental stages. Conclusions These observations suggest that protein networks are frequently organized into hubs of widely expressed proteins to which are attached various tissue- or stage-specific proteins. This is consistent with earlier analyses of human PPI data and suggests a similar organization of interaction networks across species. This organization implies that tissue or stage specific networks can be best identified from interactome data by using filters designed to include both ubiquitously expressed and specifically expressed genes and proteins.
Collapse
Affiliation(s)
| | | | - Russell L Finley
- Center for Molecular Medicine and Genetics, Wayne State University School of Medicine, Detroit, Michigan 48201, USA.
| |
Collapse
|
70
|
Calero-Nieto FJ, Ng FS, Wilson NK, Hannah R, Moignard V, Leal-Cervantes AI, Jimenez-Madrid I, Diamanti E, Wernisch L, Göttgens B. Key regulators control distinct transcriptional programmes in blood progenitor and mast cells. EMBO J 2014; 33:1212-26. [PMID: 24760698 PMCID: PMC4168288 DOI: 10.1002/embj.201386825] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2013] [Revised: 02/27/2014] [Accepted: 03/20/2014] [Indexed: 12/21/2022] Open
Abstract
Despite major advances in the generation of genome-wide binding maps, the mechanisms by which transcription factors (TFs) regulate cell type identity have remained largely obscure. Through comparative analysis of 10 key haematopoietic TFs in both mast cells and blood progenitors, we demonstrate that the largely cell type-specific binding profiles are not opportunistic, but instead contribute to cell type-specific transcriptional control, because (i) mathematical modelling of differential binding of shared TFs can explain differential gene expression, (ii) consensus binding sites are important for cell type-specific binding and (iii) knock-down of blood stem cell regulators in mast cells reveals mast cell-specific genes as direct targets. Finally, we show that the known mast cell regulators Mitf and c-fos likely contribute to the global reorganisation of TF binding profiles. Taken together therefore, our study elucidates how key regulatory TFs contribute to transcriptional programmes in several distinct mammalian cell types.
Collapse
Affiliation(s)
- Fernando J Calero-Nieto
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, Cambridge University, Cambridge, UK
| | - Felicia S Ng
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, Cambridge University, Cambridge, UK
| | - Nicola K Wilson
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, Cambridge University, Cambridge, UK
| | - Rebecca Hannah
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, Cambridge University, Cambridge, UK
| | - Victoria Moignard
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, Cambridge University, Cambridge, UK
| | - Ana I Leal-Cervantes
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, Cambridge University, Cambridge, UK
| | - Isabel Jimenez-Madrid
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, Cambridge University, Cambridge, UK
| | - Evangelia Diamanti
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, Cambridge University, Cambridge, UK
| | - Lorenz Wernisch
- MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK
| | - Berthold Göttgens
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute, Cambridge Institute for Medical Research, Cambridge University, Cambridge, UK
| |
Collapse
|
71
|
Inferring gene regulatory networks by integrating ChIP-seq/chip and transcriptome data via LASSO-type regularization methods. Methods 2014; 67:294-303. [DOI: 10.1016/j.ymeth.2014.03.006] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2013] [Revised: 03/04/2014] [Accepted: 03/05/2014] [Indexed: 01/14/2023] Open
|
72
|
Gupta A, Christensen RG, Bell HA, Goodwin M, Patel RY, Pandey M, Enuameh MS, Rayla AL, Zhu C, Thibodeau-Beganny S, Brodsky MH, Joung JK, Wolfe SA, Stormo GD. An improved predictive recognition model for Cys(2)-His(2) zinc finger proteins. Nucleic Acids Res 2014; 42:4800-12. [PMID: 24523353 PMCID: PMC4005693 DOI: 10.1093/nar/gku132] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Revised: 01/21/2014] [Accepted: 01/22/2014] [Indexed: 11/17/2022] Open
Abstract
Cys(2)-His(2) zinc finger proteins (ZFPs) are the largest family of transcription factors in higher metazoans. They also represent the most diverse family with regards to the composition of their recognition sequences. Although there are a number of ZFPs with characterized DNA-binding preferences, the specificity of the vast majority of ZFPs is unknown and cannot be directly inferred by homology due to the diversity of recognition residues present within individual fingers. Given the large number of unique zinc fingers and assemblies present across eukaryotes, a comprehensive predictive recognition model that could accurately estimate the DNA-binding specificity of any ZFP based on its amino acid sequence would have great utility. Toward this goal, we have used the DNA-binding specificities of 678 two-finger modules from both natural and artificial sources to construct a random forest-based predictive model for ZFP recognition. We find that our recognition model outperforms previously described determinant-based recognition models for ZFPs, and can successfully estimate the specificity of naturally occurring ZFPs with previously defined specificities.
Collapse
Affiliation(s)
- Ankit Gupta
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Ryan G. Christensen
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Heather A. Bell
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Mathew Goodwin
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Ronak Y. Patel
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Manishi Pandey
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Metewo Selase Enuameh
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Amy L. Rayla
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Cong Zhu
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Stacey Thibodeau-Beganny
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Michael H. Brodsky
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - J. Keith Joung
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Scot A. Wolfe
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| | - Gary D. Stormo
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA, Department of Genetics, Washington University School of Medicine, St Louis, MO 63108, USA, Department of Biochemistry and Biology and Biotechnology, Worcester Polytechnic Institute, Worcester, MA 01609, USA, Molecular Pathology Unit, Center for Computational and Integrative Biology, and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA, Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA and Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
73
|
Levinson M, Zhou Q. A penalized Bayesian approach to predicting sparse protein-DNA binding landscapes. ACTA ACUST UNITED AC 2014; 30:636-43. [PMID: 24115169 DOI: 10.1093/bioinformatics/btt585] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Cellular processes are controlled, directly or indirectly, by the binding of hundreds of different DNA binding factors (DBFs) to the genome. One key to deeper understanding of the cell is discovering where, when and how strongly these DBFs bind to the DNA sequence. Direct measurement of DBF binding sites (BSs; e.g. through ChIP-Chip or ChIP-Seq experiments) is expensive, noisy and not available for every DBF in every cell type. Naive and most existing computational approaches to detecting which DBFs bind in a set of genomic regions of interest often perform poorly, due to the high false discovery rates and restrictive requirements for prior knowledge. RESULTS We develop SparScape, a penalized Bayesian method for identifying DBFs active in the considered regions and predicting a joint probabilistic binding landscape. Using a sparsity-inducing penalization, SparScape is able to select a small subset of DBFs with enriched BSs in a set of DNA sequences from a much larger candidate set. This substantially reduces the false positives in prediction of BSs. Analysis of ChIP-Seq data in mouse embryonic stem cells and simulated data show that SparScape dramatically outperforms the naive motif scanning method and the comparable computational approaches in terms of DBF identification and BS prediction. AVAILABILITY AND IMPLEMENTATION SparScape is implemented in C++ with OpenMP (optional at compilation) and is freely available at 'www.stat.ucla.edu/∼zhou/Software.html' for academic use.
Collapse
Affiliation(s)
- Matthew Levinson
- Department of Statistics, University of California, Los Angeles, CA 90095, USA
| | | |
Collapse
|
74
|
Li X, Zhao Y, Tian B, Jamaluddin M, Mitra A, Yang J, Rowicka M, Brasier AR, Kudlicki A. Modulation of gene expression regulated by the transcription factor NF-κB/RelA. J Biol Chem 2014; 289:11927-11944. [PMID: 24523406 DOI: 10.1074/jbc.m113.539965] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Modulators (Ms) are proteins that modify the activity of transcription factors (TFs) and influence expression of their target genes (TGs). To discover modulators of NF-κB/RelA, we first identified 365 NF-κB/RelA-binding proteins using liquid chromatography-tandem mass spectrometry (LC-MS/MS). We used a probabilistic model to infer 8349 (M, NF-κB/RelA, TG) triplets and their modes of modulatory action from our combined LC-MS/MS and ChIP-Seq (ChIP followed by next generation sequencing) data, published RelA modulators and TGs, and a compendium of gene expression profiles. Hierarchical clustering of the derived modulatory network revealed functional subnetworks and suggested new pathways modulating RelA transcriptional activity. The modulators with the highest number of TGs and most non-random distribution of action modes (measured by Shannon entropy) are consistent with published reports. Our results provide a repertoire of testable hypotheses for experimental validation. One of the NF-κB/RelA modulators we identified is STAT1. The inferred (STAT1, NF-κB/RelA, TG) triplets were validated by LC-selected reaction monitoring-MS and the results of STAT1 deletion in human fibrosarcoma cells. Overall, we have identified 562 NF-κB/RelA modulators, which are potential drug targets, and clarified mechanisms of achieving NF-κB/RelA multiple functions through modulators. Our approach can be readily applied to other TFs.
Collapse
Affiliation(s)
- Xueling Li
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas 77555; Hefei Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, China
| | - Yingxin Zhao
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Center for Clinical Proteomics, University of Texas Medical Branch, Galveston, Texas 77555
| | - Bing Tian
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Departments of Internal Medicine, University of Texas Medical Branch, Galveston, Texas 77555
| | - Mohammad Jamaluddin
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555
| | - Abhishek Mitra
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555
| | - Jun Yang
- Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Departments of Internal Medicine, University of Texas Medical Branch, Galveston, Texas 77555
| | - Maga Rowicka
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas 77555
| | - Allan R Brasier
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Center for Clinical Proteomics, University of Texas Medical Branch, Galveston, Texas 77555; Departments of Internal Medicine, University of Texas Medical Branch, Galveston, Texas 77555
| | - Andrzej Kudlicki
- Institute for Translational Sciences, University of Texas Medical Branch, Galveston, Texas 77555; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, Texas 77555; Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas 77555.
| |
Collapse
|
75
|
Jiang P, Singh M. CCAT: Combinatorial Code Analysis Tool for transcriptional regulation. Nucleic Acids Res 2013; 42:2833-47. [PMID: 24366875 PMCID: PMC3950699 DOI: 10.1093/nar/gkt1302] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Combinatorial interplay among transcription factors (TFs) is an important mechanism by which transcriptional regulatory specificity is achieved. However, despite the increasing number of TFs for which either binding specificities or genome-wide occupancy data are known, knowledge about cooperativity between TFs remains limited. To address this, we developed a computational framework for predicting genome-wide co-binding between TFs (CCAT, Combinatorial Code Analysis Tool), and applied it to Drosophila melanogaster to uncover cooperativity among TFs during embryo development. Using publicly available TF binding specificity data and DNaseI chromatin accessibility data, we first predicted genome-wide binding sites for 324 TFs across five stages of D. melanogaster embryo development. We then applied CCAT in each of these developmental stages, and identified from 19 to 58 pairs of TFs in each stage whose predicted binding sites are significantly co-localized. We found that nearby binding sites for pairs of TFs predicted to cooperate were enriched in regions bound in relevant ChIP experiments, and were more evolutionarily conserved than other pairs. Further, we found that TFs tend to be co-localized with other TFs in a dynamic manner across developmental stages. All generated data as well as source code for our front-to-end pipeline are available at http://cat.princeton.edu.
Collapse
Affiliation(s)
- Peng Jiang
- Department of Computer Science, Princeton University, Princeton, 08540 NJ, USA and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, 08544 NJ, USA
| | | |
Collapse
|
76
|
Chandrasekaran S, Price ND. Metabolic constraint-based refinement of transcriptional regulatory networks. PLoS Comput Biol 2013; 9:e1003370. [PMID: 24348226 PMCID: PMC3857774 DOI: 10.1371/journal.pcbi.1003370] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2013] [Accepted: 10/16/2013] [Indexed: 01/01/2023] Open
Abstract
There is a strong need for computational frameworks that integrate different biological processes and data-types to unravel cellular regulation. Current efforts to reconstruct transcriptional regulatory networks (TRNs) focus primarily on proximal data such as gene co-expression and transcription factor (TF) binding. While such approaches enable rapid reconstruction of TRNs, the overwhelming combinatorics of possible networks limits identification of mechanistic regulatory interactions. Utilizing growth phenotypes and systems-level constraints to inform regulatory network reconstruction is an unmet challenge. We present our approach Gene Expression and Metabolism Integrated for Network Inference (GEMINI) that links a compendium of candidate regulatory interactions with the metabolic network to predict their systems-level effect on growth phenotypes. We then compare predictions with experimental phenotype data to select phenotype-consistent regulatory interactions. GEMINI makes use of the observation that only a small fraction of regulatory network states are compatible with a viable metabolic network, and outputs a regulatory network that is simultaneously consistent with the input genome-scale metabolic network model, gene expression data, and TF knockout phenotypes. GEMINI preferentially recalls gold-standard interactions (p-value = 10−172), significantly better than using gene expression alone. We applied GEMINI to create an integrated metabolic-regulatory network model for Saccharomyces cerevisiae involving 25,000 regulatory interactions controlling 1597 metabolic reactions. The model quantitatively predicts TF knockout phenotypes in new conditions (p-value = 10−14) and revealed potential condition-specific regulatory mechanisms. Our results suggest that a metabolic constraint-based approach can be successfully used to help reconstruct TRNs from high-throughput data, and highlights the potential of using a biochemically-detailed mechanistic framework to integrate and reconcile inconsistencies across different data-types. The algorithm and associated data are available at https://sourceforge.net/projects/gemini-data/ Cellular networks, such as metabolic and transcriptional regulatory networks (TRNs), do not operate independently but work together in unison to determine cellular phenotypes. Further, the phenotype and architecture of one network constrains the topology of other networks. Hence, it is critical to study network components and interactions in the context of the entire cell. Typically, efforts to reconstruct TRNs focus only on immediately proximal data such as gene co-expression and transcription factor (TF)-binding. Herein, we take a different strategy by linking candidate TRNs with the metabolic network to predict systems-level responses such as growth phenotypes of TF knockout strains, and compare predictions with experimental phenotype data to select amongst the candidate TRNs. Our approach goes beyond traditional data integration approaches for network inference and refinement by using a predictive network model (metabolism) to refine another network model (regulation) – thus providing an alternative avenue to this area of research. Understanding how the networks function together in a cell will pave the way for synthetic biology and has a wide-range of applications in biotechnology, drug discovery and diagnostics. Further we demonstrate how metabolic models can integrate and reconcile inconsistencies across different data-types.
Collapse
Affiliation(s)
- Sriram Chandrasekaran
- Institute for Systems Biology, Seattle, Washington, United States of America
- Center for Biophysics and Computational Biology, University of Illinois, Urbana-Champaign, Illinois, United States of America
| | - Nathan D. Price
- Institute for Systems Biology, Seattle, Washington, United States of America
- Center for Biophysics and Computational Biology, University of Illinois, Urbana-Champaign, Illinois, United States of America
- * E-mail:
| |
Collapse
|
77
|
Zhong S, He X, Bar-Joseph Z. Predicting tissue specific transcription factor binding sites. BMC Genomics 2013; 14:796. [PMID: 24238150 PMCID: PMC3898213 DOI: 10.1186/1471-2164-14-796] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 11/06/2013] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Studies of gene regulation often utilize genome-wide predictions of transcription factor (TF) binding sites. Most existing prediction methods are based on sequence information alone, ignoring biological contexts such as developmental stages and tissue types. Experimental methods to study in vivo binding, including ChIP-chip and ChIP-seq, can only study one transcription factor in a single cell type and under a specific condition in each experiment, and therefore cannot scale to determine the full set of regulatory interactions in mammalian transcriptional regulatory networks. RESULTS We developed a new computational approach, PIPES, for predicting tissue-specific TF binding. PIPES integrates in vitro protein binding microarrays (PBMs), sequence conservation and tissue-specific epigenetic (DNase I hypersensitivity) information. We demonstrate that PIPES improves over existing methods on distinguishing between in vivo bound and unbound sequences using ChIP-seq data for 11 mouse TFs. In addition, our predictions are in good agreement with current knowledge of tissue-specific TF regulation. CONCLUSIONS We provide a systematic map of computationally predicted tissue-specific binding targets for 284 mouse TFs across 55 tissue/cell types. Such comprehensive resource is useful for researchers studying gene regulation.
Collapse
Affiliation(s)
| | | | - Ziv Bar-Joseph
- Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| |
Collapse
|
78
|
Roy S, Lagree S, Hou Z, Thomson JA, Stewart R, Gasch AP. Integrated module and gene-specific regulatory inference implicates upstream signaling networks. PLoS Comput Biol 2013; 9:e1003252. [PMID: 24146602 PMCID: PMC3798279 DOI: 10.1371/journal.pcbi.1003252] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2013] [Accepted: 08/17/2013] [Indexed: 11/19/2022] Open
Abstract
Regulatory networks that control gene expression are important in diverse biological contexts including stress response and development. Each gene's regulatory program is determined by module-level regulation (e.g. co-regulation via the same signaling system), as well as gene-specific determinants that can fine-tune expression. We present a novel approach, Modular regulatory network learning with per gene information (MERLIN), that infers regulatory programs for individual genes while probabilistically constraining these programs to reveal module-level organization of regulatory networks. Using edge-, regulator- and module-based comparisons of simulated networks of known ground truth, we find MERLIN reconstructs regulatory programs of individual genes as well or better than existing approaches of network reconstruction, while additionally identifying modular organization of the regulatory networks. We use MERLIN to dissect global transcriptional behavior in two biological contexts: yeast stress response and human embryonic stem cell differentiation. Regulatory modules inferred by MERLIN capture co-regulatory relationships between signaling proteins and downstream transcription factors thereby revealing the upstream signaling systems controlling transcriptional responses. The inferred networks are enriched for regulators with genetic or physical interactions, supporting the inference, and identify modules of functionally related genes bound by the same transcriptional regulators. Our method combines the strengths of per-gene and per-module methods to reveal new insights into transcriptional regulation in stress and development.
Collapse
Affiliation(s)
- Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Wisconsin Institute for Discovery, Madison, Wisconsin, United States of America
- * E-mail:
| | - Stephen Lagree
- Department of Computer Science, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Zhonggang Hou
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
| | - James A. Thomson
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Molecular, Cellular, and Developmental Biology, University of California Santa Barbara, Santa Barbara, California, United States of America
| | - Ron Stewart
- Morgridge Institute for Research, Madison, Wisconsin, United States of America
| | - Audrey P. Gasch
- Department of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
79
|
Xiong J, Zhou T. A Kalman-filter based approach to identification of time-varying gene regulatory networks. PLoS One 2013; 8:e74571. [PMID: 24116005 PMCID: PMC3792119 DOI: 10.1371/journal.pone.0074571] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2013] [Accepted: 08/04/2013] [Indexed: 11/18/2022] Open
Abstract
Motivation Conventional identification methods for gene regulatory networks (GRNs) have overwhelmingly adopted static topology models, which remains unchanged over time to represent the underlying molecular interactions of a biological system. However, GRNs are dynamic in response to physiological and environmental changes. Although there is a rich literature in modeling static or temporally invariant networks, how to systematically recover these temporally changing networks remains a major and significant pressing challenge. The purpose of this study is to suggest a two-step strategy that recovers time-varying GRNs. Results It is suggested in this paper to utilize a switching auto-regressive model to describe the dynamics of time-varying GRNs, and a two-step strategy is proposed to recover the structure of time-varying GRNs. In the first step, the change points are detected by a Kalman-filter based method. The observed time series are divided into several segments using these detection results; and each time series segment belonging to two successive demarcating change points is associated with an individual static regulatory network. In the second step, conditional network structure identification methods are used to reconstruct the topology for each time interval. This two-step strategy efficiently decouples the change point detection problem and the topology inference problem. Simulation results show that the proposed strategy can detect the change points precisely and recover each individual topology structure effectively. Moreover, computation results with the developmental data of Drosophila Melanogaster show that the proposed change point detection procedure is also able to work effectively in real world applications and the change point estimation accuracy exceeds other existing approaches, which means the suggested strategy may also be helpful in solving actual GRN reconstruction problem.
Collapse
Affiliation(s)
- Jie Xiong
- Department of Automation, Tsinghua University, Beijing, China
- * E-mail:
| | - Tong Zhou
- Department of Automation and Tsinghua National Laboratory for Information Science and Technology(TNList), Tsinghua University, Beijing, China
| |
Collapse
|
80
|
Glass K, Huttenhower C, Quackenbush J, Yuan GC. Passing messages between biological networks to refine predicted interactions. PLoS One 2013; 8:e64832. [PMID: 23741402 PMCID: PMC3669401 DOI: 10.1371/journal.pone.0064832] [Citation(s) in RCA: 142] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Accepted: 04/17/2013] [Indexed: 01/10/2023] Open
Abstract
Regulatory network reconstruction is a fundamental problem in computational biology. There are significant limitations to such reconstruction using individual datasets, and increasingly people attempt to construct networks using multiple, independent datasets obtained from complementary sources, but methods for this integration are lacking. We developed PANDA (Passing Attributes between Networks for Data Assimilation), a message-passing model using multiple sources of information to predict regulatory relationships, and used it to integrate protein-protein interaction, gene expression, and sequence motif data to reconstruct genome-wide, condition-specific regulatory networks in yeast as a model. The resulting networks were not only more accurate than those produced using individual data sets and other existing methods, but they also captured information regarding specific biological mechanisms and pathways that were missed using other methodologies. PANDA is scalable to higher eukaryotes, applicable to specific tissue or cell type data and conceptually generalizable to include a variety of regulatory, interaction, expression, and other genome-scale data. An implementation of the PANDA algorithm is available at www.sourceforge.net/projects/panda-net.
Collapse
Affiliation(s)
- Kimberly Glass
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - John Quackenbush
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Guo-Cheng Yuan
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
81
|
Samee AH, Sinha S. Evaluating thermodynamic models of enhancer activity on cellular resolution gene expression data. Methods 2013; 62:79-90. [PMID: 23624421 DOI: 10.1016/j.ymeth.2013.03.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 03/04/2013] [Indexed: 11/18/2022] Open
Abstract
With the advent of high throughput sequencing and high resolution transcriptomic technologies, there exists today an unprecedented opportunity to understand gene regulation at a quantitative level. State of the art models of the relationship between regulatory sequence and gene expression have shown great promise, but also suffer from some major shortcomings. In this paper, we identify and address methodological challenges pertaining to quantitative modeling of gene expression from sequence, and test our models on the anterior-posterior patterning system in the Drosophila embryo. We first develop a framework to process cellular resolution three-dimensional gene expression data from the Drosophila embryo and create data sets on which quantitative models can be trained. Next we propose a new score, called 'weighted pattern generating potential' (w-PGP), to evaluate model predictions, and show its advantages over the two most common scoring schemes in use today. The model building exercise uses w-PGP as the evaluation score and adopts a systematic strategy to increase a model's complexity while guarding against over-fitting. Our model identifies three transcription factors--ZELDA, SLOPPY-PAIRED, and NUBBIN--that have not been previously incorporated in quantitative models of this system, as having significant regulatory influence. Finally, we show how fitting quantitative models on data sets comprising a handful of enhancers, as reported in earlier work, may lead to unreliable models.
Collapse
Affiliation(s)
- Abul Hassan Samee
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | | |
Collapse
|
82
|
Van Nostrand EL, Kim SK. Integrative analysis of C. elegans modENCODE ChIP-seq data sets to infer gene regulatory interactions. Genome Res 2013; 23:941-53. [PMID: 23531767 PMCID: PMC3668362 DOI: 10.1101/gr.152876.112] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The C. elegans modENCODE Consortium has defined in vivo binding sites for a large array of transcription factors by ChIP-seq. In this article, we present examples that illustrate how this compendium of ChIP-seq data can drive biological insights not possible with analysis of individual factors. First, we analyze the number of independent factors bound to the same locus, termed transcription factor complexity, and find that low-complexity sites are more likely to respond to altered expression of a single bound transcription factor. Next, we show that comparison of binding sites for the same factor across developmental stages can reveal insight into the regulatory network of that factor, as we find that the transcription factor UNC-62 has distinct binding profiles at different stages due to distinct cofactor co-association as well as tissue-specific alternative splicing. Finally, we describe an approach to infer potential regulators of gene expression changes found in profiling experiments (such as DNA microarrays) by screening these altered genes to identify significant enrichment for targets of a transcription factor identified in ChIP-seq data sets. After confirming that this approach can correctly identify the upstream regulator on expression data sets for which the regulator was previously known, we applied this approach to identify novel candidate regulators of transcriptional changes with age. The analysis revealed nine candidate aging regulators, of which three were previously known to have a role in longevity. We experimentally showed that two of the new candidate aging regulators can extend lifespan when overexpressed, indicating that this approach can identify novel functional regulators of complex processes.
Collapse
Affiliation(s)
- Eric L Van Nostrand
- Department of Genetics and Department of Developmental Biology, Stanford University Medical Center, Stanford, California 94305, USA
| | | |
Collapse
|
83
|
Enuameh MS, Asriyan Y, Richards A, Christensen RG, Hall VL, Kazemian M, Zhu C, Pham H, Cheng Q, Blatti C, Brasefield JA, Basciotta MD, Ou J, McNulty JC, Zhu LJ, Celniker SE, Sinha S, Stormo GD, Brodsky MH, Wolfe SA. Global analysis of Drosophila Cys₂-His₂ zinc finger proteins reveals a multitude of novel recognition motifs and binding determinants. Genome Res 2013; 23:928-40. [PMID: 23471540 PMCID: PMC3668361 DOI: 10.1101/gr.151472.112] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Cys2-His2 zinc finger proteins (ZFPs) are the largest group of transcription factors in higher metazoans. A complete characterization of these ZFPs and their associated target sequences is pivotal to fully annotate transcriptional regulatory networks in metazoan genomes. As a first step in this process, we have characterized the DNA-binding specificities of 129 zinc finger sets from Drosophila using a bacterial one-hybrid system. This data set contains the DNA-binding specificities for at least one encoded ZFP from 70 unique genes and 23 alternate splice isoforms representing the largest set of characterized ZFPs from any organism described to date. These recognition motifs can be used to predict genomic binding sites for these factors within the fruit fly genome. Subsets of fingers from these ZFPs were characterized to define their orientation and register on their recognition sequences, thereby allowing us to define the recognition diversity within this finger set. We find that the characterized fingers can specify 47 of the 64 possible DNA triplets. To confirm the utility of our finger recognition models, we employed subsets of Drosophila fingers in combination with an existing archive of artificial zinc finger modules to create ZFPs with novel DNA-binding specificity. These hybrids of natural and artificial fingers can be used to create functional zinc finger nucleases for editing vertebrate genomes.
Collapse
Affiliation(s)
- Metewo Selase Enuameh
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
84
|
Batut P, Dobin A, Plessy C, Carninci P, Gingeras TR. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res 2013; 23:169-80. [PMID: 22936248 PMCID: PMC3530677 DOI: 10.1101/gr.139618.112] [Citation(s) in RCA: 141] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2012] [Accepted: 08/29/2012] [Indexed: 12/20/2022]
Abstract
Many eukaryotic genes possess multiple alternative promoters with distinct expression specificities. Therefore, comprehensively annotating promoters and deciphering their individual regulatory dynamics is critical for gene expression profiling applications and for our understanding of regulatory complexity. We introduce RAMPAGE, a novel promoter activity profiling approach that combines extremely specific 5'-complete cDNA sequencing with an integrated data analysis workflow, to address the limitations of current techniques. RAMPAGE features a streamlined protocol for fast and easy generation of highly multiplexed sequencing libraries, offers very high transcription start site specificity, generates accurate and reproducible promoter expression measurements, and yields extensive transcript connectivity information through paired-end cDNA sequencing. We used RAMPAGE in a genome-wide study of promoter activity throughout 36 stages of the life cycle of Drosophila melanogaster, and describe here a comprehensive data set that represents the first available developmental time-course of promoter usage. We found that >40% of developmentally expressed genes have at least two promoters and that alternative promoters generally implement distinct regulatory programs. Transposable elements, long proposed to play a central role in the evolution of their host genomes through their ability to regulate gene expression, contribute at least 1300 promoters shaping the developmental transcriptome of D. melanogaster. Hundreds of these promoters drive the expression of annotated genes, and transposons often impart their own expression specificity upon the genes they regulate. These observations provide support for the theory that transposons may drive regulatory innovation through the distribution of stereotyped cis-regulatory modules throughout their host genomes.
Collapse
Affiliation(s)
- Philippe Batut
- Watson School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.
| | | | | | | | | |
Collapse
|
85
|
Ciofani M, Madar A, Galan C, Sellars M, Mace K, Pauli F, Agarwal A, Huang W, Parkhurst CN, Muratet M, Newberry KM, Meadows S, Greenfield A, Yang Y, Jain P, Kirigin FK, Birchmeier C, Wagner EF, Murphy KM, Myers RM, Bonneau R, Littman DR. A validated regulatory network for Th17 cell specification. Cell 2012; 151:289-303. [PMID: 23021777 DOI: 10.1016/j.cell.2012.09.016] [Citation(s) in RCA: 925] [Impact Index Per Article: 71.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Revised: 09/12/2012] [Accepted: 09/17/2012] [Indexed: 12/29/2022]
Abstract
Th17 cells have critical roles in mucosal defense and are major contributors to inflammatory disease. Their differentiation requires the nuclear hormone receptor RORγt working with multiple other essential transcription factors (TFs). We have used an iterative systems approach, combining genome-wide TF occupancy, expression profiling of TF mutants, and expression time series to delineate the Th17 global transcriptional regulatory network. We find that cooperatively bound BATF and IRF4 contribute to initial chromatin accessibility and, with STAT3, initiate a transcriptional program that is then globally tuned by the lineage-specifying TF RORγt, which plays a focal deterministic role at key loci. Integration of multiple data sets allowed inference of an accurate predictive model that we computationally and experimentally validated, identifying multiple new Th17 regulators, including Fosl2, a key determinant of cellular plasticity. This interconnected network can be used to investigate new therapeutic approaches to manipulate Th17 functions in the setting of inflammatory disease.
Collapse
Affiliation(s)
- Maria Ciofani
- Molecular Pathogenesis Program, The Kimmel Center for Biology and Medicine of the Skirball Institute, New York University School of Medicine, New York, NY 10016, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
86
|
Spence JL, Wallihan S. Computational prediction of the polyQ and CAG repeat spinocerebellar ataxia network based on sequence identity to untranslated regions. Gene 2012; 509:273-81. [PMID: 22967711 DOI: 10.1016/j.gene.2012.07.068] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 07/30/2012] [Indexed: 01/01/2023]
Abstract
Computational prediction of biological networks would be a tremendous asset to systems biology and personalized medicine. In this paper, we use a moving window bioinformatic screen to identify transcripts with partial identity to the 5' and 3'UTRs of the polyQ spinocerebellar ataxia (SCA) genes ATXN1, ATXN2, ATXN3, ATXN7, TBP and CACNA1A and the CAG repeat expansion gene PPP2R2B. We find that the bioinformatic screen enriches for transcripts that encode proteins that interact and that have functions relevant to polyQ SCA. Transcription control and RNA binding are the primary functional groups represented in the proteins from the combined screens. The insulin growth factor pathway, the WNT pathway, long term potentiation, melanogenesis and ATM mediated DNA repair pathways were identified as important pathways. UGUUU repeats were identified as an abundant motif in the SCA network and PAXIP1, CELF2, CREBBP, EBF1, PLEKHG4, SRSF4, C5orf42, NFIA, STK24, and YWHAG were identified as statistically significant proteins in the polyQ and PPP2R2B network.
Collapse
|
87
|
Abstract
Reconstructing gene regulatory networks from high-throughput data is a long-standing problem. Through the DREAM project (Dialogue on Reverse Engineering Assessment and Methods), we performed a comprehensive blind assessment of over thirty network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae, and in silico microarray data. We characterize performance, data requirements, and inherent biases of different inference approaches offering guidelines for both algorithm application and development. We observe that no single inference method performs optimally across all datasets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse datasets. Thereby, we construct high-confidence networks for E. coli and S. aureus, each comprising ~1700 transcriptional interactions at an estimated precision of 50%. We experimentally test 53 novel interactions in E. coli, of which 23 were supported (43%). Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.
Collapse
|