1
|
Cho C, Lee D, Jeong D, Kim S, Kim MK, Srinivasan S. Characterization of radiation-resistance mechanism in Spirosoma montaniterrae DY10 T in terms of transcriptional regulatory system. Sci Rep 2023; 13:4739. [PMID: 36959250 PMCID: PMC10036542 DOI: 10.1038/s41598-023-31509-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 03/13/2023] [Indexed: 03/25/2023] Open
Abstract
To respond to the external environmental changes for survival, bacteria regulates expression of a number of genes including transcription factors (TFs). To characterize complex biological phenomena, a biological system-level approach is necessary. Here we utilized six computational biology methods to infer regulatory network and to characterize underlying biologically mechanisms relevant to radiation-resistance. In particular, we inferred gene regulatory network (GRN) and operons of radiation-resistance bacterium Spirosoma montaniterrae DY10[Formula: see text] and identified the major regulators for radiation-resistance. Our results showed that DNA repair and reactive oxygen species (ROS) scavenging mechanisms are key processes and Crp/Fnr family transcriptional regulator works as a master regulatory TF in early response to radiation.
Collapse
Affiliation(s)
- Changyun Cho
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
| | - Dohoon Lee
- Bioinformatics Institute, Seoul National University, Seoul, 08826, Republic of Korea
- BK21 FOUR Intelligence Computing, Seoul National University, Seoul, 08826, Republic of Korea
| | - Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Myung Kyum Kim
- Department of Bio & Environmental Technology, College of Natural Science, Seoul Women's University, Seoul, 01797, Republic of Korea.
| | - Sathiyaraj Srinivasan
- Department of Bio & Environmental Technology, College of Natural Science, Seoul Women's University, Seoul, 01797, Republic of Korea.
| |
Collapse
|
2
|
Visualization and assessment of model selection uncertainty. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
3
|
Cabrera-Garcia D, Warm D, de la Fuente P, Fernández-Sánchez MT, Novelli A, Villanueva-Balsera JM. Early prediction of developing spontaneous activity in cultured neuronal networks. Sci Rep 2021; 11:20407. [PMID: 34650146 PMCID: PMC8516856 DOI: 10.1038/s41598-021-99538-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 09/27/2021] [Indexed: 11/18/2022] Open
Abstract
Synchronization and bursting activity are intrinsic electrophysiological properties of in vivo and in vitro neural networks. During early development, cortical cultures exhibit a wide repertoire of synchronous bursting dynamics whose characterization may help to understand the parameters governing the transition from immature to mature networks. Here we used machine learning techniques to characterize and predict the developing spontaneous activity in mouse cortical neurons on microelectrode arrays (MEAs) during the first three weeks in vitro. Network activity at three stages of early development was defined by 18 electrophysiological features of spikes, bursts, synchrony, and connectivity. The variability of neuronal network activity during early development was investigated by applying k-means and self-organizing map (SOM) clustering analysis to features of bursts and synchrony. These electrophysiological features were predicted at the third week in vitro with high accuracy from those at earlier times using three machine learning models: Multivariate Adaptive Regression Splines, Support Vector Machines, and Random Forest. Our results indicate that initial patterns of electrical activity during the first week in vitro may already predetermine the final development of the neuronal network activity. The methodological approach used here may be applied to explore the biological mechanisms underlying the complex dynamics of spontaneous activity in developing neuronal cultures.
Collapse
Affiliation(s)
- David Cabrera-Garcia
- Department of Biochemistry and Molecular Biology and University Institute of Biotechnology of Asturias (IUBA), Campus "El Cristo", University of Oviedo, 33006, Oviedo, Spain.
- Department of Synapse and Network Development, Netherlands Institute for Neuroscience, 1105 BA, Amsterdam, The Netherlands.
| | - Davide Warm
- Department of Biochemistry and Molecular Biology and University Institute of Biotechnology of Asturias (IUBA), Campus "El Cristo", University of Oviedo, 33006, Oviedo, Spain
- Institute of Physiology, University Medical Center of the Johannes Gutenberg University Mainz, Duesbergweg 6, 55128, Mainz, Germany
| | - Pablo de la Fuente
- Department of Biochemistry and Molecular Biology and University Institute of Biotechnology of Asturias (IUBA), Campus "El Cristo", University of Oviedo, 33006, Oviedo, Spain
| | - M Teresa Fernández-Sánchez
- Department of Biochemistry and Molecular Biology and University Institute of Biotechnology of Asturias (IUBA), Campus "El Cristo", University of Oviedo, 33006, Oviedo, Spain
| | - Antonello Novelli
- Department of Biochemistry and Molecular Biology and University Institute of Biotechnology of Asturias (IUBA), Campus "El Cristo", University of Oviedo, 33006, Oviedo, Spain.
- Department of Psychology and University Institute of Biotechnology of Asturias (IUBA), Campus "El Cristo", University of Oviedo, Institute for Sanitary Research of the Princedom of Asturias (ISPA), 33006, Oviedo, Spain.
| | | |
Collapse
|
4
|
Abstract
Transcriptomes are known to organize themselves into gene co-expression clusters or modules where groups of genes display distinct patterns of coordinated or synchronous expression across independent biological samples. The functional significance of these co-expression clusters is suggested by the fact that highly coexpressed groups of genes tend to be enriched in genes involved in common functions and biological processes. While gene co-expression is widely assumed to reflect close regulatory proximity, the validity of this assumption remains unclear. Here we use a simple synthetic gene regulatory network (GRN) model and contrast the resulting co-expression structure produced by these networks with their known regulatory architecture and with the co-expression structure measured in available human expression data. Using randomization tests, we found that the levels of co-expression observed in simulated expression data were, just as with empirical data, significantly higher than expected by chance. When examining the source of correlated expression, we found that individual regulators, both in simulated and experimental data, fail, on average, to display correlated expression with their immediate targets. However, highly correlated gene pairs tend to share at least one common regulator, while most gene pairs sharing common regulators do not necessarily display correlated expression. Our results demonstrate that widespread co-expression naturally emerges in regulatory networks, and that it is a reliable and direct indicator of active co-regulation in a given cellular context.
Collapse
|
5
|
Zhang MQ. A personal journey on cracking the genomic codes. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-021-0245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
6
|
Green B, Lian H, Yu Y, Zu T. Ultra high-dimensional semiparametric longitudinal data analysis. Biometrics 2020; 77:903-913. [PMID: 32750150 DOI: 10.1111/biom.13348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 06/08/2020] [Accepted: 07/21/2020] [Indexed: 11/30/2022]
Abstract
As ultra high-dimensional longitudinal data are becoming ever more apparent in fields such as public health and bioinformatics, developing flexible methods with a sparse model is of high interest. In this setting, the dimension of the covariates can potentially grow exponentially as exp ( n 1 / 2 ) with respect to the number of clusters n. We consider a flexible semiparametric approach, namely, partially linear single-index models, for ultra high-dimensional longitudinal data. Most importantly, we allow not only the partially linear covariates but also the single-index covariates within the unknown flexible function estimated nonparametrically to be ultra high dimensional. Using penalized generalized estimating equations, this approach can capture correlation within subjects, can perform simultaneous variable selection and estimation with a smoothly clipped absolute deviation penalty, and can capture nonlinearity and potentially some interactions among predictors. We establish asymptotic theory for the estimators including the oracle property in ultra high dimension for both the partially linear and nonparametric components, and we present an efficient algorithm to handle the computational challenges. We show the effectiveness of our method and algorithm via a simulation study and a yeast cell cycle gene expression data.
Collapse
Affiliation(s)
- Brittany Green
- Department of Computer Information Systems, University of Louisville, Louisville, Kentucky
| | - Heng Lian
- Department of Mathematics, City University of Hong Kong, Kowloon Tong, Hong Kong, China
| | - Yan Yu
- Department of Operations, Business Analytics, & Information Systems, University of Cincinnati, Cincinnati, Ohio
| | - Tianhai Zu
- Department of Operations, Business Analytics, & Information Systems, University of Cincinnati, Cincinnati, Ohio
| |
Collapse
|
7
|
Kreimer A, Yosef N. Evaluation of Davis et al.: Exploring Sequence of Determinants of Transcriptional Regulation-The Case of c-AMP Response Element. Cell Syst 2020; 11:2-4. [PMID: 32702318 DOI: 10.1016/j.cels.2020.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
One snapshot of the peer review process for "Dissection of c-AMP Response Element Architecture by Using Genomic and Episomal Massively Parallel Reporter Assays" (Davis et al., 2020).
Collapse
Affiliation(s)
- Anat Kreimer
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94158, USA; Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences and Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA; Chan Zuckerberg Biohub, San Francisco, CA, USA; Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard University, Boston, MA, USA.
| |
Collapse
|
8
|
Detection of cooperatively bound transcription factor pairs using ChIP-seq peak intensities and expectation maximization. PLoS One 2018; 13:e0199771. [PMID: 30016330 PMCID: PMC6049898 DOI: 10.1371/journal.pone.0199771] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Accepted: 06/13/2018] [Indexed: 11/19/2022] Open
Abstract
Transcription factors (TFs) often work cooperatively, where the binding of one TF to DNA enhances the binding affinity of a second TF to a nearby location. Such cooperative binding is important for activating gene expression from promoters and enhancers in both prokaryotic and eukaryotic cells. Existing methods to detect cooperative binding of a TF pair rely on analyzing the sequence that is bound. We propose a method that uses, instead, only ChIP-seq peak intensities and an expectation maximization (CPI-EM) algorithm. We validate our method using ChIP-seq data from cells where one of a pair of TFs under consideration has been genetically knocked out. Our algorithm relies on our observation that cooperative TF-TF binding is correlated with weak binding of one of the TFs, which we demonstrate in a variety of cell types, including E. coli, S. cerevisiae and M. musculus cells. We show that this method performs significantly better than a predictor based only on the ChIP-seq peak distance of the TFs under consideration. This suggests that peak intensities contain information that can help detect the cooperative binding of a TF pair. CPI-EM also outperforms an existing sequence-based algorithm in detecting cooperative binding. The CPI-EM algorithm is available at https://github.com/vishakad/cpi-em.
Collapse
|
9
|
Kreimer A, Zeng H, Edwards MD, Guo Y, Tian K, Shin S, Welch R, Wainberg M, Mohan R, Sinnott-Armstrong NA, Li Y, Eraslan G, AMIN TB, Goke J, Mueller NS, Kellis M, Kundaje A, Beer MA, Keles S, Gifford DK, Yosef N. Predicting gene expression in massively parallel reporter assays: A comparative study. Hum Mutat 2017; 38:1240-1250. [PMID: 28220625 PMCID: PMC5560998 DOI: 10.1002/humu.23197] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Revised: 01/19/2017] [Accepted: 02/12/2017] [Indexed: 02/03/2023]
Abstract
In many human diseases, associated genetic changes tend to occur within noncoding regions, whose effect might be related to transcriptional control. A central goal in human genetics is to understand the function of such noncoding regions: given a region that is statistically associated with changes in gene expression (expression quantitative trait locus [eQTL]), does it in fact play a regulatory role? And if so, how is this role "coded" in its sequence? These questions were the subject of the Critical Assessment of Genome Interpretation eQTL challenge. Participants were given a set of sequences that flank eQTLs in humans and were asked to predict whether these are capable of regulating transcription (as evaluated by massively parallel reporter assays), and whether this capability changes between alternative alleles. Here, we report lessons learned from this community effort. By inspecting predictive properties in isolation, and conducting meta-analysis over the competing methods, we find that using chromatin accessibility and transcription factor binding as features in an ensemble of classifiers or regression models leads to the most accurate results. We then characterize the loci that are harder to predict, putting the spotlight on areas of weakness, which we expect to be the subject of future studies.
Collapse
Affiliation(s)
- Anat Kreimer
- Department of Electrical Engineering and Computer Science and Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Bioengineering and Therapeutic Sciences, Institute for Human Genetics, University of California, San Francisco, San Francisco, California, USA
| | - Haoyang Zeng
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Matthew D. Edwards
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Yuchun Guo
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Kevin Tian
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Sunyoung Shin
- Department of Statistics, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Rene Welch
- Department of Statistics, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Michael Wainberg
- Department of Genetics, Stanford University School of Medicine, Department of Computer Science, Stanford, California 94305, USA
| | - Rahul Mohan
- Department of Genetics, Stanford University School of Medicine, Department of Computer Science, Stanford, California 94305, USA
| | - Nicholas A. Sinnott-Armstrong
- Department of Genetics, Stanford University School of Medicine, Department of Computer Science, Stanford, California 94305, USA
| | - Yue Li
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA
| | - Gökcen Eraslan
- Computational Cell Maps, Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1 85764 Neuherberg, Germany
| | - Talal Bin AMIN
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Jonathan Goke
- Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore
| | - Nikola S. Mueller
- Computational Cell Maps, Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstr. 1 85764 Neuherberg, Germany
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, Massachusetts 02139, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University School of Medicine, Department of Computer Science, Stanford, California 94305, USA
| | - Michael A Beer
- McKusick-Nathans Institute of Genetic Medicine, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sunduz Keles
- Department of Statistics, Department of Biostatistics and Medical Informatics University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - David K. Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | - Nir Yosef
- Department of Electrical Engineering and Computer Science and Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Ragon Institute of Massachusetts General Hospital, MIT and Harvard, Cambridge, MA, 02139
| |
Collapse
|
10
|
Kong FY, Zhu T, Li N, Cai YF, Zhou K, Wei X, Kou YB, You HJ, Zheng KY, Tang RX. Bioinformatics analysis of the proteins interacting with LASP-1 and their association with HBV-related hepatocellular carcinoma. Sci Rep 2017; 7:44017. [PMID: 28266596 PMCID: PMC5339786 DOI: 10.1038/srep44017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 02/02/2017] [Indexed: 12/11/2022] Open
Abstract
LIM and SH3 domain protein (LASP-1) is responsible for the development of several types of human cancers via the interaction with other proteins; however, the precise biological functions of proteins interacting with LASP-1 are not fully clarified. Although the role of LASP-1 in hepatocarcinogenesis has been reported, the implication of LASP-1 interactors in HBV-related hepatocellular carcinoma (HCC) is not clearly evaluated. We obtained information regarding LASP-1 interactors from public databases and published studies. Via bioinformatics analysis, we found that LASP-1 interactors were related to distinct molecular functions and associated with various biological processes. Through an integrated network analysis of the interaction and pathways of LASP-1 interactors, cross-talk between different proteins and associated pathways was found. In addition, LASP-1 and several its interactors are significantly altered in HBV-related HCC through microarray analysis and could form a complex co-expression network. In the disease, LASP-1 and its interactors were further predicted to be regulated by a complex interaction network composed of different transcription factors. Besides, numerous LASP-1 interactors were associated with various clinical factors and related to the survival and recurrence of HBV-related HCC. Taken together, these results could help enrich our understanding of LASP-1 interactors and their relationships with HBV-related HCC.
Collapse
Affiliation(s)
- Fan-Yun Kong
- Jiangsu Key Laboratory of Brain Disease Bioinformation, Xuzhou Medical University, Xuzhou, Jiangsu, China.,Department of Pathogenic Biology and Immunology, Jiangsu Key Laboratory of Immunity and Metabolism, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ting Zhu
- Department of Pathogenic Biology and Immunology, Jiangsu Key Laboratory of Immunity and Metabolism, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Nan Li
- Department of Pathogenic Biology and Immunology, Jiangsu Key Laboratory of Immunity and Metabolism, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Yun-Fei Cai
- Department of Pathogenic Biology and Immunology, Jiangsu Key Laboratory of Immunity and Metabolism, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Kai Zhou
- Department of Pathogenic Biology and Immunology, Jiangsu Key Laboratory of Immunity and Metabolism, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Xiao Wei
- Department of Pathogenic Biology and Immunology, Jiangsu Key Laboratory of Immunity and Metabolism, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Yan-Bo Kou
- Department of Pathogenic Biology and Immunology, Jiangsu Key Laboratory of Immunity and Metabolism, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Hong-Juan You
- Department of Pathogenic Biology and Immunology, Jiangsu Key Laboratory of Immunity and Metabolism, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Kui-Yang Zheng
- Department of Pathogenic Biology and Immunology, Jiangsu Key Laboratory of Immunity and Metabolism, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ren-Xian Tang
- Jiangsu Key Laboratory of Brain Disease Bioinformation, Xuzhou Medical University, Xuzhou, Jiangsu, China.,Department of Pathogenic Biology and Immunology, Jiangsu Key Laboratory of Immunity and Metabolism, Xuzhou Medical University, Xuzhou, Jiangsu, China
| |
Collapse
|
11
|
Sikdar S, Datta S. A novel statistical approach for identification of the master regulator transcription factor. BMC Bioinformatics 2017; 18:79. [PMID: 28148240 PMCID: PMC5288875 DOI: 10.1186/s12859-017-1499-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 01/27/2017] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Transcription factors are known to play key roles in carcinogenesis and therefore, are gaining popularity as potential therapeutic targets in drug development. A 'master regulator' transcription factor often appears to control most of the regulatory activities of the other transcription factors and the associated genes. This 'master regulator' transcription factor is at the top of the hierarchy of the transcriptomic regulation. Therefore, it is important to identify and target the master regulator transcription factor for proper understanding of the associated disease process and identifying the best therapeutic option. METHODS We present a novel two-step computational approach for identification of master regulator transcription factor in a genome. At the first step of our method we test whether there exists any master regulator transcription factor in the system. We evaluate the concordance of two ranked lists of transcription factors using a statistical measure. In case the concordance measure is statistically significant, we conclude that there is a master regulator. At the second step, our method identifies the master regulator transcription factor, if there exists one. RESULTS In the simulation scenario, our method performs reasonably well in validating the existence of a master regulator when the number of subjects in each treatment group is reasonably large. In application to two real datasets, our method ensures the existence of master regulators and identifies biologically meaningful master regulators. An R code for implementing our method in a sample test data can be found in http://www.somnathdatta.org/software . CONCLUSION We have developed a screening method of identifying the 'master regulator' transcription factor just using only the gene expression data. Understanding the regulatory structure and finding the master regulator help narrowing the search space for identifying biomarkers for complex diseases such as cancer. In addition to identifying the master regulator our method provides an overview of the regulatory structure of the transcription factors which control the global gene expression profiles and consequently the cell functioning.
Collapse
Affiliation(s)
- Sinjini Sikdar
- Department of Biostatistics, University of Florida, Gainesville, FL, 32611, USA
| | - Susmita Datta
- Department of Biostatistics, University of Florida, Gainesville, FL, 32611, USA.
| |
Collapse
|
12
|
Zhang K, Li N, Ainsworth RI, Wang W. Systematic identification of protein combinations mediating chromatin looping. Nat Commun 2016; 7:12249. [PMID: 27461729 PMCID: PMC4974460 DOI: 10.1038/ncomms12249] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 06/15/2016] [Indexed: 12/25/2022] Open
Abstract
Chromatin looping plays a pivotal role in gene expression and other biological processes through bringing distal regulatory elements into spatial proximity. The formation of chromatin loops is mainly mediated by DNA-binding proteins (DBPs) that bind to the interacting sites and form complexes in three-dimensional (3D) space. Previously, identification of DBP cooperation has been limited to those binding to neighbouring regions in the proximal linear genome (1D cooperation). Here we present the first study that integrates protein ChIP-seq and Hi-C data to systematically identify both the 1D- and 3D-cooperation between DBPs. We develop a new network model that allows identification of cooperation between multiple DBPs and reveals cell-type-specific and -independent regulations. Using this framework, we retrieve many known and previously unknown 3D-cooperations between DBPs in chromosomal loops that may be a key factor in influencing the 3D organization of chromatin.
Collapse
Affiliation(s)
- Kai Zhang
- Graduate Program in Bioinformatics and Systems Biology, University of California, La Jolla, San Diego, California 92093-0359, USA
| | - Nan Li
- Department of Chemistry and Biochemistry, University of California, La Jolla, San Diego, California 92093-0359, USA
- Department of Cellular and Molecular Medicine, University of California, La Jolla, San Diego, California 92093-0359, USA
| | - Richard I. Ainsworth
- Department of Chemistry and Biochemistry, University of California, La Jolla, San Diego, California 92093-0359, USA
- Department of Cellular and Molecular Medicine, University of California, La Jolla, San Diego, California 92093-0359, USA
| | - Wei Wang
- Graduate Program in Bioinformatics and Systems Biology, University of California, La Jolla, San Diego, California 92093-0359, USA
- Department of Chemistry and Biochemistry, University of California, La Jolla, San Diego, California 92093-0359, USA
- Department of Cellular and Molecular Medicine, University of California, La Jolla, San Diego, California 92093-0359, USA
| |
Collapse
|
13
|
Siwo G, Rider A, Tan A, Pinapati R, Emrich S, Chawla N, Ferdig M. Prediction of fine-tuned promoter activity from DNA sequence. F1000Res 2016; 5:158. [PMID: 27347373 PMCID: PMC4916984 DOI: 10.12688/f1000research.7485.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/08/2016] [Indexed: 12/16/2022] Open
Abstract
The quantitative prediction of transcriptional activity of genes using promoter sequence is fundamental to the engineering of biological systems for industrial purposes and understanding the natural variation in gene expression. To catalyze the development of new algorithms for this purpose, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized a community challenge seeking predictive models of promoter activity given normalized promoter activity data for 90 ribosomal protein promoters driving expression of a fluorescent reporter gene. By developing an unbiased modeling approach that performs an iterative search for predictive DNA sequence features using the frequencies of various k-mers, inferred DNA mechanical properties and spatial positions of promoter sequences, we achieved the best performer status in this challenge. The specific predictive features used in the model included the frequency of the nucleotide G, the length of polymeric tracts of T and TA, the frequencies of 6 distinct trinucleotides and 12 tetranucleotides, and the predicted protein deformability of the DNA sequence. Our method accurately predicted the activity of 20 natural variants of ribosomal protein promoters (Spearman correlation r = 0.73) as compared to 33 laboratory-mutated variants of the promoters (r = 0.57) in a test set that was hidden from participants. Notably, our model differed substantially from the rest in 2 main ways: i) it did not explicitly utilize transcription factor binding information implying that subtle DNA sequence features are highly associated with gene expression, and ii) it was entirely based on features extracted exclusively from the 100 bp region upstream from the translational start site demonstrating that this region encodes much of the overall promoter activity. The findings from this study have important implications for the engineering of predictable gene expression systems and the evolution of gene expression in naturally occurring biological systems.
Collapse
Affiliation(s)
- Geoffrey Siwo
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA; Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA; IBM TJ Watson Research Center, NY, USA; IBM Research-Africa, Johannesberg, South Africa
| | - Andrew Rider
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA; Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| | - Asako Tan
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA; Epicentre, Madison, WI, USA
| | - Richard Pinapati
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA; Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| | - Scott Emrich
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA; Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| | - Nitesh Chawla
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA; Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| | - Michael Ferdig
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA; Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA; Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA
| |
Collapse
|
14
|
Abstract
Transcriptional control of gene expression requires interactions between the cis-regulatory elements (CREs) controlling gene promoters. We developed a sensitive computational method to identify CRE combinations with conserved spacing that does not require genome alignments. When applied to seven sensu stricto and sensu lato Saccharomyces species, 80% of the predicted interactions displayed some evidence of combinatorial transcriptional behavior in several existing datasets including: (1) chromatin immunoprecipitation data for colocalization of transcription factors, (2) gene expression data for coexpression of predicted regulatory targets, and (3) gene ontology databases for common pathway membership of predicted regulatory targets. We tested several predicted CRE interactions with chromatin immunoprecipitation experiments in a wild-type strain and strains in which a predicted cofactor was deleted. Our experiments confirmed that transcription factor (TF) occupancy at the promoters of the CRE combination target genes depends on the predicted cofactor while occupancy of other promoters is independent of the predicted cofactor. Our method has the additional advantage of identifying regulatory differences between species. By analyzing the S. cerevisiae and S. bayanus genomes, we identified differences in combinatorial cis-regulation between the species and showed that the predicted changes in gene regulation explain several of the species-specific differences seen in gene expression datasets. In some instances, the same CRE combinations appear to regulate genes involved in distinct biological processes in the two different species. The results of this research demonstrate that (1) combinatorial cis-regulation can be inferred by multi-genome analysis and (2) combinatorial cis-regulation can explain differences in gene expression between species.
Collapse
|
15
|
Li H, Chen D, Zhang J. Statistical analysis of combinatorial transcriptional regulatory motifs in human intron-containing promoter sequences. Comput Biol Chem 2013; 43:35-45. [DOI: 10.1016/j.compbiolchem.2012.12.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2012] [Revised: 12/19/2012] [Accepted: 12/23/2012] [Indexed: 11/16/2022]
|
16
|
Wen J, Chen Z, Cai X. A biophysical model for identifying splicing regulatory elements and their interactions. PLoS One 2013; 8:e54885. [PMID: 23382993 PMCID: PMC3559881 DOI: 10.1371/journal.pone.0054885] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2012] [Accepted: 12/17/2012] [Indexed: 11/18/2022] Open
Abstract
Alternative splicing (AS) of precursor mRNA (pre-mRNA) is a crucial step in the expression of most eukaryotic genes. Splicing factors (SFs) play an important role in AS regulation by binding to the cis-regulatory elements on the pre-mRNA. Although many splicing factors (SFs) and their binding sites have been identified, their combinatorial regulatory effects remain to be elucidated. In this paper, we derive a biophysical model for AS regulation that integrates combinatorial signals of cis-acting splicing regulatory elements (SREs) and their interactions. We also develop a systematic framework for model inference. Applying the biophysical model to a human RNA-Seq data set, we demonstrate that our model can explain 49.1%–66.5% variance of the data, which is comparable to the best result achieved by biophysical models for transcription. In total, we identified 119 SRE pairs between different regions of cassette exons that may regulate exon or intron definition in splicing, and 77 SRE pairs from the same region that may arise from a long motif or two different SREs bound by different SFs. Particularly, putative binding sites of polypyrimidine tract-binding protein (PTB), heterogeneous nuclear ribonucleoprotein (hnRNP) F/H and E/K are identified as interacting SRE pairs, and have been shown to be consistent with the interaction models proposed in previous experimental results. These results show that our biophysical model and inference method provide a means of quantitative modeling of splicing regulation and is a useful tool for identifying SREs and their interactions. The software package for model inference is available under an open source license.
Collapse
Affiliation(s)
- Ji Wen
- Department of Electrical and Computer Engineering, University of Miami, Coral Gables, Florida, United States of America
| | - Zhibin Chen
- Department of Microbiology and Immunology, University of Miami, Miami, Florida, United States of America
| | - Xiaodong Cai
- Department of Electrical and Computer Engineering, University of Miami, Coral Gables, Florida, United States of America
- * E-mail:
| |
Collapse
|
17
|
Bayarsaihan D, Makeyev AV, Enkhmandakh B. Epigenetic modulation by TFII-I during embryonic stem cell differentiation. J Cell Biochem 2013; 113:3056-60. [PMID: 22628223 DOI: 10.1002/jcb.24202] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
TFII-I transcription factors play an essential role during early vertebrate embryogenesis. Genome-wide mapping studies by ChIP-seq and ChIP-chip revealed that TFII-I primes multiple genomic loci in mouse embryonic stem cells and embryonic tissues. Moreover, many TFII-I-bound regions co-localize with H3K4me3/K27me3 bivalent chromatin within the promoters of lineage-specific genes. This minireview provides a summary of current knowledge regarding the function of TFII-I in epigenetic control of stem cell differentiation.
Collapse
Affiliation(s)
- Dashzeveg Bayarsaihan
- Center for Regenerative Medicine and Skeletal Development, Department of Reconstructive Sciences, School of Dentistry, University of Connecticut, Farmington, CT 06030, USA.
| | | | | |
Collapse
|
18
|
Geeven G, van der Laan MJ, de Gunst MCM. Comparison of targeted maximum likelihood and shrinkage estimators of parameters in gene networks. Stat Appl Genet Mol Biol 2012; 11:Article 2. [PMID: 23023699 DOI: 10.1515/1544-6115.1728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Gene regulatory networks, in which edges between nodes describe interactions between transcription factors (TFs) and their target genes, model regulatory interactions that determine the cell-type and condition-specific expression of genes. Regression methods can be used to identify TF-target gene interactions from gene expression and DNA sequence data. The response variable, i.e. observed gene expression, is modeled as a function of many predictor variables simultaneously. In practice, it is generally not possible to select a single model that clearly achieves the best fit to the observed experimental data and the selected models typically contain overlapping sets of predictor variables. Moreover, parameters that represent the marginal effect of the individual predictors are not always present. In this paper, we use the statistical framework of estimation of variable importance to define variable importance as a parameter of interest and study two different estimators of this parameter in the context of gene regulatory networks. On yeast data we show that the resulting parameter has a biologically appealing interpretation. We apply the proposed methodology on mammalian gene expression data to gain insight into the temporal activity of TFs that underly gene expression changes in F11 cells in response to Forskolin stimulation.
Collapse
|
19
|
Martin D, Allagnat F, Gesina E, Caille D, Gjinovci A, Waeber G, Meda P, Haefliger JA. Specific silencing of the REST target genes in insulin-secreting cells uncovers their participation in beta cell survival. PLoS One 2012; 7:e45844. [PMID: 23029270 PMCID: PMC3447792 DOI: 10.1371/journal.pone.0045844] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 08/24/2012] [Indexed: 12/22/2022] Open
Abstract
The absence of the transcriptional repressor RE-1 Silencing Transcription Factor (REST) in insulin-secreting beta cells is a major cue for the specific expression of a large number of genes. These REST target genes were largely ascribed to a function of neurotransmission in a neuronal context, whereas their role in pancreatic beta cells has been poorly explored. To identify their functional significance, we have generated transgenic mice expressing REST in beta cells (RIP-REST mice), and previously discovered that REST target genes are essential to insulin exocytosis. Herein we characterized a novel line of RIP-REST mice featuring diabetes. In diabetic RIP-REST mice, high levels of REST were associated with postnatal beta cell apoptosis, which resulted in gradual beta cell loss and sustained hyperglycemia in adults. Moreover, adenoviral REST transduction in INS-1E cells led to increased cell death under control conditions, and sensitized cells to death induced by cytokines. Screening for REST target genes identified several anti-apoptotic genes bearing the binding motif RE-1 that were downregulated upon REST expression in INS-1E cells, including Gjd2, Mapk8ip1, Irs2, Ptprn, and Cdk5r2. Decreased levels of Cdk5r2 in beta cells of RIP-REST mice further confirmed that it is controlled by REST, in vivo. Using siRNA-mediated knock-down in INS-1E cells, we showed that Cdk5r2 protects beta cells against cytokines and palmitate-induced apoptosis. Together, these data document that a set of REST target genes, including Cdk5r2, is important for beta cell survival.
Collapse
Affiliation(s)
- David Martin
- Service of Internal Medicine, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Florent Allagnat
- Service of Internal Medicine, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Emilie Gesina
- Ecole Polytechnique Fédérale de Lausanne, Faculté des Sciences de la Vie, Lausanne, Switzerland
| | - Dorothee Caille
- Department of Cell Physiology and Metabolism, University Medical Center, Geneva, Switzerland
| | - Asllan Gjinovci
- Department of Cell Physiology and Metabolism, University Medical Center, Geneva, Switzerland
| | - Gerard Waeber
- Service of Internal Medicine, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - Paolo Meda
- Department of Cell Physiology and Metabolism, University Medical Center, Geneva, Switzerland
| | | |
Collapse
|
20
|
McLeay RC, Lesluyes T, Cuellar Partida G, Bailey TL. Genome-wide in silico prediction of gene expression. ACTA ACUST UNITED AC 2012; 28:2789-96. [PMID: 22954627 DOI: 10.1093/bioinformatics/bts529] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Modelling the regulation of gene expression can provide insight into the regulatory roles of individual transcription factors (TFs) and histone modifications. Recently, Ouyang et al. in 2009 modelled gene expression levels in mouse embryonic stem (mES) cells using in vivo ChIP-seq measurements of TF binding. ChIP-seq TF binding data, however, are tissue-specific and relatively difficult to obtain. This limits the applicability of gene expression models that rely on ChIP-seq TF binding data. RESULTS In this study, we build regression-based models that relate gene expression to the binding of 12 different TFs, 7 histone modifications and chromatin accessibility (DNase I hypersensitivity) in two different tissues. We find that expression models based on computationally predicted TF binding can achieve similar accuracy to those using in vivo TF binding data and that including binding at weak sites is critical for accurate prediction of gene expression. We also find that incorporating histone modification and chromatin accessibility data results in additional accuracy. Surprisingly, we find that models that use no TF binding data at all, but only histone modification and chromatin accessibility data, can be as (or more) accurate than those based on in vivo TF binding data. AVAILABILITY AND IMPLEMENTATION All scripts, motifs and data presented in this article are available online at http://research.imb.uq.edu.au/t.bailey/supplementary_data/McLeay2011a.
Collapse
Affiliation(s)
- Robert C McLeay
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
| | | | | | | |
Collapse
|
21
|
Dümcke S, Seizl M, Etzold S, Pirkl N, Martin DE, Cramer P, Tresch A. One Hand Clapping: detection of condition-specific transcription factor interactions from genome-wide gene activity data. Nucleic Acids Res 2012; 40:8883-92. [PMID: 22844089 PMCID: PMC3467085 DOI: 10.1093/nar/gks695] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
We present One Hand Clapping (OHC), a method for the detection of condition-specific interactions between transcription factors (TFs) from genome-wide gene activity measurements. OHC is based on a mapping between transcription factors and their target genes. Given a single case–control experiment, it uses a linear regression model to assess whether the common targets of two arbitrary TFs behave differently than expected from the genes targeted by only one of the TFs. When applied to osmotic stress data in S. cerevisiae, OHC produces consistent results across three types of expression measurements: gene expression microarray data, RNA Polymerase II ChIP-chip binding data and messenger RNA synthesis rates. Among the eight novel, condition-specific TF pairs, we validate the interaction between Gcn4p and Arr1p experimentally. We apply OHC to a large gene activity dataset in S. cerevisiae and provide a compendium of condition-specific TF interactions.
Collapse
Affiliation(s)
- Sebastian Dümcke
- Gene Center Munich and Department of Biochemistry, Ludwig-Maximilians-Universität München, Feodor-Lynen-Straße 25, 81377 Munich, Germany
| | | | | | | | | | | | | |
Collapse
|
22
|
A self-organized model for cell-differentiation based on variations of molecular decay rates. PLoS One 2012; 7:e36679. [PMID: 22693554 PMCID: PMC3365067 DOI: 10.1371/journal.pone.0036679] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 04/11/2012] [Indexed: 11/19/2022] Open
Abstract
Systemic properties of living cells are the result of molecular dynamics governed by so-called genetic regulatory networks (GRN). These networks capture all possible features of cells and are responsible for the immense levels of adaptation characteristic to living systems. At any point in time only small subsets of these networks are active. Any active subset of the GRN leads to the expression of particular sets of molecules (expression modes). The subsets of active networks change over time, leading to the observed complex dynamics of expression patterns. Understanding of these dynamics becomes increasingly important in systems biology and medicine. While the importance of transcription rates and catalytic interactions has been widely recognized in modeling genetic regulatory systems, the understanding of the role of degradation of biochemical agents (mRNA, protein) in regulatory dynamics remains limited. Recent experimental data suggests that there exists a functional relation between mRNA and protein decay rates and expression modes. In this paper we propose a model for the dynamics of successions of sequences of active subnetworks of the GRN. The model is able to reproduce key characteristics of molecular dynamics, including homeostasis, multi-stability, periodic dynamics, alternating activity, differentiability, and self-organized critical dynamics. Moreover the model allows to naturally understand the mechanism behind the relation between decay rates and expression modes. The model explains recent experimental observations that decay-rates (or turnovers) vary between differentiated tissue-classes at a general systemic level and highlights the role of intracellular decay rate control mechanisms in cell differentiation.
Collapse
|
23
|
Chiang S, Swamy KB, Hsu TW, Tsai ZTY, Lu HHS, Wang D, Tsai HK. Analysis of the association between transcription factor binding site variants and distinct accompanying regulatory motifs in yeast. Gene X 2012; 491:237-45. [DOI: 10.1016/j.gene.2011.08.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Accepted: 08/25/2011] [Indexed: 11/25/2022] Open
|
24
|
Geeven G, van Kesteren RE, Smit AB, de Gunst MCM. Identification of context-specific gene regulatory networks with GEMULA--gene expression modeling using LAsso. ACTA ACUST UNITED AC 2011; 28:214-21. [PMID: 22106333 DOI: 10.1093/bioinformatics/btr641] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
MOTIVATION Gene regulatory networks, in which edges between nodes describe interactions between transcriptional regulators and their target genes, determine the coordinated spatiotemporal expression of genes. Especially in higher organisms, context-specific combinatorial regulation by transcription factors (TFs) is believed to determine cellular states and fates. TF-target gene interactions can be studied using high-throughput techniques such as ChIP-chip or ChIP-Seq. These experiments are time and cost intensive, and further limited by, for instance, availability of high affinity TF antibodies. Hence, there is a practical need for methods that can predict TF-TF and TF-target gene interactions in silico, i.e. from gene expression and DNA sequence data alone. We propose GEMULA, a novel approach based on linear models to predict TF-gene expression associations and TF-TF interactions from experimental data. GEMULA is based on linear models, fast and considers a wide range of biologically plausible models that describe gene expression data as a function of predicted TF binding to gene promoters. RESULTS We show that models inferred with GEMULA are able to explain roughly 70% of the observed variation in gene expression in the yeast heat shock response. The functional relevance of the inferred TF-TF interactions in these models are validated by different sources of independent experimental evidence. We also have applied GEMULA to an in vitro model of neuronal outgrowth. Our findings confirm existing knowledge on gene regulatory interactions underlying neuronal outgrowth, but importantly also generate new insights into the temporal dynamics of this gene regulatory network that can now be addressed experimentally. AVAILABILITY The GEMULA R-package is available from http://www.few.vu.nl/~degunst/gemula_1.0.tar.gz.
Collapse
Affiliation(s)
- Geert Geeven
- Department of Mathematics, Faculty of Sciences, Neuroscience Campus Amsterdam, VU University, De Boelelaan 1085, 1081 HV Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
25
|
Moyle-Heyrman G, Tims HS, Widom J. Structural constraints in collaborative competition of transcription factors against the nucleosome. J Mol Biol 2011; 412:634-46. [PMID: 21821044 DOI: 10.1016/j.jmb.2011.07.032] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2011] [Revised: 07/03/2011] [Accepted: 07/16/2011] [Indexed: 01/13/2023]
Abstract
Cooperativity in transcription factor (TF) binding is essential in eukaryotic gene regulation and arises through diverse mechanisms. Here, we focus on one mechanism, collaborative competition, which is of interest because it arises both automatically (with no requirement for TF coevolution) and spontaneously (with no requirement for ATP-dependent nucleosome remodeling factors). Previous experimental studies of collaborative competition analyzed cases in which target sites for pairs of cooperating TFs were contained within the same side of the nucleosome. Here, we utilize new assays to measure cooperativity in protein binding to pairs of nucleosomal DNA target sites. We focus on the cases that are of greatest in vivo relevance, in which one binding site is located close to the end of a nucleosome and the other binding site is located at diverse positions throughout the nucleosome. Our results reveal energetically significant positive (favorable) cooperativity for pairs of sites on the same side of the nucleosome but, for the cases examined, energetically insignificant cooperativity between sites on opposite sides of the nucleosome. These findings imply a special significance for TF binding sites that are spaced within one-half nucleosome length (74 bp) or less along the genome and may prove useful for prediction of cooperatively acting TFs genome wide.
Collapse
|
26
|
Shiraishi Y, Okada-Hatakeyama M, Miyano S. A rank-based statistical test for measuring synergistic effects between two gene sets. ACTA ACUST UNITED AC 2011; 27:2399-405. [PMID: 21700673 DOI: 10.1093/bioinformatics/btr382] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Due to recent advances in high-throughput technologies, data on various types of genomic annotation have accumulated. These data will be crucially helpful for elucidating the combinatorial logic of transcription. Although several approaches have been proposed for inferring cooperativity among multiple factors, most approaches are haunted by the issues of normalization and threshold values. RESULTS In this article, we propose a rank-based non-parametric statistical test for measuring the effects between two gene sets. This method is free from the issues of normalization and threshold value determination for gene expression values. Furthermore, we have proposed an efficient Markov chain Monte Carlo method for calculating an approximate significance value of synergy. We have applied this approach for detecting synergistic combinations of transcription factor binding motifs and histone modifications. AVAILABILITY C implementation of the method is available from http://www.hgc.jp/~yshira/software/rankSynergy.zip. CONTACT yshira@hgc.jp SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuichi Shiraishi
- Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan.
| | | | | |
Collapse
|
27
|
Irie T, Park SJ, Yamashita R, Seki M, Yada T, Sugano S, Nakai K, Suzuki Y. Predicting promoter activities of primary human DNA sequences. Nucleic Acids Res 2011; 39:e75. [PMID: 21486745 PMCID: PMC3113590 DOI: 10.1093/nar/gkr173] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
We developed a computer program that can predict the intrinsic promoter activities of primary human DNA sequences. We observed promoter activity using a quantitative luciferase assay and generated a prediction model using multiple linear regression. Our program achieved a prediction accuracy correlation coefficient of 0.87 between the predicted and observed promoter activities. We evaluated the prediction accuracy of the program using massive sequencing analysis of transcriptional start sites in vivo. We found that it is still difficult to predict transcript levels in a strictly quantitative manner in vivo; however, it was possible to select active promoters in a given cell from the other silent promoters. Using this program, we analyzed the transcriptional landscape of the entire human genome. We demonstrate that many human genomic regions have potential promoter activity, and the expression of some previously uncharacterized putatively non-protein-coding transcripts can be explained by our prediction model. Furthermore, we found that nucleosomes occasionally formed open chromatin structures with RNA polymerase II recruitment where the program predicted significant promoter activities, although no transcripts were observed.
Collapse
Affiliation(s)
- Takuma Irie
- Department of Medical Genome Sciences, Graduate School of Frontier Sciences, the University of Tokyo, 5-1-5 Kashiwanoha, Kashiwashi, Chiba 277-8562, Japan
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Bickel PJ, Boley N, Brown JB, Huang H, Zhang NR. Subsampling methods for genomic inference. Ann Appl Stat 2010. [DOI: 10.1214/10-aoas363] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
29
|
Cromer D, Stark J, Christophides G. Hidden variable analysis of transcription factor cooperativity from microarray time courses. IET Syst Biol 2010; 4:131-44. [DOI: 10.1049/iet-syb.2009.0012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
30
|
ModuleMaster: A new tool to decipher transcriptional regulatory networks. Biosystems 2010; 99:79-81. [DOI: 10.1016/j.biosystems.2009.09.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Revised: 08/19/2009] [Accepted: 09/30/2009] [Indexed: 01/30/2023]
|
31
|
ChIP-Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells. Proc Natl Acad Sci U S A 2009; 106:21521-6. [PMID: 19995984 DOI: 10.1073/pnas.0904863106] [Citation(s) in RCA: 246] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Next-generation sequencing has greatly increased the scope and the resolution of transcriptional regulation study. RNA sequencing (RNA-Seq) and ChIP-Seq experiments are now generating comprehensive data on transcript abundance and on regulator-DNA interactions. We propose an approach for an integrated analysis of these data based on feature extraction of ChIP-Seq signals, principal component analysis, and regression-based component selection. Compared with traditional methods, our approach not only offers higher power in predicting gene expression from ChIP-Seq data but also provides a way to capture cooperation among regulators. In mouse embryonic stem cells (ESCs), we find that a remarkably high proportion of variation in gene expression (65%) can be explained by the binding signals of 12 transcription factors (TFs). Two groups of TFs are identified. Whereas the first group (E2f1, Myc, Mycn, and Zfx) act as activators in general, the second group (Oct4, Nanog, Sox2, Smad1, Stat3, Tcfcp2l1, and Esrrb) may serve as either activator or repressor depending on the target. The two groups of TFs cooperate tightly to activate genes that are differentially up-regulated in ESCs. In the absence of binding by the first group, the binding of the second group is associated with genes that are repressed in ESCs and derepressed upon early differentiation.
Collapse
|
32
|
Chuang CL, Hung K, Chen CM, Shieh GS. Uncovering transcriptional interactions via an adaptive fuzzy logic approach. BMC Bioinformatics 2009; 10:400. [PMID: 19961622 PMCID: PMC2797023 DOI: 10.1186/1471-2105-10-400] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2009] [Accepted: 12/06/2009] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND To date, only a limited number of transcriptional regulatory interactions have been uncovered. In a pilot study integrating sequence data with microarray data, a position weight matrix (PWM) performed poorly in inferring transcriptional interactions (TIs), which represent physical interactions between transcription factors (TF) and upstream sequences of target genes. Inferring a TI means that the promoter sequence of a target is inferred to match the consensus sequence motifs of a potential TF, and their interaction type such as AT or RT is also predicted. Thus, a robust PWM (rPWM) was developed to search for consensus sequence motifs. In addition to rPWM, one feature extracted from ChIP-chip data was incorporated to identify potential TIs under specific conditions. An interaction type classifier was assembled to predict activation/repression of potential TIs using microarray data. This approach, combining an adaptive (learning) fuzzy inference system and an interaction type classifier to predict transcriptional regulatory networks, was named AdaFuzzy. RESULTS AdaFuzzy was applied to predict TIs using real genomics data from Saccharomyces cerevisiae. Following one of the latest advances in predicting TIs, constrained probabilistic sparse matrix factorization (cPSMF), and using 19 transcription factors (TFs), we compared AdaFuzzy to four well-known approaches using over-representation analysis and gene set enrichment analysis. AdaFuzzy outperformed these four algorithms. Furthermore, AdaFuzzy was shown to perform comparably to 'ChIP-experimental method' in inferring TIs identified by two sets of large scale ChIP-chip data, respectively. AdaFuzzy was also able to classify all predicted TIs into one or more of the four promoter architectures. The results coincided with known promoter architectures in yeast and provided insights into transcriptional regulatory mechanisms. CONCLUSION AdaFuzzy successfully integrates multiple types of data (sequence, ChIP, and microarray) to predict transcriptional regulatory networks. The validated success in the prediction results implies that AdaFuzzy can be applied to uncover TIs in yeast.
Collapse
Affiliation(s)
- Cheng-Long Chuang
- Institute of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Kenneth Hung
- Institute of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | - Chung-Ming Chen
- Institute of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
| | - Grace S Shieh
- Institute of Biomedical Engineering, National Taiwan University, Taipei, Taiwan
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
33
|
Rhee JK, Joung JG, Chang JH, Fei Z, Zhang BT. Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis. BMC Genomics 2009; 10 Suppl 3:S29. [PMID: 19958493 PMCID: PMC2788382 DOI: 10.1186/1471-2164-10-s3-s29] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Gene regulation is a key mechanism in higher eukaryotic cellular processes. One of the major challenges in gene regulation studies is to identify regulators affecting the expression of their target genes in specific biological processes. Despite their importance, regulators involved in diverse biological processes still remain largely unrevealed. In the present study, we propose a kernel-based approach to efficiently identify core regulatory elements involved in specific biological processes using gene expression profiles. RESULTS We developed a framework that can detect correlations between gene expression profiles and the upstream sequences on the basis of the kernel canonical correlation analysis (kernel CCA). Using a yeast cell cycle dataset, we demonstrated that upstream sequence patterns were closely related to gene expression profiles based on the canonical correlation scores obtained by measuring the correlation between them. Our results showed that the cell cycle-specific regulatory motifs could be found successfully based on the motif weights derived through kernel CCA. Furthermore, we identified co-regulatory motif pairs using the same framework. CONCLUSION Given expression profiles, our method was able to identify regulatory motifs involved in specific biological processes. The method could be applied to the elucidation of the unknown regulatory mechanisms associated with complex gene regulatory processes.
Collapse
Affiliation(s)
- Je-Keun Rhee
- Graduate Program in Bioinformatics, Seoul National University, Seoul 151-744, Korea
- Center for Biointelligence Technology (CBIT), Seoul National University, Seoul 151-744, Korea
| | - Je-Gun Joung
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY 14853, USA
| | | | - Zhangjun Fei
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY 14853, USA
- USDA Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
| | - Byoung-Tak Zhang
- Graduate Program in Bioinformatics, Seoul National University, Seoul 151-744, Korea
- Center for Biointelligence Technology (CBIT), Seoul National University, Seoul 151-744, Korea
- School of Computer Science and Engineering, Seoul National University, Seoul 151-744, Korea
| |
Collapse
|
34
|
Cheng H, Jiang L, Wu M, Liu Q. Inferring Transcriptional Interactions by the Optimal Integration of ChIP-chip and Knock-out Data. Bioinform Biol Insights 2009; 3:129-40. [PMID: 20140075 PMCID: PMC2808186 DOI: 10.4137/bbi.s3445] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
How to combine heterogeneous data sources for reliable prediction of transcriptional regulation is a challenge. Here we present an easy but powerful method to integrate Chromatin immunoprecipitation (ChIP)-chip and knock-out data. Since these two types of data provide complementary (physical and functional) information about transcription, the method combining them is expected to achieve high detection rates and very low false positive rates. We try to seek the optimal integration of these two data using hyper-geometric distribution. We evaluate our method on yeast data and compare our predictions with YEASTRACT, high-quality ChIP-chip data, and literature. The results show that even using low-quality ChIP-chip data, our method uncovers more relations than those inferred before from high-quality data. Furthermore our method achieves a low false positive rate. We find experimental and computational evidence in literature for most transcription factor (TF)-gene relations uncovered by our method.
Collapse
Affiliation(s)
- Haoyu Cheng
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China.
| | | | | | | |
Collapse
|
35
|
Wang Y, Zhang XS, Xia Y. Predicting eukaryotic transcriptional cooperativity by Bayesian network integration of genome-wide data. Nucleic Acids Res 2009; 37:5943-58. [PMID: 19661283 PMCID: PMC2764433 DOI: 10.1093/nar/gkp625] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Transcriptional cooperativity among several transcription factors (TFs) is believed to be the main mechanism of complexity and precision in transcriptional regulatory programs. Here, we present a Bayesian network framework to reconstruct a high-confidence whole-genome map of transcriptional cooperativity in Saccharomyces cerevisiae by integrating a comprehensive list of 15 genomic features. We design a Bayesian network structure to capture the dominant correlations among features and TF cooperativity, and introduce a supervised learning framework with a well-constructed gold-standard dataset. This framework allows us to assess the predictive power of each genomic feature, validate the superior performance of our Bayesian network compared to alternative methods, and integrate genomic features for optimal TF cooperativity prediction. Data integration reveals 159 high-confidence predicted cooperative relationships among 105 TFs, most of which are subsequently validated by literature search. The existing and predicted transcriptional cooperativities can be grouped into three categories based on the combination patterns of the genomic features, providing further biological insights into the different types of TF cooperativity. Our methodology is the first supervised learning approach for predicting transcriptional cooperativity, compares favorably to alternative unsupervised methodologies, and can be applied to other genomic data integration tasks where high-quality gold-standard positive data are scarce.
Collapse
Affiliation(s)
- Yong Wang
- Bioinformatics Program, Department of Chemistry, Boston University, Boston, MA 02215, USA
| | | | | |
Collapse
|
36
|
Van Loo P, Marynen P. Computational methods for the detection of cis-regulatory modules. Brief Bioinform 2009; 10:509-24. [DOI: 10.1093/bib/bbp025] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
|
37
|
Benita Y, Kikuchi H, Smith AD, Zhang MQ, Chung DC, Xavier RJ. An integrative genomics approach identifies Hypoxia Inducible Factor-1 (HIF-1)-target genes that form the core response to hypoxia. Nucleic Acids Res 2009; 37:4587-602. [PMID: 19491311 PMCID: PMC2724271 DOI: 10.1093/nar/gkp425] [Citation(s) in RCA: 372] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The transcription factor Hypoxia-inducible factor 1 (HIF-1) plays a central role in the transcriptional response to oxygen flux. To gain insight into the molecular pathways regulated by HIF-1, it is essential to identify the downstream-target genes. We report here a strategy to identify HIF-1-target genes based on an integrative genomic approach combining computational strategies and experimental validation. To identify HIF-1-target genes microarrays data sets were used to rank genes based on their differential response to hypoxia. The proximal promoters of these genes were then analyzed for the presence of conserved HIF-1-binding sites. Genes were scored and ranked based on their response to hypoxia and their HIF-binding site score. Using this strategy we recovered 41% of the previously confirmed HIF-1-target genes that responded to hypoxia in the microarrays and provide a catalogue of predicted HIF-1 targets. We present experimental validation for ANKRD37 as a novel HIF-1-target gene. Together these analyses demonstrate the potential to recover novel HIF-1-target genes and the discovery of mammalian-regulatory elements operative in the context of microarray data sets.
Collapse
Affiliation(s)
- Yair Benita
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | | | | | | | | | | |
Collapse
|
38
|
Xiao Y, Segal MR. Identification of yeast transcriptional regulation networks using multivariate random forests. PLoS Comput Biol 2009; 5:e1000414. [PMID: 19543377 PMCID: PMC2691601 DOI: 10.1371/journal.pcbi.1000414] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2008] [Accepted: 05/12/2009] [Indexed: 02/02/2023] Open
Abstract
The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression measures. An analysis of the multivariate random forest output reveals complex regulatory networks, which consist of cohesive, condition-dependent regulatory cliques. Each regulatory clique features homogeneous gene expression profiles and common motifs or synergistic motif groups. We apply our method to several yeast physiological processes: cell cycle, sporulation, and various stress conditions. Our technique displays excellent performance with regard to identifying known regulatory motifs, including high order interactions. In addition, we present evidence of the existence of an alternative MCB-binding pathway, which we confirm using data from two independent cell cycle studies and two other physioloigical processes. Finally, we have uncovered elaborate transcription regulation refinement mechanisms involving PAC and mRRPE motifs that govern essential rRNA processing. These include intriguing instances of differing motif dosages and differing combinatorial motif control that promote regulatory specificity in rRNA metabolism under differing physiological processes.
Collapse
Affiliation(s)
- Yuanyuan Xiao
- Department of Epidemiology and Biostatistics, Center for Bioinformatics and Molecular Biostatistics, University of California, San Francisco, California, USA.
| | | |
Collapse
|
39
|
Bruce AW, López-Contreras AJ, Flicek P, Down TA, Dhami P, Dillon SC, Koch CM, Langford CF, Dunham I, Andrews RM, Vetrie D. Functional diversity for REST (NRSF) is defined by in vivo binding affinity hierarchies at the DNA sequence level. Genome Res 2009; 19:994-1005. [PMID: 19401398 DOI: 10.1101/gr.089086.108] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The molecular events that contribute to, and result from, the in vivo binding of transcription factors to their cognate DNA sequence motifs in mammalian genomes are poorly understood. We demonstrate that variations within the DNA sequence motifs that bind the transcriptional repressor REST (NRSF) encode in vivo DNA binding affinity hierarchies that contribute to regulatory function during lineage-specific and developmental programs in fundamental ways. First, canonical sequence motifs for REST facilitate strong REST binding and control functional classes of REST targets that are common to all cell types, whilst atypical motifs participate in weak interactions and control those targets, which are cell- or tissue-specific. Second, variations in REST binding relate directly to variations in expression and chromatin configurations of REST's target genes. Third, REST clearance from its binding sites is also associated with variations in the RE1 motif. Finally, and most surprisingly, weak REST binding sites reside in DNA sequences that show the highest levels of constraint through evolution, thus facilitating their roles in maintaining tissue-specific functions. These relationships have never been reported in mammalian systems for any transcription factor.
Collapse
Affiliation(s)
- Alexander W Bruce
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Affiliation(s)
- Debopriya Das
- Life Sciences Division, Ernest O Lawrence Berkeley National Laboratory, Berkeley, California, United States of America.
| | | | | |
Collapse
|
41
|
Kim J, He X, Sinha S. Evolution of regulatory sequences in 12 Drosophila species. PLoS Genet 2009; 5:e1000330. [PMID: 19132088 PMCID: PMC2607023 DOI: 10.1371/journal.pgen.1000330] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2008] [Accepted: 12/05/2008] [Indexed: 01/07/2023] Open
Abstract
Characterization of the evolutionary constraints acting on cis-regulatory sequences is crucial to comparative genomics and provides key insights on the evolution of organismal diversity. We study the relationships among orthologous cis-regulatory modules (CRMs) in 12 Drosophila species, especially with respect to the evolution of transcription factor binding sites, and report statistical evidence in favor of key evolutionary hypotheses. Binding sites are found to have position-specific substitution rates. However, the selective forces at different positions of a site do not act independently, and the evidence suggests that constraints on sites are often based on their exact binding affinities. Binding site loss is seen to conform to a molecular clock hypothesis. The rate of site loss is transcription factor–specific and depends on the strength of binding and, in some cases, the presence of other binding sites in close proximity. Our analysis is based on a novel computational method for aligning orthologous CRMs on a tree, which rigorously accounts for alignment uncertainties and exploits binding site predictions through a unified probabilistic framework. Finally, we report weak purifying selection on short deletions, providing important clues about overall spatial constraints on CRMs. Our results present a complex picture of regulatory sequence evolution, with substantial plasticity that depends on a number of factors. The insights gained in this study will help us to understand the combinatorial control of gene regulation and how it evolves. They will pave the way for theoretical models that are cognizant of the important determinants of regulatory sequence evolution and will be critical in genome-wide identification of non-coding sequences under purifying or positive selection. The spatial–temporal expression pattern of a gene, which is crucial to its function, is controlled by cis-regulatory DNA sequences. Forming the basic units of regulatory sequences are transcription factor binding sites, often organized into larger modules that determine gene expression in response to combinatorial environmental signals. Understanding the conservation and change of regulatory sequences is critical to our knowledge of the unity as well as diversity of animal development and phenotypes. In this paper, we study the evolution of sequences involved in the regulation of body patterning in the Drosophila embryo. We find that mutations of nucleotides within a binding site are constrained by evolutionary forces to preserve the site's binding affinity to the cognate transcription factor. Functional binding sites are frequently destroyed during evolution and the rate of loss across evolutionary spans is roughly constant. We also find that the evolutionary fate of a site strongly depends on its context; a pair of interacting sites are more likely to survive mutational forces than isolated sites. Together, these findings provide new insights and pose new challenges to our understanding of cis-regulatory sequences and their evolution.
Collapse
Affiliation(s)
- Jaebum Kim
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Xin He
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
42
|
Kechris K, Li H. c-REDUCE: incorporating sequence conservation to detect motifs that correlate with expression. BMC Bioinformatics 2008; 9:506. [PMID: 19040743 PMCID: PMC2626603 DOI: 10.1186/1471-2105-9-506] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2008] [Accepted: 11/28/2008] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Computational methods for characterizing novel transcription factor binding sites search for sequence patterns or "motifs" that appear repeatedly in genomic regions of interest. Correlation-based motif finding strategies are used to identify motifs that correlate with expression data and do not rely on promoter sequences from a pre-determined set of genes. RESULTS In this work, we describe a method for predicting motifs that combines the correlation-based strategy with phylogenetic footprinting, where motifs are identified by evaluating orthologous sequence regions from multiple species. Our method, c-REDUCE, can account for variability at a motif position inferred from evolutionary information. c-REDUCE has been tested on ChIP-chip data for yeast transcription factors and on gene expression data in Drosophila. CONCLUSION Our results indicate that utilizing sequence conservation information in addition to correlation-based methods improves the identification of known motifs.
Collapse
Affiliation(s)
- Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Denver, 4200 East Ninth Avenue, B-119, Denver, CO 80262, USA
| | - Hao Li
- Department of Biochemistry and Biophysics, UCSF, 1700 4th Street, San Francisco, CA 94143, USA
- Center for Theoretical Biology, Peking University, Beijing 100871, PR China
| |
Collapse
|
43
|
Gertz J, Siggia ED, Cohen BA. Analysis of combinatorial cis-regulation in synthetic and genomic promoters. Nature 2008; 457:215-8. [PMID: 19029883 PMCID: PMC2677908 DOI: 10.1038/nature07521] [Citation(s) in RCA: 236] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2008] [Accepted: 10/01/2008] [Indexed: 11/09/2022]
Abstract
Transcription factor binding sites (TFBS) are being discovered at a rapid pace1, 2. We must now begin to turn our attention towards understanding how these sites work in combination to influence gene expression. Quantitative models that accurately predict gene expression from promoter sequence3-5 will be a crucial part of solving this problem. Here we present such a model based on the analysis of synthetic promoter libraries in yeast. Thermodynamic models based only on the equilibrium binding of transcription factors to DNA and to each other captured a large fraction of the variation in expression in every library. Thermodynamic analysis of these libraries uncovered several phenomena in our system, including cooperativity and the effects of weak binding sites. When applied to the genome, a model of repression by Mig1, which was trained on synthetic promoters, predicts a number of Mig1 regulated genes that lack significant Mig1 binding sites in their promoters. The success of the thermodynamic approach suggests that the information encoded by combinations of cis-regulatory sites is interpreted primarily through simple protein-DNA and protein-protein interactions with complicated biochemical reactions, such as nucleosome modifications, being down stream events. Quantitative analyses of synthetic promoter libraries will be an important tool in unraveling the rules underlying combinatorial cis-regulation.
Collapse
Affiliation(s)
- Jason Gertz
- Center for Genome Sciences, Department of Genetics, Washington University in Saint Louis School of Medicine, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA
| | | | | |
Collapse
|
44
|
Niida A, Smith AD, Imoto S, Tsutsumi S, Aburatani H, Zhang MQ, Akiyama T. Integrative bioinformatics analysis of transcriptional regulatory programs in breast cancer cells. BMC Bioinformatics 2008; 9:404. [PMID: 18823535 PMCID: PMC2572072 DOI: 10.1186/1471-2105-9-404] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2008] [Accepted: 09/29/2008] [Indexed: 02/04/2023] Open
Abstract
Background Microarray technology has unveiled transcriptomic differences among tumors of various phenotypes, and, especially, brought great progress in molecular understanding of phenotypic diversity of breast tumors. However, compared with the massive knowledge about the transcriptome, we have surprisingly little knowledge about regulatory mechanisms underling transcriptomic diversity. Results To gain insights into the transcriptional programs that drive tumor progression, we integrated regulatory sequence data and expression profiles of breast cancer into a Bayesian Network, and searched for cis-regulatory motifs statistically associated with given histological grades and prognosis. Our analysis found that motifs bound by ELK1, E2F, NRF1 and NFY are potential regulatory motifs that positively correlate with malignant progression of breast cancer. Conclusion The results suggest that these 4 motifs are principal regulatory motifs driving malignant progression of breast cancer. Our method offers a more concise description about transcriptome diversity among breast tumors with different clinical phenotypes.
Collapse
Affiliation(s)
- Atsushi Niida
- Laboratory of Molecular and Genetic Information, Institute of Molecular and Cellular Biosciences, The University of Tokyo, Bunkyo-ku, Tokyo, 110-0032, Japan.
| | | | | | | | | | | | | |
Collapse
|
45
|
Shen L, Liu J, Wang W. GBNet: deciphering regulatory rules in the co-regulated genes using a Gibbs sampler enhanced Bayesian network approach. BMC Bioinformatics 2008; 9:395. [PMID: 18811979 PMCID: PMC2571992 DOI: 10.1186/1471-2105-9-395] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2008] [Accepted: 09/24/2008] [Indexed: 12/19/2022] Open
Abstract
Background Combinatorial regulation of transcription factors (TFs) is important in determining the complex gene expression patterns particularly in higher organisms. Deciphering regulatory rules between cooperative TFs is a critical step towards understanding the mechanisms of combinatorial regulation. Results We present here a Bayesian network approach called GBNet to search for DNA motifs that may be cooperative in transcriptional regulation and the sequence constraints that these motifs may satisfy. We showed that GBNet outperformed the other available methods in the simulated and the yeast data. We also demonstrated the usefulness of GBNet on learning regulatory rules between YY1, a human TF, and its co-factors. Most of the rules learned by GBNet on YY1 and co-factors were supported by literature. In addition, a spacing constraint between YY1 and E2F was also supported by independent TF binding experiments. Conclusion We thus conclude that GBNet is a useful tool for deciphering the "grammar" of transcriptional regulation.
Collapse
Affiliation(s)
- Li Shen
- Department of Chemistry and Biochemistry, University of California, San Diego, California, USA.
| | | | | |
Collapse
|
46
|
Knijnenburg TA, Wessels LFA, Reinders MJT. Combinatorial influence of environmental parameters on transcription factor activity. ACTA ACUST UNITED AC 2008; 24:i172-81. [PMID: 18586711 PMCID: PMC2718633 DOI: 10.1093/bioinformatics/btn155] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Motivation: Cells receive a wide variety of environmental signals, which are often processed combinatorially to generate specific genetic responses. Changes in transcript levels, as observed across different environmental conditions, can, to a large extent, be attributed to changes in the activity of transcription factors (TFs). However, in unraveling these transcription regulation networks, the actual environmental signals are often not incorporated into the model, simply because they have not been measured. The unquantified heterogeneity of the environmental parameters across microarray experiments frustrates regulatory network inference. Results: We propose an inference algorithm that models the influence of environmental parameters on gene expression. The approach is based on a yeast microarray compendium of chemostat steady-state experiments. Chemostat cultivation enables the accurate control and measurement of many of the key cultivation parameters, such as nutrient concentrations, growth rate and temperature. The observed transcript levels are explained by inferring the activity of TFs in response to combinations of cultivation parameters. The interplay between activated enhancers and repressors that bind a gene promoter determine the possible up- or downregulation of the gene. The model is translated into a linear integer optimization problem. The resulting regulatory network identifies the combinatorial effects of environmental parameters on TF activity and gene expression. Availability: The Matlab code is available from the authors upon request. Contact:t.a.knijnenburg@tudelft.nl Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- T A Knijnenburg
- Information and Communication Theory Group, Department of Mediamatics, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands.
| | | | | |
Collapse
|
47
|
Zhou Q, Liu JS. Extracting sequence features to predict protein-DNA interactions: a comparative study. Nucleic Acids Res 2008; 36:4137-48. [PMID: 18556756 PMCID: PMC2475627 DOI: 10.1093/nar/gkn361] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2008] [Revised: 05/16/2008] [Accepted: 05/21/2008] [Indexed: 11/12/2022] Open
Abstract
Predicting how and where proteins, especially transcription factors (TFs), interact with DNA is an important problem in biology. We present here a systematic study of predictive modeling approaches to the TF-DNA binding problem, which have been frequently shown to be more efficient than those methods only based on position-specific weight matrices (PWMs). In these approaches, a statistical relationship between genomic sequences and gene expression or ChIP-binding intensities is inferred through a regression framework; and influential sequence features are identified by variable selection. We examine a few state-of-the-art learning methods including stepwise linear regression, multivariate adaptive regression splines, neural networks, support vector machines, boosting and Bayesian additive regression trees (BART). These methods are applied to both simulated datasets and two whole-genome ChIP-chip datasets on the TFs Oct4 and Sox2, respectively, in human embryonic stem cells. We find that, with proper learning methods, predictive modeling approaches can significantly improve the predictive power and identify more biologically interesting features, such as TF-TF interactions, than the PWM approach. In particular, BART and boosting show the best and the most robust overall performance among all the methods.
Collapse
Affiliation(s)
- Qing Zhou
- Department of Statistics, University of California, Los Angeles, CA 90095 and Department of Statistics, Harvard University, Cambridge, MA 02138, USA
| | - Jun S. Liu
- Department of Statistics, University of California, Los Angeles, CA 90095 and Department of Statistics, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
48
|
Network-based global inference of human disease genes. Mol Syst Biol 2008; 4:189. [PMID: 18463613 PMCID: PMC2424293 DOI: 10.1038/msb.2008.27] [Citation(s) in RCA: 455] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2007] [Accepted: 03/17/2008] [Indexed: 01/04/2023] Open
Abstract
Deciphering the genetic basis of human diseases is an important goal of biomedical research. On the basis of the assumption that phenotypically similar diseases are caused by functionally related genes, we propose a computational framework that integrates human protein–protein interactions, disease phenotype similarities, and known gene–phenotype associations to capture the complex relationships between phenotypes and genotypes. We develop a tool named CIPHER to predict and prioritize disease genes, and we show that the global concordance between the human protein network and the phenotype network reliably predicts disease genes. Our method is applicable to genetically uncharacterized phenotypes, effective in the genome-wide scan of disease genes, and also extendable to explore gene cooperativity in complex diseases. The predicted genetic landscape of over 1000 human phenotypes, which reveals the global modular organization of phenotype–genotype relationships. The genome-wide prioritization of candidate genes for over 5000 human phenotypes, including those with under-characterized disease loci or even those lacking known association, is publicly released to facilitate future discovery of disease genes.
Collapse
|
49
|
Hannenhalli S. Eukaryotic transcription factor binding sites--modeling and integrative search methods. Bioinformatics 2008; 24:1325-31. [PMID: 18426806 DOI: 10.1093/bioinformatics/btn198] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
A comprehensive knowledge of transcription factor binding sites (TFBS) is important for a mechanistic understanding of transcriptional regulation as well as for inferring gene regulatory networks. Because the DNA motif recognized by a transcription factor is typically short and degenerate, computational approaches for identifying binding sites based only on the sequence motif inevitably suffer from high error rates. Current state-of-the-art techniques for improving computational identification of binding sites can be broadly categorized into two classes: (1) approaches that aim to improve binding motif models by extracting maximal sequence information from experimentally determined binding sites and (2) approaches that supplement binding motif models with additional genomic or other attributes (such as evolutionary conservation). In this review we will discuss recent attempts to improve computational identification of TFBS through these two types of approaches and conclude with thoughts on future development.
Collapse
Affiliation(s)
- Sridhar Hannenhalli
- Penn Center for Bioinformatics and Department of Genetics, University of Pennsylvania, Philadelphia, USA.
| |
Collapse
|
50
|
Aguilar D, Oliva B. Topological comparison of methods for predicting transcriptional cooperativity in yeast. BMC Genomics 2008; 9:137. [PMID: 18366726 PMCID: PMC2315657 DOI: 10.1186/1471-2164-9-137] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2007] [Accepted: 03/25/2008] [Indexed: 11/10/2022] Open
Abstract
Background The cooperative interaction between transcription factors has a decisive role in the control of the fate of the eukaryotic cell. Computational approaches for characterizing cooperative transcription factors in yeast, however, are based on different rationales and provide a low overlap between their results. Because the wealth of information contained in protein interaction networks and regulatory networks has proven highly effective in elucidating functional relationships between proteins, we compared different sets of cooperative transcription factor pairs (predicted by four different computational methods) within the frame of those networks. Results Our results show that the overlap between the sets of cooperative transcription factors predicted by the different methods is low yet significant. Cooperative transcription factors predicted by all methods are closer and more clustered in the protein interaction network than expected by chance. On the other hand, members of a cooperative transcription factor pair neither seemed to regulate each other nor shared similar regulatory inputs, although they do regulate similar groups of target genes. Conclusion Despite the different definitions of transcriptional cooperativity and the different computational approaches used to characterize cooperativity between transcription factors, the analysis of their roles in the framework of the protein interaction network and the regulatory network indicates a common denominator for the predictions under study. The knowledge of the shared topological properties of cooperative transcription factor pairs in both networks can be useful not only for designing better prediction methods but also for better understanding the complexities of transcriptional control in eukaryotes.
Collapse
Affiliation(s)
- Daniel Aguilar
- Structural Bioinformatics Group (GRIB), IMIM-Universitat Pompeu Fabra, C/Doctor Aiguader, 88, Barcelona 08003, Spain.
| | | |
Collapse
|