Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: McLeay RC, Lesluyes T, Cuellar Partida G, Bailey TL. Genome-wide in silico prediction of gene expression. ACTA ACUST UNITED AC 2012;28:2789-96. [PMID: 22954627 DOI: 10.1093/bioinformatics/bts529] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

For:	McLeay RC, Lesluyes T, Cuellar Partida G, Bailey TL. Genome-wide in silico prediction of gene expression. ACTA ACUST UNITED AC 2012;28:2789-96. [PMID: 22954627 DOI: 10.1093/bioinformatics/bts529] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Number

Cited by Other Article(s)

Raun N, Jones SG, Kerr O, Keung C, Butler EF, Alka K, Krupski JD, Reid-Taylor RA, Ibrahim V, Williams M, Top D, Kramer JM. Trithorax regulates long-term memory in Drosophila through epigenetic maintenance of mushroom body metabolic state and translation capacity. PLoS Biol 2025;23:e3003004. [PMID: 39869640 PMCID: PMC11835295 DOI: 10.1371/journal.pbio.3003004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/18/2025] [Accepted: 01/06/2025] [Indexed: 01/29/2025] Open

Labani M, Beheshti A, O’Brien TA. GENet: A Graph-Based Model Leveraging Histone Marks and Transcription Factors for Enhanced Gene Expression Prediction. Genes (Basel) 2024;15:938. [PMID: 39062717 PMCID: PMC11275947 DOI: 10.3390/genes15070938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 07/28/2024] Open

Pianfetti E, Lovino M, Ficarra E, Martignetti L. MiREx: mRNA levels prediction from gene sequence and miRNA target knowledge. BMC Bioinformatics 2023;24:443. [PMID: 37993778 PMCID: PMC10666312 DOI: 10.1186/s12859-023-05560-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 11/06/2023] [Indexed: 11/24/2023] Open

Singh B, Kumar S, Elangovan A, Vasht D, Arya S, Duc NT, Swami P, Pawar GS, Raju D, Krishna H, Sathee L, Dalal M, Sahoo RN, Chinnusamy V. Phenomics based prediction of plant biomass and leaf area in wheat using machine learning approaches. FRONTIERS IN PLANT SCIENCE 2023;14:1214801. [PMID: 37448870 PMCID: PMC10337996 DOI: 10.3389/fpls.2023.1214801] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 06/07/2023] [Indexed: 07/15/2023]

Abstract

Introduction

Phenomics has emerged as important tool to bridge the genotype-phenotype gap. To dissect complex traits such as highly dynamic plant growth, and quantification of its component traits over a different growth phase of plant will immensely help dissect genetic basis of biomass production. Based on RGB images, models have been developed to predict biomass recently. However, it is very challenging to find a model performing stable across experiments. In this study, we recorded RGB and NIR images of wheat germplasm and Recombinant Inbred Lines (RILs) of Raj3765xHD2329, and examined the use of multimodal images from RGB, NIR sensors and machine learning models to predict biomass and leaf area non-invasively.

Results

The image-based traits (i-Traits) containing geometric features, RGB based indices, RGB colour classes and NIR features were categorized into architectural traits and physiological traits. Total 77 i-Traits were selected for prediction of biomass and leaf area consisting of 35 architectural and 42 physiological traits. We have shown that different biomass related traits such as fresh weight, dry weight and shoot area can be predicted accurately from RGB and NIR images using 16 machine learning models. We applied the models on two consecutive years of experiments and found that measurement accuracies were similar suggesting the generalized nature of models. Results showed that all biomass-related traits could be estimated with about 90% accuracy but the performance of model BLASSO was relatively stable and high in all the traits and experiments. The R² of BLASSO for fresh weight prediction was 0.96 (both year experiments), for dry weight prediction was 0.90 (Experiment 1) and 0.93 (Experiment 2) and for shoot area prediction 0.96 (Experiment 1) and 0.93 (Experiment 2). Also, the RMSRE of BLASSO for fresh weight prediction was 0.53 (Experiment 1) and 0.24 (Experiment 2), for dry weight prediction was 0.85 (Experiment 1) and 0.25 (Experiment 2) and for shoot area prediction 0.59 (Experiment 1) and 0.53 (Experiment 2).

Discussion

Based on the quantification power analysis of i-Traits, the determinants of biomass accumulation were found which contains both architectural and physiological traits. The best predictor i-Trait for fresh weight and dry weight prediction was Area_SV and for shoot area prediction was projected shoot area. These results will be helpful for identification and genetic basis dissection of major determinants of biomass accumulation and also non-invasive high throughput estimation of plant growth during different phenological stages can identify hitherto uncovered genes for biomass production and its deployment in crop improvement for breaking the yield plateau.

Collapse

Affiliation(s)

Biswabiplab Singh Division of Plant Physiology and Nanaji Deshmukh Plant Phenomics Centre (NDPPC), Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi, India
Sudhir Kumar Division of Plant Physiology and Nanaji Deshmukh Plant Phenomics Centre (NDPPC), Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi, India
Allimuthu Elangovan Division of Plant Physiology and Nanaji Deshmukh Plant Phenomics Centre (NDPPC), Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi, India
Devendra Vasht Division of Plant Physiology and Nanaji Deshmukh Plant Phenomics Centre (NDPPC), Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi, India
Sunny Arya Division of Plant Physiology and Nanaji Deshmukh Plant Phenomics Centre (NDPPC), Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi, India
Nguyen Trung Duc Division of Plant Physiology and Nanaji Deshmukh Plant Phenomics Centre (NDPPC), Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi, India Vietnam National University of Agriculture, Hanoi, Vietnam
Pooja Swami Division of Plant Physiology and Nanaji Deshmukh Plant Phenomics Centre (NDPPC), Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi, India
Godawari Shivaji Pawar Division of Agricultural Botany, Vasantrao Naik Marathwada Krishi Vidyapeeth, Parbhani, India
Dhandapani Raju Division of Plant Physiology and Nanaji Deshmukh Plant Phenomics Centre (NDPPC), Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi, India
Hari Krishna Division of Genetics, ICAR-Indian Agricultural Research Institute, New Delhi, India
Lekshmy Sathee Division of Plant Physiology and Nanaji Deshmukh Plant Phenomics Centre (NDPPC), Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi, India
Monika Dalal ICAR-National Institute for Plant Biotechnology, New Delhi, India
Rabi Narayan Sahoo Division of Agricultural Physics, ICAR-Indian Agricultural Research Institute, New Delhi, India
Viswanathan Chinnusamy Division of Plant Physiology and Nanaji Deshmukh Plant Phenomics Centre (NDPPC), Indian Council of Agricultural Research (ICAR)-Indian Agricultural Research Institute, New Delhi, India

Collapse

Hecker D, Behjati Ardakani F, Karollus A, Gagneur J, Schulz MH. The adapted Activity-By-Contact model for enhancer-gene assignment and its application to single-cell data. Bioinformatics 2023;39:btad062. [PMID: 36708003 PMCID: PMC9931646 DOI: 10.1093/bioinformatics/btad062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 12/05/2022] [Accepted: 01/26/2023] [Indexed: 01/29/2023] Open

Abstract

MOTIVATION

Identifying regulatory regions in the genome is of great interest for understanding the epigenomic landscape in cells. One fundamental challenge in this context is to find the target genes whose expression is affected by the regulatory regions. A recent successful method is the Activity-By-Contact (ABC) model which scores enhancer-gene interactions based on enhancer activity and the contact frequency of an enhancer to its target gene. However, it describes regulatory interactions entirely from a gene's perspective, and does not account for all the candidate target genes of an enhancer. In addition, the ABC model requires two types of assays to measure enhancer activity, which limits the applicability. Moreover, there is neither implementation available that could allow for an integration with transcription factor (TF) binding information nor an efficient analysis of single-cell data.

RESULTS

We demonstrate that the ABC score can yield a higher accuracy by adapting the enhancer activity according to the number of contacts the enhancer has to its candidate target genes and also by considering all annotated transcription start sites of a gene. Further, we show that the model is comparably accurate with only one assay to measure enhancer activity. We combined our generalized ABC model with TF binding information and illustrated an analysis of a single-cell ATAC-seq dataset of the human heart, where we were able to characterize cell type-specific regulatory interactions and predict gene expression based on TF affinities. All executed processing steps are incorporated into our new computational pipeline STARE.

AVAILABILITY AND IMPLEMENTATION

The software is available at https://github.com/schulzlab/STARE.

CONTACT

marcel.schulz@em.uni-frankfurt.de.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Kang Y, Jung WJ, Brent MR. Predicting which genes will respond to transcription factor perturbations. G3 (BETHESDA, MD.) 2022;12:jkac144. [PMID: 35666184 PMCID: PMC9339286 DOI: 10.1093/g3journal/jkac144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 05/25/2022] [Indexed: 11/13/2022]

Abstract

The ability to predict which genes will respond to the perturbation of a transcription factor serves as a benchmark for our systems-level understanding of transcriptional regulatory networks. In previous work, machine learning models have been trained to predict static gene expression levels in a biological sample by using data from the same or similar samples, including data on their transcription factor binding locations, histone marks, or DNA sequence. We report on a different challenge-training machine learning models to predict which genes will respond to the perturbation of a transcription factor without using any data from the perturbed cells. We find that existing transcription factor location data (ChIP-seq) from human cells have very little detectable utility for predicting which genes will respond to perturbation of a transcription factor. Features of genes, including their preperturbation expression level and expression variation, are very useful for predicting responses to perturbation of any transcription factor. This shows that some genes are poised to respond to transcription factor perturbations and others are resistant, shedding light on why it has been so difficult to predict responses from binding locations. Certain histone marks, including H3K4me1 and H3K4me3, have some predictive power when located downstream of the transcription start site. However, the predictive power of histone marks is much less than that of gene expression level and expression variation. Sequence-based or epigenetic properties of genes strongly influence their tendency to respond to direct transcription factor perturbations, partially explaining the oft-noted difficulty of predicting responsiveness from transcription factor binding location data. These molecular features are largely reflected in and summarized by the gene's expression level and expression variation. Code is available at https://github.com/BrentLab/TFPertRespExplainer.

Collapse

Schmidt F, Marx A, Baumgarten N, Hebel M, Wegner M, Kaulich M, Leisegang M, Brandes R, Göke J, Vreeken J, Schulz M. Integrative analysis of epigenetics data identifies gene-specific regulatory elements. Nucleic Acids Res 2021;49:10397-10418. [PMID: 34508352 PMCID: PMC8501997 DOI: 10.1093/nar/gkab798] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Revised: 08/01/2021] [Accepted: 09/07/2021] [Indexed: 12/19/2022] Open

Affiliation(s)

Florian Schmidt Cluster of Excellence for Multimodal Computing and Interaction, Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany Graduate School of Computer Science, Saarland Informatics Campus, 66123 Saarbrücken, Germany Laboratory of Systems Biology and Data Analytics, Genome Institute of Singapore, 60 Biopolis Street, 138672 Singapore, Singapore
Alexander Marx Cluster of Excellence for Multimodal Computing and Interaction, Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany Graduate School of Computer Science, Saarland Informatics Campus, 66123 Saarbrücken, Germany International Max Planck Research School for Computer Science, Saarland Informatics Campus, 66123 Saarbrücken, Germany
Nina Baumgarten Institute for Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany German Center for Cardiovascular Research (DZHK), Partner site RheinMain, 60590 Frankfurt am Main, Germany
Marie Hebel Institute of Biochemistry II, Goethe University Frankfurt - Medical Faculty, University Hospital, 60590 Frankfurt am Main, Germany
Martin Wegner Institute of Biochemistry II, Goethe University Frankfurt - Medical Faculty, University Hospital, 60590 Frankfurt am Main, Germany
Manuel Kaulich Institute of Biochemistry II, Goethe University Frankfurt - Medical Faculty, University Hospital, 60590 Frankfurt am Main, Germany Frankfurt Cancer Institute, Goethe University, 60590 Frankfurt am Main, Germany
Matthias S Leisegang German Center for Cardiovascular Research (DZHK), Partner site RheinMain, 60590 Frankfurt am Main, Germany Institute for Cardiovascular Physiology, Goethe University, 60590 Frankfurt am Main, Germany
Ralf P Brandes German Center for Cardiovascular Research (DZHK), Partner site RheinMain, 60590 Frankfurt am Main, Germany Institute for Cardiovascular Physiology, Goethe University, 60590 Frankfurt am Main, Germany
Jonathan Göke Laboratory of Computational Transcriptomics, Genome Institute of Singapore, 60 Biopolis Street, 138672 Singapore, Singapore
Jilles Vreeken CISPA Helmholtz Center for Information Security, Saarland Informatics Campus, 66123 Saarbrücken, Germany Cluster of Excellence for Multimodal Computing and Interaction, Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany
Marcel H Schulz Cluster of Excellence for Multimodal Computing and Interaction, Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany Max Planck Institute for Informatics, Saarland Informatics Campus, 66123 Saarbrücken, Germany Institute for Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany German Center for Cardiovascular Research (DZHK), Partner site RheinMain, 60590 Frankfurt am Main, Germany

Collapse

Agarwal V, Shendure J. Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks. Cell Rep 2021;31:107663. [PMID: 32433972 DOI: 10.1016/j.celrep.2020.107663] [Citation(s) in RCA: 120] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 06/11/2019] [Accepted: 04/28/2020] [Indexed: 01/06/2023] Open

Wang T, Guo Y, Liu S, Zhang C, Cui T, Ding K, Wang P, Wang X, Wang Z. KLF4, a Key Regulator of a Transitive Triplet, Acts on the TGF-β Signaling Pathway and Contributes to High-Altitude Adaptation of Tibetan Pigs. Front Genet 2021;12:628192. [PMID: 33936161 PMCID: PMC8082500 DOI: 10.3389/fgene.2021.628192] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 03/10/2021] [Indexed: 11/13/2022] Open

Scherer M, Schmidt F, Lazareva O, Walter J, Baumbach J, Schulz MH, List M. Machine learning for deciphering cell heterogeneity and gene regulation. NATURE COMPUTATIONAL SCIENCE 2021;1:183-191. [PMID: 38183187 DOI: 10.1038/s43588-021-00038-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 02/08/2021] [Indexed: 12/14/2022]

Aflakparast M, Geeven G, de Gunst MCM. Bayesian mixture regression analysis for regulation of Pluripotency in ES cells. BMC Bioinformatics 2020;21:3. [PMID: 31898480 PMCID: PMC6941360 DOI: 10.1186/s12859-019-3331-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 12/17/2019] [Indexed: 11/10/2022] Open

Schmidt F, Schulz MH. On the problem of confounders in modeling gene expression. Bioinformatics 2019;35:711-719. [PMID: 30084962 PMCID: PMC6530814 DOI: 10.1093/bioinformatics/bty674] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 06/21/2018] [Accepted: 08/02/2018] [Indexed: 01/01/2023] Open

The spatial binding model of the pioneer factor Oct4 with its target genes during cell reprogramming. Comput Struct Biotechnol J 2019;17:1226-1233. [PMID: 31921389 PMCID: PMC6944736 DOI: 10.1016/j.csbj.2019.09.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 09/05/2019] [Accepted: 09/07/2019] [Indexed: 12/18/2022] Open

Read DF, Cook K, Lu YY, Le Roch KG, Noble WS. Predicting gene expression in the human malaria parasite Plasmodium falciparum using histone modification, nucleosome positioning, and 3D localization features. PLoS Comput Biol 2019;15:e1007329. [PMID: 31509524 PMCID: PMC6756558 DOI: 10.1371/journal.pcbi.1007329] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 09/23/2019] [Accepted: 08/12/2019] [Indexed: 12/02/2022] Open

Feng ZX, Li QZ, Meng JJ. Modeling the relationship of diverse genomic signatures to gene expression levels with the regulation of long-range enhancer-promoter interactions. BIOPHYSICS REPORTS 2019. [DOI: 10.1007/s41048-019-0089-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Zhao Y, Schaafsma E, Cheng C. Applications of ENCODE data to Systematic Analyses via Data Integration. ACTA ACUST UNITED AC 2019;11:57-64. [PMID: 31011690 DOI: 10.1016/j.coisb.2018.08.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Lu R, Rogan PK. Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations. F1000Res 2018;7:1933. [PMID: 31001412 PMCID: PMC6464064 DOI: 10.12688/f1000research.17363.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/05/2018] [Indexed: 10/12/2023] Open

Abstract

Background: The distribution and composition of cis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets. Methods: Genes with correlated expression patterns across 53 tissues and TF targets were respectively identified from Bray-Curtis Similarity and TF knockdown experiments. Corresponding promoter sequences were reduced to DNase I-accessible intervals; TFBSs were then identified within these intervals using information theory-based position weight matrices for each TF (iPWMs) and clustered. Features from information-dense TFBS clusters predicted these genes with machine learning classifiers, which were evaluated for accuracy, specificity and sensitivity. Mutations in TFBSs were analyzed to in silico examine their impact on cluster densities and the regulatory states of target genes. Results: We initially chose the glucocorticoid receptor gene ( NR3C1), whose regulation has been extensively studied, to test this approach. SLC25A32 and TANK were found to exhibit the most similar expression patterns to NR3C1. A Decision Tree classifier exhibited the largest area under the Receiver Operating Characteristic (ROC) curve in detecting such genes. Target gene prediction was confirmed using siRNA knockdown of TFs, which was found to be more accurate than those predicted after CRISPR/CAS9 inactivation. In-silico mutation analyses of TFBSs also revealed that one or more information-dense TFBS clusters in promoters are required for accurate target gene prediction. Conclusions: Machine learning based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes.

Collapse

Lu R, Rogan PK. Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations. F1000Res 2018;7:1933. [PMID: 31001412 PMCID: PMC6464064 DOI: 10.12688/f1000research.17363.2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/28/2019] [Indexed: 12/20/2022] Open

Abstract

Background: The distribution and composition of cis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets using Machine Learning (ML). Methods: Bray-Curtis Similarity was used to identify genes with correlated expression patterns across 53 tissues. TF targets from knockdown experiments were also analyzed by this approach to set up the ML framework. TFBSs were selected within DNase I-accessible intervals of corresponding promoter sequences using information theory-based position weight matrices (iPWMs) for each TF. Features from information-dense clusters of TFBSs were input to ML classifiers which predict these gene targets along with their accuracy, specificity and sensitivity. Mutations in TFBSs were analyzed in silico to examine their impact on TFBS clustering and predict changes in gene regulation. Results: The glucocorticoid receptor gene ( NR3C1), whose regulation has been extensively studied, was selected to test this approach. SLC25A32 and TANK exhibited the most similar expression patterns to NR3C1. A Decision Tree classifier exhibited the best performance in detecting such genes, based on Area Under the Receiver Operating Characteristic curve (ROC). TF target gene prediction was confirmed using siRNA knockdown, which was more accurate than CRISPR/CAS9 inactivation. TFBS mutation analyses revealed that accurate target gene prediction required at least 1 information-dense TFBS cluster. Conclusions: ML based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes.

Collapse

Ng FSL, Ruau D, Wernisch L, Göttgens B. A graphical model approach visualizes regulatory relationships between genome-wide transcription factor binding profiles. Brief Bioinform 2018;19:162-173. [PMID: 27780826 PMCID: PMC5496675 DOI: 10.1093/bib/bbw102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Indexed: 11/16/2022] Open

Zhang LQ, Li QZ. Estimating the effects of transcription factors binding and histone modifications on gene expression levels in human cells. Oncotarget 2018;8:40090-40103. [PMID: 28454114 PMCID: PMC5522221 DOI: 10.18632/oncotarget.16988] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 03/11/2017] [Indexed: 12/22/2022] Open

Jiang S, Mortazavi A. Integrating ChIP-seq with other functional genomics data. Brief Funct Genomics 2018;17:104-115. [PMID: 29579165 PMCID: PMC5888983 DOI: 10.1093/bfgp/ely002] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open

Kehl T, Schneider L, Schmidt F, Stöckel D, Gerstner N, Backes C, Meese E, Keller A, Schulz MH, Lenhof HP. RegulatorTrail: a web service for the identification of key transcriptional regulators. Nucleic Acids Res 2017;45:W146-W153. [PMID: 28472408 PMCID: PMC5570139 DOI: 10.1093/nar/gkx350] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 04/07/2017] [Accepted: 04/20/2017] [Indexed: 12/14/2022] Open

Integrated analysis and transcript abundance modelling of H3K4me3 and H3K27me3 in developing secondary xylem. Sci Rep 2017;7:3370. [PMID: 28611454 PMCID: PMC5469831 DOI: 10.1038/s41598-017-03665-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Accepted: 05/02/2017] [Indexed: 01/10/2023] Open

Song L, Huang SSC, Wise A, Castanon R, Nery JR, Chen H, Watanabe M, Thomas J, Bar-Joseph Z, Ecker JR. A transcription factor hierarchy defines an environmental stress response network. Science 2017;354:354/6312/aag1550. [PMID: 27811239 PMCID: PMC5217750 DOI: 10.1126/science.aag1550] [Citation(s) in RCA: 343] [Impact Index Per Article: 42.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Accepted: 09/28/2016] [Indexed: 12/17/2022]

Schmidt F, Gasparoni N, Gasparoni G, Gianmoena K, Cadenas C, Polansky JK, Ebert P, Nordström K, Barann M, Sinha A, Fröhler S, Xiong J, Dehghani Amirabad A, Behjati Ardakani F, Hutter B, Zipprich G, Felder B, Eils J, Brors B, Chen W, Hengstler JG, Hamann A, Lengauer T, Rosenstiel P, Walter J, Schulz MH. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction. Nucleic Acids Res 2017;45:54-66. [PMID: 27899623 PMCID: PMC5224477 DOI: 10.1093/nar/gkw1061] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 10/18/2016] [Accepted: 10/24/2016] [Indexed: 12/21/2022] Open

Affiliation(s)

Florian Schmidt Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Nina Gasparoni Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
Gilles Gasparoni Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
Kathrin Gianmoena Leibniz Research Centre for Working Environment and Human Factors IfADo, Dortmund, 44139, Germany
Cristina Cadenas Leibniz Research Centre for Working Environment and Human Factors IfADo, Dortmund, 44139, Germany
Julia K Polansky Experimental Rheumatology, German Rheumatism Research Centre, Berlin, 10117, Germany
Peter Ebert Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Karl Nordström Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
Matthias Barann Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, 24105, Germany
Anupam Sinha Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, 24105, Germany
Sebastian Fröhler Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, 13092, Germany
Jieyi Xiong Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, 13092, Germany
Azim Dehghani Amirabad Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Fatemeh Behjati Ardakani Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Barbara Hutter Applied Bioinformatics, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Gideon Zipprich Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Bärbel Felder Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Jürgen Eils Data Management and Genomics IT, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Benedikt Brors Applied Bioinformatics, Deutsches Krebsforschungszentrum, Heidelberg, 69120, Germany
Wei Chen Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, 13092, Germany
Jan G Hengstler Leibniz Research Centre for Working Environment and Human Factors IfADo, Dortmund, 44139, Germany
Alf Hamann International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Thomas Lengauer Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany
Philip Rosenstiel Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, 24105, Germany
Jörn Walter Department of Genetics, University of Saarland, Saarbrücken, 66123, Germany
Marcel H Schulz Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, 66123, Germany Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, 66123, Germany

Collapse

Budden DM, Crampin EJ. Distributed gene expression modelling for exploring variability in epigenetic function. BMC Bioinformatics 2016;17:446. [PMID: 27816056 PMCID: PMC5097851 DOI: 10.1186/s12859-016-1313-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 10/25/2016] [Indexed: 11/10/2022] Open

Su WX, Li QZ, Zhang LQ, Fan GL, Wu CY, Yan ZH, Zuo YC. Gene expression classification using epigenetic features and DNA sequence composition in the human embryonic stem cell line H1. Gene 2016;592:227-234. [DOI: 10.1016/j.gene.2016.07.059] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Revised: 06/20/2016] [Accepted: 07/23/2016] [Indexed: 01/01/2023]

E2F1 Orchestrates Transcriptomics and Oxidative Metabolism in Wharton's Jelly-Derived Mesenchymal Stem Cells from Growth-Restricted Infants. PLoS One 2016;11:e0163035. [PMID: 27631473 PMCID: PMC5025055 DOI: 10.1371/journal.pone.0163035] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Accepted: 09/01/2016] [Indexed: 12/31/2022] Open

Narang V, Ramli MA, Singhal A, Kumar P, de Libero G, Poidinger M, Monterola C. Automated Identification of Core Regulatory Genes in Human Gene Regulatory Networks. PLoS Comput Biol 2015;11:e1004504. [PMID: 26393364 PMCID: PMC4578944 DOI: 10.1371/journal.pcbi.1004504] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 08/11/2015] [Indexed: 12/20/2022] Open

Abstract

Human gene regulatory networks (GRN) can be difficult to interpret due to a tangle of edges interconnecting thousands of genes. We constructed a general human GRN from extensive transcription factor and microRNA target data obtained from public databases. In a subnetwork of this GRN that is active during estrogen stimulation of MCF-7 breast cancer cells, we benchmarked automated algorithms for identifying core regulatory genes (transcription factors and microRNAs). Among these algorithms, we identified K-core decomposition, pagerank and betweenness centrality algorithms as the most effective for discovering core regulatory genes in the network evaluated based on previously known roles of these genes in MCF-7 biology as well as in their ability to explain the up or down expression status of up to 70% of the remaining genes. Finally, we validated the use of K-core algorithm for organizing the GRN in an easier to interpret layered hierarchy where more influential regulatory genes percolate towards the inner layers. The integrated human gene and miRNA network and software used in this study are provided as supplementary materials (S1 Data) accompanying this manuscript.

A gene regulatory network (GRN) represents how some genes encoding regulatory molecules such as transcription factors or microRNAs regulate the expression of other genes. Researchers commonly study GRNs involved in a specific biological process with the aim of identifying a few important regulatory genes. In higher organisms such as humans, a regulatory gene regulates multiple target genes and correspondingly any gene is regulated by multiple regulatory genes. Due to such multiplicity of interactions, a GRN usually resembles a tangled hairball wherein it is difficult to identify few most influential regulatory genes. In this study, we show that network analysis algorithms such as K-core, pagerank and betweenness centrality are useful for identifying a few important or core regulatory genes in a GRN, and the K-core algorithm is also useful for organizing regulatory genes in a hierarchical layered structure where the most influential genes in a GRN are found within the innermost layer or core. These few core regulatory genes determine to a large extent the expression status of the remaining genes in the network. We illustrate a pragmatic application of this technique to GRNs reconstructed from genome-wide gene expression measurements in the MCF-7 human breast cancer cell line.

Collapse

Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast. PLoS Comput Biol 2015;11:e1004418. [PMID: 26291518 PMCID: PMC4546298 DOI: 10.1371/journal.pcbi.1004418] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 06/29/2015] [Indexed: 11/19/2022] Open

Abstract

Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA “intrinsic properties” (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

Identification of transcription factor binding sites based on sequence motifs is typically accompanied by a high false positive rate. Increasing evidence suggests that there are many other factors besides DNA sequence that may affect the binding and interaction of TFs with DNA. Through the integration of sequence motif, chromatin state, and DNA structure properties, we show that TF binding can be better predicted. Moreover, considering chromatin state and DNA structure properties simultaneously yields a significant improvement. While the binding of some TFs can be readily predicted using either chromatin state information or DNA structure, other TFs need both. Thus, our findings provide insights on how different histone modifications and DNA structure properties may influence the binding of a particular TF and thus how TFs regulate gene expression. These features are referred to as sequence “intrinsic properties” because they can be predicted from sequences alone. These intrinsic properties can be used to build a TF binding prediction model that has a similar performance to considering all features. Moreover, the intrinsic property model allows TFBS predictions not only across TFs, but also across DNA-binding domain families that are present in most eukaryotes, suggesting that the model likely can be used across species.

Collapse

Budden DM, Hurley DG, Crampin EJ. Modelling the conditional regulatory activity of methylated and bivalent promoters. Epigenetics Chromatin 2015;8:21. [PMID: 26097508 PMCID: PMC4474576 DOI: 10.1186/s13072-015-0013-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 06/10/2015] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Predictive modelling of gene expression is a powerful framework for the in silico exploration of transcriptional regulatory interactions through the integration of high-throughput -omics data. A major limitation of previous approaches is their inability to handle conditional interactions that emerge when genes are subject to different regulatory mechanisms. Although chromatin immunoprecipitation-based histone modification data are often used as proxies for chromatin accessibility, the association between these variables and expression often depends upon the presence of other epigenetic markers (e.g. DNA methylation or histone variants). These conditional interactions are poorly handled by previous predictive models and reduce the reliability of downstream biological inference.

RESULTS

We have previously demonstrated that integrating both transcription factor and histone modification data within a single predictive model is rendered ineffective by their statistical redundancy. In this study, we evaluate four proposed methods for quantifying gene-level DNA methylation levels and demonstrate that inclusion of these data in predictive modelling frameworks is also subject to this critical limitation in data integration. Based on the hypothesis that statistical redundancy in epigenetic data is caused by conditional regulatory interactions within a dynamic chromatin context, we construct a new gene expression model which is the first to improve prediction accuracy by unsupervised identification of latent regulatory classes. We show that DNA methylation and H2A.Z histone variant data can be interpreted in this way to identify and explore the signatures of silenced and bivalent promoters, substantially improving genome-wide predictions of mRNA transcript abundance and downstream biological inference across multiple cell lines.

CONCLUSIONS

Previous models of gene expression have been applied successfully to several important problems in molecular biology, including the discovery of transcription factor roles, identification of regulatory elements responsible for differential expression patterns and comparative analysis of the transcriptome across distant species. Our analysis supports our hypothesis that statistical redundancy in epigenetic data is partially due to conditional relationships between these regulators and gene expression levels. This analysis provides insight into the heterogeneous roles of H3K4me3 and H3K27me3 in the presence of the H2A.Z histone variant (implicated in cancer progression) and how these signatures change during lineage commitment and carcinogenesis.

Collapse

Inference of transcriptional regulation in cancers. Proc Natl Acad Sci U S A 2015;112:7731-6. [PMID: 26056275 DOI: 10.1073/pnas.1424272112] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

Ignatieva EV, Podkolodnaya OA, Orlov YL, Vasiliev GV, Kolchanov NA. Regulatory genomics: Combined experimental and computational approaches. RUSS J GENET+ 2015. [DOI: 10.1134/s1022795415040067] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Weber D, Heisig J, Kneitz S, Wolf E, Eilers M, Gessler M. Mechanisms of epigenetic and cell-type specific regulation of Hey target genes in ES cells and cardiomyocytes. J Mol Cell Cardiol 2015;79:79-88. [DOI: 10.1016/j.yjmcc.2014.11.004] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Revised: 10/07/2014] [Accepted: 11/06/2014] [Indexed: 01/20/2023]

Budden DM, Hurley DG, Cursons J, Markham JF, Davis MJ, Crampin EJ. Predicting expression: the complementary power of histone modification and transcription factor binding data. Epigenetics Chromatin 2014;7:36. [PMID: 25489339 PMCID: PMC4258808 DOI: 10.1186/1756-8935-7-36] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 11/05/2014] [Indexed: 01/01/2023] Open

Abstract

Background

Transcription factors (TFs) and histone modifications (HMs) play critical roles in gene expression by regulating mRNA transcription. Modelling frameworks have been developed to integrate high-throughput omics data, with the aim of elucidating the regulatory logic that results from the interactions of DNA, TFs and HMs. These models have yielded an unexpected and poorly understood result: that TFs and HMs are statistically redundant in explaining mRNA transcript abundance at a genome-wide level.

Results

We constructed predictive models of gene expression by integrating RNA-sequencing, TF and HM chromatin immunoprecipitation sequencing and DNase I hypersensitivity data for two mammalian cell types. All models identified genome-wide statistical redundancy both within and between TFs and HMs, as previously reported. To investigate potential explanations, groups of genes were constructed for ontology-classified biological processes. Predictive models were constructed for each process to explore the distribution of statistical redundancy. We found significant variation in the predictive capacity of TFs and HMs across these processes and demonstrated the predictive power of HMs to be inversely proportional to process enrichment for housekeeping genes.

Conclusions

It is well established that the roles played by TFs and HMs are not functionally redundant. Instead, we attribute the statistical redundancy reported in this and previous genome-wide modelling studies to the heterogeneous distribution of HMs across chromatin domains. Furthermore, we conclude that statistical redundancy between individual TFs can be readily explained by nucleosome-mediated cooperative binding. This could possibly help the cell confer regulatory robustness by rejecting signalling noise and allowing control via multiple pathways.

Electronic supplementary material

The online version of this article (doi:10.1186/1756-8935-7-36) contains supplementary material, which is available to authorized users.

Collapse

Angelini C, Costa V. Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems. Front Cell Dev Biol 2014;2:51. [PMID: 25364758 PMCID: PMC4207007 DOI: 10.3389/fcell.2014.00051] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2014] [Accepted: 09/01/2014] [Indexed: 11/15/2022] Open

Budden DM, Hurley DG, Crampin EJ. Predictive modelling of gene expression from transcriptional regulatory elements. Brief Bioinform 2014;16:616-28. [PMID: 25231769 DOI: 10.1093/bib/bbu034] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Accepted: 08/20/2014] [Indexed: 12/15/2022] Open

O'Connor TR, Bailey TL. Creating and validating cis-regulatory maps of tissue-specific gene expression regulation. Nucleic Acids Res 2014;42:11000-10. [PMID: 25200088 PMCID: PMC4176179 DOI: 10.1093/nar/gku801] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Noderer WL, Flockhart RJ, Bhaduri A, Diaz de Arce AJ, Zhang J, Khavari PA, Wang CL. Quantitative analysis of mammalian translation initiation sites by FACS-seq. Mol Syst Biol 2014;10:748. [PMID: 25170020 PMCID: PMC4299517 DOI: 10.15252/msb.20145136] [Citation(s) in RCA: 137] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Ling MHT, Poh CL. A predictor for predicting Escherichia coli transcriptome and the effects of gene perturbations. BMC Bioinformatics 2014;15:140. [PMID: 24884349 PMCID: PMC4038595 DOI: 10.1186/1471-2105-15-140] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2013] [Accepted: 05/09/2014] [Indexed: 11/24/2022] Open

Abstract

Background

A means to predict the effects of gene over-expression, knockouts, and environmental stimuli in silico is useful for system biologists to develop and test hypotheses. Several studies had predicted the expression of all Escherichia coli genes from sequences and reported a correlation of 0.301 between predicted and actual expression. However, these do not allow biologists to study the effects of gene perturbations on the native transcriptome.

Results

We developed a predictor to predict transcriptome-scale gene expression from a small number (n = 59) of known gene expressions using gene co-expression network, which can be used to predict the effects of over-expressions and knockdowns on E. coli transcriptome. In terms of transcriptome prediction, our results show that the correlation between predicted and actual expression value is 0.467, which is similar to the microarray intra-array variation (p-value = 0.348), suggesting that intra-array variation accounts for a substantial portion of the transcriptome prediction error. In terms of predicting the effects of gene perturbation(s), our results suggest that the expression of 83% of the genes affected by perturbation can be predicted within 40% of error and the correlation between predicted and actual expression values among the affected genes to be 0.698. With the ability to predict the effects of gene perturbations, we demonstrated that our predictor has the potential to estimate the effects of varying gene expression level on the native transcriptome.

Conclusion

We present a potential means to predict an entire transcriptome and a tool to estimate the effects of gene perturbations for E. coli, which will aid biologists in hypothesis development. This study forms the baseline for future work in using gene co-expression network for gene expression prediction.

Collapse

Multiscale representation of genomic signals. Nat Methods 2014;11:689-94. [PMID: 24727652 PMCID: PMC4040162 DOI: 10.1038/nmeth.2924] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2013] [Accepted: 02/24/2014] [Indexed: 12/30/2022]

Chen H, Lonardi S, Zheng J. Deciphering histone code of transcriptional regulation in malaria parasites by large-scale data mining. Comput Biol Chem 2014;50:3-10. [PMID: 24581698 DOI: 10.1016/j.compbiolchem.2014.01.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/23/2013] [Indexed: 10/25/2022]

Comoglio F, Paro R. Combinatorial modeling of chromatin features quantitatively predicts DNA replication timing in Drosophila. PLoS Comput Biol 2014;10:e1003419. [PMID: 24465194 PMCID: PMC3900380 DOI: 10.1371/journal.pcbi.1003419] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2013] [Accepted: 11/18/2013] [Indexed: 01/14/2023] Open