1
|
Yu Q, Zhang X, Hu Y, Chen S, Yang L. A Method for Predicting DNA Motif Length Based On Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:61-73. [PMID: 35275822 DOI: 10.1109/tcbb.2022.3158471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
A DNA motif is a sequence pattern shared by the DNA sequence segments that bind to a specific protein. Discovering motifs in a given DNA sequence dataset plays a vital role in studying gene expression regulation. As an important attribute of the DNA motif, the motif length directly affects the quality of the discovered motifs. How to determine the motif length more accurately remains a difficult challenge to be solved. We propose a new motif length prediction scheme named MotifLen by using supervised machine learning. First, a method of constructing sample data for predicting the motif length is proposed. Secondly, a deep learning model for motif length prediction is constructed based on the convolutional neural network. Then, the methods of applying the proposed prediction model based on a motif found by an existing motif discovery algorithm are given. The experimental results show that i) the prediction accuracy of MotifLen is more than 90% on the validation set and is significantly higher than that of the compared methods on real datasets, ii) MotifLen can successfully optimize the motifs found by the existing motif discovery algorithms, and iii) it can effectively improve the time performance of some existing motif discovery algorithms.
Collapse
|
2
|
A systematic study of HIF1A cofactors in hypoxic cancer cells. Sci Rep 2022; 12:18962. [PMID: 36347941 PMCID: PMC9643333 DOI: 10.1038/s41598-022-23060-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 10/25/2022] [Indexed: 11/09/2022] Open
Abstract
Hypoxia inducible factor 1 alpha (HIF1A) is a transcription factor (TF) that forms highly structural and functional protein-protein interactions with other TFs to promote gene expression in hypoxic cancer cells. However, despite the importance of these TF-TF interactions, we still lack a comprehensive view of many of the TF cofactors involved and how they cooperate. In this study, we systematically studied HIF1A cofactors in eight cancer cell lines using the computational motif mining tool, SIOMICS, and discovered 201 potential HIF1A cofactors, which included 21 of the 29 known HIF1A cofactors in public databases. These 201 cofactors were statistically and biologically significant, with 19 of the top 37 cofactors in our study directly validated in the literature. The remaining 18 were novel cofactors. These discovered cofactors can be essential to HIF1A's regulatory functions and may lead to the discovery of new therapeutic targets in cancer treatment.
Collapse
|
3
|
Zheng H, Wang S, Li X, Hu H. INSISTC: Incorporating network structure information for single-cell type classification. Genomics 2022; 114:110480. [PMID: 36075505 DOI: 10.1016/j.ygeno.2022.110480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 08/30/2022] [Accepted: 09/04/2022] [Indexed: 11/27/2022]
Abstract
Uncovering gene regulatory mechanisms in individual cells can provide insight into cell heterogeneity and function. Recent accumulated Single-Cell RNA-Seq data have made it possible to analyze gene regulation at single-cell resolution. Understanding cell-type-specific gene regulation can assist in more accurate cell type and state identification. Computational approaches utilizing such relationships are under development. Methods pioneering in integrating gene regulatory mechanism discovery with cell-type classification encounter challenges such as determine gene regulatory relationships and incorporate gene regulatory network structure. To fill this gap, we developed INSISTC, a computational method to incorporate gene regulatory network structure information for single-cell type classification. INSISTC is capable of identifying cell-type-specific gene regulatory mechanisms while performing single-cell type classification. INSISTC demonstrated its accuracy in cell type classification and its potential for providing insight into molecular mechanisms specific to individual cells. In comparison with the alternative methods, INSISTC demonstrated its complementary performance for gene regulation interpretation.
Collapse
Affiliation(s)
- Hansi Zheng
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA.
| | - Haiyan Hu
- Department of Computer Science, Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA.
| |
Collapse
|
4
|
Wang S, Hu H, Li X. A systematic study of motif pairs that may facilitate enhancer-promoter interactions. J Integr Bioinform 2022; 19:jib-2021-0038. [PMID: 35130376 PMCID: PMC9069648 DOI: 10.1515/jib-2021-0038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/20/2022] [Indexed: 01/06/2023] Open
Abstract
Pairs of interacting transcription factors (TFs) have previously been shown to bind to enhancers and promoters and contribute to their physical interactions. However, to date, we have limited knowledge about such TF pairs. To fill this void, we systematically studied the co-occurrence of TF-binding motifs in interacting enhancer-promoter (EP) pairs in seven human cell lines. We discovered 423 motif pairs that significantly co-occur in enhancers and promoters of interacting EP pairs. We demonstrated that these motif pairs are biologically meaningful and significantly enriched with motif pairs of known interacting TF pairs. We also showed that the identified motif pairs facilitated the discovery of the interacting EP pairs. The developed pipeline, EPmotifPair, together with the predicted motifs and motif pairs, is available at https://doi.org/10.6084/m9.figshare.14192000. Our study provides a comprehensive list of motif pairs that may contribute to EP physical interactions, which facilitate generating meaningful hypotheses for experimental validation.
Collapse
Affiliation(s)
- Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xiaoman Li
- Burnett school of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL, 32816, USA
| |
Collapse
|
5
|
Wang Y, Liu L, Lin M. Psychiatric risk gene transcription factor 4 preferentially regulates cortical interneuron neurogenesis during early brain development. J Biomed Res 2022; 36:242-254. [PMID: 35965434 PMCID: PMC9376727 DOI: 10.7555/jbr.36.20220074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Genetic variants within or near the transcription factor 4 gene (TCF4) are robustly implicated in psychiatric disorders including schizophrenia. However, the biological pleiotropy poses considerable obstacles to dissect the potential relationship between TCF4 and those highly heterogeneous diseases. Through integrative transcriptomic analysis, we demonstrated that TCF4 is preferentially expressed in cortical interneurons during early brain development. Therefore, disruptions of interneuron development might be the underlying contribution of TCF4 perturbation to a range of neurodevelopmental disorders. Here, we performed chromatin immunoprecipitation sequencing (ChIP-seq) of TCF4 on human medial ganglionic eminence-like organoids (hMGEOs) to identify genome-wide TCF4 binding sites, followed by integration of multi-omics data from human fetal brain. We observed preferential expression of the isoform TCF4-B over TCF4-A. De novo motif analysis found that the identified 5916 TCF4 binding sites are significantly enriched for the E-box sequence. The predicted TCF4 targets in general have positively correlated expression levels with TCF4 in the cortical interneurons, and are primarily involved in biological processes related to neurogenesis. Interestingly, we found that TCF4 interacts with non-bHLH proteins such as FOS/JUN, which may underlie the functional specificity of TCF4 in hMGEOs. This study highlights the regulatory role of TCF4 in interneuron development and provides compelling evidence to support the biological rationale linking TCF4 to the developing cortical interneuron and psychiatric disorders.
Collapse
Affiliation(s)
- Yuanyuan Wang
- State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Liya Liu
- Department of Neurobiology, School of Basic Medical Sciences, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Mingyan Lin
- Department of Neurobiology, School of Basic Medical Sciences, Nanjing Medical University, Nanjing, Jiangsu 211166, China
- Mingyan Lin, Department of Neurobiology, School of Basic Medical Sciences, Nanjing Medical University, 101 Longmian Avenue, Jiangning District, Nanjing, Jiangsu 211166, China. Tel: +86-25-86869432, E-mail:
| |
Collapse
|
6
|
Shommo G, Apolloni B. A holistic miRNA-mRNA module discovery. Noncoding RNA Res 2021; 6:159-166. [PMID: 34703956 PMCID: PMC8521321 DOI: 10.1016/j.ncrna.2021.09.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 09/20/2021] [Accepted: 09/20/2021] [Indexed: 11/16/2022] Open
Abstract
The regulatory role of the Micro-RNAs (miRNAs) in the messenger RNAs (mRNAs) gene expression is well understood by the biologists since some decades, even though the delving into specific aspects is in progress. In this paper we will focus on miRNA-mRNA modules, where regulation jointly occurs in miRNA-mRNA pairs. Namely, we propose a holistic procedure to identify miRNA-mRNA modules within a population of candidate pairs. Since current methods still leave open issues, we adopt the strategy of postponing any decision on the value of the module ingredients exactly at the end, i.e. at the moment of biologically exploiting the results. This diverts chains of statistical tests into sequences of specially-devised-evolving metrics on the possible solutions. This strategy is rather expensive under a computational perspective, so needing implementations on HPC. The reward stands in the discovery of new modules, possibly hosting non differentially expressed miRNAs and mRNAs and pairs containing genes that currently are considered not targeted. In the paper we implement the procedure on a Multiple Myeloma dataset publicly available on GEO platform, as a template of a cancer instance analysis, and hazard some biological issues. These results, jointly with the normal manageability of the computations, suggest that the discovery procedure may be profitably extended to a wide spectrum of diseases where miRNA-mRNA interactions play a relevant role.
Collapse
Affiliation(s)
- Ghada Shommo
- Sudan University of Science and Technology, Department of Information Technology and Computer Science, Sudan
| | - Bruno Apolloni
- Department of Computer Science, Via Comelico 39/41, 20135, Milano, Italy
| |
Collapse
|
7
|
Li JY, Jin S, Tu XM, Ding Y, Gao G. Identifying complex motifs in massive omics data with a variable-convolutional layer in deep neural network. Brief Bioinform 2021; 22:6312656. [PMID: 34219140 DOI: 10.1093/bib/bbab233] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 05/25/2021] [Accepted: 05/28/2021] [Indexed: 01/10/2023] Open
Abstract
Motif identification is among the most common and essential computational tasks for bioinformatics and genomics. Here we proposed a novel convolutional layer for deep neural network, named variable convolutional (vConv) layer, for effective motif identification in high-throughput omics data by learning kernel length from data adaptively. Empirical evaluations on DNA-protein binding and DNase footprinting cases well demonstrated that vConv-based networks have superior performance to their convolutional counterparts regardless of model complexity. Meanwhile, vConv could be readily integrated into multi-layer neural networks as an 'in-place replacement' of canonical convolutional layer. All source codes are freely available on GitHub for academic usage.
Collapse
Affiliation(s)
- Jing-Yi Li
- Biomedical Pioneering Innovation Center & Beijing Advanced Innovation Center for Genomics, Center for Bioinformatics, and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Shen Jin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Xin-Ming Tu
- Biomedical Pioneering Innovation Center & Beijing Advanced Innovation Center for Genomics, Center for Bioinformatics, and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Yang Ding
- Biomedical Pioneering Innovation Center & Beijing Advanced Innovation Center for Genomics, Center for Bioinformatics, and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| | - Ge Gao
- Biomedical Pioneering Innovation Center & Beijing Advanced Innovation Center for Genomics, Center for Bioinformatics, and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
8
|
Shared distal regulatory regions may contribute to the coordinated expression of human ribosomal protein genes. Genomics 2020; 112:2886-2893. [PMID: 32240723 DOI: 10.1016/j.ygeno.2020.03.028] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 03/12/2020] [Accepted: 03/29/2020] [Indexed: 11/21/2022]
Abstract
To identify the potential distal regulatory regions of human ribosomal protein genes (RPGs) and to understand their characteristics, we studied the chromatin interactions in seven cell lines and four primary cell types. We identified 22,797 putative regulatory regions that directly or indirectly interact with human RPG promoters. A large proportion of these regions are only present in one cell line or one cell type, implying that RPGs may be differentially regulated across experimental conditions. We also noticed that groups of RPGs, which are the same groups across cell lines and cell types, share common regulatory regions. These shared regulatory regions by RPGs may contribute to their coordinated regulation. By studying the overrepresented motifs in the identified regulatory regions, we showed that there are about two dozen motifs in these regions shared across cell lines and cell types. Our study shed new light on the coordinated transcriptional regulation of human RPGs.
Collapse
|
9
|
Essebier A, Lamprecht M, Piper M, Bodén M. Bioinformatics approaches to predict target genes from transcription factor binding data. Methods 2017; 131:111-119. [DOI: 10.1016/j.ymeth.2017.09.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Revised: 08/29/2017] [Accepted: 09/03/2017] [Indexed: 12/28/2022] Open
|
10
|
Identification of cis-regulatory sequences reveals potential participation of lola and Deaf1 transcription factors in Anopheles gambiae innate immune response. PLoS One 2017; 12:e0186435. [PMID: 29028826 PMCID: PMC5640250 DOI: 10.1371/journal.pone.0186435] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 09/29/2017] [Indexed: 01/26/2023] Open
Abstract
The innate immune response of Anopheles gambiae involves the transcriptional upregulation of effector genes. Therefore, the cis-regulatory sequences and their cognate binding factors play essential roles in the mosquito’s immune response. However, the genetic control of the mosquito’s innate immune response is not yet fully understood. To gain further insight on the elements, the factors and the potential mechanisms involved, an open chromatin profiling was carried out on A. gambiae-derived immune-responsive cells. Here, we report the identification of cis-regulatory sites, immunity-related transcription factor binding sites, and cis-regulatory modules. A de novo motif discovery carried out on this set of cis-regulatory sequences identified immunity-related motifs and cis-regulatory modules. These modules contain motifs that are similar to binding sites for REL-, STAT-, lola- and Deaf1-type transcription factors. Sequence motifs similar to the binding sites for GAGA were found within a cis-regulatory module, together with immunity-related transcription factor binding sites. The presence of Deaf1- and lola-type binding sites, along with REL- and STAT-type binding sites, suggests that the immunity function of these two factors could have been conserved both in Drosophila and Anopheles gambiae.
Collapse
|
11
|
Wang Y, Goodison S, Li X, Hu H. Prognostic cancer gene signatures share common regulatory motifs. Sci Rep 2017; 7:4750. [PMID: 28684851 PMCID: PMC5500535 DOI: 10.1038/s41598-017-05035-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Accepted: 06/01/2017] [Indexed: 11/29/2022] Open
Abstract
Scientists have discovered various prognostic gene signatures (GSs) in different cancer types. Surprisingly, although different GSs from the same cancer type can be used to measure similar biological characteristics, often rarely is there a gene shared by different GSs. To explain such a paradox, we hypothesized that GSs from the same cancer type may be regulated by common regulatory motifs. To test this hypothesis, we carried out a comprehensive motif analysis on the prognostic GSs from five cancer types. We demonstrated that GSs from individual cancer type as well as across cancer types share regulatory motifs. We also observed that transcription factors that likely bind to these shared motifs have prognostic functions in cancers. Moreover, 75% of the predicted cofactors of these transcription factors may have cancer-related functions and some cofactors even have prognostic functions. In addition, there exist common microRNAs that regulate different GSs from individual cancer types and across cancer types, several of which are prognostic biomarkers for the corresponding cancer types. Our study suggested the existence of common regulatory mechanisms shared by GSs from individual cancer types and across cancer types, which shed light on the discovery of new prognostic GSs in cancers and the understanding of the regulatory mechanisms of cancers.
Collapse
Affiliation(s)
- Ying Wang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Steve Goodison
- Nonagen BioScience Corp, Jacksonville, FL, 32216, USA
- Department of Health Sciences Research, Mayo Clinic, Jacksonville, FL, 32224, USA
| | - Xiaoman Li
- Burnett school of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL, 32816, USA.
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
| |
Collapse
|
12
|
Duffy DJ, Krstic A, Halasz M, Schwarzl T, Fey D, Iljin K, Mehta JP, Killick K, Whilde J, Turriziani B, Haapa-Paananen S, Fey V, Fischer M, Westermann F, Henrich KO, Bannert S, Higgins DG, Kolch W. Integrative omics reveals MYCN as a global suppressor of cellular signalling and enables network-based therapeutic target discovery in neuroblastoma. Oncotarget 2016; 6:43182-201. [PMID: 26673823 PMCID: PMC4791225 DOI: 10.18632/oncotarget.6568] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2015] [Accepted: 11/23/2015] [Indexed: 12/12/2022] Open
Abstract
Despite intensive study, many mysteries remain about the MYCN oncogene's functions. Here we focus on MYCN's role in neuroblastoma, the most common extracranial childhood cancer. MYCN gene amplification occurs in 20% of cases, but other recurrent somatic mutations are rare. This scarcity of tractable targets has hampered efforts to develop new therapeutic options. We employed a multi-level omics approach to examine MYCN functioning and identify novel therapeutic targets for this largely un-druggable oncogene. We used systems medicine based computational network reconstruction and analysis to integrate a range of omic techniques: sequencing-based transcriptomics, genome-wide chromatin immunoprecipitation, siRNA screening and interaction proteomics, revealing that MYCN controls highly connected networks, with MYCN primarily supressing the activity of network components. MYCN's oncogenic functions are likely independent of its classical heterodimerisation partner, MAX. In particular, MYCN controls its own protein interaction network by transcriptionally regulating its binding partners. Our network-based approach identified vulnerable therapeutically targetable nodes that function as critical regulators or effectors of MYCN in neuroblastoma. These were validated by siRNA knockdown screens, functional studies and patient data. We identified β-estradiol and MAPK/ERK as having functional cross-talk with MYCN and being novel targetable vulnerabilities of MYCN-amplified neuroblastoma. These results reveal surprising differences between the functioning of endogenous, overexpressed and amplified MYCN, and rationalise how different MYCN dosages can orchestrate cell fate decisions and cancerous outcomes. Importantly, this work describes a systems-level approach to systematically uncovering network based vulnerabilities and therapeutic targets for multifactorial diseases by integrating disparate omic data types.
Collapse
Affiliation(s)
- David J Duffy
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland.,The Whitney Laboratory for Marine Bioscience, University of Florida, St. Augustine, Florida, USA
| | - Aleksandar Krstic
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland
| | - Melinda Halasz
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland
| | - Thomas Schwarzl
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland.,European Molecular Biology Laboratory (EMBL), Meyerhofstraße, Heidelberg, Germany
| | - Dirk Fey
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland
| | | | - Jai Prakash Mehta
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland
| | - Kate Killick
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland
| | - Jenny Whilde
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland
| | | | | | - Vidal Fey
- VTT Technical Research Centre of Finland, Espoo, Finland
| | - Matthias Fischer
- Department of Paediatric Haematology and Oncology and Center for Molecular Medicine Cologne (CMMC), University Hospital Cologne, Cologne, Germany
| | - Frank Westermann
- Division of NB Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Kai-Oliver Henrich
- Division of NB Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Steffen Bannert
- Division of NB Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Desmond G Higgins
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland.,Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin, Ireland.,School of Medicine and Medical Science, University College Dublin, Belfield, Dublin, Ireland
| | - Walter Kolch
- Systems Biology Ireland, University College Dublin, Belfield, Dublin, Ireland.,Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin, Ireland.,School of Medicine and Medical Science, University College Dublin, Belfield, Dublin, Ireland
| |
Collapse
|
13
|
PETModule: a motif module based approach for enhancer target gene prediction. Sci Rep 2016; 6:30043. [PMID: 27436110 PMCID: PMC4951774 DOI: 10.1038/srep30043] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Accepted: 06/29/2016] [Indexed: 12/20/2022] Open
Abstract
The identification of enhancer-target gene (ETG) pairs is vital for the understanding of gene transcriptional regulation. Experimental approaches such as Hi-C have generated valuable resources of ETG pairs. Several computational methods have also been developed to successfully predict ETG interactions. Despite these progresses, high-throughput experimental approaches are still costly and existing computational approaches are still suboptimal and not easy to apply. Here we developed a motif module based approach called PETModule that predicts ETG pairs. Tested on eight human cell types and two mouse cell types, we showed that a large number of our predictions were supported by Hi-C and/or ChIA-PET experiments. Compared with two recently developed approaches for ETG pair prediction, we shown that PETModule had a much better recall, a similar or better F1 score, and a larger area under the receiver operating characteristic curve. The PETModule tool is freely available at http://hulab.ucf.edu/research/projects/PETModule/.
Collapse
|
14
|
Integrative analyses shed new light on human ribosomal protein gene regulation. Sci Rep 2016; 6:28619. [PMID: 27346035 PMCID: PMC4921865 DOI: 10.1038/srep28619] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 06/06/2016] [Indexed: 12/05/2022] Open
Abstract
Ribosomal protein genes (RPGs) are important house-keeping genes that are well-known for their coordinated expression. Previous studies on RPGs are largely limited to their promoter regions. Recent high-throughput studies provide an unprecedented opportunity to study how human RPGs are transcriptionally modulated and how such transcriptional regulation may contribute to the coordinate gene expression in various tissues and cell types. By analyzing the DNase I hypersensitive sites under 349 experimental conditions, we predicted 217 RPG regulatory regions in the human genome. More than 86.6% of these computationally predicted regulatory regions were partially corroborated by independent experimental measurements. Motif analyses on these predicted regulatory regions identified 31 DNA motifs, including 57.1% of experimentally validated motifs in literature that regulate RPGs. Interestingly, we observed that the majority of the predicted motifs were shared by the predicted distal and proximal regulatory regions of the same RPGs, a likely general mechanism for enhancer-promoter interactions. We also found that RPGs may be differently regulated in different cells, indicating that condition-specific RPG regulatory regions still need to be discovered and investigated. Our study advances the understanding of how RPGs are coordinately modulated, which sheds light to the general principles of gene transcriptional regulation in mammals.
Collapse
|
15
|
Ma CW, Zhou LB, Zeng AP. Engineering Biomolecular Switches for Dynamic Metabolic Control. ADVANCES IN BIOCHEMICAL ENGINEERING/BIOTECHNOLOGY 2016; 162:45-76. [DOI: 10.1007/10_2016_9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
16
|
Systematic discovery of cofactor motifs from ChIP-seq data by SIOMICS. Methods 2015; 79-80:47-51. [DOI: 10.1016/j.ymeth.2014.08.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 07/19/2014] [Accepted: 08/06/2014] [Indexed: 11/19/2022] Open
|
17
|
Lihu A, Holban T. A review of ensemble methods for de novo motif discovery in ChIP-Seq data. Brief Bioinform 2015; 16:964-73. [DOI: 10.1093/bib/bbv022] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Indexed: 01/17/2023] Open
|
18
|
Zheng Y, Li X, Hu H. PreDREM: a database of predicted DNA regulatory motifs from 349 human cell and tissue samples. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav007. [PMID: 25725063 PMCID: PMC4343075 DOI: 10.1093/database/bav007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
PreDREM is a database of DNA regulatory motifs and motifs modules predicted from DNase I hypersensitive sites in 349 human cell and tissue samples. It contains 845–1325 predicted motifs in each sample, which result in a total of 2684 non-redundant motifs. In comparison with seven large collections of known motifs, more than 84% of the 2684 predicted motifs are similar to the known motifs, and 54–76% of the known motifs are similar to the predicted motifs. PreDREM also stores 43 663–20 13 288 motif modules in each sample, which provide the cofactor motifs of each predicted motif. Compared with motifs of known interacting transcription factor (TF) pairs in eight resources, on average, 84% of motif pairs corresponding to known interacting TF pairs are included in the predicted motif modules. Through its web interface, PreDREM allows users to browse motif information by tissues, datasets, individual non-redundant motifs, etc. Users can also search motifs, motif modules, instances of motifs and motif modules in given genomic regions, tissue or cell types a motif occurs, etc. PreDREM thus provides a useful resource for the understanding of cell- and tissue-specific gene regulation in the human genome. Database URL:http://server.cs.ucf.edu/predrem/.
Collapse
Affiliation(s)
- Yiyu Zheng
- Department of Electrical Engineering and Computer Science and Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Department of Electrical Engineering and Computer Science and Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| | - Haiyan Hu
- Department of Electrical Engineering and Computer Science and Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
19
|
Zheng Y, Li X, Hu H. Comprehensive discovery of DNA motifs in 349 human cells and tissues reveals new features of motifs. Nucleic Acids Res 2015; 43:74-83. [PMID: 25505144 PMCID: PMC4288161 DOI: 10.1093/nar/gku1261] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Revised: 11/13/2014] [Accepted: 11/17/2014] [Indexed: 01/15/2023] Open
Abstract
Comprehensive motif discovery under experimental conditions is critical for the global understanding of gene regulation. To generate a nearly complete list of human DNA motifs under given conditions, we employed a novel approach to de novo discover significant co-occurring DNA motifs in 349 human DNase I hypersensitive site datasets. We predicted 845 to 1325 motifs in each dataset, for a total of 2684 non-redundant motifs. These 2684 motifs contained 54.02 to 75.95% of the known motifs in seven large collections including TRANSFAC. In each dataset, we also discovered 43 663 to 2 013 288 motif modules, groups of motifs with their binding sites co-occurring in a significant number of short DNA regions. Compared with known interacting transcription factors in eight resources, the predicted motif modules on average included 84.23% of known interacting motifs. We further showed new features of the predicted motifs, such as motifs enriched in proximal regions rarely overlapped with motifs enriched in distal regions, motifs enriched in 5' distal regions were often enriched in 3' distal regions, etc. Finally, we observed that the 2684 predicted motifs classified the cell or tissue types of the datasets with an accuracy of 81.29%. The resources generated in this study are available at http://server.cs.ucf.edu/predrem/.
Collapse
Affiliation(s)
- Yiyu Zheng
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| | - Haiyan Hu
- Department of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
20
|
Ding J, Li X, Hu H. MicroRNA modules prefer to bind weak and unconventional target sites. ACTA ACUST UNITED AC 2014; 31:1366-74. [PMID: 25527098 PMCID: PMC4410656 DOI: 10.1093/bioinformatics/btu833] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2014] [Accepted: 12/13/2014] [Indexed: 11/13/2022]
Abstract
Motivation: MicroRNAs (miRNAs) play critical roles in gene regulation. Although it is well known that multiple miRNAs may work as miRNA modules to synergistically regulate common target mRNAs, the understanding of miRNA modules is still in its infancy. Results: We employed the recently generated high throughput experimental data to study miRNA modules. We predicted 181 miRNA modules and 306 potential miRNA modules. We observed that the target sites of these predicted modules were in general weaker compared with those not bound by miRNA modules. We also discovered that miRNAs in predicted modules preferred to bind unconventional target sites rather than canonical sites. Surprisingly, contrary to a previous study, we found that most adjacent miRNA target sites from the same miRNA modules were not within the range of 10–130 nucleotides. Interestingly, the distance of target sites bound by miRNAs in the same modules was shorter when miRNA modules bound unconventional instead of canonical sites. Our study shed new light on miRNA binding and miRNA target sites, which will likely advance our understanding of miRNA regulation. Availability and implementation: The software miRModule can be freely downloaded at http://hulab.ucf.edu/research/projects/miRNA/miRModule. Supplementary information:Supplementary data are available at Bioinformatics online. Contact:haihu@cs.ucf.edu or xiaoman@mail.ucf.edu.
Collapse
Affiliation(s)
- Jun Ding
- Department of Electrical Engineering and Computer Science and Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Department of Electrical Engineering and Computer Science and Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| | - Haiyan Hu
- Department of Electrical Engineering and Computer Science and Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|