1
|
Wu Q, Li Y, Wang Q, Zhao X, Sun D, Liu B. Identification of DNA motif pairs on paired sequences based on composite heterogeneous graph. Front Genet 2024; 15:1424085. [PMID: 38952710 PMCID: PMC11215013 DOI: 10.3389/fgene.2024.1424085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Accepted: 05/22/2024] [Indexed: 07/03/2024] Open
Abstract
Motivation The interaction between DNA motifs (DNA motif pairs) influences gene expression through partnership or competition in the process of gene regulation. Potential chromatin interactions between different DNA motifs have been implicated in various diseases. However, current methods for identifying DNA motif pairs rely on the recognition of single DNA motifs or probabilities, which may result in local optimal solutions and can be sensitive to the choice of initial values. A method for precisely identifying DNA motif pairs is still lacking. Results Here, we propose a novel computational method for predicting DNA Motif Pairs based on Composite Heterogeneous Graph (MPCHG). This approach leverages a composite heterogeneous graph model to identify DNA motif pairs on paired sequences. Compared with the existing methods, MPCHG has greatly improved the accuracy of motifs prediction. Furthermore, the predicted DNA motifs demonstrate heightened DNase accessibility than the background sequences. Notably, the two DNA motifs forming a pair exhibit functional consistency. Importantly, the interacting TF pairs obtained by predicted DNA motif pairs were significantly enriched with known interacting TF pairs, suggesting their potential contribution to chromatin interactions. Collectively, we believe that these identified DNA motif pairs held substantial implications for revealing gene transcriptional regulation under long-range chromatin interactions.
Collapse
Affiliation(s)
- Qiuqin Wu
- School of Mathematics, Shandong University, Jinan, China
| | - Yang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Qi Wang
- School of Mathematics, Shandong University, Jinan, China
| | - Xiaoyu Zhao
- School of Mathematics, Shandong University, Jinan, China
| | - Duanchen Sun
- School of Mathematics, Shandong University, Jinan, China
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, China
| |
Collapse
|
2
|
Liu Z, Wong HM, Chen X, Lin J, Zhang S, Yan S, Wang F, Li X, Wong KC. MotifHub: Detection of trans-acting DNA motif group with probabilistic modeling algorithm. Comput Biol Med 2024; 168:107753. [PMID: 38039889 DOI: 10.1016/j.compbiomed.2023.107753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/30/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023]
Abstract
BACKGROUND Trans-acting factors are of special importance in transcription regulation, which is a group of proteins that can directly or indirectly recognize or bind to the 8-12 bp core sequence of cis-acting elements and regulate the transcription efficiency of target genes. The progressive development in high-throughput chromatin capture technology (e.g., Hi-C) enables the identification of chromatin-interacting sequence groups where trans-acting DNA motif groups can be discovered. The problem difficulty lies in the combinatorial nature of DNA sequence pattern matching and its underlying sequence pattern search space. METHOD Here, we propose to develop MotifHub for trans-acting DNA motif group discovery on grouped sequences. Specifically, the main approach is to develop probabilistic modeling for accommodating the stochastic nature of DNA motif patterns. RESULTS Based on the modeling, we develop global sampling techniques based on EM and Gibbs sampling to address the global optimization challenge for model fitting with latent variables. The results reflect that our proposed approaches demonstrate promising performance with linear time complexities. CONCLUSION MotifHub is a novel algorithm considering the identification of both DNA co-binding motif groups and trans-acting TFs. Our study paves the way for identifying hub TFs of stem cell development (OCT4 and SOX2) and determining potential therapeutic targets of prostate cancer (FOXA1 and MYC). To ensure scientific reproducibility and long-term impact, its matrix-algebra-optimized source code is released at http://bioinfo.cs.cityu.edu.hk/MotifHub.
Collapse
Affiliation(s)
- Zhe Liu
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Hiu-Man Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Jiecong Lin
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Shixiong Zhang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Shankai Yan
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Fuzhou Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong, China.
| |
Collapse
|
3
|
Liu C, Song J, Ogata H, Akutsu T. MSNet-4mC: learning effective multi-scale representations for identifying DNA N4-methylcytosine sites. Bioinformatics 2022; 38:5160-5167. [PMID: 36205602 DOI: 10.1093/bioinformatics/btac671] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/09/2022] [Accepted: 10/05/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION N4-methylcytosine (4mC) is an essential kind of epigenetic modification that regulates a wide range of biological processes. However, experimental methods for detecting 4mC sites are time-consuming and labor-intensive. As an alternative, computational methods that are capable of automatically identifying 4mC with data analysis techniques become a reasonable option. A major challenge is how to develop effective methods to fully exploit the complex interactions within the DNA sequences to improve the predictive capability. RESULTS In this work, we propose MSNet-4mC, a lightweight neural network building upon convolutional operations with multi-scale receptive fields to perceive cross-element relationships over both short and long ranges of given DNA sequences. With strong imbalances in the number of candidates in different species in mind, we compute and apply class weights in the cross-entropy loss to balance the training process. Extensive benchmarking experiments show that our method achieves a significant performance improvement and outperforms other state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION The source code and models are freely available for download at https://github.com/LIU-CT/MSNet-4mC, implemented in Python and supported on Linux and Windows. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chunting Liu
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Kyoto 606-8501, Japan.,Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Hiroyuki Ogata
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Tatsuya Akutsu
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Kyoto, Kyoto 606-8501, Japan.,Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| |
Collapse
|
4
|
Wang S, Hu H, Li X. A systematic study of motif pairs that may facilitate enhancer-promoter interactions. J Integr Bioinform 2022; 19:jib-2021-0038. [PMID: 35130376 PMCID: PMC9069648 DOI: 10.1515/jib-2021-0038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/20/2022] [Indexed: 01/06/2023] Open
Abstract
Pairs of interacting transcription factors (TFs) have previously been shown to bind to enhancers and promoters and contribute to their physical interactions. However, to date, we have limited knowledge about such TF pairs. To fill this void, we systematically studied the co-occurrence of TF-binding motifs in interacting enhancer-promoter (EP) pairs in seven human cell lines. We discovered 423 motif pairs that significantly co-occur in enhancers and promoters of interacting EP pairs. We demonstrated that these motif pairs are biologically meaningful and significantly enriched with motif pairs of known interacting TF pairs. We also showed that the identified motif pairs facilitated the discovery of the interacting EP pairs. The developed pipeline, EPmotifPair, together with the predicted motifs and motif pairs, is available at https://doi.org/10.6084/m9.figshare.14192000. Our study provides a comprehensive list of motif pairs that may contribute to EP physical interactions, which facilitate generating meaningful hypotheses for experimental validation.
Collapse
Affiliation(s)
- Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xiaoman Li
- Burnett school of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL, 32816, USA
| |
Collapse
|
5
|
Wong KC, Lin J, Li X, Lin Q, Liang C, Song YQ. Heterodimeric DNA motif synthesis and validations. Nucleic Acids Res 2019; 47:1628-1636. [PMID: 30590725 PMCID: PMC6393289 DOI: 10.1093/nar/gky1297] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 12/04/2018] [Accepted: 12/19/2018] [Indexed: 02/06/2023] Open
Abstract
Bound by transcription factors, DNA motifs (i.e. transcription factor binding sites) are prevalent and important for gene regulation in different tissues at different developmental stages of eukaryotes. Although considerable efforts have been made on elucidating monomeric DNA motif patterns, our knowledge on heterodimeric DNA motifs are still far from complete. Therefore, we propose to develop a computational approach to synthesize a heterodimeric DNA motif from two monomeric DNA motifs. The approach is sequentially divided into two components (Phases A and B). In Phase A, we propose to develop the inference models on how two DNA monomeric motifs can be oriented and overlapped with each other at nucleotide level. In Phase B, given the two monomeric DNA motifs oriented, we further propose to develop DNA-binding family-specific input-output hidden Markov models (IOHMMs) to synthesize a heterodimeric DNA motif. To validate the approach, we execute and cross-validate it with the experimentally verified 618 heterodimeric DNA motifs across 49 DNA-binding family combinations. We observe that our approach can even "rescue" the existing heterodimeric DNA motif pattern (i.e. HOXB2_EOMES) previously published on Nature. Lastly, we apply the proposed approach to infer previously uncharacterized heterodimeric motifs. Their motif instances are supported by DNase accessibility, gene ontology, protein-protein interactions, in vivo ChIP-seq peaks, and even structural data from PDB. A public web-server is built for open accessibility and scientific impact. Its address is listed as follows: http://motif.cs.cityu.edu.hk/custom/MotifKirin.
Collapse
Affiliation(s)
- Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Jiecong Lin
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xiangtao Li
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Qiuzhen Lin
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - You-Qiang Song
- School of Biomedical Sciences, University of Hong Kong, Pokfulam, Hong Kong SAR
| |
Collapse
|
6
|
Wong KC. MotifHyades: expectation maximization for de novo DNA motif pair discovery on paired sequences. Bioinformatics 2017. [DOI: 10.1093/bioinformatics/btx381] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Affiliation(s)
- Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| |
Collapse
|
7
|
Mourad R, Li L, Cuvier O. Uncovering direct and indirect molecular determinants of chromatin loops using a computational integrative approach. PLoS Comput Biol 2017; 13:e1005538. [PMID: 28542178 PMCID: PMC5462476 DOI: 10.1371/journal.pcbi.1005538] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2016] [Revised: 06/07/2017] [Accepted: 04/28/2017] [Indexed: 12/11/2022] Open
Abstract
Chromosomal organization in 3D plays a central role in regulating cell-type specific transcriptional and DNA replication timing programs. Yet it remains unclear to what extent the resulting long-range contacts depend on specific molecular drivers. Here we propose a model that comprehensively assesses the influence on contacts of DNA-binding proteins, cis-regulatory elements and DNA consensus motifs. Using real data, we validate a large number of predictions for long-range contacts involving known architectural proteins and DNA motifs. Our model outperforms existing approaches including enrichment test, random forests and correlation, and it uncovers numerous novel long-range contacts in Drosophila and human. The model uncovers the orientation-dependent specificity for long-range contacts between CTCF motifs in Drosophila, highlighting its conserved property in 3D organization of metazoan genomes. Our model further unravels long-range contacts depending on co-factors recruited to DNA indirectly, as illustrated by the influence of cohesin in stabilizing long-range contacts between CTCF sites. It also reveals asymmetric contacts such as enhancer-promoter contacts that highlight opposite influences of the transcription factors EBF1, EGR1 or MEF2C depending on RNA Polymerase II pausing.
Collapse
Affiliation(s)
- Raphaël Mourad
- Laboratoire de Biologie Moléculaire Eucaryote (LBME), CNRS, Université Paul Sabatier (UPS), Toulouse, France
| | - Lang Li
- Center for Computational Biology and Bioinformatics (CCBB), Indiana University, Indianapolis, Indiana, United States of America
| | - Olivier Cuvier
- Laboratoire de Biologie Moléculaire Eucaryote (LBME), CNRS, Université Paul Sabatier (UPS), Toulouse, France
| |
Collapse
|
8
|
Wong KC. A Novel Approach to Predict Core Residues on Cancer-Related DNA-Binding Domains. Cancer Inform 2016; 15:1-7. [PMID: 27279732 PMCID: PMC4892203 DOI: 10.4137/cin.s39366] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Revised: 05/04/2016] [Accepted: 05/08/2016] [Indexed: 11/05/2022] Open
Abstract
Protein-DNA interactions are involved in different cancer pathways. In particular, the DNA-binding domains of proteins can determine where and how gene regulatory regions are bound in different cell lines at different stages. Therefore, it is essential to develop a method to predict and locate the core residues on cancer-related DNA-binding domains. In this study, we propose a computational method to predict and locate core residues on DNA-binding domains. In particular, we have selected the cancer-related DNA-binding domains for in-depth studies, namely, winged Helix Turn Helix family, homeodomain family, and basic Helix-Loop-Helix family. The results demonstrate that the proposed method can predict the core residues involved in protein-DNA interactions, as verified by the existing structural data. Given its good performance, various aspects of the method are discussed and explored: for instance, different uses of prediction algorithm, different protein domains, and hotspot threshold setting.
Collapse
Affiliation(s)
- Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| |
Collapse
|
9
|
Zeidler S, Meckbach C, Tacke R, Raad FS, Roa A, Uchida S, Zimmermann WH, Wingender E, Gültas M. Computational Detection of Stage-Specific Transcription Factor Clusters during Heart Development. Front Genet 2016; 7:33. [PMID: 27047536 PMCID: PMC4804722 DOI: 10.3389/fgene.2016.00033] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 02/23/2016] [Indexed: 12/28/2022] Open
Abstract
Transcription factors (TFs) regulate gene expression in living organisms. In higher organisms, TFs often interact in non-random combinations with each other to control gene transcription. Understanding the interactions is key to decipher mechanisms underlying tissue development. The aim of this study was to analyze co-occurring transcription factor binding sites (TFBSs) in a time series dataset from a new cell-culture model of human heart muscle development in order to identify common as well as specific co-occurring TFBS pairs in the promoter regions of regulated genes which can be essential to enhance cardiac tissue developmental processes. To this end, we separated available RNAseq dataset into five temporally defined groups: (i) mesoderm induction stage; (ii) early cardiac specification stage; (iii) late cardiac specification stage; (iv) early cardiac maturation stage; (v) late cardiac maturation stage, where each of these stages is characterized by unique differentially expressed genes (DEGs). To identify TFBS pairs for each stage, we applied the MatrixCatch algorithm, which is a successful method to deduce experimentally described TFBS pairs in the promoters of the DEGs. Although DEGs in each stage are distinct, our results show that the TFBS pair networks predicted by MatrixCatch for all stages are quite similar. Thus, we extend the results of MatrixCatch utilizing a Markov clustering algorithm (MCL) to perform network analysis. Using our extended approach, we are able to separate the TFBS pair networks in several clusters to highlight stage-specific co-occurences between TFBSs. Our approach has revealed clusters that are either common (NFAT or HMGIY clusters) or specific (SMAD or AP-1 clusters) for the individual stages. Several of these clusters are likely to play an important role during the cardiomyogenesis. Further, we have shown that the related TFs of TFBSs in the clusters indicate potential synergistic or antagonistic interactions to switch between different stages. Additionally, our results suggest that cardiomyogenesis follows the hourglass model which was already proven for Arabidopsis and some vertebrates. This investigation helps us to get a better understanding of how each stage of cardiomyogenesis is affected by different combination of TFs. Such knowledge may help to understand basic principles of stem cell differentiation into cardiomyocytes.
Collapse
Affiliation(s)
- Sebastian Zeidler
- University Medical Center Göttingen, Institute of Bioinformatics, Georg-August-University GöttingenGöttingen, Germany; Heart Research Center Göttingen, University Medical Center Göttingen, Institute of Pharmacology and Toxicology, Georg-August-University GöttingenGöttingen, Germany; DZHK (German Centre for Cardiovascular Research)Göttingen, Germany
| | - Cornelia Meckbach
- University Medical Center Göttingen, Institute of Bioinformatics, Georg-August-University Göttingen Göttingen, Germany
| | - Rebecca Tacke
- University Medical Center Göttingen, Institute of Bioinformatics, Georg-August-University Göttingen Göttingen, Germany
| | - Farah S Raad
- Heart Research Center Göttingen, University Medical Center Göttingen, Institute of Pharmacology and Toxicology, Georg-August-University GöttingenGöttingen, Germany; DZHK (German Centre for Cardiovascular Research)Göttingen, Germany
| | - Angelica Roa
- Heart Research Center Göttingen, University Medical Center Göttingen, Institute of Pharmacology and Toxicology, Georg-August-University GöttingenGöttingen, Germany; DZHK (German Centre for Cardiovascular Research)Göttingen, Germany
| | - Shizuka Uchida
- Institute of Cardiovascular Regeneration, Goethe University FrankfurtFrankfurt, Germany; DZHK (German Centre for Cardiovascular Research)Frankfurt, Germany
| | - Wolfram-Hubertus Zimmermann
- Heart Research Center Göttingen, University Medical Center Göttingen, Institute of Pharmacology and Toxicology, Georg-August-University GöttingenGöttingen, Germany; DZHK (German Centre for Cardiovascular Research)Göttingen, Germany
| | - Edgar Wingender
- University Medical Center Göttingen, Institute of Bioinformatics, Georg-August-University GöttingenGöttingen, Germany; DZHK (German Centre for Cardiovascular Research)Göttingen, Germany
| | - Mehmet Gültas
- University Medical Center Göttingen, Institute of Bioinformatics, Georg-August-University Göttingen Göttingen, Germany
| |
Collapse
|