1
|
Chuah CW, He W, Huang DS. GMean-a semi-supervised GRU and K-mean model for predicting the TF binding site. Sci Rep 2024; 14:2539. [PMID: 38291225 PMCID: PMC10827707 DOI: 10.1038/s41598-024-52933-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Accepted: 01/25/2024] [Indexed: 02/01/2024] Open
Abstract
The transcription factor binding site is a deoxyribonucleic acid sequence that binds to transcription factors. Transcription factors are proteins that regulate the transcription gene. Abnormal turnover of transcription factors can lead to uncontrolled cell growth. Therefore, discovering the relationships between transcription factors and deoxyribonucleic acid sequences is an important component of bioinformatics research. Numerous deep learning and machine learning language models have been developed to accomplish these tasks. Our goal in this work is to propose a GMean model for predicting unlabelled deoxyribonucleic acid sequences. The GMean model is a hybrid model with a combination of gated recurrent unit and K-mean clustering. The GMean model is developed in three phases. The labelled and unlabelled data are processed based on k-mers and tokenization. The labelled data is used for training. The unlabelled data are used for testing and prediction. The experimental data consists of deoxyribonucleic acid experimental of GM12878, K562 and HepG2. The experimental results show that GMean is feasible and effective in predicting deoxyribonucleic acid sequences, as the highest accuracy is 91.85% in predicting K562 and HepG2. This is followed by the prediction of the sequence between GM12878 and K562 with an accuracy of 89.13%. The lowest accuracy is the prediction of the sequence between HepG2 and GM12828, which is 88.80%.
Collapse
Affiliation(s)
- Chai Wen Chuah
- Guangdong University of Science and Technology, Songsan Hu, Dongguang, 523070, Guangdong, China.
| | - Wanxian He
- Guangxi Academy of Sciences, 98 Daling Road, Nanning, 530007, Guangxi, China
| | - De-Shuang Huang
- Eastern Institute for Advanced Study, Eastern Institute of Technology, Tongxin Road No. 568, Ningbo, 315201, China
| |
Collapse
|
2
|
Kartiganer Z, Rojas G, Riccio M, Tyree A, Noronha K, Wetzel M, Barnett J, McGann J, Garbarino J, Massucci D, Chafi NS, Decker S, McDaniels A, Sabina J, Levchenko D, Perez J, Ng C, Wang K. Improved cell-type identification and comprehensive mapping of regulatory features with spatial epigenomics 96-channel microfluidic platform. GEN BIOTECHNOLOGY 2023; 2:503-514. [PMID: 39380764 PMCID: PMC11460376 DOI: 10.1089/genbio.2023.0044] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2024]
Abstract
Gene expression is subject to epigenetic regulation and is dependent upon cellular context. Spatial omics tools can provide insight into cellular context; however, development has centered on spatial transcriptomics and proteomics. Deterministic barcoding in tissue for spatial omics sequencing (DBiT-seq) was the first spatial epigenomics platform at the cellular level. Here we present a comparison of spatial epigenomic profiling on both 50-channel and 96-channel platforms. The new 96-channel microfluidics chip design greatly improved precision in cell typing and identification of regulatory elements by spatial-ATAC-seq. Spatial mapping reveals complexity of glial cell and neuronal localization within brain structures as well as cis-regulatory elements controlling cellular function. This technology streamlines spatial analysis of the epigenome and contributes a new layer of spatial omics to uncover the context dependent regulatory mechanisms underpinning development, disease, and normal cellular function.
Collapse
Affiliation(s)
| | | | | | | | - Katelyn Noronha
- Commercial, AtlasXomics, Inc., New Haven, CT
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT
| | | | | | - James McGann
- Bioinformatics, AtlasXomics, Inc., New Haven, CT
| | | | | | | | | | | | | | - David Levchenko
- Commercial, AtlasXomics, Inc., New Haven, CT
- Lafayette College, Easton, PA
| | - Jose Perez
- Engineering, AtlasXomics, Inc., New Haven, CT
| | - Colin Ng
- Commercial, AtlasXomics, Inc., New Haven, CT
| | | |
Collapse
|
3
|
Zou Z, Yoshimura Y, Yamanishi Y, Oki S. Elucidating disease-associated mechanisms triggered by pollutants via the epigenetic landscape using large-scale ChIP-Seq data. Epigenetics Chromatin 2023; 16:34. [PMID: 37743474 PMCID: PMC10518938 DOI: 10.1186/s13072-023-00510-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 09/19/2023] [Indexed: 09/26/2023] Open
Abstract
BACKGROUND Despite well-documented effects on human health, the action modes of environmental pollutants are incompletely understood. Although transcriptome-based approaches are widely used to predict associations between chemicals and disorders, the molecular cues regulating pollutant-derived gene expression changes remain unclear. Therefore, we developed a data-mining approach, termed "DAR-ChIPEA," to identify transcription factors (TFs) playing pivotal roles in the action modes of pollutants. METHODS Large-scale public ChIP-Seq data (human, n = 15,155; mouse, n = 13,156) were used to predict TFs that are enriched in the pollutant-induced differentially accessible genomic regions (DARs) obtained from epigenome analyses (ATAC-Seq). The resultant pollutant-TF matrices were then cross-referenced to a repository of TF-disorder associations to account for pollutant modes of action. We subsequently evaluated the performance of the proposed method using a chemical perturbation data set to compare the outputs of the DAR-ChIPEA and our previously developed differentially expressed gene (DEG)-ChIPEA methods using pollutant-induced DEGs as input. We then adopted the proposed method to predict disease-associated mechanisms triggered by pollutants. RESULTS The proposed approach outperformed other methods using the area under the receiver operating characteristic curve score. The mean score of the proposed DAR-ChIPEA was significantly higher than that of our previously described DEG-ChIPEA (0.7287 vs. 0.7060; Q = 5.278 × 10-42; two-tailed Wilcoxon rank-sum test). The proposed approach further predicted TF-driven modes of action upon pollutant exposure, indicating that (1) TFs regulating Th1/2 cell homeostasis are integral in the pathophysiology of tributyltin-induced allergic disorders; (2) fine particulates (PM2.5) inhibit the binding of C/EBPs, Rela, and Spi1 to the genome, thereby perturbing normal blood cell differentiation and leading to immune dysfunction; and (3) lead induces fatty liver by disrupting the normal regulation of lipid metabolism by altering hepatic circadian rhythms. CONCLUSIONS Highlighting genome-wide chromatin change upon pollutant exposure to elucidate the epigenetic landscape of pollutant responses outperformed our previously described method that focuses on gene-adjacent domains only. Our approach has the potential to reveal pivotal TFs that mediate deleterious effects of pollutants, thereby facilitating the development of strategies to mitigate damage from environmental pollution.
Collapse
Affiliation(s)
- Zhaonan Zou
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, 53 Shogoin Kawahara-Cho, Sakyo-Ku, Kyoto, 606-8507, Japan
| | - Yuka Yoshimura
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, 53 Shogoin Kawahara-Cho, Sakyo-Ku, Kyoto, 606-8507, Japan
| | - Yoshihiro Yamanishi
- Department of Complex Systems Science, Graduate School of Informatics, Nagoya University, Furo-Cho, Chikusa-Ku, Nagoya, 464-8602, Japan
| | - Shinya Oki
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, 53 Shogoin Kawahara-Cho, Sakyo-Ku, Kyoto, 606-8507, Japan.
| |
Collapse
|
4
|
Iwata M, Kosai K, Ono Y, Oki S, Mimori K, Yamanishi Y. Regulome-based characterization of drug activity across the human diseasome. NPJ Syst Biol Appl 2022; 8:44. [DOI: 10.1038/s41540-022-00255-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 10/21/2022] [Indexed: 11/09/2022] Open
Abstract
AbstractDrugs are expected to recover the cell system away from the impaired state to normalcy through disease treatment. However, the understanding of gene regulatory machinery underlying drug activity or disease pathogenesis is far from complete. Here, we perform large-scale regulome analysis for various diseases in terms of gene regulatory machinery. Transcriptome signatures were converted into regulome signatures of transcription factors by integrating publicly available ChIP-seq data. Regulome-based correlations between diseases and their approved drugs were much clearer than the transcriptome-based correlations. For example, an inverse correlation was observed for cancers, whereas a positive correlation was observed for immune system diseases. After demonstrating the usefulness of the regulome-based drug discovery method in terms of accuracy and applicability, we predicted new drugs for nonsmall cell lung cancer and validated the anticancer activity in vitro. The proposed method is useful for understanding disease–disease relationships and drug discovery.
Collapse
|
5
|
Zou Z, Ohta T, Miura F, Oki S. ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP-seq, ATAC-seq and Bisulfite-seq data. Nucleic Acids Res 2022; 50:W175-W182. [PMID: 35325188 PMCID: PMC9252733 DOI: 10.1093/nar/gkac199] [Citation(s) in RCA: 203] [Impact Index Per Article: 67.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 02/21/2022] [Accepted: 03/22/2022] [Indexed: 01/07/2023] Open
Abstract
ChIP-Atlas (https://chip-atlas.org) is a web service providing both GUI- and API-based data-mining tools to reveal the architecture of the transcription regulatory landscape. ChIP-Atlas is powered by comprehensively integrating all data sets from high-throughput ChIP-seq and DNase-seq, a method for profiling chromatin regions accessible to DNase. In this update, we further collected all the ATAC-seq and whole-genome bisulfite-seq data for six model organisms (human, mouse, rat, fruit fly, nematode, and budding yeast) with the latest genome assemblies. These together with ChIP-seq data can be visualized with the Peak Browser tool and a genome browser to explore the epigenomic landscape of a query genomic locus, such as its chromatin accessibility, DNA methylation status, and protein–genome interactions. This epigenomic landscape can also be characterized for multiple genes and genomic loci by querying with the Enrichment Analysis tool, which, for example, revealed that inflammatory bowel disease-associated SNPs are the most significantly hypo-methylated in neutrophils. Therefore, ChIP-Atlas provides a panoramic view of the whole epigenomic landscape. All datasets are free to download via either a simple button on the web page or an API.
Collapse
Affiliation(s)
- Zhaonan Zou
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, 53 Shogoin Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan.,Kyoto University Graduate Program for Medical Innovation, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan.,Kyoto University Graduate Division, Yoshida-Nihonmatsu-cho, Sakyo-ku, Kyoto 606-8501, Japan
| | - Tazro Ohta
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Yata 1111, Mishima, Shizuoka 411-8540, Japan
| | - Fumihito Miura
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, 3-1-1 Maidashi, Higashi-ku, Fukuoka 812-8582, Japan
| | - Shinya Oki
- Department of Drug Discovery Medicine, Kyoto University Graduate School of Medicine, 53 Shogoin Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan.,Precursory Research for Embryonic Science and Technology, Japan Science and Technology Agency, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan
| |
Collapse
|