1
|
Sun Z, Wang M, Wang S, Kwong S. LEC-Codec: Learning-Based Genome Data Compression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2447-2458. [PMID: 39361454 DOI: 10.1109/tcbb.2024.3473899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2024]
Abstract
In this paper, we propose a Learning-based gEnome Codec (LEC), which is designed for high efficiency and enhanced flexibility. The LEC integrates several advanced technologies, including Group of Bases (GoB) compression, multi-stride coding and bidirectional prediction, all of which are aimed at optimizing the balance between coding complexity and performance in lossless compression. The model applied in our proposed codec is data-driven, based on deep neural networks to infer probabilities for each symbol, enabling fully parallel encoding and decoding with configured complexity for diverse applications. Based upon a set of configurations on compression ratios and inference speed, experimental results show that the proposed method is very efficient in terms of compression performance and provides improved flexibility in real-world applications.
Collapse
|
2
|
Zeng Z, Hu J, Xiao G, Liu Y, Jia D, Wu G, Xie C, Li S, Bi X. Integrating network toxicology and molecular docking to explore the toxicity of the environmental pollutant butyl hydroxyanisole: An example of induction of chronic urticaria. Heliyon 2024; 10:e35409. [PMID: 39170477 PMCID: PMC11336633 DOI: 10.1016/j.heliyon.2024.e35409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 07/08/2024] [Accepted: 07/29/2024] [Indexed: 08/23/2024] Open
Abstract
The study aimed to comprehensively investigate environmental pollutants' potential toxicity and underlying molecular mechanisms, focusing on chronic urticaria (CU) induced by butylated hydroxyanisole (BHA) exposure, further drawing public awareness regarding the potential risks of environmental pollutants, applying ChEMBL, STITCH, and SwissTargetPrediction databases to predict the targets of BHA, CTD, GeneCards, and OMIM databases to collect the relevant targets of CU. Ultimately, we identified 81 potential targets of BHA-induced CU and extracted 31 core targets, including TNF, SRC, CASP3, BCL2, IL2, and MMP9. GO and KEGG enrichment analyses revealed that these core targets were predominantly involved in cancer signaling, estrogen and endocrine resistance pathways. Furthermore, molecular docking confirmed the ability of BHA to bind with core targets. The onset and development of CU may result from BHA by affecting multiple immune signaling pathways. Our study elucidated the molecular mechanisms of BHA toxicity and its role in CU induction, providing the basis for preventing and treating chronic urticaria associated with environmental BHA exposure.
Collapse
Affiliation(s)
- Zhihao Zeng
- School of the Fifth Clinical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, 510405, China
| | - Jiaoting Hu
- Artemisinin Research Center, Guangzhou University of Chinese Medicine, Guangzhou, 510405, China
| | - Guanlin Xiao
- Guangdong Provincial Engineering and Technology Research Institute of Traditional Chinese Medicine/Guangdong Provincial Key Laboratory of Research and Development in Traditional Chinese Medicine, Guangzhou, 510095, China
| | - Yanchang Liu
- School of the Fifth Clinical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, 510405, China
| | - Dezheng Jia
- School of the Fifth Clinical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, 510405, China
| | - Guangying Wu
- School of the Fifth Clinical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, 510405, China
| | - Canhui Xie
- School of the Fifth Clinical Medicine, Guangzhou University of Chinese Medicine, Guangzhou, 510405, China
| | - Sumei Li
- Guangdong Provincial Engineering and Technology Research Institute of Traditional Chinese Medicine/Guangdong Provincial Key Laboratory of Research and Development in Traditional Chinese Medicine, Guangzhou, 510095, China
| | - Xiaoli Bi
- Guangdong Provincial Engineering and Technology Research Institute of Traditional Chinese Medicine/Guangdong Provincial Key Laboratory of Research and Development in Traditional Chinese Medicine, Guangzhou, 510095, China
| |
Collapse
|
3
|
Peng H, Xu J, Liu K, Liu F, Zhang A, Zhang X. EIEPCF: accurate inference of functional gene regulatory networks by eliminating indirect effects from confounding factors. Brief Funct Genomics 2024; 23:373-383. [PMID: 37642217 DOI: 10.1093/bfgp/elad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Reconstructing functional gene regulatory networks (GRNs) is a primary prerequisite for understanding pathogenic mechanisms and curing diseases in animals, and it also provides an important foundation for cultivating vegetable and fruit varieties that are resistant to diseases and corrosion in plants. Many computational methods have been developed to infer GRNs, but most of the regulatory relationships between genes obtained by these methods are biased. Eliminating indirect effects in GRNs remains a significant challenge for researchers. In this work, we propose a novel approach for inferring functional GRNs, named EIEPCF (eliminating indirect effects produced by confounding factors), which eliminates indirect effects caused by confounding factors. This method eliminates the influence of confounding factors on regulatory factors and target genes by measuring the similarity between their residuals. The validation results of the EIEPCF method on simulation studies, the gold-standard networks provided by the DREAM3 Challenge and the real gene networks of Escherichia coli demonstrate that it achieves significantly higher accuracy compared to other popular computational methods for inferring GRNs. As a case study, we utilized the EIEPCF method to reconstruct the cold-resistant specific GRN from gene expression data of cold-resistant in Arabidopsis thaliana. The source code and data are available at https://github.com/zhanglab-wbgcas/EIEPCF.
Collapse
Affiliation(s)
- Huixiang Peng
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Kangchen Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- University of Chinese Academy of Sciences, Beijing 100049 China
| | - Fang Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Aidi Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074 China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan 430074, China
| |
Collapse
|
4
|
Subbaroyan A, Sil P, Martin OC, Samal A. Leveraging developmental landscapes for model selection in Boolean gene regulatory networks. Brief Bioinform 2023; 24:7145905. [PMID: 37114653 DOI: 10.1093/bib/bbad160] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 03/26/2023] [Accepted: 04/03/2023] [Indexed: 04/29/2023] Open
Abstract
Boolean models are a well-established framework to model developmental gene regulatory networks (DGRNs) for acquisition of cellular identities. During the reconstruction of Boolean DGRNs, even if the network structure is given, there is generally a large number of combinations of Boolean functions that will reproduce the different cell fates (biological attractors). Here we leverage the developmental landscape to enable model selection on such ensembles using the relative stability of the attractors. First we show that previously proposed measures of relative stability are strongly correlated and we stress the usefulness of the one that captures best the cell state transitions via the mean first passage time (MFPT) as it also allows the construction of a cellular lineage tree. A property of great computational importance is the insensitivity of the different stability measures to changes in noise intensities. That allows us to use stochastic approaches to estimate the MFPT and thereby scale up the computations to large networks. Given this methodology, we revisit different Boolean models of Arabidopsis thaliana root development, showing that a most recent one does not respect the biologically expected hierarchy of cell states based on relative stabilities. We therefore developed an iterative greedy algorithm that searches for models which satisfy the expected hierarchy of cell states and found that its application to the root development model yields many models that meet this expectation. Our methodology thus provides new tools that can enable reconstruction of more realistic and accurate Boolean models of DGRNs.
Collapse
Affiliation(s)
- Ajay Subbaroyan
- The Institute of Mathematical Sciences (IMSc), Chennai, 600113, India
- Homi Bhabha National Institute (HBNI), Mumbai, 400094, India
| | - Priyotosh Sil
- The Institute of Mathematical Sciences (IMSc), Chennai, 600113, India
- Homi Bhabha National Institute (HBNI), Mumbai, 400094, India
| | - Olivier C Martin
- Université Paris-Saclay, CNRS, INRAE, Univ Evry, Institute of Plant Sciences Paris-Saclay (IPS2), 91405, Orsay, France
- Université de Paris, CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), 91405, Orsay, France
| | - Areejit Samal
- The Institute of Mathematical Sciences (IMSc), Chennai, 600113, India
- Homi Bhabha National Institute (HBNI), Mumbai, 400094, India
| |
Collapse
|
5
|
Gomes R, Denison Kroschel A, Day S, Jansen R. High variation across E. coli hybrid isolates identified in metabolism-related biological pathways co-expressed with virulent genes. Gut Microbes 2023; 15:2228042. [PMID: 37417543 PMCID: PMC10332235 DOI: 10.1080/19490976.2023.2228042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Accepted: 06/12/2023] [Indexed: 07/08/2023] Open
Abstract
Virulent genes present in Escherichia coli (E. coli) can cause significant human diseases. These enteropathogenic E. coli (EPEC) and enterotoxigenic E. coli (ETEC) isolates with virulent genes show different expression levels when grown under diverse laboratory conditions. In this research, we have performed differential gene expression analysis using publicly available RNA-seq data on three pathogenic E. coli hybrid isolates in an attempt to characterize the variation in gene interactions that are altered by the presence or absence of virulent factors within the genome. Almost 26.7% of the common genes across these strains were found to be differentially expressed. Out of the 88 differentially expressed genes with virulent factors identified from PATRIC, nine were common in all these strains. A combination of Weighted Gene Co-Expression Network Analysis and Gene Ontology Enrichment Analysis reveals significant differences in gene co-expression involving virulent genes common among the three investigated strains. The co-expression pattern is observed to be especially variable among biological pathways involving metabolism-related genes. This suggests a potential difference in resource allocation or energy generation across the three isolates based on genomic variation.
Collapse
Affiliation(s)
- Rahul Gomes
- Department of Computer Science, University of Wisconsin-Eau Claire, Eau Claire, WI, USA
| | | | - Stephanie Day
- Department of Earth, Environment, and Geospatial Sciences, North Dakota State University, Fargo, ND, USA
| | - Rick Jansen
- Masonic Cancer Center, University of Minnesota-Twin Cities, Minneapolis, MN, USA
| |
Collapse
|
6
|
Jia Z, Zhang X. Accurate determination of causalities in gene regulatory networks by dissecting downstream target genes. Front Genet 2022; 13:923339. [PMID: 36568360 PMCID: PMC9768335 DOI: 10.3389/fgene.2022.923339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 11/08/2022] [Indexed: 12/12/2022] Open
Abstract
Accurate determination of causalities between genes is a challenge in the inference of gene regulatory networks (GRNs) from the gene expression profile. Although many methods have been developed for the reconstruction of GRNs, most of them are insufficient in determining causalities or regulatory directions. In this work, we present a novel method, namely, DDTG, to improve the accuracy of causality determination in GRN inference by dissecting downstream target genes. In the proposed method, the topology and hierarchy of GRNs are determined by mutual information and conditional mutual information, and the regulatory directions of GRNs are determined by Taylor formula-based regression. In addition, indirect interactions are removed with the sparseness of the network topology to improve the accuracy of network inference. The method is validated on the benchmark GRNs from DREAM3 and DREAM4 challenges. The results demonstrate the superior performance of the DDTG method on causality determination of GRNs compared to some popular GRN inference methods. This work provides a useful tool to infer the causal gene regulatory network.
Collapse
Affiliation(s)
- Zhigang Jia
- School of Mathematics and Statistics, Xinyang Normal University, Xinyang, China,Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China,Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Wuhan, China,*Correspondence: Xiujun Zhang,
| |
Collapse
|