1
|
Zhang Y, Zhao J, Sun X, Zheng Y, Chen T, Wang Z. Leveraging independent component analysis to unravel transcriptional regulatory networks: A critical review and future directions. Biotechnol Adv 2025; 78:108479. [PMID: 39577573 DOI: 10.1016/j.biotechadv.2024.108479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 11/11/2024] [Accepted: 11/14/2024] [Indexed: 11/24/2024]
Abstract
Transcriptional regulatory networks (TRNs) play a crucial role in exploring microbial life activities and complex regulatory mechanisms. The comprehensive reconstruction of TRNs requires the integration of large-scale experimental data, which poses significant challenges due to the complexity of regulatory relationships. The application of machine learning tools, such as clustering analysis, has been employed to investigate TRNs, but these methods have limitations in capturing both global and local co-expression effects. In contrast, Independent Component Analysis (ICA) has emerged as a powerful analysis algorithm for modularizing independently regulated gene sets in TRNs, allowing it to account for both global and local co-expression effects. In this review, we comprehensively summarize the application of ICA in unraveling TRNs and highlight the research progress in three key aspects: (1) extending TRNs with iModulon analysis; (2) elucidating the regulatory mechanisms triggered by environmental perturbation; and (3) exploring the mechanisms of transcriptional regulation triggered by changes in microbial physiological state. At the end of this review, we also address the challenges facing ICA in TRN analysis and outline future research directions to promote the advancement of ICA-based transcriptomics analysis in biotechnology and related fields.
Collapse
Affiliation(s)
- Yuhan Zhang
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Jianxiao Zhao
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Xi Sun
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; School of Life Science, Ningxia University, Yinchuan 750021, China
| | - Yangyang Zheng
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Tao Chen
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China
| | - Zhiwen Wang
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China; School of Life Science, Ningxia University, Yinchuan 750021, China.
| |
Collapse
|
2
|
Yuan W, Li Y, Han Z, Chen Y, Xie J, Chen J, Bi Z, Xi J. Evolutionary Mechanism Based Conserved Gene Expression Biclustering Module Analysis for Breast Cancer Genomics. Biomedicines 2024; 12:2086. [PMID: 39335599 PMCID: PMC11428256 DOI: 10.3390/biomedicines12092086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/23/2024] [Accepted: 09/02/2024] [Indexed: 09/30/2024] Open
Abstract
The identification of significant gene biclusters with particular expression patterns and the elucidation of functionally related genes within gene expression data has become a critical concern due to the vast amount of gene expression data generated by RNA sequencing technology. In this paper, a Conserved Gene Expression Module based on Genetic Algorithm (CGEMGA) is proposed. Breast cancer data from the TCGA database is used as the subject of this study. The p-values from Fisher's exact test are used as evaluation metrics to demonstrate the significance of different algorithms, including the Cheng and Church algorithm, CGEM algorithm, etc. In addition, the F-test is used to investigate the difference between our method and the CGEM algorithm. The computational cost of the different algorithms is further investigated by calculating the running time of each algorithm. Finally, the established driver genes and cancer-related pathways are used to validate the process. The results of 10 independent runs demonstrate that CGEMGA has a superior average p-value of 1.54 × 10-4 ± 3.06 × 10-5 compared to all other algorithms. Furthermore, our approach exhibits consistent performance across all methods. The F-test yields a p-value of 0.039, indicating a significant difference between our approach and the CGEM. Computational cost statistics also demonstrate that our approach has a significantly shorter average runtime of 5.22 × 100 ± 1.65 × 10-1 s compared to the other algorithms. Enrichment analysis indicates that the genes in our approach are significantly enriched for driver genes. Our algorithm is fast and robust, efficiently extracting co-expressed genes and associated co-expression condition biclusters from RNA-seq data.
Collapse
Affiliation(s)
- Wei Yuan
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Yaming Li
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Zhengpan Han
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Yu Chen
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Jinnan Xie
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Jianguo Chen
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Zhisheng Bi
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Jianing Xi
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| |
Collapse
|
3
|
You C, Jiang S, Ding Y, Ye S, Zou X, Zhang H, Li Z, Chen F, Li Y, Ge X, Guo X. RNA barcode segments for SARS-CoV-2 identification from HCoVs and SARSr-CoV-2 lineages. Virol Sin 2024; 39:156-168. [PMID: 38253258 PMCID: PMC10877444 DOI: 10.1016/j.virs.2024.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 01/17/2024] [Indexed: 01/24/2024] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the pathogen responsible for coronavirus disease 2019 (COVID-19), continues to evolve, giving rise to more variants and global reinfections. Previous research has demonstrated that barcode segments can effectively and cost-efficiently identify specific species within closely related populations. In this study, we designed and tested RNA barcode segments based on genetic evolutionary relationships to facilitate the efficient and accurate identification of SARS-CoV-2 from extensive virus samples, including human coronaviruses (HCoVs) and SARSr-CoV-2 lineages. Nucleotide sequences sourced from NCBI and GISAID were meticulously selected and curated to construct training sets, encompassing 1733 complete genome sequences of HCoVs and SARSr-CoV-2 lineages. Through genetic-level species testing, we validated the accuracy and reliability of the barcode segments for identifying SARS-CoV-2. Subsequently, 75 main and subordinate species-specific barcode segments for SARS-CoV-2, located in ORF1ab, S, E, ORF7a, and N coding sequences, were intercepted and screened based on single-nucleotide polymorphism sites and weighted scores. Post-testing, these segments exhibited high recall rates (nearly 100%), specificity (almost 30% at the nucleotide level), and precision (100%) performance on identification. They were eventually visualized using one and two-dimensional combined barcodes and deposited in an online database (http://virusbarcodedatabase.top/). The successful integration of barcoding technology in SARS-CoV-2 identification provides valuable insights for future studies involving complete genome sequence polymorphism analysis. Moreover, this cost-effective and efficient identification approach also provides valuable reference for future research endeavors related to virus surveillance.
Collapse
Affiliation(s)
- Changqiao You
- College of Biology, Hunan University, Changsha, 410082, China
| | - Shuai Jiang
- College of Biology, Hunan University, Changsha, 410082, China
| | - Yunyun Ding
- College of Biology, Hunan University, Changsha, 410082, China
| | - Shunxing Ye
- College of Bioscience and Biotechnology, Hunan Agricultural University, Changsha, 410128, China
| | - Xiaoxiao Zou
- College of Biology, Hunan University, Changsha, 410082, China
| | - Hongming Zhang
- College of Biology, Hunan University, Changsha, 410082, China
| | - Zeqi Li
- College of Biology, Hunan University, Changsha, 410082, China
| | - Fenglin Chen
- College of Biology, Hunan University, Changsha, 410082, China
| | - Yongliang Li
- College of Biology, Hunan University, Changsha, 410082, China.
| | - Xingyi Ge
- College of Biology, Hunan University, Changsha, 410082, China.
| | - Xinhong Guo
- College of Biology, Hunan University, Changsha, 410082, China.
| |
Collapse
|
4
|
Aidi MN, Wulandari C, Oktarina SD, Aditra TR, Ernawati F, Efriwati E, Nurjanah N, Rachmawati R, Julianti ED, Sundari D, Retiaty F, Arifin AY, Dewi RM, Nazaruddin N, Salimar S, Fuada N, Widodo Y, Setyawati B, Nurhidayati N, Sudikno S, Irawan IR, Widoretno W. Province clustering based on the percentage of communicable disease using the BCBimax biclustering algorithm. GEOSPATIAL HEALTH 2023; 18. [PMID: 37698368 DOI: 10.4081/gh.2023.1202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 08/09/2023] [Indexed: 09/13/2023]
Abstract
Indonesia needs to lower its high infectious disease rate. This requires reliable data and following their temporal changes across provinces. We investigated the benefits of surveying the epidemiological situation with the imax biclustering algorithm using secondary data from a recent national scale survey of main infectious diseases from the National Basic Health Research (Riskesdas) covering 34 provinces in Indonesia. Hierarchical and k-means clustering can only handle one data source, but BCBimax biclustering can cluster rows and columns in a data matrix. Several experiments determined the best row and column threshold values, which is crucial for a useful result. The percentages of Indonesia's seven most common infectious diseases (ARI, pneumonia, diarrhoea, tuberculosis (TB), hepatitis, malaria, and filariasis) were ordered by province to form groups without considering proximity because clusters are usually far apart. ARI, pneumonia, and diarrhoea were divided into toddler and adult infections, making 10 target diseases instead of seven. The set of biclusters formed based on the presence and level of these diseases included 7 diseases with moderate to high disease levels, 5 diseases (formed by 2 clusters), 3 diseases, 2 diseases, and a final order that only included adult diarrhoea. In 6 of 8 clusters, diarrhea was the most prevalent infectious disease in Indonesia, making its eradication a priority. Direct person-to-person infections like ARI, pneumonia, TB, and diarrhoea were found in 4-6 of 8 clusters. These diseases are more common and spread faster than vector-borne diseases like malaria and filariasis, making them more important.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Dian Sundari
- National Research and Innovation Agency, Jakarta.
| | - Fifi Retiaty
- National Research and Innovation Agency, Jakarta.
| | | | | | | | | | | | - Yekti Widodo
- National Research and Innovation Agency, Jakarta.
| | | | | | | | | | | |
Collapse
|