1
|
Bai L, Li Z, Tang C, Song C, Hu F. Hypergraph-based analysis of weighted gene co-expression hypernetwork. Front Genet 2025; 16:1560841. [PMID: 40255486 PMCID: PMC12006133 DOI: 10.3389/fgene.2025.1560841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Accepted: 03/19/2025] [Indexed: 04/22/2025] Open
Abstract
Background With the rapid advancement of gene sequencing technologies, Traditional weighted gene co-expression network analysis (WGCNA), which relies on pairwise gene relationships, struggles to capture higher-order interactions and exhibits low computational efficiency when handling large, complex datasets. Methods To overcome these challenges, we propose a novel Weighted Gene Co-expression Hypernetwork Analysis (WGCHNA) based on weighted hypergraph, where genes are modeled as nodes and samples as hyperedges. By calculating the hypergraph Laplacian matrix, WGCHNA generates a topological overlap matrix for module identification through hierarchical clustering. Results Results on four gene expression datasets show that WGCHNA outperforms WGCNA in module identification and functional enrichment. WGCHNA identifies biologically relevant modules with greater complexity, particularly in processes like neuronal energy metabolism linked to Alzheimer's disease. Additionally, functional enrichment analysis uncovers more comprehensive pathway hierarchies, revealing potential regulatory relationships and novel targets. Conclusion WGCHNA effectively addresses WGCNA's limitations, providing superior accuracy in detecting gene modules and deeper insights for disease research, making it a powerful tool for analyzing complex biological systems.
Collapse
Affiliation(s)
- Libing Bai
- Computer College of Qinghai Normal University, Xining, Qinghai, China
- The State Key Laboratory of Tibetan Intelligence, Qinghai, Xining, China
| | - Zongjin Li
- College of Science, North China University of Science and Technology, Tangshan, China
| | - Chunyang Tang
- Computer College of Qinghai Normal University, Xining, Qinghai, China
- The State Key Laboratory of Tibetan Intelligence, Qinghai, Xining, China
| | - Changxin Song
- Department of Mechanical Engineering and Information, Shanghai Urban Construction Vocational College, Shanghai, China
| | - Feng Hu
- Computer College of Qinghai Normal University, Xining, Qinghai, China
- The State Key Laboratory of Tibetan Intelligence, Qinghai, Xining, China
| |
Collapse
|
2
|
Aghaieabiane N, Koutis I. SGCP: a spectral self-learning method for clustering genes in co-expression networks. BMC Bioinformatics 2024; 25:230. [PMID: 38956463 PMCID: PMC11221046 DOI: 10.1186/s12859-024-05848-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 06/18/2024] [Indexed: 07/04/2024] Open
Abstract
BACKGROUND A widely used approach for extracting information from gene expression data employs the construction of a gene co-expression network and the subsequent computational detection of gene clusters, called modules. WGCNA and related methods are the de facto standard for module detection. The purpose of this work is to investigate the applicability of more sophisticated algorithms toward the design of an alternative method with enhanced potential for extracting biologically meaningful modules. RESULTS We present self-learning gene clustering pipeline (SGCP), a spectral method for detecting modules in gene co-expression networks. SGCP incorporates multiple features that differentiate it from previous work, including a novel step that leverages gene ontology (GO) information in a self-leaning step. Compared with widely used existing frameworks on 12 real gene expression datasets, we show that SGCP yields modules with higher GO enrichment. Moreover, SGCP assigns highest statistical importance to GO terms that are mostly different from those reported by the baselines. CONCLUSION Existing frameworks for discovering clusters of genes in gene co-expression networks are based on relatively simple algorithmic components. SGCP relies on newer algorithmic techniques that enable the computation of highly enriched modules with distinctive characteristics, thus contributing a novel alternative tool for gene co-expression analysis.
Collapse
Affiliation(s)
- Niloofar Aghaieabiane
- Computer Science Department, New Jersey Institute of Technology, Newark, NJ, 07102, USA
| | - Ioannis Koutis
- Computer Science Department, New Jersey Institute of Technology, Newark, NJ, 07102, USA.
| |
Collapse
|
3
|
Kori M, Temiz K, Gov E. Network medicine approaches for identification of novel prognostic systems biomarkers and drug candidates for papillary thyroid carcinoma. J Cell Mol Med 2023; 27:4171-4180. [PMID: 37859510 PMCID: PMC10746936 DOI: 10.1111/jcmm.18002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 09/21/2023] [Accepted: 10/07/2023] [Indexed: 10/21/2023] Open
Abstract
Papillary thyroid carcinoma (PTC) is one of the most common endocrine carcinomas worldwide and the aetiology of this cancer is still not well understood. Therefore, it remains important to understand the disease mechanism and find prognostic biomarkers and/or drug candidates for PTC. Compared with approaches based on single-gene assessment, network medicine analysis offers great promise to address this need. Accordingly, in the present study, we performed differential co-expressed network analysis using five transcriptome datasets in patients with PTC and healthy controls. Following meta-analysis of the transcriptome datasets, we uncovered common differentially expressed genes (DEGs) for PTC and, using these genes as proxies, found a highly clustered differentially expressed co-expressed module: a 'PTC-module'. Using independent data, we demonstrated the high prognostic capacity of the PTC-module and designated this module as a prognostic systems biomarker. In addition, using the nodes of the PTC-module, we performed drug repurposing and text mining analyzes to identify novel drug candidates for the disease. We performed molecular docking simulations, and identified: 4-demethoxydaunorubicin hydrochloride, AS605240, BRD-A60245366, ER 27319 maleate, sinensetin, and TWS119 as novel drug candidates whose efficacy was also confirmed by in silico analyzes. Consequently, we have highlighted here the need for differential co-expression analysis to gain a systems-level understanding of a complex disease, and we provide candidate prognostic systems biomarker and novel drugs for PTC.
Collapse
Affiliation(s)
- Medi Kori
- Faculty of Health SciencesAcibadem Mehmet Ali Aydinlar UniversityİstanbulTürkiye
- Department of BioengineeringMarmara UniversityİstanbulTürkiye
| | - Kubra Temiz
- Department of BioengineeringAdana Alparslan Turkes Science and Technology UniversityAdanaTürkiye
| | - Esra Gov
- Department of BioengineeringAdana Alparslan Turkes Science and Technology UniversityAdanaTürkiye
| |
Collapse
|
4
|
Weng Y, Ning P. Construction of a prognostic prediction model for renal clear cell carcinoma combining clinical traits. Sci Rep 2023; 13:3358. [PMID: 36849551 PMCID: PMC9970964 DOI: 10.1038/s41598-023-30020-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 02/14/2023] [Indexed: 03/01/2023] Open
Abstract
Kidney renal clear cell carcinoma (KIRC) is one of the common malignant tumors of the urinary system. Patients with different risk levels are other in terms of disease progression patterns and disease regression. The poorer prognosis for high-risk patients compared to low-risk patients. Therefore, it is essential to accurately high-risk screen patients and gives accurate and timely treatment. Differential gene analysis, weighted correlation network analysis, Protein-protein interaction network, and univariate Cox analysis were performed sequentially on the train set. Next, the KIRC prognostic model was constructed using the least absolute shrinkage and selection operator (LASSO), and the Cancer Genome Atlas (TCGA) test set and the Gene Expression Omnibus dataset verified the model's validity. Finally, the constructed models were analyzed; including gene set enrichment analysis (GSEA) and immune analysis. The differences in pathways and immune functions between the high-risk and low-risk groups were observed to provide a reference for clinical treatment and diagnosis. A four-step key gene screen resulted in 17 key factors associated with disease prognosis, including 14 genes and 3 clinical features. The LASSO regression algorithm selected the seven most critical key factors to construct the model: age, grade, stage, GDF3, CASR, CLDN10, and COL9A2. In the training set, the accuracy of the model in predicting 1-, 2- and 3-year survival rates was 0.883, 0.819, and 0.830, respectively. The accuracy of the TCGA dataset was 0.831, 0.801, and 0.791, and the accuracy of the GSE29609 dataset was 0.812, 0.809, and 0.851 in the test set. Model scoring divided the sample into a high-risk group and a low-risk group. There were significant differences in disease progression and risk scores between the two groups. GSEA analysis revealed that the enriched pathways in the high-risk group mainly included proteasome and primary immunodeficiency. Immunological analysis showed that CD8 (+) T cells, M1 macrophages, PDCD1, and CTLA4 were upregulated in the high-risk group. In contrast, antigen-presenting cell stimulation and T-cell co-suppression were more active in the high-risk group. This study added clinical characteristics to constructing the KIRC prognostic model to improve prediction accuracy. It provides help to assess the risk of patients more accurately. The differences in pathways and immunity between high and low-risk groups were also analyzed to provide ideas for treating KIRC patients.
Collapse
Affiliation(s)
- Yujie Weng
- grid.410612.00000 0004 0604 6392College of Computer and Information, Inner Mongolia Medical University, Hohhot, 010110 Inner Mongolia Autonomous Region China
| | - Pengfei Ning
- College of Computer and Information, Inner Mongolia Medical University, Hohhot, 010110, Inner Mongolia Autonomous Region, China.
| |
Collapse
|
5
|
Zhang Y, Shi W, Sun Y. A functional gene module identification algorithm in gene expression data based on genetic algorithm and gene ontology. BMC Genomics 2023; 24:76. [PMID: 36797662 PMCID: PMC9936134 DOI: 10.1186/s12864-023-09157-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 01/31/2023] [Indexed: 02/18/2023] Open
Abstract
Since genes do not function individually, the gene module is considered an important tool for interpreting gene expression profiles. In order to consider both functional similarity and expression similarity in module identification, GMIGAGO, a functional Gene Module Identification algorithm based on Genetic Algorithm and Gene Ontology, was proposed in this work. GMIGAGO is an overlapping gene module identification algorithm, which mainly includes two stages: In the first stage (initial identification of gene modules), Improved Partitioning Around Medoids Based on Genetic Algorithm (PAM-GA) is used for the initial clustering on gene expression profiling, and traditional gene co-expression modules can be obtained. Only similarity of expression levels is considered at this stage. In the second stage (optimization of functional similarity within gene modules), Genetic Algorithm for Functional Similarity Optimization (FSO-GA) is used to optimize gene modules based on gene ontology, and functional similarity within gene modules can be improved. Without loss of generality, we compared GMIGAGO with state-of-the-art gene module identification methods on six gene expression datasets, and GMIGAGO identified the gene modules with the highest functional similarity (much higher than state-of-the-art algorithms). GMIGAGO was applied in BRCA, THCA, HNSC, COVID-19, Stem, and Radiation datasets, and it identified some interesting modules which performed important biological functions. The hub genes in these modules could be used as potential targets for diseases or radiation protection. In summary, GMIGAGO has excellent performance in mining molecular mechanisms, and it can also identify potential biomarkers for individual precision therapy.
Collapse
Affiliation(s)
- Yan Zhang
- grid.440686.80000 0001 0543 8253College of Environmental Science and Engineering, Dalian Maritime University, 116026 Dalian, Liaoning China
| | - Weiyu Shi
- grid.440686.80000 0001 0543 8253College of Maritime Economics & Management, Dalian Maritime University, 116026 Dalian, Liaoning China
| | - Yeqing Sun
- College of Environmental Science and Engineering, Dalian Maritime University, 116026, Dalian, Liaoning, China.
| |
Collapse
|
6
|
Deng S, Shen S, Liu K, El-Ashram S, Alouffi A, Cenci-Goga BT, Ye G, Cao C, Luo T, Zhang H, Li W, Li S, Zhang W, Wu J, Chen C. Integrated bioinformatic analyses investigate macrophage-M1-related biomarkers and tuberculosis therapeutic drugs. Front Genet 2023; 14:1041892. [PMID: 36845395 PMCID: PMC9945105 DOI: 10.3389/fgene.2023.1041892] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open
Abstract
Tuberculosis (TB) is a common infectious disease linked to host genetics and the innate immune response. It is vital to investigate new molecular mechanisms and efficient biomarkers for Tuberculosis because the pathophysiology of the disease is still unclear, and there aren't any precise diagnostic tools. This study downloaded three blood datasets from the GEO database, two of which (GSE19435 and 83456) were used to build a weighted gene co-expression network for searching hub genes associated with macrophage M1 by the CIBERSORT and WGCNA algorithms. Furthermore, 994 differentially expressed genes (DEGs) were extracted from healthy and TB samples, four of which were associated with macrophage M1, naming RTP4, CXCL10, CD38, and IFI44. They were confirmed as upregulation in TB samples by external dataset validation (GSE34608) and quantitative real-time PCR analysis (qRT-PCR). CMap was used to predict potential therapeutic compounds for tuberculosis using 300 differentially expressed genes (150 downregulated and 150 upregulated genes), and six small molecules (RWJ-21757, phenamil, benzanthrone, TG-101348, metyrapone, and WT-161) with a higher confidence value were extracted. We used in-depth bioinformatics analysis to investigate significant macrophage M1-related genes and promising anti-Tuberculosis therapeutic compounds. However, more clinical trials were necessary to determine their effect on Tuberculosis.
Collapse
Affiliation(s)
- Siqi Deng
- Key Laboratory of Xinjiang Endemic and Ethnic Diseases Cooperated by Education Ministry with Xinjiang Province, Shihezi University, Shihezi, China
| | - Shijie Shen
- Key Laboratory of Xinjiang Endemic and Ethnic Diseases Cooperated by Education Ministry with Xinjiang Province, Shihezi University, Shihezi, China
| | - Keyu Liu
- Key Laboratory of Xinjiang Endemic and Ethnic Diseases Cooperated by Education Ministry with Xinjiang Province, Shihezi University, Shihezi, China
| | - Saeed El-Ashram
- Faculty of Science, Kafrelsheikh University, Kafr El-Sheikh, Egypt
| | - Abdulaziz Alouffi
- King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
| | | | - Guomin Ye
- Key Laboratory of Xinjiang Endemic and Ethnic Diseases Cooperated by Education Ministry with Xinjiang Province, Shihezi University, Shihezi, China
| | - Chengzhang Cao
- Key Laboratory of Xinjiang Endemic and Ethnic Diseases Cooperated by Education Ministry with Xinjiang Province, Shihezi University, Shihezi, China
| | - Tingting Luo
- Key Laboratory of Xinjiang Endemic and Ethnic Diseases Cooperated by Education Ministry with Xinjiang Province, Shihezi University, Shihezi, China
| | - Hui Zhang
- Key Laboratory of Xinjiang Endemic and Ethnic Diseases Cooperated by Education Ministry with Xinjiang Province, Shihezi University, Shihezi, China
| | - Weimin Li
- Beijing Chest Hospital, Capital Medical University, Beijing, China
| | - Siyuan Li
- Key Laboratory of Xinjiang Endemic and Ethnic Diseases Cooperated by Education Ministry with Xinjiang Province, Shihezi University, Shihezi, China
| | - Wanjiang Zhang
- Key Laboratory of Xinjiang Endemic and Ethnic Diseases Cooperated by Education Ministry with Xinjiang Province, Shihezi University, Shihezi, China
| | - Jiangdong Wu
- Key Laboratory of Xinjiang Endemic and Ethnic Diseases Cooperated by Education Ministry with Xinjiang Province, Shihezi University, Shihezi, China,*Correspondence: Jiangdong Wu, ; Chuangfu Chen,
| | - Chuangfu Chen
- Key Laboratory of Xinjiang Endemic and Ethnic Diseases Cooperated by Education Ministry with Xinjiang Province, Shihezi University, Shihezi, China,*Correspondence: Jiangdong Wu, ; Chuangfu Chen,
| |
Collapse
|
7
|
Xiao G, Guan R, Cao Y, Huang Z, Xu Y. KISL: knowledge-injected semi-supervised learning for biological co-expression network modules. Front Genet 2023; 14:1151962. [PMID: 37205122 PMCID: PMC10185879 DOI: 10.3389/fgene.2023.1151962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/11/2023] [Indexed: 05/21/2023] Open
Abstract
The exploration of important biomarkers associated with cancer development is crucial for diagnosing cancer, designing therapeutic interventions, and predicting prognoses. The analysis of gene co-expression provides a systemic perspective on gene networks and can be a valuable tool for mining biomarkers. The main objective of co-expression network analysis is to discover highly synergistic sets of genes, and the most widely used method is weighted gene co-expression network analysis (WGCNA). With the Pearson correlation coefficient, WGCNA measures gene correlation, and uses hierarchical clustering to identify gene modules. The Pearson correlation coefficient reflects only the linear dependence between variables, and the main drawback of hierarchical clustering is that once two objects are clustered together, the process cannot be reversed. Hence, readjusting inappropriate cluster divisions is not possible. Existing co-expression network analysis methods rely on unsupervised methods that do not utilize prior biological knowledge for module delineation. Here we present a method for identification of outstanding modules in a co-expression network using a knowledge-injected semi-supervised learning approach (KISL), which utilizes apriori biological knowledge and a semi-supervised clustering method to address the issue existing in the current GCN-based clustering methods. To measure the linear and non-linear dependence between genes, we introduce a distance correlation due to the complexity of the gene-gene relationship. Eight RNA-seq datasets of cancer samples are used to validate its effectiveness. In all eight datasets, the KISL algorithm outperformed WGCNA when comparing the silhouette coefficient, Calinski-Harabasz index and Davies-Bouldin index evaluation metrics. According to the results, KISL clusters had better cluster evaluation values and better gene module aggregation. Enrichment analysis of the recognition modules demonstrated their effectiveness in discovering modular structures in biological co-expression networks. In addition, as a general method, KISL can be applied to various co-expression network analyses based on similarity metrics. Source codes for the KISL and the related scripts are available online at https://github.com/Mowonhoo/KISL.git.
Collapse
Affiliation(s)
- Gangyi Xiao
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Renchu Guan
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yangkun Cao
- School of Artificial Intelligence Jilin University, Changchun, China
| | - Zhenyu Huang
- College of Computer Science and Technology, Jilin University, Changchun, China
- *Correspondence: Ying Xu, ; Zhenyu Huang,
| | - Ying Xu
- School of Medicine, Southern University of Science and Technology, Shenzhen, Guangdong, China
- *Correspondence: Ying Xu, ; Zhenyu Huang,
| |
Collapse
|
8
|
Li H, Yang P. Identification of biomarkers related to neutrophils and two molecular subtypes of systemic lupus erythematosus. BMC Med Genomics 2022; 15:162. [PMID: 35858908 PMCID: PMC9297641 DOI: 10.1186/s12920-022-01306-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 06/27/2022] [Indexed: 11/18/2022] Open
Abstract
Background Systemic lupus erythematosus (SLE), an autoimmune disease with complex pathogenesis, poses a considerable threat to women’s health. Increasing evidence indicates that neutrophils play an important role in the development and progression of lupus. Methods Weighted correlation network analysis and single-sample gene set enrichment analysis (GSEA) were used to analyse SLE expression data from a comprehensive gene expression database and identify modules associated with neutrophils. Thereafter, the biomarkers most closely related to neutrophils were identified. We reclassified SLE into two molecular subtypes based on the aforementioned biomarkers and evaluated cell infiltration, molecular mechanisms, and signature pathways in each subtype. Results The results showed significant differences in immunological characteristics between the two molecular subtypes of SLE. Hub genes were significantly upregulated in the NEUT-H subtype, and they may be associated with lupus activity. The GSEA revealed associations between our biomarkers and key metabolic pathways. Conclusions Our study provides not only a classification for patients with SLE but also new cell and gene targets for immunotherapy, as well as a new experimental paradigm to explore immunotherapy for other autoimmune diseases. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-022-01306-9.
Collapse
Affiliation(s)
- Huiyan Li
- Department of Rheumatology and Immunology, First Affiliated Hospital, China Medical University, Shenyang, 110001, China
| | - Pingting Yang
- Department of Rheumatology and Immunology, First Affiliated Hospital, China Medical University, Shenyang, 110001, China.
| |
Collapse
|
9
|
Sunkavalli A, McClure R, Genco C. Molecular Regulatory Mechanisms Drive Emergent Pathogenetic Properties of Neisseria gonorrhoeae. Microorganisms 2022; 10:922. [PMID: 35630366 PMCID: PMC9147433 DOI: 10.3390/microorganisms10050922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/25/2022] [Accepted: 04/26/2022] [Indexed: 12/05/2022] Open
Abstract
Neisseria gonorrhoeae is the causative agent of the sexually transmitted infection (STI) gonorrhea, with an estimated 87 million annual cases worldwide. N. gonorrhoeae predominantly colonizes the male and female genital tract (FGT). In the FGT, N. gonorrhoeae confronts fluctuating levels of nutrients and oxidative and non-oxidative antimicrobial defenses of the immune system, as well as the resident microbiome. One mechanism utilized by N. gonorrhoeae to adapt to this dynamic FGT niche is to modulate gene expression primarily through DNA-binding transcriptional regulators. Here, we describe the major N. gonorrhoeae transcriptional regulators, genes under their control, and how these regulatory processes lead to pathogenic properties of N. gonorrhoeae during natural infection. We also discuss the current knowledge of the structure, function, and diversity of the FGT microbiome and its influence on gonococcal survival and transcriptional responses orchestrated by its DNA-binding regulators. We conclude with recent multi-omics data and modeling tools and their application to FGT microbiome dynamics. Understanding the strategies utilized by N. gonorrhoeae to regulate gene expression and their impact on the emergent characteristics of this pathogen during infection has the potential to identify new effective strategies to both treat and prevent gonorrhea.
Collapse
Affiliation(s)
- Ashwini Sunkavalli
- Department of Immunology, Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA 02111, USA;
| | - Ryan McClure
- Pacific Northwest National Laboratory, Richland, WA 99354, USA;
| | - Caroline Genco
- Department of Immunology, Graduate School of Biomedical Sciences, Tufts University School of Medicine, Boston, MA 02111, USA;
| |
Collapse
|
10
|
Hou J, Ye X, Feng W, Zhang Q, Han Y, Liu Y, Li Y, Wei Y. Distance correlation application to gene co-expression network analysis. BMC Bioinformatics 2022; 23:81. [PMID: 35193539 PMCID: PMC8862277 DOI: 10.1186/s12859-022-04609-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 02/10/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND To construct gene co-expression networks, it is necessary to evaluate the correlation between different gene expression profiles. However, commonly used correlation metrics, including both linear (such as Pearson's correlation) and monotonic (such as Spearman's correlation) dependence metrics, are not enough to observe the nature of real biological systems. Hence, introducing a more informative correlation metric when constructing gene co-expression networks is still an interesting topic. RESULTS In this paper, we test distance correlation, a correlation metric integrating both linear and non-linear dependence, with other three typical metrics (Pearson's correlation, Spearman's correlation, and maximal information coefficient) on four different arrays (macrophage and liver) and RNA-seq (cervical cancer and pancreatic cancer) datasets. Among all the metrics, distance correlation is distribution free and can provide better performance on complex relationships and anti-outlier. Furthermore, distance correlation is applied to Weighted Gene Co-expression Network Analysis (WGCNA) for constructing a gene co-expression network analysis method which we named Distance Correlation-based Weighted Gene Co-expression Network Analysis (DC-WGCNA). Compared with traditional WGCNA, DC-WGCNA can enhance the result of enrichment analysis and improve the module stability. CONCLUSIONS Distance correlation is better at revealing complex biological relationships between gene profiles compared with other correlation metrics, which contribute to more meaningful modules when analyzing gene co-expression networks. However, due to the high time complexity of distance correlation, the implementation requires more computer memory.
Collapse
Affiliation(s)
- Jie Hou
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Nantong Street, Harbin, China.,College of Science, Heilongjiang Bayi Agricultural University, Xinfeng Road, Daqing, China
| | - Xiufen Ye
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Nantong Street, Harbin, China.
| | - Weixing Feng
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Nantong Street, Harbin, China
| | - Qiaosheng Zhang
- School of Computer Engineering, Jiangsu Ocean University, Cangwu Road, Lianyungang, China
| | - Yatong Han
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Nantong Street, Harbin, China
| | - Yusong Liu
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Nantong Street, Harbin, China
| | - Yu Li
- College of Science, Northeast Forestry University, Hexing Road, Harbin, China
| | - Yufen Wei
- College of Science, Heilongjiang Bayi Agricultural University, Xinfeng Road, Daqing, China
| |
Collapse
|
11
|
Aghaieabiane N, Koutis I. A Novel Calibration Step in Gene Co-Expression Network Construction. FRONTIERS IN BIOINFORMATICS 2021; 1:704817. [PMID: 36303738 PMCID: PMC9581019 DOI: 10.3389/fbinf.2021.704817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 10/22/2021] [Indexed: 12/02/2022] Open
Abstract
High-throughput technologies such as DNA microarrays and RNA-sequencing are used to measure the expression levels of large numbers of genes simultaneously. To support the extraction of biological knowledge, individual gene expression levels are transformed to Gene Co-expression Networks (GCNs). In a GCN, nodes correspond to genes, and the weight of the connection between two nodes is a measure of similarity in the expression behavior of the two genes. In general, GCN construction and analysis includes three steps; 1) calculating a similarity value for each pair of genes 2) using these similarity values to construct a fully connected weighted network 3) finding clusters of genes in the network, commonly called modules. The specific implementation of these three steps can significantly impact the final output and the downstream biological analysis. GCN construction is a well-studied topic. Existing algorithms rely on relatively simple statistical and mathematical tools to implement these steps. Currently, software package WGCNA appears to be the most widely accepted standard. We hypothesize that the raw features provided by sequencing data can be leveraged to extract modules of higher quality. A novel preprocessing step of the gene expression data set is introduced that in effect calibrates the expression levels of individual genes, before computing pairwise similarities. Further, the similarity is computed as an inner-product of positive vectors. In experiments, this provides a significant improvement over WGCNA, as measured by aggregate p-values of the gene ontology term enrichment of the computed modules.
Collapse
|