1
|
Wang P, Liu F, Wang Y, Chen H, Liu T, Li M, Chen S, Wang D. Deciphering crucial salt-responsive genes in Brassica napus via statistical modeling and network analysis on dynamic transcriptomic data. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2025; 220:109568. [PMID: 39903946 DOI: 10.1016/j.plaphy.2025.109568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Revised: 01/15/2025] [Accepted: 01/24/2025] [Indexed: 02/06/2025]
Abstract
Soil salinization severely impacts crop yields, threatening global food security. Understanding the salt stress response of Brassica napus (B. napus), a vital oilseed crop, is crucial for developing salt-tolerant varieties. This study aims to comprehensively characterize the dynamic transcriptomic response of B. napus seedlings to salt stress, identifying key genes and pathways involved in this process. RNA-sequencing on 43 B. napus seedling samples are performed, including 24 controls and 19 salt-stressed plants, at time points of 0, 1, 3, 6, and 12 h. Differential expression analysis using 33 control experiments (CEs) identified 39,330 differentially expressed genes (DEGs). Principal component analysis (PCA) and a novel penalized logistic regression with k-Shape clustering (PLRKSC) method identify 346 crucial DEGs. GO enrichment, differential co-expression network analysis, and functional validation through B. napus transformation verify the functional roles of the identified DEGs. The analysis reveals highly dynamic and tissue-specific expression patterns of DEGs under salt stress. The identified 346 crucial DEGs include those involved in leaf and root development, stress-responsive transcription factors, and genes associated with the salt overly sensitive (SOS) pathway. Specifically, Overexpression of RD26 (BnaC07g40860D) in B. napus significantly enhances salt tolerance, confirming its role in salt stress response. This study provides a comprehensive understanding of the B. napus salt stress response at the transcriptomic level and identifies key candidate genes, such as RD26, for developing salt-tolerant varieties. The methodologies established can be applied to other omics studies of plant stress responses.
Collapse
Affiliation(s)
- Pei Wang
- State Key Laboratory of Crop Stress Adaption and Improvement, College of Agriculture, School of Life Sciences, School of Mathematics and Statistics, Henan University, Kaifeng, 475004, Henan, China; Henan Engineering Research Center for Industrial Internet of Things, Henan University, Zhengzhou, 450046, Henan, China
| | - Fei Liu
- State Key Laboratory of Crop Stress Adaption and Improvement, College of Agriculture, School of Life Sciences, School of Mathematics and Statistics, Henan University, Kaifeng, 475004, Henan, China
| | - Yongfeng Wang
- State Key Laboratory of Crop Stress Adaption and Improvement, College of Agriculture, School of Life Sciences, School of Mathematics and Statistics, Henan University, Kaifeng, 475004, Henan, China
| | - Hao Chen
- State Key Laboratory of Crop Stress Adaption and Improvement, College of Agriculture, School of Life Sciences, School of Mathematics and Statistics, Henan University, Kaifeng, 475004, Henan, China
| | - Tong Liu
- State Key Laboratory of Crop Stress Adaption and Improvement, College of Agriculture, School of Life Sciences, School of Mathematics and Statistics, Henan University, Kaifeng, 475004, Henan, China
| | - Mengyao Li
- State Key Laboratory of Crop Stress Adaption and Improvement, College of Agriculture, School of Life Sciences, School of Mathematics and Statistics, Henan University, Kaifeng, 475004, Henan, China
| | - Shunjie Chen
- State Key Laboratory of Crop Stress Adaption and Improvement, College of Agriculture, School of Life Sciences, School of Mathematics and Statistics, Henan University, Kaifeng, 475004, Henan, China
| | - Daojie Wang
- State Key Laboratory of Crop Stress Adaption and Improvement, College of Agriculture, School of Life Sciences, School of Mathematics and Statistics, Henan University, Kaifeng, 475004, Henan, China.
| |
Collapse
|
2
|
Yu X, Wu Z, Zhang N. Machine learning-driven discovery of novel therapeutic targets in diabetic foot ulcers. Mol Med 2024; 30:215. [PMID: 39543487 PMCID: PMC11562697 DOI: 10.1186/s10020-024-00955-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2024] [Accepted: 10/08/2024] [Indexed: 11/17/2024] Open
Abstract
BACKGROUND To utilize machine learning for identifying treatment response genes in diabetic foot ulcers (DFU). METHODS Transcriptome data from patients with DFU were collected and subjected to comprehensive analysis. Initially, differential expression analysis was conducted to identify genes with significant changes in expression levels between DFU patients and healthy controls. Following this, enrichment analyses were performed to uncover biological pathways and processes associated with these differentially expressed genes. Machine learning algorithms, including feature selection and classification techniques, were then applied to the data to pinpoint key genes that play crucial roles in the pathogenesis of DFU. An independent transcriptome dataset was used to validate the key genes identified in our study. Further analysis of single-cell datasets was conducted to investigate changes in key genes at the single-cell level. RESULTS Through this integrated approach, SCUBE1 and RNF103-CHMP3 were identified as key genes significantly associated with DFU. SCUBE1 was found to be involved in immune regulation, playing a role in the body's response to inflammation and infection, which are common in DFU. RNF103-CHMP3 was linked to extracellular interactions, suggesting its involvement in cellular communication and tissue repair mechanisms essential for wound healing. The reliability of our analysis results was confirmed in the independent transcriptome dataset. Additionally, the expression of SCUBE1 and RNF103-CHMP3 was examined in single-cell transcriptome data, showing that these genes were significantly downregulated in the cured DFU patient group, particularly in NK cells and macrophages. CONCLUSION The identification of SCUBE1 and RNF103-CHMP3 as potential biomarkers for DFU marks a significant step forward in understanding the molecular basis of the disease. These genes offer new directions for both diagnosis and treatment, with the potential for developing targeted therapies that could enhance patient outcomes. This study underscores the value of integrating computational methods with biological data to uncover novel insights into complex diseases like DFU. Future research should focus on validating these findings in larger cohorts and exploring the therapeutic potential of targeting SCUBE1 and RNF103-CHMP3 in clinical settings.
Collapse
Affiliation(s)
- Xin Yu
- Pediatric Oncology of the First Hospital of Jilin University, Changchun, 130021, China
| | - Zhuo Wu
- Mircrosurgery Department of PLA General Hospital, Beijing, 100853, China
| | - Nan Zhang
- Burn Department of the First Hospital of Jilin University, No. 1 Xinmin Street, Chaoyang District, Changchun, 130021, Jilin Province, China.
| |
Collapse
|
3
|
Lan Y, Peng Q, Shen J, Liu H. Elucidating common biomarkers and pathways of osteoporosis and aortic valve calcification: insights into new therapeutic targets. Sci Rep 2024; 14:27827. [PMID: 39537712 PMCID: PMC11560947 DOI: 10.1038/s41598-024-78707-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Accepted: 11/04/2024] [Indexed: 11/16/2024] Open
Abstract
BACKGROUND Osteoporosis and aortic valve calcification, prevalent in the elderly, have unclear common mechanisms. This study aims to uncover them through bioinformatics analysis. METHODS Microarray data from GEO was analyzed for osteoporosis and aortic valve calcification. Differential expression analysis identified co-expressed genes. SVM-RFE and random forest selected key genes. GO and KEGG enrichment analyses were performed. Immunoinfiltration and GSEA analyses were subsequently performed. NetworkAnalyst analyzed microRNAs/TFs. HERB predicted drugs, and molecular docking assessed targeting potential. RESULTS Thirteen genes linked to osteoporosis and aortic valve calcification were identified. TNFSF11, KYNU, and HLA-DMB emerged as key genes. miRNAs, TFs, and drug predictions offered therapeutic insights. Molecular docking suggested 17-beta-estradiol and vitamin D3 as potential treatments. CONCLUSION The study clarifies shared mechanisms of osteoporosis and aortic valve calcification, identifies biomarkers, and highlights TNFSF11, KYNU, and HLA-DMB. It also suggests 17-beta-estradiol and vitamin D3 as potential effective treatments.
Collapse
Affiliation(s)
- Yujian Lan
- School of Integrated Traditional Chinese and Western Medicine, Southwest Medical University, Luzhou, 646000, Sichuan, China
- Department of Orthopaedics, The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Luzhou, 646000, Sichuan, China
| | - Qingping Peng
- School of Integrated Traditional Chinese and Western Medicine, Southwest Medical University, Luzhou, 646000, Sichuan, China
- Department of Orthopaedics, The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Luzhou, 646000, Sichuan, China
| | - Jianlin Shen
- Department of Orthopaedics, Affiliated Hospital of Putian University, Putian, 351100, Fujian, China.
- Central Laboratory, Affiliated Hospital of Putian University, Putian, 351100, Fujian, China.
| | - Huan Liu
- Department of Orthopaedics, The Affiliated Traditional Chinese Medicine Hospital, Southwest Medical University, Luzhou, 646000, Sichuan, China.
| |
Collapse
|
4
|
Tang S, Mao S, Chen Y, Tan F, Duan L, Pian C, Zeng X. LRBmat: A novel gut microbial interaction and individual heterogeneity inference method for colorectal cancer. J Theor Biol 2023; 571:111538. [PMID: 37257720 DOI: 10.1016/j.jtbi.2023.111538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 05/07/2023] [Accepted: 05/18/2023] [Indexed: 06/02/2023]
Abstract
The gut microbial community has been shown to play a significant role in various diseases, including colorectal cancer (CRC), which is a major public health concern worldwide. The accurate diagnosis and etiological analysis of CRC are crucial issues. Numerous methods have utilized gut microbiota to address these challenges; however, few have considered the complex interactions and individual heterogeneity of the gut microbiota, which are important issues in genetics and intestinal microbiology, particularly in high-dimensional cases. This paper presents a novel method called Binary matrix based on Logistic Regression (LRBmat) to address these concerns. The binary matrix in LRBmat can directly mitigate or eliminate the influence of heterogeneity, while also capturing information on gut microbial interactions with any order. LRBmat is highly adaptable and can be combined with any machine learning method to enhance its capabilities. The proposed method was evaluated using real CRC data and demonstrated superior classification performance compared to state-of-the-art methods. Furthermore, the association rules extracted from the binary matrix of the real data align well with biological properties and existing literature, thereby aiding in the etiological analysis of CRC.
Collapse
Affiliation(s)
- Shan Tang
- Department of Statistics, Hunan University, Changsha 410006, China
| | - Shanjun Mao
- Department of Statistics, Hunan University, Changsha 410006, China.
| | - Yangyang Chen
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Falong Tan
- Department of Statistics, Hunan University, Changsha 410006, China
| | - Lihua Duan
- Department of Rheumatology and Clinical Immunology, Jiangxi Provincial People's Hospital, The First Affiliated Hospital of Nanchang University, Nanchang 330006, China
| | - Cong Pian
- College of Sciences, Nanjing Agricultural University, Nanjing 210095, China
| | - Xiangxiang Zeng
- Department of Computer Science, Hunan University, Changsha 410086, China
| |
Collapse
|
5
|
Wang F, Liang D, Li Y, Ma S. Prior information-assisted integrative analysis of multiple datasets. Bioinformatics 2023; 39:btad452. [PMID: 37490475 PMCID: PMC10400378 DOI: 10.1093/bioinformatics/btad452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 05/13/2023] [Accepted: 07/24/2023] [Indexed: 07/27/2023] Open
Abstract
MOTIVATION Analyzing genetic data to identify markers and construct predictive models is of great interest in biomedical research. However, limited by cost and sample availability, genetic studies often suffer from the "small sample size, high dimensionality" problem. To tackle this problem, an integrative analysis that collectively analyzes multiple datasets with compatible designs is often conducted. For regularizing estimation and selecting relevant variables, penalization and other regularization techniques are routinely adopted. "Blindly" searching over a vast number of variables may not be efficient. RESULTS We propose incorporating prior information to assist integrative analysis of multiple genetic datasets. To obtain accurate prior information, we adopt a convolutional neural network with an active learning strategy to label textual information from previous studies. Then the extracted prior information is incorporated using a group LASSO-based technique. We conducted a series of simulation studies that demonstrated the satisfactory performance of the proposed method. Finally, data on skin cutaneous melanoma are analyzed to establish practical utility. AVAILABILITY AND IMPLEMENTATION Code is available at https://github.com/ldz7/PAIA. The data that support the findings in this article are openly available in TCGA (The Cancer Genome Atlas) at https://portal.gdc.cancer.gov/.
Collapse
Affiliation(s)
- Feifei Wang
- Center for Applied Statistics, Renmin University of China, Beijing 100872, China
- School of Statistics, Renmin University of China, Beijing 100872, China
- Institute for Data Science in Health, Renmin University of China, Beijing 100872, China
| | - Dongzuo Liang
- School of Statistics, Renmin University of China, Beijing 100872, China
- RSS and China-Re Life Joint Lab on Public Health and Risk Management, Renmin University of China, Beijing 100872, China
| | - Yang Li
- Center for Applied Statistics, Renmin University of China, Beijing 100872, China
- School of Statistics, Renmin University of China, Beijing 100872, China
- RSS and China-Re Life Joint Lab on Public Health and Risk Management, Renmin University of China, Beijing 100872, China
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, CT 06520, United States
| |
Collapse
|
6
|
Mohammed NN. Improved Regularized Multi-class Logistic Regression for Gene Classification with Optimal Kernel PCA and HC Algorithm. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2023; 1424:273-279. [PMID: 37486504 DOI: 10.1007/978-3-031-31982-2_31] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
A significant challenge in high-dimensional and big data analysis is related to the classification and prediction of the variables of interest. The massive genetic datasets are complex. Gene expression datasets are enriched with useful genes that are associated with specific diseases such as cancer. In this study, we used two gene expression datasets from the Gene Expression Omnibus and preprocessed them before classification. We used optimal kernel principal component analysis in which the optimal kernel function was chosen for dataset dimensionality reduction and extraction of the most important features. The gene sets with a high validity index were collected using a combined hieratical clustering and optimal kernel principal component analysis (KHC-RLR) algorithm. Logistic regression is one of the most common methods for classification, and it has been shown to be a useful classification approach for gene expression data analysis. In this study, we used multi-class logistic regression to classify the collected gene sets. We found that ordinary logistic regression caused a major overfitting problem; therefore, we used regularized multi-class logistic regression to classify the gene sets. The proposed KHC-RLR algorithm showed a high performance and satisfied accuracy measures.
Collapse
Affiliation(s)
- Nwayyin Najat Mohammed
- Department of Computer Science, University of Sulaimani, Collage of Science, Sulaymaniyah, Iraq.
| |
Collapse
|
7
|
Ai N, Yang Z, Yuan H, Ouyang D, Miao R, Ji Y, Liang Y. A distributed sparse logistic regression with $$L_{1/2}$$ regularization for microarray biomarker discovery in cancer classification. Soft comput 2022. [DOI: 10.1007/s00500-022-07551-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
8
|
Mendonca-Neto R, Li Z, Fenyo D, Silva CT, Nakamura FG, Nakamura EF. A Gene Selection Method Based on Outliers for Breast Cancer Subtype Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2547-2559. [PMID: 34860652 DOI: 10.1109/tcbb.2021.3132339] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Breast cancer is the second most common cancer type and is the leading cause of cancer-related deaths worldwide. Since it is a heterogeneous disease, subtyping breast cancer plays an important role in performing a specific treatment. Gene expression data is a viable alternative to be employed on cancer subtype classification, as they represent the state of a cell at the molecular level, but generally has a relatively small number of samples compared to a large number of genes. Gene selection is a promising approach that addresses this uneven high-dimensional matrix of genes versus samples and plays an important role in the development of efficient cancer subtype classification. In this work, an innovative outlier-based gene selection (OGS) method is proposed to select relevant genes for efficiently and effectively classify breast cancer subtypes. Experiments show that our strategy presents an F1 score of 1.0 for basal and 0.86 for her 2, the two subtypes with the worst prognoses, respectively. Compared to other methods, our proposed method outperforms in the F1 score using 80% less genes. In general, our method selects only a few highly relevant genes, speeding up the classification, and significantly improving the classifier's performance.
Collapse
|
9
|
Li X, Wang Y, Ruiz R. A Survey on Sparse Learning Models for Feature Selection. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1642-1660. [PMID: 32386172 DOI: 10.1109/tcyb.2020.2982445] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Feature selection is important in both machine learning and pattern recognition. Successfully selecting informative features can significantly increase learning accuracy and improve result comprehensibility. Various methods have been proposed to identify informative features from high-dimensional data by removing redundant and irrelevant features to improve classification accuracy. In this article, we systematically survey existing sparse learning models for feature selection from the perspectives of individual sparse feature selection and group sparse feature selection, and analyze the differences and connections among various sparse learning models. Promising research directions and topics on sparse learning models are analyzed.
Collapse
|
10
|
Li L, Liu ZP. A connected network-regularized logistic regression model for feature selection. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02877-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
11
|
Cancer data classification by quantum-inspired immune clone optimization-based optimal feature selection using gene expression data: deep learning approach. DATA TECHNOLOGIES AND APPLICATIONS 2021. [DOI: 10.1108/dta-05-2020-0109] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
PurposeGene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its diagnosis capability is limited. Nowadays, the significant problems of cancer diagnosis are solved by the utilization of gene expression data. The researchers have been introducing many possibilities to diagnose cancer appropriately and effectively. This paper aims to develop the cancer data classification using gene expression data.Design/methodology/approachThe proposed classification model involves three main phases: “(1) Feature extraction, (2) Optimal Feature Selection and (3) Classification”. Initially, five benchmark gene expression datasets are collected. From the collected gene expression data, the feature extraction is performed. To diminish the length of the feature vectors, optimal feature selection is performed, for which a new meta-heuristic algorithm termed as quantum-inspired immune clone optimization algorithm (QICO) is used. Once the relevant features are selected, the classification is performed by a deep learning model called recurrent neural network (RNN). Finally, the experimental analysis reveals that the proposed QICO-based feature selection model outperforms the other heuristic-based feature selection and optimized RNN outperforms the other machine learning methods.FindingsThe proposed QICO-RNN is acquiring the best outcomes at any learning percentage. On considering the learning percentage 85, the accuracy of the proposed QICO-RNN was 3.2% excellent than RNN, 4.3% excellent than RF, 3.8% excellent than NB and 2.1% excellent than KNN for Dataset 1. For Dataset 2, at learning percentage 35, the accuracy of the proposed QICO-RNN was 13.3% exclusive than RNN, 8.9% exclusive than RF and 14.8% exclusive than NB and KNN. Hence, the developed QICO algorithm is performing well in classifying the cancer data using gene expression data accurately.Originality/valueThis paper introduces a new optimal feature selection model using QICO and QICO-based RNN for effective classification of cancer data using gene expression data. This is the first work that utilizes an optimal feature selection model using QICO and QICO-RNN for effective classification of cancer data using gene expression data.
Collapse
|