1
|
Yan B, Liao P, Han Z, Zhao J, Gao H, Liu Y, Chen F, Lei P. Association of aging related genes and immune microenvironment with major depressive disorder. J Affect Disord 2025; 369:706-717. [PMID: 39419187 DOI: 10.1016/j.jad.2024.10.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 09/06/2024] [Accepted: 10/14/2024] [Indexed: 10/19/2024]
Abstract
OBJECTIVE To study the relationship between aging related genes (ARGs) and Major Depressive Disorder (MDD). METHODS The datasets GSE98793, GSE52790 and GSE39653 for MDD were obtained from the GEO database, and ARGs were obtained from the Human Aging Genome Resources database. Differential expression genes (DEGs) screening and GO, KEGG enrichment analysis were performed to uncover the underlying mechanisms. To identify key ARGs associated with MDD (key ARG-DEGs), we employed machine learning methods such as LASSO, SVM, and Random Forest, as well as the plug-ins CytoHubba-MCC and MCODE methods. SsGSEA was used to analyze the immune infiltration of MDD and healthy controls. Furthermore, we created risk prediction nomograms model and ROC curves to assess not only the ability of key ARG-DEGs to diagnose MDD, but also predicted miRNAs and transcription factors (TFs) that might interact. Finally, a two-sample Mendelian randomization (MR) study was performed to confirm the association of identified key ARG-DEGs with depression. RESULTS DEGs of ARGs between MDD and healthy controls led to the identification of eight ARG-DEGs. GO and KEGG analysis revealed that the pathways associated with these eight ARG-DEGs were primarily concentrated in Foxo pathway, JAK-STAT pathway, Pl3K-AKT pathway, and metabolic diseases. A comprehensive analysis further narrowed down the 8 ARG-DEGs to 4 key ARG-DEGs: MMP9, IL7R, S100B, and EGF. Immune infiltration analysis indicated significant differences in CD8(+) T cells, macrophages, neutrophils, Th2 cells, and TIL cells between MDD and control groups, correlating with these four key ARG-DEGs. Based on these four key ARG-DEGs, a risk prediction model for MDD was developed. The miRNA-TF-mRNA interaction network of the key ARG-DEGs highlights the complexity of the regulatory process, providing valuable insights for future related research. The MR study suggested a potential causal relationship between MMP9 and the risk of depression. CONCLUSION The process of aging, immune dysregulation, and MDD are closely interconnected. MMP9, IL7R, S100B, and EGF may be used as novel diagnostic biomarkers and potential therapeutic targets for MDD, especially MMP9.
Collapse
Affiliation(s)
- Bo Yan
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin 300052, China; Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China
| | - Pan Liao
- Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China; School of Medicine, Nankai University, Tianjin 300192, China
| | - Zhaoli Han
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin 300052, China; Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China
| | - Jing Zhao
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin 300052, China; Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China
| | - Han Gao
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin 300052, China; Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China
| | - Yuan Liu
- Institute of Mental Health, Tianjin Anding Hospital, Mental Health Center of Tianjin Medical University, Tianjin 300222, China
| | - Fanglian Chen
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin 300052, China; Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China.
| | - Ping Lei
- Department of Geriatrics, Tianjin Medical University General Hospital, Anshan Road No. 154, Tianjin 300052, China; Key Laboratory of Post-Trauma Neuro-Repair and Regeneration in Central Nervous System, Tianjin Key Laboratory of Injuries, Variations and Regeneration of Nervous System, Tianjin Neurological Institute, Ministry of Education, Tianjin 300052, China; School of Medicine, Nankai University, Tianjin 300192, China.
| |
Collapse
|
2
|
Mu KL, Ran F, Peng LQ, Zhou LL, Wu YT, Shao MH, Chen XG, Guo CM, Luo QM, Wang TJ, Liu YC, Liu G. Identification of diagnostic biomarkers of rheumatoid arthritis based on machine learning-assisted comprehensive bioinformatics and its correlation with immune cells. Heliyon 2024; 10:e35511. [PMID: 39170142 PMCID: PMC11336745 DOI: 10.1016/j.heliyon.2024.e35511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 07/29/2024] [Accepted: 07/30/2024] [Indexed: 08/23/2024] Open
Abstract
Background Rheumatoid arthritis (RA) is a chronic systemic autoimmune disease characterized by inflammatory cell infiltration, which can lead to chronic disability, joint destruction and loss of function. At present, the pathogenesis of RA is still unclear. The purpose of this study is to explore the potential biomarkers and immune molecular mechanisms of rheumatoid arthritis through machine learning-assisted bioinformatics analysis, in order to provide reference for the early diagnosis and treatment of RA disease. Methods RA gene chips were screened from the public gene GEO database, and batch correction of different groups of RA gene chips was performed using Strawberry Perl. DEGs were obtained using the limma package of R software, and functional enrichment analysis such as gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), disease ontology (DO), and gene set (GSEA) were performed. Three machine learning methods, least absolute shrinkage and selection operator regression (LASSO), support vector machine recursive feature elimination (SVM-RFE) and random forest tree (Random Forest), were used to identify potential biomarkers of RA. The validation group data set was used to verify and further confirm its expression and diagnostic value. In addition, CIBERSORT algorithm was used to evaluate the infiltration of immune cells in RA and control samples, and the correlation between confirmed RA diagnostic biomarkers and immune cells was analyzed. Results Through feature screening, 79 key DEGs were obtained, mainly involving virus response, Parkinson's pathway, dermatitis and cell junction components. A total of 29 hub genes were screened by LASSO regression, 34 hub genes were screened by SVM-RFE, and 39 hub genes were screened by Random Forest. Combined with the three algorithms, a total of 12 hub genes were obtained. Through the expression and diagnostic value verification in the validation group data set, 7 genes that can be used as diagnostic biomarkers for RA were preliminarily confirmed. At the same time, the correlation analysis of immune cells found that γδT cells, CD4+ memory activated T cells, activated dendritic cells and other immune cells were positively correlated with multiple RA diagnostic biomarkers, CD4+ naive T cells, regulatory T cells and other immune cells were negatively correlated with multiple RA diagnostic biomarkers. Conclusions The results of novel characteristic gene analysis of RA showed that KYNU, EVI2A, CD52, C1QB, BATF, AIM2 and NDC80 had good diagnostic and clinical value for the diagnosis of RA, and were closely related to immune cells. Therefore, these seven DEGs may become new diagnostic markers and immunotherapy markers for RA.
Collapse
Affiliation(s)
| | | | - Le-qiang Peng
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Ling-li Zhou
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Yu-tong Wu
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Ming-hui Shao
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Xiang-gui Chen
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Chang-mao Guo
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Qiu-mei Luo
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Tian-jian Wang
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Yu-chen Liu
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| | - Gang Liu
- Guizhou University of Traditional Chinese Medicine, Guiyang, 550025, Guizhou, China
| |
Collapse
|
3
|
Cao W, Howe BM, Wright DE, Ramanathan S, Rhodes NG, Korfiatis P, Amrami KK, Spinner RJ, Kline TL. Abnormal Brachial Plexus Differentiation from Routine Magnetic Resonance Imaging: An AI-based Approach. Neuroscience 2024; 546:178-187. [PMID: 38518925 DOI: 10.1016/j.neuroscience.2024.03.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 03/11/2024] [Accepted: 03/17/2024] [Indexed: 03/24/2024]
Abstract
Automatic abnormality identification of brachial plexus (BP) from normal magnetic resonance imaging to localize and identify a neurologic injury in clinical practice (MRI) is still a novel topic in brachial plexopathy. This study developed and evaluated an approach to differentiate abnormal BP with artificial intelligence (AI) over three commonly used MRI sequences, i.e. T1, FLUID sensitive and post-gadolinium sequences. A BP dataset was collected by radiological experts and a semi-supervised artificial intelligence method was used to segment the BP (based on nnU-net). Hereafter, a radiomics method was utilized to extract 107 shape and texture features from these ROIs. From various machine learning methods, we selected six widely recognized classifiers for training our Brachial plexus (BP) models and assessing their efficacy. To optimize these models, we introduced a dynamic feature selection approach aimed at discarding redundant and less informative features. Our experimental findings demonstrated that, in the context of identifying abnormal BP cases, shape features displayed heightened sensitivity compared to texture features. Notably, both the Logistic classifier and Bagging classifier outperformed other methods in our study. These evaluations illuminated the exceptional performance of our model trained on FLUID-sensitive sequences, which notably exceeded the results of both T1 and post-gadolinium sequences. Crucially, our analysis highlighted that both its classification accuracies and AUC score (area under the curve of receiver operating characteristics) over FLUID-sensitive sequence exceeded 90%. This outcome served as a robust experimental validation, affirming the substantial potential and strong feasibility of integrating AI into clinical practice.
Collapse
Affiliation(s)
- Weiguo Cao
- Department of Radiology, Mayo Clinic, 200 First Street SW, Charlton 1, Rochester, MN 55905, USA
| | - Benjamin M Howe
- Department of Radiology, Mayo Clinic, 200 First Street SW, Charlton 1, Rochester, MN 55905, USA
| | - Darryl E Wright
- Department of Radiology, Mayo Clinic, 200 First Street SW, Charlton 1, Rochester, MN 55905, USA
| | - Sumana Ramanathan
- Department of Radiology, Mayo Clinic, 200 First Street SW, Charlton 1, Rochester, MN 55905, USA
| | - Nicholas G Rhodes
- Department of Radiology, Mayo Clinic, 200 First Street SW, Charlton 1, Rochester, MN 55905, USA
| | - Panagiotis Korfiatis
- Department of Radiology, Mayo Clinic, 200 First Street SW, Charlton 1, Rochester, MN 55905, USA
| | - Kimberly K Amrami
- Department of Radiology, Mayo Clinic, 200 First Street SW, Charlton 1, Rochester, MN 55905, USA
| | - Robert J Spinner
- Department of Neurological Surgery, Mayo Clinic, 200 First Street SW, Gonda 8, Rochester, MN 55905, USA
| | - Timothy L Kline
- Department of Radiology, Mayo Clinic, 200 First Street SW, Charlton 1, Rochester, MN 55905, USA.
| |
Collapse
|
4
|
Li S, Liu S, Sun X, Hao L, Gao Q. Identification of endocrine-disrupting chemicals targeting key DCM-associated genes via bioinformatics and machine learning. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 274:116168. [PMID: 38460409 DOI: 10.1016/j.ecoenv.2024.116168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 02/04/2024] [Accepted: 02/27/2024] [Indexed: 03/11/2024]
Abstract
Dilated cardiomyopathy (DCM) is a primary cause of heart failure (HF), with the incidence of HF increasing consistently in recent years. DCM pathogenesis involves a combination of inherited predisposition and environmental factors. Endocrine-disrupting chemicals (EDCs) are exogenous chemicals that interfere with endogenous hormone action and are capable of targeting various organs, including the heart. However, the impact of these disruptors on heart disease through their effects on genes remains underexplored. In this study, we aimed to explore key DCM-related genes using machine learning (ML) and the construction of a predictive model. Using the Gene Expression Omnibus (GEO) database, we screened differentially expressed genes (DEGs) and performed enrichment analyses of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways related to DCM. Through ML techniques combining maximum relevance minimum redundancy (mRMR) and least absolute shrinkage and selection operator (LASSO) logistic regression, we identified key genes for predicting DCM (IL1RL1, SEZ6L, SFRP4, COL22A1, RNASE2, HB). Based on these key genes, 79 EDCs with the potential to affect DCM were identified, among which 4 (3,4-dichloroaniline, fenitrothion, pyrene, and isoproturon) have not been previously associated with DCM. These findings establish a novel relationship between the EDCs mediated by key genes and the development of DCM.
Collapse
Affiliation(s)
- Shu Li
- Department of Health and Intelligent Engineering, College of Health Management, China Medical University, Shenyang, Liaoning Province 110122, PR China..
| | - Shuice Liu
- Department of Pharmacology, Shenyang Medical College, Shenyang, Liaoning Province 110001, PR China..
| | - Xuefei Sun
- Department of Pharmaceutical Toxicology, School of Pharmacy, China Medical University, Shenyang 110122, PR China..
| | - Liying Hao
- Department of Pharmaceutical Toxicology, School of Pharmacy, China Medical University, Shenyang 110122, PR China..
| | - Qinghua Gao
- Department of Developmental Cell Biology, Key Laboratory of Cell Biology, Ministry of Public Health, and Key Laboratory of Medical Cell Biology, Ministry of Education, China Medical University, No. 77 Puhe Road, Shenyang North New Area, Shenyang, Liaoning Province, PR China..
| |
Collapse
|
5
|
Aslam M, Rajbdad F, Azmat S, Li Z, Boudreaux JP, Thiagarajan R, Yao S, Xu J. A novel method for detection of pancreatic Ductal Adenocarcinoma using explainable machine learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 245:108019. [PMID: 38237450 DOI: 10.1016/j.cmpb.2024.108019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 01/09/2024] [Accepted: 01/10/2024] [Indexed: 02/15/2024]
Abstract
BACKGROUND AND OBJECTIVE Pancreatic Ductal Adenocarcinoma (PDAC) is a form of pancreatic cancer that is one of the primary causes of cancer-related deaths globally, with less than 10 % of the five years survival rate. The prognosis of pancreatic cancer has remained poor in the last four decades, mainly due to the lack of early diagnostic mechanisms. This study proposes a novel method for detecting PDAC using explainable and supervised machine learning from Raman spectroscopic signals. METHODS An insightful feature set consisting of statistical, peak, and extended empirical mode decomposition features is selected using the support vector machine recursive feature elimination method integrated with a correlation bias reduction. Explicable features successfully identified mutations in Kirsten rat sarcoma viral oncogene homolog (KRAS) and tumor suppressor protein53 (TP53) in the fingerprint region for the first time in the literature. PDAC and normal pancreas are classified using K-nearest neighbor, linear discriminant analysis, and support vector machine classifiers. RESULTS This study achieved a classification accuracy of 98.5% using a nonlinear support vector machine. Our proposed method reduced test time by 28.5 % and saved 85.6 % memory utilization, which reduces complexity significantly and is more accurate than the state-of-the-art method. The generalization of the proposed method is assessed by fifteen-fold cross-validation, and its performance is evaluated using accuracy, specificity, sensitivity, and receiver operating characteristic curves. CONCLUSIONS In this study, we proposed a method to detect and define the fingerprint region for PDAC using explainable machine learning. This simple, accurate, and efficient method for PDAC detection in mice could be generalized to examine human pancreatic cancer and provide a basis for precise chemotherapy for early cancer treatment.
Collapse
Affiliation(s)
- Murtaza Aslam
- Department of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Fozia Rajbdad
- Department of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Shoaib Azmat
- Department of Electrical and Computer Engineering, COMSATS University Islamabad, Pakistan
| | - Zheng Li
- Department of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803, USA
| | - J Philip Boudreaux
- Department of Surgery, School of Medicine, Louisiana State University Health Sciences Center, New Orleans, LA 70112, USA
| | - Ramcharan Thiagarajan
- Department of Surgery, School of Medicine, Louisiana State University Health Sciences Center, New Orleans, LA 70112, USA
| | - Shaomian Yao
- Department of Comparative Biomedical Sciences, School of Veterinary Medicine, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Jian Xu
- Department of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803, USA.
| |
Collapse
|
6
|
Ding X, Li Y, Chen S. Maximum margin and global criterion based-recursive feature selection. Neural Netw 2024; 169:597-606. [PMID: 37956576 DOI: 10.1016/j.neunet.2023.10.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 06/19/2023] [Accepted: 10/22/2023] [Indexed: 11/15/2023]
Abstract
In this research paper, we aim to investigate and address the limitations of recursive feature elimination (RFE) and its variants in high-dimensional feature selection tasks. We identify two main challenges associated with these methods. Firstly, the feature ranking criterion utilized in these approaches is inconsistent with the maximum-margin theory. Secondly, the computation of the criterion is performed locally, lacking the ability to measure the importance of features globally. To overcome these challenges, we propose a novel feature ranking criterion called Maximum Margin and Global (MMG) criterion. This criterion utilizes the classification margin to determine the importance of features and computes it globally, enabling a more accurate assessment of feature importance. Moreover, we introduce an optimal feature subset evaluation algorithm that leverages the MMG criterion to determine the best subset of features. To enhance the efficiency of the proposed algorithms, we provide two alpha seeding strategies that significantly reduce computational costs while maintaining high accuracy. These strategies offer a practical means to expedite the feature selection process. Through extensive experiments conducted on ten benchmark datasets, we demonstrate that our proposed algorithms outperform current state-of-the-art methods. Additionally, the alpha seeding strategies yield significant speedups, further enhancing the efficiency of the feature selection process.
Collapse
Affiliation(s)
- Xiaojian Ding
- College of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210023, China.
| | - Yi Li
- College of Economics and Management, Nanjing Agricultural University, Nanjing 210095, China
| | - Shilin Chen
- Thoracic Surgery, Nanjing Medical University Affiliated Cancer Hospital, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, Nanjing 221005, China
| |
Collapse
|
7
|
Ding X, Yang F, Ma F, Chen S. A Unified Multi-Class Feature Selection Framework for Microarray Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3725-3736. [PMID: 37698974 DOI: 10.1109/tcbb.2023.3314432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
In feature selection research, simultaneous multi-class feature selection technologies are popular because they simultaneously select informative features for all classes. Recursive feature elimination (RFE) methods are state-of-the-art binary feature selection algorithms. However, extending existing RFE algorithms to multi-class tasks may increase the computational cost and lead to performance degradation. With this motivation, we introduce a unified multi-class feature selection (UFS) framework for randomization-based neural networks to address these challenges. First, we propose a new multi-class feature ranking criterion using the output weights of neural networks. The heuristic underlying this criterion is that "the importance of a feature should be related to the magnitude of the output weights of a neural network". Subsequently, the UFS framework utilizes the original features to construct a training model based on a randomization-based neural network, ranks these features by the criterion of the norm of the output weights, and recursively removes a feature with the lowest ranking score. Extensive experiments on 15 real-world datasets suggest that our proposed framework outperforms state-of-the-art algorithms. The code of UFS is available at https://github.com/SVMrelated/UFS.git.
Collapse
|
8
|
Zou L, Meng L, Xu Y, Wang K, Zhang J. Revealing the diagnostic value and immune infiltration of senescence-related genes in endometriosis: a combined single-cell and machine learning analysis. Front Pharmacol 2023; 14:1259467. [PMID: 37860112 PMCID: PMC10583561 DOI: 10.3389/fphar.2023.1259467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 09/05/2023] [Indexed: 10/21/2023] Open
Abstract
Introduction: Endometriosis is a prevalent and recurrent medical condition associated with symptoms such as pelvic discomfort, dysmenorrhea, and reproductive challenges. Furthermore, it has the potential to progress into a malignant state, significantly impacting the quality of life for affected individuals. Despite its significance, there is currently a lack of precise and non-invasive diagnostic techniques for this condition. Methods: In this study, we leveraged microarray datasets and employed a multifaceted approach. We conducted differential gene analysis, implemented weighted gene co-expression network analysis (WGCNA), and utilized machine learning algorithms, including random forest, support vector machine, and LASSO analysis, to comprehensively explore senescence-related genes (SRGs) associated with endometriosis. Discussion: Our comprehensive analysis, which also encompassed profiling of immune cell infiltration and single-cell analysis, highlights the therapeutic potential of this gene assemblage as promising targets for alleviating endometriosis. Furthermore, the integration of these biomarkers into diagnostic protocols promises to enhance diagnostic precision, offering a more effective diagnostic journey for future endometriosis patients in clinical settings. Results: Our meticulous investigation led to the identification of a cluster of genes, namely BAK1, LMNA, and FLT1, which emerged as potential discerning biomarkers for endometriosis. These biomarkers were subsequently utilized to construct an artificial neural network classifier model and were graphically represented in the form of a Nomogram.
Collapse
Affiliation(s)
- Lian Zou
- Chongqing Emergency Medical Center, Department of Obstetrics and Gynecology in Chongging University Central Hospital, Chongqing, China
| | - Lou Meng
- Chongqing Emergency Medical Center, Department of Obstetrics and Gynecology in Chongging University Central Hospital, Chongqing, China
| | - Yan Xu
- Chongqing Emergency Medical Center, Department of Obstetrics and Gynecology in Chongging University Central Hospital, Chongqing, China
| | - Kana Wang
- Department of Gynecology, West China Second Hospital of Sichuan University, Chengdu, China
| | - Jiawen Zhang
- Department of Gynecology, West China Second Hospital of Sichuan University, Chengdu, China
| |
Collapse
|
9
|
Zeng Y, Cao S, Li N, Tang J, Lin G. Identification of key lipid metabolism-related genes in Alzheimer's disease. Lipids Health Dis 2023; 22:155. [PMID: 37736681 PMCID: PMC10515010 DOI: 10.1186/s12944-023-01918-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 09/04/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Alzheimer's disease (AD) represents profound degenerative conditions of the brain that cause significant deterioration in memory and cognitive function. Despite extensive research on the significant contribution of lipid metabolism to AD progression, the precise mechanisms remain incompletely understood. Hence, this study aimed to identify key differentially expressed lipid metabolism-related genes (DELMRGs) in AD progression. METHODS Comprehensive analyses were performed to determine key DELMRGs in AD compared to controls in GSE122063 dataset from Gene Expression Omnibus. Additionally, the ssGSEA algorithm was utilized for estimating immune cell levels. Subsequently, correlations between key DELMRGs and each immune cell were calculated specifically in AD samples. The key DELMRGs expression levels were validated via two external datasets. Furthermore, gene set enrichment analysis (GSEA) was utilized for deriving associated pathways of key DELMRGs. Additionally, miRNA-TF regulatory networks of the key DELMRGs were constructed using the miRDB, NetworkAnalyst 3.0, and Cytoscape software. Finally, based on key DELMRGs, AD samples were further segmented into two subclusters via consensus clustering, and immune cell patterns and pathway differences between the two subclusters were examined. RESULTS Seventy up-regulated and 100 down-regulated DELMRGs were identified. Subsequently, three key DELMRGs (DLD, PLPP2, and PLAAT4) were determined utilizing three algorithms [(i) LASSO, (ii) SVM-RFE, and (iii) random forest]. Specifically, PLPP2 and PLAAT4 were up-regulated, while DLD exhibited downregulation in AD cerebral cortex tissue. This was validated in two separate external datasets (GSE132903 and GSE33000). The AD group exhibited significantly altered immune cell composition compared to controls. In addition, GSEA identified various pathways commonly associated with three key DELMRGs. Moreover, the regulatory network of miRNA-TF for key DELMRGs was established. Finally, significant differences in immune cell levels and several pathways were identified between the two subclusters. CONCLUSION This study identified DLD, PLPP2, and PLAAT4 as key DELMRGs in AD progression, providing novel insights for AD prevention/treatment.
Collapse
Affiliation(s)
- Youjie Zeng
- Department of Anesthesiology, Third Xiangya Hospital, Central South University, Changsha, 410013, Hunan, China
| | - Si Cao
- Department of Anesthesiology, Third Xiangya Hospital, Central South University, Changsha, 410013, Hunan, China
| | - Nannan Li
- Department of Nephrology, Third Xiangya Hospital, Central South University, Changsha, 410013, Hunan, China
| | - Juan Tang
- Department of Nephrology, Third Xiangya Hospital, Central South University, Changsha, 410013, Hunan, China.
| | - Guoxin Lin
- Department of Anesthesiology, Third Xiangya Hospital, Central South University, Changsha, 410013, Hunan, China.
| |
Collapse
|
10
|
S. K, J. S, K. J, T. A, R. R. Ensemble feature selection using q-rung orthopair hesitant fuzzy multi criteria decision making extended to VIKOR. J EXP THEOR ARTIF IN 2023. [DOI: 10.1080/0952813x.2023.2183273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Affiliation(s)
- Kavitha S.
- Department of Computer Applications, Bharathiar University, Coimbatore, India
| | - Satheeshkumar J.
- Department of Computer Applications, Bharathiar University, Coimbatore, India
| | - Janani K.
- Department of Mathematics, Bharathiar University, Coimbatore, India
| | - Amudha T.
- Department of Computer Applications, Bharathiar University, Coimbatore, India
| | - Rakkiyappan R.
- Department of Mathematics, Bharathiar University, Coimbatore, India
| |
Collapse
|
11
|
Bajo-Morales J, Castillo-Secilla D, Herrera LJ, Caba O, Prados JC, Rojas I. Predicting COVID-19 Severity Integrating RNA-Seq Data Using Machine
Learning Techniques. Curr Bioinform 2023; 18:221-231. [DOI: 10.2174/1574893617666220718110053] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/21/2022] [Accepted: 05/31/2022] [Indexed: 11/22/2022]
Abstract
Abstract:
A fundamental challenge in the fight against COVID -19 is the development of reliable and accurate tools to predict disease progression in a patient. This information can be extremely useful in distinguishing hospitalized patients at higher risk for needing UCI from patients with low severity. How SARS-CoV-2 infection will evolve is still unclear.
Methods:
A novel pipeline was developed that can integrate RNA-Seq data from different databases to obtain a genetic biomarker COVID -19 severity index using an artificial intelligence algorithm. Our pipeline ensures robustness through multiple cross-validation processes in different steps.
Results:
CD93, RPS24, PSCA, and CD300E were identified as a COVID -19 severity gene signature. Furthermore, using the obtained gene signature, an effective multi-class classifier capable of discriminating between control, outpatient, inpatient, and ICU COVID -19 patients was optimized, achieving an accuracy of 97.5%.
Conclusion:
In summary, during this research, a new intelligent pipeline was implemented with the goal of developing a specific gene signature that can detect the severity of patients suffering COVID -19. Our approach to clinical decision support systems achieved excellent results, even when processing unseen samples. Our system can be of great clinical utility for the strategy of planning, organizing and managing human and material resources, as well as for automatically classifying the severity of patients affected by COVID -19.
Collapse
Affiliation(s)
- Javier Bajo-Morales
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez
Montero, 2, 18014, Granada, Spain
- Deuser Tech Group, Calle Islandia, 182-NAV 24A, Córdoba,
14014, Córdoba; Spain
| | - Daniel Castillo-Secilla
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez
Montero, 2, 18014, Granada, Spain
- Fujitsu Technology Solutions S.A, CoE Data Intelligence, Camino del Cerro
de los Gamos, 1, Pozuelo de Alarcón, 28224, Madrid, Spain
| | - Luis Javier Herrera
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez
Montero, 2, 18014, Granada, Spain
| | - Octavio Caba
- Nuclear Medicine Department, IMIBIC, University Hospital Reina Sofia, Menéndez
Pidal Avenue, 14004, Córdoba, Spain
| | - Jose Carlos Prados
- Nuclear Medicine Department, IMIBIC, University Hospital Reina Sofia, Menéndez
Pidal Avenue, 14004, Córdoba, Spain
| | - Ignacio Rojas
- Department of Computer Architecture and Technology, University of Granada, C.I.T.I.C., Periodista Rafael Gómez
Montero, 2, 18014, Granada, Spain
| |
Collapse
|
12
|
Liu K, Chen Q, Huang GH. An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF. Genes (Basel) 2023; 14:421. [PMID: 36833348 PMCID: PMC9957060 DOI: 10.3390/genes14020421] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 02/10/2023] Open
Abstract
Gene families, which are parts of a genome's information storage hierarchy, play a significant role in the development and diversity of multicellular organisms. Several studies have focused on the characteristics of gene families, such as function, homology, or phenotype. However, statistical and correlation analyses on the distribution of gene family members in the genome have yet to be conducted. Here, a novel framework incorporating gene family analysis and genome selection based on NMF-ReliefF is reported. Specifically, the proposed method starts by obtaining gene families from the TreeFam database and determining the number of gene families within the feature matrix. Then, NMF-ReliefF is used to select features from the gene feature matrix, which is a new feature selection algorithm that overcomes the inefficiencies of traditional methods. Finally, a support vector machine is utilized to classify the acquired features. The results show that the framework achieved an accuracy of 89.1% and an AUC of 0.919 on the insect genome test set. We also employed four microarray gene data sets to evaluate the performance of the NMF-ReliefF algorithm. The outcomes show that the proposed method may strike a delicate balance between robustness and discrimination. Additionally, the proposed method's categorization is superior to state-of-the-art feature selection approaches.
Collapse
Affiliation(s)
- Kai Liu
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China
| | - Qi Chen
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
| | - Guo-Hua Huang
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
| |
Collapse
|
13
|
Zhao X, Zhao Y, Jiang Y, Zhang Q. Deciphering the endometrial immune landscape of RIF during the window of implantation from cellular senescence by integrated bioinformatics analysis and machine learning. Front Immunol 2022; 13:952708. [PMID: 36131919 PMCID: PMC9484583 DOI: 10.3389/fimmu.2022.952708] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 08/17/2022] [Indexed: 11/16/2022] Open
Abstract
Recurrent implantation failure (RIF) is an extremely thorny issue in in-vitro fertilization (IVF)-embryo transfer (ET). However, its intricate etiology and pathological mechanisms are still unclear. Nowadays, there has been extensive interest in cellular senescence in RIF, and its involvement in endometrial immune characteristics during the window of implantation (WOI) has captured scholars' growing concerns. Therefore, this study aims to probe into the pathological mechanism of RIF from cellular senescence and investigate the correlation between cellular senescence and endometrial immune characteristics during WOI based on bioinformatics combined with machine learning strategy, so as to elucidate the underlying pathological mechanisms of RIF and to explore novel treatment strategies for RIF. Firstly, the gene sets of GSE26787 and GSE111974 from the Gene Expression Omnibus (GEO) database were included for the weighted gene correlation network analysis (WGCNA), from which we concluded that the genes of the core module were closely related to cell fate decision and immune regulation. Subsequently, we identified 25 cellular senescence-associated differentially expressed genes (DEGs) in RIF by intersecting DEGs with cellular senescence-associated genes from the Cell Senescence (CellAge) database. Moreover, functional enrichment analysis was conducted to further reveal the specific molecular mechanisms by which these molecules regulate cellular senescence and immune pathways. Then, eight signature genes were determined by the machine learning method of support vector machine-recursive feature elimination (SVM-RFE), random forest (RF), and artificial neural network (ANN), comprising LATS1, EHF, DUSP16, ADCK5, PATZ1, DEK, MAP2K1, and ETS2, which were also validated in the testing gene set (GSE106602). Furthermore, distinct immune microenvironment abnormalities in the RIF endometrium during WOI were comprehensively explored and validated in GSE106602, including infiltrating immunocytes, immune function, and the expression profiling of human leukocyte antigen (HLA) genes and immune checkpoint genes. Moreover, the correlation between the eight signature genes with the endometrial immune landscape of RIF was also evaluated. After that, two distinct subtypes with significantly distinct immune infiltration characteristics were identified by consensus clustering analysis based on the eight signature genes. Finally, a "KEGG pathway-RIF signature genes-immune landscape" association network was constructed to intuitively uncover their connection. In conclusion, this study demonstrated that cellular senescence might play a pushing role in the pathological mechanism of RIF, which might be closely related to its impact on the immune microenvironment during the WOI phase. The exploration of the molecular mechanism of cellular senescence in RIF is expected to bring new breakthroughs for disease diagnosis and treatment strategies.
Collapse
Affiliation(s)
- Xiaoxuan Zhao
- Department of Traditional Chinese Medicine (TCM) Gynecology, Hangzhou Hospital of Traditional Chinese Medicine Affiliated to Zhejiang Chinese Medical University, Hangzhou, China
| | - Yang Zhao
- College of Basic Medicine, Hebei College of Traditional Chinese Medicine, Shijiazhuang, China
| | - Yuepeng Jiang
- College of Pharmacy, Zhejiang Chinese Medical University, Hangzhou, China
| | - Qin Zhang
- Department of Traditional Chinese Medicine (TCM) Gynecology, Hangzhou Hospital of Traditional Chinese Medicine Affiliated to Zhejiang Chinese Medical University, Hangzhou, China
| |
Collapse
|
14
|
Mashrur FR, Rahman KM, Miya MTI, Vaidyanathan R, Anwar SF, Sarker F, Mamun KA. An intelligent neuromarketing system for predicting consumers' future choice from electroencephalography signals. Physiol Behav 2022; 253:113847. [PMID: 35594931 DOI: 10.1016/j.physbeh.2022.113847] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 04/05/2022] [Accepted: 05/16/2022] [Indexed: 10/18/2022]
Abstract
Neuromarketing utilizes Brain-Computer Interface (BCI) technologies to provide insight into consumers responses on marketing stimuli. In order to achieve insight information, marketers spend about $400 billion annually on marketing, promotion, and advertisement using traditional marketing research tools. In addition, these tools like personal depth interviews, surveys, focus group discussions, etc. are expensive and frequently criticized for failing to extract actual consumer preferences. Neuromarketing, on the other hand, promises to overcome such constraints. In this work, an EEG-based neuromarketing framework is employed for predicting consumer future choice (affective attitude) while they view E-commerce products. After preprocessing, three types of features, namely, time, frequency, and time-frequency domain features are extracted. Then, wrapper-based Support Vector Machine-Recursive Feature Elimination (SVM-RFE) along with correlation bias reduction is used for feature selection. Lastly, we use SVM for categorizing positive affective attitude and negative affective attitude. Experiments show that the frontal cortex achieves the best accuracy of 98.67±2.98, 98±3.22, and 98.67±3.52 for 5-fold, 10-fold, and leave-one-subject-out (LOSO) respectively. In addition, among all the channels, Fz achieves best accuracy 90±7.81, 90.67±9.53, and 92.67±7.03 for 5-fold, 10-fold, and LOSO respectively. Subsequently, this work opens the door for implementing such a neuromarketing framework using consumer-grade devices in a real-life setting for marketers. As a result, it is evident that EEG-based neuromarketing technologies can assist brands and enterprises in forecasting future consumer preferences accurately. Hence, it will pave the way for the creation of an intelligent marketing assistive system for neuromarketing applications in future.
Collapse
Affiliation(s)
- Fazla Rabbi Mashrur
- Advanced Intelligent Multidisciplinary Systems (AIMS) Lab, Institute for Advanced Research (IAR), United International University, Dhaka, Bangladesh.
| | | | | | - Ravi Vaidyanathan
- Department of Mechanical Engineering and UK Dementia Research Institute Care, Research and Technology Centre (DRI-CR&T), Imperial College London, London, United Kingdom
| | - Syed Ferhat Anwar
- Institute of Business Administration, University of Dhaka, Dhaka, Bangladesh
| | - Farhana Sarker
- Department of Computer Science and Engineering, University of Liberal Arts Bangladesh, Dhaka, Bangladesh
| | - Khondaker A Mamun
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh.
| |
Collapse
|
15
|
Qiu F, Zheng P, Heidari AA, Liang G, Chen H, Karim FK, Elmannai H, Lin H. Mutational Slime Mould Algorithm for Gene Selection. Biomedicines 2022; 10:2052. [PMID: 36009599 PMCID: PMC9406076 DOI: 10.3390/biomedicines10082052] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/14/2022] [Accepted: 08/16/2022] [Indexed: 02/02/2023] Open
Abstract
A large volume of high-dimensional genetic data has been produced in modern medicine and biology fields. Data-driven decision-making is particularly crucial to clinical practice and relevant procedures. However, high-dimensional data in these fields increase the processing complexity and scale. Identifying representative genes and reducing the data's dimensions is often challenging. The purpose of gene selection is to eliminate irrelevant or redundant features to reduce the computational cost and improve classification accuracy. The wrapper gene selection model is based on a feature set, which can reduce the number of features and improve classification accuracy. This paper proposes a wrapper gene selection method based on the slime mould algorithm (SMA) to solve this problem. SMA is a new algorithm with a lot of application space in the feature selection field. This paper improves the original SMA by combining the Cauchy mutation mechanism with the crossover mutation strategy based on differential evolution (DE). Then, the transfer function converts the continuous optimizer into a binary version to solve the gene selection problem. Firstly, the continuous version of the method, ISMA, is tested on 33 classical continuous optimization problems. Then, the effect of the discrete version, or BISMA, was thoroughly studied by comparing it with other gene selection methods on 14 gene expression datasets. Experimental results show that the continuous version of the algorithm achieves an optimal balance between local exploitation and global search capabilities, and the discrete version of the algorithm has the highest accuracy when selecting the least number of genes.
Collapse
Affiliation(s)
- Feng Qiu
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Pan Zheng
- Information Systems, University of Canterbury, Christchurch 8014, New Zealand
| | - Ali Asghar Heidari
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Guoxi Liang
- Department of Information Technology, Wenzhou Polytechnic, Wenzhou 325035, China
| | - Huiling Chen
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Faten Khalid Karim
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Hela Elmannai
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Haiping Lin
- Department of Information Engineering, Hangzhou Vocational & Technical College, Hangzhou 310018, China
| |
Collapse
|
16
|
Limam H, Hasni O, Alaya IB. A novel hybrid approach for feature selection enhancement: COVID-19 case study. Comput Methods Biomech Biomed Engin 2022:1-15. [PMID: 35993576 DOI: 10.1080/10255842.2022.2112185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Feature selection is a promising Artificial Intelligence technique for screening, analysing, predicting, and tracking current COVID-19 patients and likely future patients. Significant applications are developed to track data of confirmed, recovered, and death cases. In this work, we propose a new feature selection method based on a new way of hybridization between filter and wrapper methods. The proposed approach is expected to achieve high classification accuracy with a small feature subset. Specifically, the main contribution of this work is a four steps-based approach organized as follows: First, we remove consecutively duplicate and constant features. Then, we select the highest-ranked feature with Mutual Information. In the last step, we run the 'Backward Feature Elimination' algorithm to delete features from the active subset until a stopping criterion based on the degradation of classification performance is met. We applied the proposed approach to a COVID-19 dataset to test its ability to find the relevant feature for characterizing the disease, such as new cases infected with the virus, people vaccinated, and the number of deaths, to better assess the situation. For evaluation purposes, experiments are conducted at the first stage on the COVID-19 dataset, then on six benchmark datasets that have a high dimensional and large size. The method performance is tracked and measured on these datasets and a comparison with many approaches is provided.
Collapse
Affiliation(s)
- Hela Limam
- Institut Supérieur d'Informatique, Université de Tunis El Manar, Tunisia and Laboratoire BestMod, Institut Supérieur de Gestion de Tunis, Tunis, Tunisia
| | - Oumaima Hasni
- Laboratoire BestMod, Institut Supérieur de Gestion de Tunis, Tunis, Tunisia
| | - Ines Ben Alaya
- Higher Institute of Medical Technology of Tunis, Laboratory of Biophysics and Medical Technology, Tunis El Manar University, Tunis, Tunisia
| |
Collapse
|
17
|
Zhang J, Zhang S, Zhou Y, Qu Y, Hou T, Ge W, Zhang S. KLF9 and EPYC acting as feature genes for osteoarthritis and their association with immune infiltration. J Orthop Surg Res 2022; 17:365. [PMID: 35902862 PMCID: PMC9330685 DOI: 10.1186/s13018-022-03247-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 07/07/2022] [Indexed: 11/20/2022] Open
Abstract
Background Osteoarthritis, a common degenerative disease of articular cartilage, is characterized by degeneration of articular cartilage, changes in subchondral bone structure, and formation of osteophytes, with main clinical manifestations including increasingly serious swelling, pain, stiffness, deformity, and mobility deficits of the knee joints. With the advent of the big data era, the processing of mass data has evolved into a hot topic and gained a solid foundation from the steadily developed and improved machine learning algorithms. Aiming to provide a reference for the diagnosis and treatment of osteoarthritis, this paper using machine learning identifies the key feature genes of osteoarthritis and explores its relationship with immune infiltration, thereby revealing its pathogenesis at the molecular level. Methods From the GEO database, GSE55235 and GSE55457 data were derived as training sets and GSE98918 data as a validation set. Differential gene expressions of the training sets were analyzed, and the LASSO regression model and support vector machine model were established by applying machine learning algorithms. Moreover, their intersection genes were regarded as feature genes, the receiver operator characteristic (ROC) curve was drawn, and the results were verified using the validation set. In addition, the expression spectrum of osteoarthritis was analyzed by immunocyte infiltration and the co-expression correlation between feature genes and immunocytes was construed. Conclusion EPYC and KLF9 can be viewed as feature genes for osteoarthritis. The silencing of EPYC and the overexpression of KLF9 are associated with the occurrence of osteoarthritis and immunocyte infiltration.
Collapse
Affiliation(s)
- Jiayin Zhang
- Orthopedics, The Second Hospital of Jilin University, No. 218, Ziqiang Street, Nanguan District, Changchun, 130000, Jilin, China
| | - Shengjie Zhang
- Orthopedics, The Second Hospital of Jilin University, No. 218, Ziqiang Street, Nanguan District, Changchun, 130000, Jilin, China
| | - Yu Zhou
- Orthopedics, Changchun University of Chinese Medicine, No.1478, Gongnong Road, Chaoyang District, Changchun, 130117, Jilin, China
| | - Yuan Qu
- Orthopedics, The Second Hospital of Jilin University, No. 218, Ziqiang Street, Nanguan District, Changchun, 130000, Jilin, China
| | - Tingting Hou
- Orthopedics, The Second Hospital of Jilin University, No. 218, Ziqiang Street, Nanguan District, Changchun, 130000, Jilin, China
| | - Wanbao Ge
- Orthopedics, The Second Hospital of Jilin University, No. 218, Ziqiang Street, Nanguan District, Changchun, 130000, Jilin, China
| | - Shanyong Zhang
- Orthopedics, The Second Hospital of Jilin University, No. 218, Ziqiang Street, Nanguan District, Changchun, 130000, Jilin, China. .,Orthopedics, Changchun University of Chinese Medicine, No.1478, Gongnong Road, Chaoyang District, Changchun, 130117, Jilin, China.
| |
Collapse
|
18
|
Ren J, Zhong N. Analysis of Enterprise Social Responsibility to Employee Psychological Satisfaction Based on Discriminant Least Square Regression. Front Psychol 2022; 13:925010. [PMID: 35880186 PMCID: PMC9307929 DOI: 10.3389/fpsyg.2022.925010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 06/09/2022] [Indexed: 11/23/2022] Open
Abstract
Employee psychological satisfaction is the satisfaction of perception of environmental factors at the psychological and physiological levels, that is, the employees' subjective response to the work situation. How to enhance employee loyalty and psychological satisfaction has always been a hot issue in theoretical and practical research. With the development of artificial intelligence (AI), many AI methods are widely used to find important factors which have significant influences on the psychological satisfaction of employees. Feature selection methods as one kind of AI models can select discriminant features which have high correlation with the outcome. In this study, we first construct 19 factors from enterprise social responsibility. Then we use a discriminant least square regression model to select most relative factors associating with employee psychological satisfaction. Our experimental results show that the psychological satisfaction of employees is very related to salary, security, welfare, occupational health, and fairness. In addition, we find that discriminant least square regression performs better than the comparison feature selection methods we select, and the selected factors are more in line with our perceptions and expectations.
Collapse
Affiliation(s)
- Junbao Ren
- China Construction Bank Shandong Branch, Jinan, China
| | - Ni Zhong
- Shandong Tudi Development Group Co., Ltd., Financial Management Division, Jinan, China
| |
Collapse
|
19
|
Ding X, Yang F, Zhong Y, Cao J. A Novel Recursive Gene Selection Method Based on Least Square Kernel Extreme Learning Machine. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2026-2038. [PMID: 33764877 DOI: 10.1109/tcbb.2021.3068846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This paper presents a recursive feature elimination (RFE) mechanism to select the most informative genes with a least square kernel extreme learning machine (LSKELM) classifier. Describing the generalization ability of LSKELM in a way that is related to small norm of weights, we propose a ranking criterion to evaluate the importance of genes by the norm of weights obtained by LSKELM. The proposed method is called LSKELM-RFE which first employs the original genes to build a LSKELM classifier, and then ranks the genes according to their importance given by the norm of output weights of LSKELM and finally removes a "least important" gene. Benefiting from the random mapping mechanism of the extreme learning machine (ELM) kernel, there are no parameter of LSKELM-RFE needs to be manually tuned. A comparative study among our proposed algorithm and other two famous RFE algorithms has shown that LSKELM-RFE outperforms other RFE algorithms in both the computational cost and generalization ability.
Collapse
|
20
|
Mashrur FR, Rahman KM, Miya MTI, Vaidyanathan R, Anwar SF, Sarker F, Mamun KA. BCI-Based Consumers' Choice Prediction From EEG Signals: An Intelligent Neuromarketing Framework. Front Hum Neurosci 2022; 16:861270. [PMID: 35693537 PMCID: PMC9177951 DOI: 10.3389/fnhum.2022.861270] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 05/02/2022] [Indexed: 11/29/2022] Open
Abstract
Neuromarketing relies on Brain Computer Interface (BCI) technology to gain insight into how customers react to marketing stimuli. Marketers spend about $750 billion annually on traditional marketing camping. They use traditional marketing research procedures such as Personal Depth Interviews, Surveys, Focused Group Discussions, and so on, which are frequently criticized for failing to extract true consumer preferences. On the other hand, Neuromarketing promises to overcome such constraints. This work proposes a machine learning framework for predicting consumers' purchase intention (PI) and affective attitude (AA) from analyzing EEG signals. In this work, EEG signals are collected from 20 healthy participants while administering three advertising stimuli settings: product, endorsement, and promotion. After preprocessing, features are extracted in three domains (time, frequency, and time-frequency). Then, after selecting features using wrapper-based methods Recursive Feature Elimination, Support Vector Machine is used for categorizing positive and negative (AA and PI). The experimental results show that proposed framework achieves an accuracy of 84 and 87.00% for PI and AA ensuring the simulation of real-life results. In addition, AA and PI signals show N200 and N400 components when people tend to take decision after visualizing static advertisement. Moreover, negative AA signals shows more dispersion than positive AA signals. Furthermore, this work paves the way for implementing such a neuromarketing framework using consumer-grade EEG devices in a real-life setting. Therefore, it is evident that BCI-based neuromarketing technology can help brands and businesses effectively predict future consumer preferences. Hence, EEG-based neuromarketing technologies can assist brands and enterprizes in accurately forecasting future consumer preferences.
Collapse
Affiliation(s)
- Fazla Rabbi Mashrur
- Advanced Intelligent Multidisciplinary Systems (AIMS) Lab, Institute for Advanced Research (IAR), United International University, Dhaka, Bangladesh
| | | | | | - Ravi Vaidyanathan
- Department of Mechanical Engineering and UK Dementia Research Institute Care, Research and Technology Centre (DRI-CR&T), Imperial College London, London, United Kingdom
| | - Syed Ferhat Anwar
- Institute of Business Administration, University of Dhaka, Dhaka, Bangladesh
| | - Farhana Sarker
- Department of Computer Science and Engineering, University of Liberal Arts Bangladesh, Dhaka, Bangladesh
| | - Khondaker A. Mamun
- Advanced Intelligent Multidisciplinary Systems (AIMS) Lab, Institute for Advanced Research (IAR), United International University, Dhaka, Bangladesh
- Department of Computer Science & Engineering, United International University, Dhaka, Bangladesh
| |
Collapse
|
21
|
Data-Driven Parameter Selection and Modeling for Concrete Carbonation. MATERIALS 2022; 15:ma15093351. [PMID: 35591685 PMCID: PMC9102323 DOI: 10.3390/ma15093351] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 04/21/2022] [Accepted: 04/26/2022] [Indexed: 12/10/2022]
Abstract
Concrete carbonation is known as a stochastic process. Its uncertainties mainly result from parameters that are not considered in prediction models. Parameter selection, therefore, is important. In this paper, based on 8204 sets of data, statistical methods and machine learning techniques were applied to choose appropriate influence factors in terms of three aspects: (1) the correlation between factors and concrete carbonation; (2) factors’ influence on the uncertainties of carbonation depth; and (3) the correlation between factors. Both single parameters and parameter groups were evaluated quantitatively. The results showed that compressive strength had the highest correlation with carbonation depth and that using the aggregate–cement ratio as the parameter significantly reduced the dispersion of carbonation depth to a low level. Machine learning models manifested that selected parameter groups had a large potential in improving the performance of models with fewer parameters. This paper also developed machine learning carbonation models and simplified them to propose a practical model. The results showed that this concise model had a high accuracy on both accelerated and natural carbonation test datasets. For natural carbonation datasets, the mean absolute error of the practical model was 1.56 mm.
Collapse
|
22
|
Mu T, Wang H, Wang C, Liang Z, Shao X. Auto-CASH: A meta-learning embedding approach for autonomous classification algorithm selection. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.01.040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
23
|
Wang Y, Gao X, Ru X, Sun P, Wang J. A hybrid feature selection algorithm and its application in bioinformatics. PeerJ Comput Sci 2022; 8:e933. [PMID: 35494789 PMCID: PMC9044222 DOI: 10.7717/peerj-cs.933] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 03/03/2022] [Indexed: 06/14/2023]
Abstract
Feature selection is an independent technology for high-dimensional datasets that has been widely applied in a variety of fields. With the vast expansion of information, such as bioinformatics data, there has been an urgent need to investigate more effective and accurate methods involving feature selection in recent decades. Here, we proposed the hybrid MMPSO method, by combining the feature ranking method and the heuristic search method, to obtain an optimal subset that can be used for higher classification accuracy. In this study, ten datasets obtained from the UCI Machine Learning Repository were analyzed to demonstrate the superiority of our method. The MMPSO algorithm outperformed other algorithms in terms of classification accuracy while utilizing the same number of features. Then we applied the method to a biological dataset containing gene expression information about liver hepatocellular carcinoma (LIHC) samples obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). On the basis of the MMPSO algorithm, we identified a 18-gene signature that performed well in distinguishing normal samples from tumours. Nine of the 18 differentially expressed genes were significantly up-regulated in LIHC tumour samples, and the area under curves (AUC) of the combination seven genes (ADRA2B, ERAP2, NPC1L1, PLVAP, POMC, PYROXD2, TRIM29) in classifying tumours with normal samples was greater than 0.99. Six genes (ADRA2B, PYROXD2, CACHD1, FKBP1B, PRKD1 and RPL7AP6) were significantly correlated with survival time. The MMPSO algorithm can be used to effectively extract features from a high-dimensional dataset, which will provide new clues for identifying biomarkers or therapeutic targets from biological data and more perspectives in tumor research.
Collapse
Affiliation(s)
- Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Xiaoguang Gao
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Xinxin Ru
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Pengzhan Sun
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Jihan Wang
- Institute of Medical Research, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| |
Collapse
|
24
|
A combinatory algorithm for identifying genes in childhood acute lymphoblastic leukemia. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2021.101433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
25
|
Guo H, Yan F, Li P, Li M. Determination of Storage Period of Harvested Plums by Near‐Infrared Spectroscopy and Quality attributes. J FOOD PROCESS PRES 2022. [DOI: 10.1111/jfpp.16504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Huixin Guo
- College of Food Science and Nutritional Engineering China Agricultural University Beijing 100083 China
| | - Fang Yan
- College of Software and Information Beijing Information Technology College Beijing 100015 China
| | - Pingzhen Li
- College of Information Shanxi University of Finance and Economic Taiyuan 030006 China
| | - Ming Li
- School of Biotechnology and Food Science Tianjin University of Commerce Tianjin 300134 China
| |
Collapse
|
26
|
Abstract
AbstractFor short text classification, insufficient labeled data, data sparsity, and imbalanced classification have become three major challenges. For this, we proposed multiple weak supervision, which can label unlabeled data automatically. Different from prior work, the proposed method can generate probabilistic labels through conditional independent model. What’s more, experiments were conducted to verify the effectiveness of multiple weak supervision. According to experimental results on public dadasets, real datasets and synthetic datasets, unlabeled imbalanced short text classification problem can be solved effectively by multiple weak supervision. Notably, without reducing precision, recall, and F1-score can be improved by adding distant supervision clustering, which can be used to meet different application needs.
Collapse
|
27
|
Tian C, Tang Z, Zhang H, Gao X, Xie Y. Operating condition recognition based on temporal cumulative distribution function and AdaBoost-extreme learning machine in zinc flotation process. POWDER TECHNOL 2022. [DOI: 10.1016/j.powtec.2021.09.078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
28
|
A class-specific metaheuristic technique for explainable relevant feature selection. MACHINE LEARNING WITH APPLICATIONS 2021. [DOI: 10.1016/j.mlwa.2021.100142] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
|
29
|
Use of relevancy and complementary information for discriminatory gene selection from high-dimensional gene expression data. PLoS One 2021; 16:e0230164. [PMID: 34613963 PMCID: PMC8494339 DOI: 10.1371/journal.pone.0230164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Accepted: 09/21/2021] [Indexed: 12/22/2022] Open
Abstract
With the advent of high-throughput technologies, life sciences are generating a huge amount of varied biomolecular data. Global gene expression profiles provide a snapshot of all the genes that are transcribed in a cell or in a tissue under a particular condition. The high-dimensionality of such gene expression data (i.e., very large number of features/genes analyzed with relatively much less number of samples) makes it difficult to identify the key genes (biomarkers) that are truly attributing to a particular phenotype or condition, (such as cancer), de novo. For identifying the key genes from gene expression data, among the existing literature, mutual information (MI) is one of the most successful criteria. However, the correction of MI for finite sample is not taken into account in this regard. It is also important to incorporate dynamic discretization of genes for more relevant gene selection, although this is not considered in the available methods. Besides, it is usually suggested in current studies to remove redundant genes which is particularly inappropriate for biological data, as a group of genes may connect to each other for downstreaming proteins. Thus, despite being redundant, it is needed to add the genes which provide additional useful information for the disease. Addressing these issues, we proposed Mutual information based Gene Selection method (MGS) for selecting informative genes. Moreover, to rank these selected genes, we extended MGS and propose two ranking methods on the selected genes, such as MGSf—based on frequency and MGSrf—based on Random Forest. The proposed method not only obtained better classification rates on gene expression datasets derived from different gene expression studies compared to recently reported methods but also detected the key genes relevant to pathways with a causal relationship to the disease, which indicate that it will also able to find the responsible genes for an unknown disease data.
Collapse
|
30
|
Wang F, Wang X. A novel feature selection algorithm based on damping oscillation theory. PLoS One 2021; 16:e0255307. [PMID: 34358234 PMCID: PMC8345869 DOI: 10.1371/journal.pone.0255307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 07/13/2021] [Indexed: 11/18/2022] Open
Abstract
Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.
Collapse
Affiliation(s)
- Fujun Wang
- School of Electronic and Information Engineering, Liaoning Technical University, Huludao, People’s Republic of China
- Key Laboratory of Preparation and Application of Environmentally Friendly Materials, Chinese Ministry of Education, Jilin Normal University, Changchun, People’s Republic of China
| | - Xing Wang
- School of Electronic and Information Engineering, Liaoning Technical University, Huludao, People’s Republic of China
| |
Collapse
|
31
|
Tan X, Wu X, Han M, Wang L, Xu L, Li B, Yuan Y. Yeast autonomously replicating sequence (ARS): Identification, function, and modification. Eng Life Sci 2021. [DOI: 10.1002/elsc.202000085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Affiliation(s)
- Xiao‐Yu Tan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| | - Xiao‐Le Wu
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| | - Ming‐Zhe Han
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| | - Li Wang
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| | - Li Xu
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| | - Bing‐Zhi Li
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| | - Ying‐Jin Yuan
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), School of Chemical Engineering and Technology Tianjin University Tianjin P. R. China
- Synthetic Biology Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin) Tianjin University Tianjin P. R. China
| |
Collapse
|
32
|
Li R, Yang J, Li L, Shen F, Zou T, Wang H, Wang X, Li J, Deng C, Huang X, Wang C, He Z, Lu F, Zeng L, Chen H. Integrating Multilevel Functional Characteristics Reveals Aberrant Neural Patterns during Audiovisual Emotional Processing in Depression. Cereb Cortex 2021; 32:1-14. [PMID: 34642754 DOI: 10.1093/cercor/bhab185] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 05/25/2021] [Accepted: 05/29/2021] [Indexed: 11/14/2022] Open
Abstract
Emotion dysregulation is one of the core features of major depressive disorder (MDD). However, most studies in depression have focused on unimodal emotion processing, whereas emotional perception in daily life is highly dependent on multimodal sensory inputs. Here, we proposed a novel multilevel discriminative framework to identify the altered neural patterns in processing audiovisual emotion in MDD. Seventy-four participants underwent an audiovisual emotional task functional magnetic resonance imaging scanning. Three levels of whole-brain functional features were extracted for each subject, including the task-evoked activation, task-modulated connectivity, combined activation and connectivity. Support vector machine classification and prediction models were built to identify MDD from controls and evaluate clinical relevance. We revealed that complex neural networks including the emotion regulation network (prefrontal areas and limbic-subcortical regions) and the multisensory integration network (lateral temporal cortex and motor areas) had the discriminative power. Moreover, by integrating comprehensive information of local and interactive processes, multilevel models could lead to a substantial increase in classification accuracy and depression severity prediction. Together, we highlight the high representational capacity of machine learning algorithms to characterize the complex network abnormalities associated with emotional regulation and multisensory integration in MDD. These findings provide novel evidence for the neural mechanisms underlying multimodal emotion dysregulation of depression.
Collapse
Affiliation(s)
- Rong Li
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Jiale Yang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China.,School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Liyuan Li
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Fei Shen
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Ting Zou
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Hongyu Wang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Xuyang Wang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Jiyi Li
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Chijun Deng
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Xinju Huang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Chong Wang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Zongling He
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Fengmei Lu
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China
| | - Ling Zeng
- School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
| | - Huafu Chen
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, High-Field Magnetic Resonance Brain Imaging Key Laboratory of Sichuan Province, University of Electronic Science and Technology of China, Chengdu, 610054, PR China.,Sichuan Provincial Center for Mental Health, The Center of Psychosomatic Medicine of Sichuan Provincial People's Hospital, University of Electronic Science and Technology of china, Chengdu 611731, PR China
| |
Collapse
|
33
|
Zhang S, Wang L, Zhao L, Li M, Liu M, Li K, Bin Y, Xia J. An improved DNA-binding hot spot residues prediction method by exploring interfacial neighbor properties. BMC Bioinformatics 2021; 22:253. [PMID: 34000983 PMCID: PMC8130120 DOI: 10.1186/s12859-020-03871-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 11/09/2020] [Indexed: 11/29/2022] Open
Abstract
Background DNA-binding hot spots are dominant and fundamental residues that contribute most of the binding free energy yet accounting for a small portion of protein–DNA interfaces. As experimental methods for identifying hot spots are time-consuming and costly, high-efficiency computational approaches are emerging as alternative pathways to experimental methods. Results Herein, we present a new computational method, termed inpPDH, for hot spot prediction. To improve the prediction performance, we extract hybrid features which incorporate traditional features and new interfacial neighbor properties. To remove redundant and irrelevant features, feature selection is employed using a two-step feature selection strategy. Finally, a subset of 7 optimal features are chosen to construct the predictor using support vector machine. The results on the benchmark dataset show that this proposed method yields significantly better prediction accuracy than those previously published methods in the literature. Moreover, a user-friendly web server for inpPDH is well established and is freely available at http://bioinfo.ahu.edu.cn/inpPDH. Conclusions We have developed an accurate improved prediction model, inpPDH, for hot spot residues in protein–DNA binding interfaces by given the structure of a protein–DNA complex. Moreover, we identify a comprehensive and useful feature subset including the proposed interfacial neighbor features that has an important strength for identifying hot spot residues. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of interfacial neighbor features and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues in protein–DNA complexes. Supplementary information Supplementary information accompanies this paper at 10.1186/s12859-020-03871-1.
Collapse
Affiliation(s)
- Sijia Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Lihua Wang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Le Zhao
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Menglu Li
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Mengya Liu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Ke Li
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Yannan Bin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China. .,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China.
| | - Junfeng Xia
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China. .,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China.
| |
Collapse
|
34
|
Pal JK, Ray SS, Pal SK. Identifying Drug Resistant miRNAs Using Entropy Based Ranking. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:973-984. [PMID: 31398129 DOI: 10.1109/tcbb.2019.2933205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
MicroRNAs play an important role in controlling drug sensitivity and resistance in cancer. Identification of responsible miRNAs for drug resistance can enhance the effectiveness of treatment. A new set theoretic entropy measure (SPEM) is defined to determine the relevance and level of confidence of miRNAs in deciding their drug resistant nature. Here, a pattern is represented by a pair of values. One of them implies the degree of its belongingness (fuzzy membership) to a class and the other represents the actual class of origin (crisp membership). A measure, called granular probability, is defined that determines the confidence level of having a particular pair of membership values. The granules used to compute the said probability are formed by a histogram based method where each bin of a histogram is considered as one granule. The width and number of the bins are automatically determined by the algorithm. The set thus defined, comprising a pair of membership values and the confidence level for having them, is used for the computation of SPEM and thereby identifying the drug resistant miRNAs. The efficiency of SPEM is demonstrated extensively on six data sets. While the achieved F-score in classifying sensitive and resistant samples ranges between 0.31 & 0.50 using all the miRNAs by SVM classifier, the same score varies from 0.67 to 0.94 using only the top 1 percent drug resistant miRNAs. Superiority of the proposed method as compared to some existing ones is established in terms of F-score. The significance of the top 1 percent miRNAs in corresponding cancer is also verified by the different articles based on biological investigations. Source code of SPEM is available at http://www.jayanta.droppages.com/SPEM.html.
Collapse
|
35
|
|
36
|
Effrosynidis D, Arampatzis A. An evaluation of feature selection methods for environmental data. ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2021.101224] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
37
|
Masoudi-Sobhanzadeh Y, Motieghader H, Omidi Y, Masoudi-Nejad A. A machine learning method based on the genetic and world competitive contests algorithms for selecting genes or features in biological applications. Sci Rep 2021; 11:3349. [PMID: 33558580 PMCID: PMC7870651 DOI: 10.1038/s41598-021-82796-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Accepted: 01/25/2021] [Indexed: 01/30/2023] Open
Abstract
Gene/feature selection is an essential preprocessing step for creating models using machine learning techniques. It also plays a critical role in different biological applications such as the identification of biomarkers. Although many feature/gene selection algorithms and methods have been introduced, they may suffer from problems such as parameter tuning or low level of performance. To tackle such limitations, in this study, a universal wrapper approach is introduced based on our introduced optimization algorithm and the genetic algorithm (GA). In the proposed approach, candidate solutions have variable lengths, and a support vector machine scores them. To show the usefulness of the method, thirteen classification and regression-based datasets with different properties were chosen from various biological scopes, including drug discovery, cancer diagnostics, clinical applications, etc. Our findings confirmed that the proposed method outperforms most of the other currently used approaches and can also free the users from difficulties related to the tuning of various parameters. As a result, users may optimize their biological applications such as obtaining a biomarker diagnostic kit with the minimum number of genes and maximum separability power.
Collapse
Affiliation(s)
- Yosef Masoudi-Sobhanzadeh
- grid.412888.f0000 0001 2174 8913Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Habib Motieghader
- grid.459617.80000 0004 0494 2783Department of Bioinformatics, Biotechnology Research Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran ,grid.459617.80000 0004 0494 2783Department of Basic Sciences, Gowgan Educational Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran
| | - Yadollah Omidi
- grid.261241.20000 0001 2168 8324Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Fort Lauderdale, Florida, 33328 USA
| | - Ali Masoudi-Nejad
- grid.46072.370000 0004 0612 7950Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|
38
|
Comprehensive classification models based on amygdala radiomic features for Alzheimer's disease and mild cognitive impairment. Brain Imaging Behav 2021; 15:2377-2386. [PMID: 33537928 DOI: 10.1007/s11682-020-00434-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 11/21/2020] [Accepted: 12/17/2020] [Indexed: 11/26/2022]
Abstract
The amygdala is an important part of the medial temporal lobe and plays a pivotal role in the emotional and cognitive function. The aim of this study was to build and validate comprehensive classification models based on amygdala radiomic features for Alzheimer's disease (AD) and amnestic mild cognitive impairment (aMCI). For the amygdala, 3360 radiomic features were extracted from 97 AD patients, 53 aMCI patients and 45 normal controls (NCs) on the three-dimensional T1-weighted magnetization-prepared rapid gradient echo (MPRAGE) images. We used maximum relevance and minimum redundancy (mRMR) and least absolute shrinkage and selection operator (LASSO) to select the features. Multivariable logistic regression analysis was performed to build three classification models (AD-NC group, AD-aMCI group, and aMCI-NC group). Finally, internal validation was assessed. After two steps of feature selection, there were 5 radiomic features remained in the AD-NC group, 16 features remained in the AD-aMCI group and the aMCI-NC group, respectively. The proposed logistic classification analysis based on amygdala radiomic features achieves an accuracy of 0.90 and an area under the ROC curve (AUC) of 0.93 for AD vs. NC classification, an accuracy of 0.81 and an AUC of 0.84 for AD vs. aMCI classification, and an accuracy of 0.75 and an AUC of 0.80 for aMCI vs. NC classification. Amygdala radiomic features might be early biomarkers for detecting microstructural brain tissue changes during the AD and aMCI course. Logistic classification analysis demonstrated the promising classification performances for clinical applications among AD, aMCI and NC groups.
Collapse
|
39
|
Venkatesh B, Anuradha J. A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS 2021. [DOI: 10.3233/kes-190134] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.
Collapse
|
40
|
Das S, Rai SN. Statistical Approach for Biologically Relevant Gene Selection from High-Throughput Gene Expression Data. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E1205. [PMID: 33286973 PMCID: PMC7712650 DOI: 10.3390/e22111205] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 10/19/2020] [Accepted: 10/21/2020] [Indexed: 12/16/2022]
Abstract
Selection of biologically relevant genes from high-dimensional expression data is a key research problem in gene expression genomics. Most of the available gene selection methods are either based on relevancy or redundancy measure, which are usually adjudged through post selection classification accuracy. Through these methods the ranking of genes was conducted on a single high-dimensional expression data, which led to the selection of spuriously associated and redundant genes. Hence, we developed a statistical approach through combining a support vector machine with Maximum Relevance and Minimum Redundancy under a sound statistical setup for the selection of biologically relevant genes. Here, the genes were selected through statistical significance values and computed using a nonparametric test statistic under a bootstrap-based subject sampling model. Further, a systematic and rigorous evaluation of the proposed approach with nine existing competitive methods was carried on six different real crop gene expression datasets. This performance analysis was carried out under three comparison settings, i.e., subject classification, biological relevant criteria based on quantitative trait loci and gene ontology. Our analytical results showed that the proposed approach selects genes which are more biologically relevant as compared to the existing methods. Moreover, the proposed approach was also found to be better with respect to the competitive existing methods. The proposed statistical approach provides a framework for combining filter and wrapper methods of gene selection.
Collapse
Affiliation(s)
- Samarendra Das
- Division of Statistical Genetics, Indian Council of Agricultural Research (ICAR)-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India;
- Netaji Subhas-Indian Council of Agricultural Research (ICAR) International Fellow, Indian Council of Agricultural Research, Krishi Bhawan, New Delhi 110001, India
- Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40292, USA
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
| | - Shesh N. Rai
- Biostatistics and Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY 40292, USA
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
- Alcohol Research Center, University of Louisville, Louisville, KY 40292, USA
- Department of Hepatobiology and Toxicology, University of Louisville, Louisville, KY 40292, USA
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40292, USA
- Wendell Cherry Chair in Clinical Trial Research, University of Louisville, Louisville, KY 40292, USA
| |
Collapse
|
41
|
A mixed integer linear programming support vector machine for cost-effective feature selection. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106145] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
42
|
Wei G, Zhao J, Feng Y, He A, Yu J. A novel hybrid feature selection method based on dynamic feature importance. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106337] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
43
|
Sparse evolutionary deep learning with over one million artificial neurons on commodity hardware. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05136-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
AbstractArtificial neural networks (ANNs) have emerged as hot topics in the research community. Despite the success of ANNs, it is challenging to train and deploy modern ANNs on commodity hardware due to the ever-increasing model size and the unprecedented growth in the data volumes. Particularly for microarray data, the very high dimensionality and the small number of samples make it difficult for machine learning techniques to handle. Furthermore, specialized hardware such as graphics processing unit (GPU) is expensive. Sparse neural networks are the leading approaches to address these challenges. However, off-the-shelf sparsity-inducing techniques either operate from a pretrained model or enforce the sparse structure via binary masks. The training efficiency of sparse neural networks cannot be obtained practically. In this paper, we introduce a technique allowing us to train truly sparse neural networks with fixed parameter count throughout training. Our experimental results demonstrate that our method can be applied directly to handle high-dimensional data, while achieving higher accuracy than the traditional two-phase approaches. Moreover, we have been able to create truly sparse multilayer perceptron models with over one million neurons and to train them on a typical laptop without GPU (https://github.com/dcmocanu/sparse-evolutionary-artificial-neural-networks/tree/master/SET-MLP-Sparse-Python-Data-Structures), this being way beyond what is possible with any state-of-the-art technique.
Collapse
|
44
|
Granular Mining and Big Data Analytics: Rough Models and Challenges. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES INDIA SECTION A-PHYSICAL SCIENCES 2020. [DOI: 10.1007/s40010-018-0578-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
45
|
Hybrid-Recursive Feature Elimination for Efficient Feature Selection. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10093211] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
As datasets continue to increase in size, it is important to select the optimal feature subset from the original dataset to obtain the best performance in machine learning tasks. Highly dimensional datasets that have an excessive number of features can cause low performance in such tasks. Overfitting is a typical problem. In addition, datasets that are of high dimensionality can create shortages in space and require high computing power, and models fitted to such datasets can produce low classification accuracies. Thus, it is necessary to select a representative subset of features by utilizing an efficient selection method. Many feature selection methods have been proposed, including recursive feature elimination. In this paper, a hybrid-recursive feature elimination method is presented which combines the feature-importance-based recursive feature elimination methods of the support vector machine, random forest, and generalized boosted regression algorithms. From the experiments, we confirm that the performance of the proposed method is superior to that of the three single recursive feature elimination methods.
Collapse
|
46
|
Wang S, Cao Z, Li M, Yue Y. G-DipC: An Improved Feature Representation Method for Short Sequences to Predict the Type of Cargo in Cell-Penetrating Peptides. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:739-747. [PMID: 31352350 DOI: 10.1109/tcbb.2019.2930993] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Cell-penetrating peptides (CPPs) are functional short peptides with high carrying capacity. CPP sequences with targeting functions for the highly efficient delivery of drugs to target cells. In this paper, which is focused on the prediction of the cargo category of CPPs, a biocomputational model is constructed to efficiently distinguish the category of cargo carried by CPPs as macromolecular carriers among the seven known deliverable cargo categories. Based on dipeptide composition (DipC), an improved feature representation method, general dipeptide composition (G-DipC) is proposed for short peptide sequences and can effectively increase the abundance of features represented. Then linear discriminant analysis (LDA) is applied to mine some important low-dimensional features of G-DipC and a predictive model is built with the XGBoost algorithm. Experimental results with five-fold cross validation show that G-DipC improves accuracy by 25 and 5 percent compared with amino acid composition (AAC) and DipC, respectively. G-DipC is even found to be better than tripeptide composition (TipC). Thus, the proposed model provides a novel resource for the study of cell-penetrating peptides, and the improved dipeptide composition G-DipC can be widely adapted to determine the feature representation of other biological sequences.
Collapse
|
47
|
Zhou L, Song X, Yu DJ, Sun J. Sequence-based Detection of DNA-binding Proteins using Multiple-view Features Allied with Feature Selection. Mol Inform 2020; 39:e2000006. [PMID: 32144887 DOI: 10.1002/minf.202000006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Accepted: 03/05/2020] [Indexed: 12/12/2022]
Abstract
DNA-binding proteins play essential roles in many molecular functions and gene regulation. Therefore, it becomes highly desirable to develop effective computational techniques for detecting DNA-binding proteins. In this paper, we proposed a new method, iDBP-DEP, which performs DNA-binding prediction by using the discriminative feature derived from multi-view feature sources including evolutionary profile, dipeptide composition, and physicochemical properties with feature selection. We evaluated iDBP-DEP on two benchmark datasets, i. e., PDB1075 and PDB594 by rigorous Jackknife test. Compared with the state-of-the-art sequence-based DNA-binding predictors, the proposed iDBP-DEP achieved 1.8 % and 3.0 % improvements of accuracy (Acc) and Mathew's Correlation Coefficient (MCC), respectively, on PDB1075 dataset; 7.4 % and 14.8 % improvements of Acc and MCC, respectively, on PDB594. The independent validation test with PDB186 show that the proposed method achieved the best performances on Acc (80.1 %) and MCC (0.684), which further demonstrated the robustness of iDBP-DEP for the detection of DNA-binding proteins. Datasets and codes used in this study are freely available at https://githup.com/Zll-codeside/iDBP-DEP.
Collapse
Affiliation(s)
- Liling Zhou
- School of Internet of Things Engineering, Jiangnan University, Wuxi, China
| | - Xiaoning Song
- School of Internet of Things Engineering, Jiangnan University, Wuxi, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Jun Sun
- School of Internet of Things Engineering, Jiangnan University, Wuxi, China
| |
Collapse
|
48
|
Han X, Li D, Liu P, Wang L. Feature selection by recursive binary gravitational search algorithm optimization for cancer classification. Soft comput 2020. [DOI: 10.1007/s00500-019-04203-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
49
|
Poduval M, Ghose A, Manchanda S, Bagaria V, Sinha A. Artificial Intelligence and Machine Learning: A New Disruptive Force in Orthopaedics. Indian J Orthop 2020; 54:109-122. [PMID: 32257027 PMCID: PMC7096590 DOI: 10.1007/s43465-019-00023-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Accepted: 09/18/2019] [Indexed: 02/04/2023]
Abstract
Orthopaedics as a surgical discipline requires a combination of good clinical acumen, good surgical skill, a reasonable physical strength and most of all, good understanding of technology. The last few decades have seen rapid adoption of new technologies into orthopaedic practice, power tools, new implants, CAD-CAM design, 3-D printing, additive manufacturing just to name a few. The new disruption in orthopaedics in the current time and era is undoubtedly the advent of artificial intelligence and robotics. As these technologies take root and innovative applications continue to be incorporated into the main-stream orthopedics, as we know it today, it is imperative to look at and understand the basics of artificial intelligence and what work is being done in the field today. This article takes the form of a loosely structured narrative review and will introduce the reader to key concepts in the field of artificial intelligence as well as some of the directions in application of the same in orthopaedics. Some of the recent work has been summarised and we present our viewpoint at the conclusion as to why we must consider artificial intelligence as a disrupting positive influence on orthopaedic surgery.
Collapse
Affiliation(s)
- Murali Poduval
- Tata Consultancy Services, Unit 129/130, SDF V, SEEPZ, Andheri East, Mumbai, 400093 India
| | - Avik Ghose
- TCS Research and Innovation, Tata Consultancy Services, Kolkata, 700160 India
| | - Sanjeev Manchanda
- TCS Research and Innovation, Tata Consultancy Services, Unit 129/130, SEEPZ, Andheri East, Mumbai, 400096 India
| | | | - Aniruddha Sinha
- TCS Research and Innovation, Tata Consultancy Services, Kolkata, 700160 India
| |
Collapse
|
50
|
Zhang Y, Li A, He J, Wang M. A Novel MKL Method for GBM Prognosis Prediction by Integrating Histopathological Image and Multi-Omics Data. IEEE J Biomed Health Inform 2020; 24:171-179. [DOI: 10.1109/jbhi.2019.2898471] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|