Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Tsai CF, Sung YT. Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106097] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

For:	Tsai CF, Sung YT. Ensemble feature selection in high dimension, low sample size datasets: Parallel and serial combination approaches. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106097] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Number

Cited by Other Article(s)

Nejadshamsi S, Karami V, Ghourchian N, Armanfard N, Bergman H, Grad R, Wilchesky M, Khanassov V, Vedel I, Abbasgholizadeh Rahimi S. Development and Feasibility Study of HOPE Model for Prediction of Depression Among Older Adults Using Wi-Fi-based Motion Sensor Data: Machine Learning Study. JMIR Aging 2025;8:e67715. [PMID: 40053734 PMCID: PMC11914842 DOI: 10.2196/67715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 12/12/2024] [Accepted: 12/19/2024] [Indexed: 03/09/2025] Open

Abstract

BACKGROUND

Depression, characterized by persistent sadness and loss of interest in daily activities, greatly reduces quality of life. Early detection is vital for effective treatment and intervention. While many studies use wearable devices to classify depression based on physical activity, these often rely on intrusive methods. Additionally, most depression classification studies involve large participant groups and use single-stage classifiers without explainability.

OBJECTIVE

This study aims to assess the feasibility of classifying depression using nonintrusive Wi-Fi-based motion sensor data using a novel machine learning model on a limited number of participants. We also conduct an explainability analysis to interpret the model's predictions and identify key features associated with depression classification.

METHODS

In this study, we recruited adults aged 65 years and older through web-based and in-person methods, supported by a McGill University health care facility directory. Participants provided consent, and we collected 6 months of activity and sleep data via nonintrusive Wi-Fi-based sensors, along with Edmonton Frailty Scale and Geriatric Depression Scale data. For depression classification, we proposed a HOPE (Home-Based Older Adults' Depression Prediction) machine learning model with feature selection, dimensionality reduction, and classification stages, evaluating various model combinations using accuracy, sensitivity, precision, and F1-score. Shapely addictive explanations and local interpretable model-agnostic explanations were used to explain the model's predictions.

RESULTS

A total of 6 participants were enrolled in this study; however, 2 participants withdrew later due to internet connectivity issues. Among the 4 remaining participants, 3 participants were classified as not having depression, while 1 participant was identified as having depression. The most accurate classification model, which combined sequential forward selection for feature selection, principal component analysis for dimensionality reduction, and a decision tree for classification, achieved an accuracy of 87.5%, sensitivity of 90%, and precision of 88.3%, effectively distinguishing individuals with and those without depression. The explainability analysis revealed that the most influential features in depression classification, in order of importance, were "average sleep duration," "total number of sleep interruptions," "percentage of nights with sleep interruptions," "average duration of sleep interruptions," and "Edmonton Frailty Scale."

CONCLUSIONS

The findings from this preliminary study demonstrate the feasibility of using Wi-Fi-based motion sensors for depression classification and highlight the effectiveness of our proposed HOPE machine learning model, even with a small sample size. These results suggest the potential for further research with a larger cohort for more comprehensive validation. Additionally, the nonintrusive data collection method and model architecture proposed in this study offer promising applications in remote health monitoring, particularly for older adults who may face challenges in using wearable devices. Furthermore, the importance of sleep patterns identified in our explainability analysis aligns with findings from previous research, emphasizing the need for more in-depth studies on the role of sleep in mental health, as suggested in the explainable machine learning study.

Collapse

Affiliation(s)

Shayan Nejadshamsi Mila-Quebec Artificial Intelligence Institute, Montreal, QC, Canada Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
Vania Karami Mila-Quebec Artificial Intelligence Institute, Montreal, QC, Canada Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
Negar Ghourchian Aerial Technologies, Montreal, QC, Canada
Narges Armanfard Mila-Quebec Artificial Intelligence Institute, Montreal, QC, Canada Department of Electrical and Computer Engineering, Faculty of Engineering, McGill University, Montreal, QC, Canada
Howard Bergman Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
Roland Grad Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
Machelle Wilchesky Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada Donald Berman Maimonides Centre for Research in Aging, Montreal, QC, Canada
Vladimir Khanassov Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
Isabelle Vedel Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada
Samira Abbasgholizadeh Rahimi Mila-Quebec Artificial Intelligence Institute, Montreal, QC, Canada Family Medicine Department, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada Faculty of Dental Medicine and Oral Health Sciences, McGill University, Montreal, Canada

Collapse

Ghavidel A, Pazos P. Machine learning (ML) techniques to predict breast cancer in imbalanced datasets: a systematic review. J Cancer Surviv 2025;19:270-294. [PMID: 37749361 DOI: 10.1007/s11764-023-01465-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/09/2023] [Indexed: 09/27/2023]

Gómez-Martínez V, Chushig-Muzo D, Veierød MB, Granja C, Soguero-Ruiz C. Ensemble feature selection and tabular data augmentation with generative adversarial networks to enhance cutaneous melanoma identification and interpretability. BioData Min 2024;17:46. [PMID: 39478549 PMCID: PMC11526724 DOI: 10.1186/s13040-024-00397-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 10/09/2024] [Indexed: 11/02/2024] Open

Abstract

BACKGROUND

Cutaneous melanoma is the most aggressive form of skin cancer, responsible for most skin cancer-related deaths. Recent advances in artificial intelligence, jointly with the availability of public dermoscopy image datasets, have allowed to assist dermatologists in melanoma identification. While image feature extraction holds potential for melanoma detection, it often leads to high-dimensional data. Furthermore, most image datasets present the class imbalance problem, where a few classes have numerous samples, whereas others are under-represented.

METHODS

In this paper, we propose to combine ensemble feature selection (FS) methods and data augmentation with the conditional tabular generative adversarial networks (CTGAN) to enhance melanoma identification in imbalanced datasets. We employed dermoscopy images from two public datasets, PH2 and Derm7pt, which contain melanoma and not-melanoma lesions. To capture intrinsic information from skin lesions, we conduct two feature extraction (FE) approaches, including handcrafted and embedding features. For the former, color, geometric and first-, second-, and higher-order texture features were extracted, whereas for the latter, embeddings were obtained using ResNet-based models. To alleviate the high-dimensionality in the FE, ensemble FS with filter methods were used and evaluated. For data augmentation, we conducted a progressive analysis of the imbalance ratio (IR), related to the amount of synthetic samples created, and evaluated the impact on the predictive results. To gain interpretability on predictive models, we used SHAP, bootstrap resampling statistical tests and UMAP visualizations.

RESULTS

The combination of ensemble FS, CTGAN, and linear models achieved the best predictive results, achieving AUCROC values of 87% (with support vector machine and IR=0.9) and 76% (with LASSO and IR=1.0) for the PH2 and Derm7pt, respectively. We also identified that melanoma lesions were mainly characterized by features related to color, while not-melanoma lesions were characterized by texture features.

CONCLUSIONS

Our results demonstrate the effectiveness of ensemble FS and synthetic data in the development of models that accurately identify melanoma. This research advances skin lesion analysis, contributing to both melanoma detection and the interpretation of main features for its identification.

Collapse

Long D, Chan M, Han M, Kamdar Z, Ma RK, Tsai PY, Francisco AB, Barrow J, Shackelford DB, Yarchoan M, McBride MJ, Orre LM, Vacanti NM, Gujral TS, Sethupathy P. Proteo-metabolomics and patient tumor slice experiments point to amino acid centrality for rewired mitochondria in fibrolamellar carcinoma. Cell Rep Med 2024;5:101699. [PMID: 39208801 PMCID: PMC11528240 DOI: 10.1016/j.xcrm.2024.101699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 06/12/2024] [Accepted: 08/03/2024] [Indexed: 09/04/2024]

Vahed SZ, Khatibi SMH, Saadat YR, Emdadi M, Khodaei B, Alishani MM, Boostani F, Dizaj SM, Pirmoradi S. Introducing effective genes in lymph node metastasis of breast cancer patients using SHAP values based on the mRNA expression data. PLoS One 2024;19:e0308531. [PMID: 39150915 PMCID: PMC11329117 DOI: 10.1371/journal.pone.0308531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 07/24/2024] [Indexed: 08/18/2024] Open

Abstract

OBJECTIVE

Breast cancer, a global concern predominantly impacting women, poses a significant threat when not identified early. While survival rates for breast cancer patients are typically favorable, the emergence of regional metastases markedly diminishes survival prospects. Detecting metastases and comprehending their molecular underpinnings are crucial for tailoring effective treatments and improving patient survival outcomes.

METHODS

Various artificial intelligence methods and techniques were employed in this study to achieve accurate outcomes. Initially, the data was organized and underwent hold-out cross-validation, data cleaning, and normalization. Subsequently, feature selection was conducted using ANOVA and binary Particle Swarm Optimization (PSO). During the analysis phase, the discriminative power of the selected features was evaluated using machine learning classification algorithms. Finally, the selected features were considered, and the SHAP algorithm was utilized to identify the most significant features for enhancing the decoding of dominant molecular mechanisms in lymph node metastases.

RESULTS

In this study, five main steps were followed for the analysis of mRNA expression data: reading, preprocessing, feature selection, classification, and SHAP algorithm. The RF classifier utilized the candidate mRNAs to differentiate between negative and positive categories with an accuracy of 61% and an AUC of 0.6. During the SHAP process, intriguing relationships between the selected mRNAs and positive/negative lymph node status were discovered. The results indicate that GDF5, BAHCC1, LCN2, FGF14-AS2, and IDH2 are among the top five most impactful mRNAs based on their SHAP values.

CONCLUSION

The prominent identified mRNAs including GDF5, BAHCC1, LCN2, FGF14-AS2, and IDH2, are implicated in lymph node metastasis. This study holds promise in elucidating a thorough insight into key candidate genes that could significantly impact the early detection and tailored therapeutic strategies for lymph node metastasis in patients with breast cancer.

Collapse

Jain S, Safo SE. DeepIDA-GRU: a deep learning pipeline for integrative discriminant analysis of cross-sectional and longitudinal multiview data with applications to inflammatory bowel disease classification. Brief Bioinform 2024;25:bbae339. [PMID: 39007595 PMCID: PMC11771283 DOI: 10.1093/bib/bbae339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 02/29/2024] [Accepted: 06/28/2024] [Indexed: 07/16/2024] Open

Tang S, Li Z. EEG complexity measures for detecting mind wandering during video-based learning. Sci Rep 2024;14:8209. [PMID: 38589498 PMCID: PMC11001605 DOI: 10.1038/s41598-024-58889-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 04/04/2024] [Indexed: 04/10/2024] Open

Pirmoradi S, Hosseiniyan Khatibi SM, Zununi Vahed S, Homaei Rad H, Khamaneh AM, Akbarpour Z, Seyedrezazadeh E, Teshnehlab M, Chapman KR, Ansarin K. Unraveling the link between PTBP1 and severe asthma through machine learning and association rule mining method. Sci Rep 2023;13:15399. [PMID: 37717070 PMCID: PMC10505163 DOI: 10.1038/s41598-023-42581-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 09/12/2023] [Indexed: 09/18/2023] Open

Wu X, Jia W. Multimodal deep learning as a next challenge in nutrition research: tailoring fermented dairy products based on cytidine diphosphate-diacylglycerol synthase-mediated lipid metabolism. Crit Rev Food Sci Nutr 2023;64:12272-12283. [PMID: 37615630 DOI: 10.1080/10408398.2023.2248633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2023]

Jiang X, Hu Z, Wang S, Zhang Y. Deep Learning for Medical Image-Based Cancer Diagnosis. Cancers (Basel) 2023;15:3608. [PMID: 37509272 PMCID: PMC10377683 DOI: 10.3390/cancers15143608] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 07/10/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023] Open

Abstract

(1) Background: The application of deep learning technology to realize cancer diagnosis based on medical images is one of the research hotspots in the field of artificial intelligence and computer vision. Due to the rapid development of deep learning methods, cancer diagnosis requires very high accuracy and timeliness as well as the inherent particularity and complexity of medical imaging. A comprehensive review of relevant studies is necessary to help readers better understand the current research status and ideas. (2) Methods: Five radiological images, including X-ray, ultrasound (US), computed tomography (CT), magnetic resonance imaging (MRI), positron emission computed tomography (PET), and histopathological images, are reviewed in this paper. The basic architecture of deep learning and classical pretrained models are comprehensively reviewed. In particular, advanced neural networks emerging in recent years, including transfer learning, ensemble learning (EL), graph neural network, and vision transformer (ViT), are introduced. Five overfitting prevention methods are summarized: batch normalization, dropout, weight initialization, and data augmentation. The application of deep learning technology in medical image-based cancer analysis is sorted out. (3) Results: Deep learning has achieved great success in medical image-based cancer diagnosis, showing good results in image classification, image reconstruction, image detection, image segmentation, image registration, and image synthesis. However, the lack of high-quality labeled datasets limits the role of deep learning and faces challenges in rare cancer diagnosis, multi-modal image fusion, model explainability, and generalization. (4) Conclusions: There is a need for more public standard databases for cancer. The pre-training model based on deep neural networks has the potential to be improved, and special attention should be paid to the research of multimodal data fusion and supervised paradigm. Technologies such as ViT, ensemble learning, and few-shot learning will bring surprises to cancer diagnosis based on medical images.

Collapse

Khan Mamun MMR, Elfouly T. Detection of Cardiovascular Disease from Clinical Parameters Using a One-Dimensional Convolutional Neural Network. Bioengineering (Basel) 2023;10:796. [PMID: 37508823 PMCID: PMC10376462 DOI: 10.3390/bioengineering10070796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 06/29/2023] [Accepted: 06/30/2023] [Indexed: 07/30/2023] Open

Lin Y, Ma J, Sun DW, Cheng JH, Wang Q. A pH-Responsive colourimetric sensor array based on machine learning for real-time monitoring of beef freshness. Food Control 2023. [DOI: 10.1016/j.foodcont.2023.109729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]

Dimitsaki S, Gavriilidis GI, Dimitriadis VK, Natsiavas P. Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence. Artif Intell Med 2023;137:102490. [PMID: 36868685 PMCID: PMC9846931 DOI: 10.1016/j.artmed.2023.102490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 01/10/2023] [Accepted: 01/11/2023] [Indexed: 01/19/2023]

Abstract

The SARS-CoV-2 pandemic highlighted the need for software tools that could facilitate patient triage regarding potential disease severity or even death. In this article, an ensemble of Machine Learning (ML) algorithms is evaluated in terms of predicting the severity of their condition using plasma proteomics and clinical data as input. An overview of AI-based technical developments to support COVID-19 patient management is presented outlining the landscape of relevant technical developments. Based on this review, the use of an ensemble of ML algorithms that analyze clinical and biological data (i.e., plasma proteomics) of COVID-19 patients is designed and deployed to evaluate the potential use of AI for early COVID-19 patient triage. The proposed pipeline is evaluated using three publicly available datasets for training and testing. Three ML "tasks" are defined, and several algorithms are tested through a hyperparameter tuning method to identify the highest-performance models. As overfitting is one of the typical pitfalls for such approaches (mainly due to the size of the training/validation datasets), a variety of evaluation metrics are used to mitigate this risk. In the evaluation procedure, recall scores ranged from 0.6 to 0.74 and F1-score from 0.62 to 0.75. The best performance is observed via Multi-Layer Perceptron (MLP) and Support Vector Machines (SVM) algorithms. Additionally, input data (proteomics and clinical data) were ranked based on corresponding Shapley additive explanation (SHAP) values and evaluated for their prognosticated capacity and immuno-biological credence. This "interpretable" approach revealed that our ML models could discern critical COVID-19 cases predominantly based on patient's age and plasma proteins on B cell dysfunction, hyper-activation of inflammatory pathways like Toll-like receptors, and hypo-activation of developmental and immune pathways like SCF/c-Kit signaling. Finally, the herein computational workflow is corroborated in an independent dataset and MLP superiority along with the implication of the abovementioned predictive biological pathways are corroborated. Regarding limitations of the presented ML pipeline, the datasets used in this study contain less than 1000 observations and a significant number of input features hence constituting a high-dimensional low-sample (HDLS) dataset which could be sensitive to overfitting. An advantage of the proposed pipeline is that it combines biological data (plasma proteomics) with clinical-phenotypic data. Thus, in principle, the presented approach could enable patient triage in a timely fashion if used on already trained models. However, larger datasets and further systematic validation are needed to confirm the potential clinical value of this approach. The code is available on Github: https://github.com/inab-certh/Predicting-COVID-19-severity-through-interpretable-AI-analysis-of-plasma-proteomics.

Collapse

Re-ranking and TOPSIS-based ensemble feature selection with multi-stage aggregation for text categorization. Pattern Recognit Lett 2023. [DOI: 10.1016/j.patrec.2023.02.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]

Wu Y, Zhu D, Wang X. Tree enhanced deep adaptive network for cancer prediction with high dimension low sample size microarray data. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Improved swarm-optimization-based filter-wrapper gene selection from microarray data for gene expression tumor classification. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-022-01117-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Nematzadeh H, García-Nieto J, Navas-Delgado I, Aldana-Montes JF. Automatic frequency-based feature selection using discrete weighted evolution strategy. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Colombelli F, Kowalski TW, Recamonde-Mendoza M. A hybrid ensemble feature selection design for candidate biomarkers discovery from transcriptome profiles. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]

Panels of mRNAs and miRNAs for decoding molecular mechanisms of Renal Cell Carcinoma (RCC) subtypes utilizing Artificial Intelligence approaches. Sci Rep 2022;12:16393. [PMID: 36180558 PMCID: PMC9525704 DOI: 10.1038/s41598-022-20783-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 09/19/2022] [Indexed: 11/12/2022] Open

Abstract

Renal Cell Carcinoma (RCC) encompasses three histological subtypes, including clear cell RCC (KIRC), papillary RCC (KIRP), and chromophobe RCC (KICH) each of which has different clinical courses, genetic/epigenetic drivers, and therapeutic responses. This study aimed to identify the significant mRNAs and microRNA panels involved in the pathogenesis of RCC subtypes. The mRNA and microRNA transcripts profile were obtained from The Cancer Genome Atlas (TCGA), which were included 611 ccRCC patients, 321 pRCC patients, and 89 chRCC patients for mRNA data and 616 patients in the ccRCC subtype, 326 patients in the pRCC subtype, and 91 patients in the chRCC for miRNA data, respectively. To identify mRNAs and miRNAs, feature selection based on filter and graph algorithms was applied. Then, a deep model was used to classify the subtypes of the RCC. Finally, an association rule mining algorithm was used to disclose features with significant roles to trigger molecular mechanisms to cause RCC subtypes. Panels of 77 mRNAs and 73 miRNAs could discriminate the KIRC, KIRP, and KICH subtypes from each other with 92% (F1-score ≥ 0.9, AUC ≥ 0.89) and 95% accuracy (F1-score ≥ 0.93, AUC ≥ 0.95), respectively. The Association Rule Mining analysis could identify miR-28 (repeat count = 2642) and CSN7A (repeat count = 5794) along with the miR-125a (repeat count = 2591) and NMD3 (repeat count = 2306) with the highest repeat counts, in the KIRC and KIRP rules, respectively. This study found new panels of mRNAs and miRNAs to distinguish among RCC subtypes, which were able to provide new insights into the underlying responsible mechanisms for the initiation and progression of KIRC and KIRP. The proposed mRNA and miRNA panels have a high potential to be as biomarkers of RCC subtypes and should be examined in future clinical studies.

Collapse

Network-based dimensionality reduction of high-dimensional, low-sample-size datasets. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Prasetiyowati MI, Maulidevi NU, Surendro K. The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy. PeerJ Comput Sci 2022;8:e1041. [PMID: 35875646 PMCID: PMC9299283 DOI: 10.7717/peerj-cs.1041] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 06/22/2022] [Indexed: 06/12/2023]

An ensemble framework for microarray data classification based on feature subspace partitioning. Comput Biol Med 2022;148:105820. [PMID: 35872409 DOI: 10.1016/j.compbiomed.2022.105820] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 06/05/2022] [Accepted: 07/03/2022] [Indexed: 12/14/2022]

Pudjihartono N, Fadason T, Kempa-Liehr AW, O'Sullivan JM. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. FRONTIERS IN BIOINFORMATICS 2022;2:927312. [PMID: 36304293 PMCID: PMC9580915 DOI: 10.3389/fbinf.2022.927312] [Citation(s) in RCA: 172] [Impact Index Per Article: 57.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Accepted: 06/03/2022] [Indexed: 01/14/2023] Open

An Effective Ensemble Automatic Feature Selection Method for Network Intrusion Detection. INFORMATION 2022. [DOI: 10.3390/info13070314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Aguilera A, Pezoa R, Rodríguez-Delherbe A. A novel ensemble feature selection method for pixel-level segmentation of HER2 overexpression. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00774-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Hoffmann Souza ML, da Costa CA, de Oliveira Ramos G, da Rosa Righi R. A feature identification method to explain anomalies in condition monitoring. COMPUT IND 2021. [DOI: 10.1016/j.compind.2021.103528] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Yang P, Huang H, Liu C. Feature selection revisited in the single-cell era. Genome Biol 2021;22:321. [PMID: 34847932 PMCID: PMC8638336 DOI: 10.1186/s13059-021-02544-3] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 11/15/2021] [Indexed: 12/13/2022] Open

Ma B, Tang Q, Qin Y, Bashir MF. Policyholder cluster divergence based differential premium in diabetes insurance. MANAGERIAL AND DECISION ECONOMICS 2021;42:1793-1807. [DOI: 10.1002/mde.3345] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 04/04/2021] [Indexed: 01/04/2025]

Jiang Z, Zhang Y, Wang J. A multi-surrogate-assisted dual-layer ensemble feature selection algorithm. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Chen X, Wang Q, Zhuang S. Ensemble dimension reduction based on spectral disturbance for subspace clustering. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107182] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Feature selection via max-independent ratio and min-redundant ratio based on adaptive weighted kernel density estimation. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.03.049] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Wang Z, Tsai CF, Lin WC. Data cleaning issues in class imbalanced datasets: instance selection and missing values imputation for one-class classifiers. DATA TECHNOLOGIES AND APPLICATIONS 2021. [DOI: 10.1108/dta-01-2021-0027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Abstract PurposeClass imbalance learning, which exists in many domain problem datasets, is an important research topic in data mining and machine learning. One-class classification techniques, which aim to identify anomalies as the minority class from the normal data as the majority class, are one representative solution for class imbalanced datasets. Since one-class classifiers are trained using only normal data to create a decision boundary for later anomaly detection, the quality of the training set, i.e. the majority class, is one key factor that affects the performance of one-class classifiers.Design/methodology/approachIn this paper, we focus on two data cleaning or preprocessing methods to address class imbalanced datasets. The first method examines whether performing instance selection to remove some noisy data from the majority class can improve the performance of one-class classifiers. The second method combines instance selection and missing value imputation, where the latter is used to handle incomplete datasets that contain missing values.FindingsThe experimental results are based on 44 class imbalanced datasets; three instance selection algorithms, including IB3, DROP3 and the GA, the CART decision tree for missing value imputation, and three one-class classifiers, which include OCSVM, IFOREST and LOF, show that if the instance selection algorithm is carefully chosen, performing this step could improve the quality of the training data, which makes one-class classifiers outperform the baselines without instance selection. Moreover, when class imbalanced datasets contain some missing values, combining missing value imputation and instance selection, regardless of which step is first performed, can maintain similar data quality as datasets without missing values.Originality/valueThe novelty of this paper is to investigate the effect of performing instance selection on the performance of one-class classifiers, which has never been done before. Moreover, this study is the first attempt to consider the scenario of missing values that exist in the training set for training one-class classifiers. In this case, performing missing value imputation and instance selection with different orders are compared. Collapse

Qu Y, Wang P, Liu B, Song C, Wang D, Yang H, Zhang Z, Chen P, Kang X, Du K, Yao H, Zhou B, Han T, Zuo N, Han Y, Lu J, Yu C, Zhang X, Jiang T, Zhou Y, Liu Y. AI4AD: Artificial intelligence analysis for Alzheimer's disease classification based on a multisite DTI database. BRAIN DISORDERS 2021. [DOI: 10.1016/j.dscb.2021.100005] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open