1
|
Zhou X, Chen Y, Heidari AA, Chen H, Chen X. Rough hypervolume-driven feature selection with groupwise intelligent sampling for detecting clinical characterization of lupus nephritis. Artif Intell Med 2025; 160:103042. [PMID: 39673961 DOI: 10.1016/j.artmed.2024.103042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 09/06/2024] [Accepted: 11/23/2024] [Indexed: 12/16/2024]
Abstract
Systemic lupus erythematosus (SLE) is an autoimmune inflammatory disease. Lupus nephritis (LN) is a major risk factor for morbidity and mortality in SLE. Proliferative and pure membranous LN have different prognoses and may require different treatments. This study proposes a binary rough hypervolume-driven spherical evolution algorithm with groupwise intelligent sampling (bRGSE). The efficient dimensionality reduction capability of the bRGSE is verified across twelve datasets. These datasets are from the public datasets, with feature dimensions ranging from seven hundred to fifty thousand. The experimental results indicate that bRGSE performs better than seven high-performing alternatives. Then, the bRGSE was combined with adaptive boosting (AdaBoost) to form a new model (bRGSE_AdaBoost), which analyzed clinical records collected from 110 patients with LN. Experimental results show that the proposed bRGSE_AdaBoost can identify the most critical indicators, including urine latent blood, white blood cells, endogenous creatinine clearing rate, and age. These indicators may help differentiate between proliferative LN and membranous LN. The proposed bRGSE algorithm is an efficient dimensionality reduction method. The developed bRGSE_AdaBoost model, a computer-aided model, achieved an accuracy of 96.687 % and is expected to provide early warning for the treatment and diagnosis of LN.
Collapse
Affiliation(s)
- Xinsen Zhou
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Yi Chen
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Ali Asghar Heidari
- School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran, Iran
| | - Huiling Chen
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China.
| | - Xiaowei Chen
- Department of Rheumatology and Immunology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou 325000, China.
| |
Collapse
|
2
|
Jiang L, Jia L, Wang Y, Wu Y, Yue J. Adap-BDCM: Adaptive Bilinear Dynamic Cascade Model for Classification Tasks on CNV Datasets. Interdiscip Sci 2024; 16:1019-1037. [PMID: 38758306 DOI: 10.1007/s12539-024-00635-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 04/18/2024] [Accepted: 04/23/2024] [Indexed: 05/18/2024]
Abstract
Copy number variation (CNV) is an essential genetic driving factor of cancer formation and progression, making intelligent classification based on CNV feasible. However, there are a few challenges in the current machine learning and deep learning methods, such as the design of base classifier combination schemes in ensemble methods and the selection of layers of neural networks, which often result in low accuracy. Therefore, an adaptive bilinear dynamic cascade model (Adap-BDCM) is developed to further enhance the accuracy and applicability of these methods for intelligent classification on CNV datasets. In this model, a feature selection module is introduced to mitigate the interference of redundant information, and a bilinear model based on the gated attention mechanism is proposed to extract more beneficial deep fusion features. Furthermore, an adaptive base classifier selection scheme is designed to overcome the difficulty of manually designing base classifier combinations and enhance the applicability of the model. Lastly, a novel feature fusion scheme with an attribute recall submodule is constructed, effectively avoiding getting stuck in local solutions and missing some valuable information. Numerous experiments have demonstrated that our Adap-BDCM model exhibits optimal performance in cancer classification, stage prediction, and recurrence on CNV datasets. This study can assist physicians in making diagnoses faster and better.
Collapse
Affiliation(s)
- Liancheng Jiang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China
| | - Liye Jia
- College of Computer Science and Technology, Taiyuan Normal University, Taiyuan, 030619, China
| | - Yizhen Wang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China
| | - Yongfei Wu
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China
| | - Junhong Yue
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, 030600, China.
| |
Collapse
|
3
|
Longkumer I, Mazumder DH. A novel parallel feature rank aggregation algorithm for gene selection applied to microarray data classification. Comput Biol Chem 2024; 112:108182. [PMID: 39197395 DOI: 10.1016/j.compbiolchem.2024.108182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 07/07/2024] [Accepted: 08/22/2024] [Indexed: 09/01/2024]
Abstract
Microarray data often comprises numerous genes, yet not all genes are relevant for predicting cancer. Feature selection becomes a crucial step to reduce the high dimensionality in these kinds of data. While no single feature selection method consistently outperforms others across diverse domains, the combination of multiple feature selectors or rankers tends to produce more effective results compared to relying on a single ranker alone. However, this approach can be computationally expensive, particularly when handling a large quantity of features. Hence, this paper presents a parallel feature rank aggregation that utilizes borda count as the rank aggregator. The concept of vertically partitioning the data along feature space was adapted to ease the parallel execution of the aggregation task. Features were selected based on the final aggregated rank list, and their classification performances were evaluated. The model's execution time was also observed across multiple worker nodes of the cluster. The experiment was conducted on six benchmark microarray datasets. The results show the capability of the proposed distributed framework compared to the sequential version in all the cases. It also illustrated the improved accuracy performance of the proposed method and its ability to select a minimal number of genes.
Collapse
Affiliation(s)
- Imtisenla Longkumer
- National Institute of Technology Nagaland, Chumukedima, Dimapur, Nagaland 797103, India
| | | |
Collapse
|
4
|
A multiple criteria ensemble pruning method for binary classification based on D-S theory of evidence. INT J MACH LEARN CYB 2023. [DOI: 10.1007/s13042-022-01690-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]
|
5
|
Bareen MA, Joshi S, Sahu JK, Prakash S, Bhandari B. Correlating process parameters and print accuracy of 3D-printable heat acid coagulated milk semisolids and polyol matrix: implications for testing methods. Food Res Int 2023; 167:112661. [PMID: 37087248 DOI: 10.1016/j.foodres.2023.112661] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 02/01/2023] [Accepted: 03/05/2023] [Indexed: 03/11/2023]
Abstract
The primary additive manufacturing (AM) technique for all high-viscosity food composites is extrusion-based. Therefore, understanding the impact of process parameters involved is crucial in fulfilling the demand characteristics of the printed constructs. In this regard, a correlation between print accuracy and critical 3D printing (3DP) process variables as a strategy for expediting the selection of 3D printable food inks has the most potential for success. This paper studies the effectiveness of using heat-acid coagulated milk semisolids and polyol matrix as 3D printable food ink for high-quality prints. The study focused on the critical material properties and conducted rheological characterization and particle size distribution analysis. The study obtained the effective range of printing parameters for various process variables using a mathematical model that employed finite element analysis (FEA) to define the flow field characteristics. The dimensional accuracy of the printed constructs under different process variables was determined by utilizing image processing methods. A multi-objective optimization was carried out using the desirability function method to obtain the key correlations between the process parameters for the best-printed construct.
Collapse
|
6
|
Re-ranking and TOPSIS-based ensemble feature selection with multi-stage aggregation for text categorization. Pattern Recognit Lett 2023. [DOI: 10.1016/j.patrec.2023.02.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
|
7
|
Przybyła-Kasperek M, Kusztal K. New Classification Method for Independent Data Sources Using Pawlak Conflict Model and Decision Trees. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1604. [PMID: 36359694 PMCID: PMC9689716 DOI: 10.3390/e24111604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 10/31/2022] [Accepted: 11/01/2022] [Indexed: 06/16/2023]
Abstract
The research concerns data collected in independent sets-more specifically, in local decision tables. A possible approach to managing these data is to build local classifiers based on each table individually. In the literature, many approaches toward combining the final prediction results of independent classifiers can be found, but insufficient efforts have been made on the study of tables' cooperation and coalitions' formation. The importance of such an approach was expected on two levels. First, the impact on the quality of classification-the ability to build combined classifiers for coalitions of tables should allow for the learning of more generalized concepts. In turn, this should have an impact on the quality of classification of new objects. Second, combining tables into coalitions will result in reduced computational complexity-a reduced number of classifiers will be built. The paper proposes a new method for creating coalitions of local tables and generating an aggregated classifier for each coalition. Coalitions are generated by determining certain characteristics of attribute values occurring in local tables and applying the Pawlak conflict analysis model. In the study, the classification and regression trees with Gini index are built based on the aggregated table for one coalition. The system bears a hierarchical structure, as in the next stage the decisions generated by the classifiers for coalitions are aggregated using majority voting. The classification quality of the proposed system was compared with an approach that does not use local data cooperation and coalition creation. The structure of the system is parallel and decision trees are built independently for local tables. In the paper, it was shown that the proposed approach provides a significant improvement in classification quality and execution time. The Wilcoxon test confirmed that differences in accuracy rate of the results obtained for the proposed method and results obtained without coalitions are significant, with a p level = 0.005. The average accuracy rate values obtained for the proposed approach and the approach without coalitions are, respectively: 0.847 and 0.812; so the difference is quite large. Moreover, the algorithm implementing the proposed approach performed up to 21-times faster than the algorithm implementing the approach without using coalitions.
Collapse
|
8
|
Li X, Luo C. Neighborhood rough cognitive networks. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
9
|
Liu K, Li T, Yang X, Yang X, Liu D. Neighborhood rough set based ensemble feature selection with cross-class sample granulation. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
10
|
Matharaarachchi S, Domaratzki M, Muthukumarana S. Minimizing features while maintaining performance in data classification problems. PeerJ Comput Sci 2022; 8:e1081. [PMID: 36262135 PMCID: PMC9575878 DOI: 10.7717/peerj-cs.1081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 08/10/2022] [Indexed: 06/16/2023]
Abstract
High dimensional classification problems have gained increasing attention in machine learning, and feature selection has become essential in executing machine learning algorithms. In general, most feature selection methods compare the scores of several feature subsets and select the one that gives the maximum score. There may be other selections of a lower number of features with a lower score, yet the difference is negligible. This article proposes and applies an extended version of such feature selection methods, which selects a smaller feature subset with similar performance to the original subset under a pre-defined threshold. It further validates the suggested extended version of the Principal Component Loading Feature Selection (PCLFS-ext) results by simulating data for several practical scenarios with different numbers of features and different imbalance rates on several classification methods. Our simulated results show that the proposed method outperforms the original PCLFS and existing Recursive Feature Elimination (RFE) by giving reasonable feature reduction on various data sets, which is important in some applications.
Collapse
Affiliation(s)
| | - Mike Domaratzki
- Computer Science, University of Western Ontario, London, Ontario, Canada
| | | |
Collapse
|
11
|
Ensemble feature selection for multi‐label text classification: An intelligent order statistics approach. INT J INTELL SYST 2022. [DOI: 10.1002/int.23044] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
12
|
Bayati H, Dowlatshahi MB, Hashemi A. MSSL: a memetic-based sparse subspace learning algorithm for multi-label classification. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01616-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
13
|
|
14
|
Using Feature Selection with Machine Learning for Generation of Insurance Insights. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12063209] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Insurance is a data-rich sector, hosting large volumes of customer data that is analysed to evaluate risk. Machine learning techniques are increasingly used in the effective management of insurance risk. Insurance datasets by their nature, however, are often of poor quality with noisy subsets of data (or features). Choosing the right features of data is a significant pre-processing step in the creation of machine learning models. The inclusion of irrelevant and redundant features has been demonstrated to affect the performance of learning models. In this article, we propose a framework for improving predictive machine learning techniques in the insurance sector via the selection of relevant features. The experimental results, based on five publicly available real insurance datasets, show the importance of applying feature selection for the removal of noisy features before performing machine learning techniques, to allow the algorithm to focus on influential features. An additional business benefit is the revelation of the most and least important features in the datasets. These insights can prove useful for decision making and strategy development in areas/business problems that are not limited to the direct target of the downstream algorithms. In our experiments, machine learning techniques based on a set of selected features suggested by feature selection algorithms outperformed the full feature set for a set of real insurance datasets. Specifically, 20% and 50% of features in our five datasets had improved downstream clustering and classification performance when compared to whole datasets. This indicates the potential for feature selection in the insurance sector to both improve model performance and to highlight influential features for business insights.
Collapse
|
15
|
Genetic Programming-Based Feature Selection for Emotion Classification Using EEG Signal. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:8362091. [PMID: 35299691 PMCID: PMC8923795 DOI: 10.1155/2022/8362091] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Accepted: 01/25/2022] [Indexed: 11/18/2022]
Abstract
The COVID-19 has resulted in one of the world's most significant worldwide lock-downs, affecting human mental health. Therefore, emotion recognition is becoming one of the essential research areas among various world researchers. Treatment that is efficacious and diagnosed early for negative emotions is the only way to save people from mental health problems. Genetic programming, a very important research area of artificial intelligence, proves its potential in almost every field. Therefore, in this study, a genetic program-based feature selection (FSGP) technique is proposed. A fourteen-channel EEG device gives 70 features for the input brain signal; with the help of GP, all the irrelevant and redundant features are separated, and 32 relevant features are selected. The proposed model achieves a classification accuracy of 85% that outmatches other prior works.
Collapse
|
16
|
Evaluation of Feature Selection Methods on Psychosocial Education Data Using Additive Ratio Assessment. ELECTRONICS 2021. [DOI: 10.3390/electronics11010114] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Artificial intelligence, particularly machine learning, is the fastest-growing research trend in educational fields. Machine learning shows an impressive performance in many prediction models, including psychosocial education. The capability of machine learning to discover hidden patterns in large datasets encourages researchers to invent data with high-dimensional features. In contrast, not all features are needed by machine learning, and in many cases, high-dimensional features decrease the performance of machine learning. The feature selection method is one of the appropriate approaches to reducing the features to ensure machine learning works efficiently. Various selection methods have been proposed, but research to determine the essential subset feature in psychosocial education has not been established thus far. This research investigated and proposed methods to determine the best feature selection method in the domain of psychosocial education. We used a multi-criteria decision system (MCDM) approach with Additive Ratio Assessment (ARAS) to rank seven feature selection methods. The proposed model evaluated the best feature selection method using nine criteria from the performance metrics provided by machine learning. The experimental results showed that the ARAS is promising for evaluating and recommending the best feature selection method for psychosocial education data using the teacher’s psychosocial risk levels dataset.
Collapse
|
17
|
Hashemi A, Bagher Dowlatshahi M, Nezamabadi-pour H. An efficient Pareto-based feature selection algorithm for multi-label classification. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.09.052] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|