1
|
Houssein EH, Samee NA, Mahmoud NF, Hussain K. Dynamic Coati Optimization Algorithm for Biomedical Classification Tasks. Comput Biol Med 2023; 164:107237. [PMID: 37467535 DOI: 10.1016/j.compbiomed.2023.107237] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/13/2023] [Accepted: 07/07/2023] [Indexed: 07/21/2023]
Abstract
Medical datasets are primarily made up of numerous pointless and redundant elements in a collection of patient records. None of these characteristics are necessary for a medical decision-making process. Conversely, a large amount of data leads to increased dimensionality and decreased classifier performance in terms of machine learning. Numerous approaches have recently been put out to address this issue, and the results indicate that feature selection can be a successful remedy. To meet the various needs of input patterns, medical diagnostic tasks typically involve learning a suitable categorization model. The k-Nearest Neighbors algorithm (kNN) classifier's classification performance is typically decreased by the input variables' abundance of irrelevant features. To simplify the kNN classifier, essential attributes of the input variables have been searched using the feature selection approach. This paper presents the Coati Optimization Algorithm (DCOA) in a dynamic form as a feature selection technique where each iteration of the optimization process involves the introduction of a different feature. We enhance the exploration and exploitation capability of DCOA by employing dynamic opposing candidate solutions. The most impressive feature of DCOA is that it does not require any preparatory parameter fine-tuning to the most popular metaheuristic algorithms. The CEC'22 test suite and nine medical datasets with various dimension sizes were used to evaluate the performance of the original COA and the proposed dynamic version. The statistical results were validated using the Bonferroni-Dunn test and Kendall's W test and showed the superiority of DCOA over seven well-known metaheuristic algorithms with an overall accuracy of 89.7%, a feature selection of 24%, a sensitivity of 93.35% a specificity of 96.81%, and a precision of 93.90%.
Collapse
Affiliation(s)
- Essam H Houssein
- Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Nagwan Abdel Samee
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia.
| | - Noha F Mahmoud
- Rehabilitation Sciences Department, Health and Rehabilitation Sciences College, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia.
| | - Kashif Hussain
- Department of Science and Engineering, Solent University, East Park Terrace, Southampton, SO14 0YN, United Kingdom.
| |
Collapse
|
2
|
Kaur S, Kumar Y, Koul A, Kumar Kamboj S. A Systematic Review on Metaheuristic Optimization Techniques for Feature Selections in Disease Diagnosis: Open Issues and Challenges. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2022; 30:1863-1895. [PMID: 36465712 PMCID: PMC9702927 DOI: 10.1007/s11831-022-09853-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Accepted: 11/15/2022] [Indexed: 06/17/2023]
Abstract
There is a need for some techniques to solve various problems in today's computing world. Metaheuristic algorithms are one of the techniques which are capable of providing practical solutions to such issues. Due to their efficiency, metaheuristic algorithms are now used in healthcare data to diagnose diseases practically and with better results than traditional methods. In this study, an efficient search has been performed where 173 papers from different research databases such as Scopus, Web of Science, PubMed, PsycINFO, and others have been considered impactful in diagnosing the diseases using metaheuristic techniques. Ten metaheuristic techniques have been studied, which include spider monkey, shuffled frog leaping algorithm, cuckoo search algorithm, ant lion technique of optimization, lion optimization technique, moth flame technique, bat-inspired algorithm, grey wolf algorithm, whale optimization, and dragonfly technique of optimization for selecting and optimizing the features to predict heart disease, Alzheimer's disease, brain disorder, diabetes, chronic disease features, liver disease, covid-19, etc. Besides, the framework has also been shown to provide information on various phases behind the execution of metaheuristic techniques to predict diseases. The study's primary goal is to present the contribution of the researchers by demonstrating their methodology to predict diseases using the metaheuristic techniques mentioned above. Later, their work has also been compared and evaluated using accuracy, precision, F1 score, error rate, sensitivity, specificity, an area under a curve, etc., to help the researchers to choose the right field and methods for predicting the diseases in the future.
Collapse
Affiliation(s)
- Sukhpreet Kaur
- Department of Computer Science and Engineering, CGC Landran, Mohali, India
| | - Yogesh Kumar
- Department of Computer Science and Engineering, School of Technology, Pandit Deendayal Energy University, Gandhinagar, Gujarat India
| | - Apeksha Koul
- Department of Computer Science and Engineering, Punjabi University, Patiala, India
| | | |
Collapse
|
3
|
Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07780-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
4
|
Alyasseri ZAA, Alomari OA, Al-Betar MA, Makhadmeh SN, Doush IA, Awadallah MA, Abasi AK, Elnagar A. Recent advances of bat-inspired algorithm, its versions and applications. Neural Comput Appl 2022; 34:16387-16422. [PMID: 35971379 PMCID: PMC9366842 DOI: 10.1007/s00521-022-07662-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 07/18/2022] [Indexed: 11/25/2022]
Abstract
Bat-inspired algorithm (BA) is a robust swarm intelligence algorithm that finds success in many problem domains. The ecosystem of bat animals inspires the main idea of BA. This review paper scanned and analysed the state-of-the-art researches investigated using BA from 2017 to 2021. BA has very impressive characteristics such as its easy-to-use, simple in concepts, flexible and adaptable, consistent, and sound and complete. It has strong operators that incorporate the natural selection principle through survival-of-the-fittest rule within the intensification step attracted by local-best solution. Initially, the growth of the recent solid works published in Scopus indexed articles is summarized in terms of the number of BA-based Journal articles published per year, citations, top authors, work with BA, top institutions, and top countries. After that, the different versions of BA are highlighted to be in line with the complex nature of optimization problems such as binary, modified, hybridized, and multiobjective BA. The successful applications of BA are reviewed and summarized, such as electrical and power system, wireless and network system, environment and materials engineering, classification and clustering, structural and mechanical engineering, feature selection, image and signal processing, robotics, medical and healthcare, scheduling domain, and many others. The critical analysis of the limitations and shortcomings of BA is also mentioned. The open-source codes of BA code are given to build a wealthy BA review. Finally, the BA review is concluded, and the possible future directions for upcoming developments are suggested such as utilizing BA to serve in dynamic, robust, multiobjective, large-scaled optimization as well as improve BA performance by utilizing structure population, tuning parameters, memetic strategy, and selection mechanisms. The reader of this review will determine the best domains and applications used by BA and can justify their BA-related contributions.
Collapse
Affiliation(s)
- Zaid Abdi Alkareem Alyasseri
- ECE Department, Faculty of Engineering, University of Kufa, P.O. Box 21, Najaf, Iraq
- College of Engineering, University of Warith Al-Anbiyaa, Karbala, Iraq
- Information Research and Development Center (ITRDC), University of Kufa, Najaf, Iraq
| | | | - Mohammed Azmi Al-Betar
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates
- Department of Information Technology, Al-Huson University College, Al-Balqa Applied University, Irbid, Jordan
| | - Sharif Naser Makhadmeh
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates
| | - Iyad Abu Doush
- Department of Computing, College of Engineering and Applied Sciences, American University of Kuwait, Salmiya, Kuwait
- Computer Science Department, Yarmouk University, Irbid, Jordan
| | - Mohammed A. Awadallah
- Department of Computer Science, Al-Aqsa University, P.O. Box 4051, Gaza, Palestine
- Artificial Intelligence Research Center (AIRC), Ajman University, Ajman, United Arab Emirates
| | - Ammar Kamal Abasi
- Machine Learning Department, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, United Arab Emirates
| | - Ashraf Elnagar
- Department of Computer Science, University of Sharjah, Sharjah, United Arab Emirates
| |
Collapse
|
5
|
Azadifar S, Rostami M, Berahmand K, Moradi P, Oussalah M. Graph-based relevancy-redundancy gene selection method for cancer diagnosis. Comput Biol Med 2022; 147:105766. [PMID: 35779479 DOI: 10.1016/j.compbiomed.2022.105766] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Revised: 06/12/2022] [Accepted: 06/18/2022] [Indexed: 11/26/2022]
Abstract
Nowadays, microarray data processing is one of the most important applications in molecular biology for cancer diagnosis. A major task in microarray data processing is gene selection, which aims to find a subset of genes with the least inner similarity and most relevant to the target class. Removing unnecessary, redundant, or noisy data reduces the data dimensionality. This research advocates a graph theoretic-based gene selection method for cancer diagnosis. Both unsupervised and supervised modes use well-known and successful social network approaches such as the maximum weighted clique criterion and edge centrality to rank genes. The suggested technique has two goals: (i) to maximize the relevancy of the chosen genes with the target class and (ii) to reduce their inner redundancy. A maximum weighted clique is chosen in a repetitive way in each iteration of this procedure. The appropriate genes are then chosen from among the existing features in this maximum clique using edge centrality and gene relevance. In the experiment, several datasets consisting of Colon, Leukemia, SRBCT, Prostate Tumor, and Lung Cancer, with different properties, are used to demonstrate the efficacy of the developed model. Our performance is compared to that of renowned filter-based gene selection approaches for cancer diagnosis whose results demonstrate a clear superiority.
Collapse
Affiliation(s)
- Saeid Azadifar
- Department of Computer Engineering, University of Khajeh Nasir Toosi, Tehran, Iran
| | - Mehrdad Rostami
- Centre for Machine Vision and Signal Processing, University of Oulu, Oulu, Finland.
| | - Kamal Berahmand
- School of Computer Science, Faculty of Science, Queensland University of Technology (QUT), Brisbane, Australia
| | - Parham Moradi
- Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran
| | - Mourad Oussalah
- Centre for Machine Vision and Signal Processing, University of Oulu, Oulu, Finland; Research Unit of Medical Imaging, Physics, and Technology, Faculty of Medicine, University of Oulu, Finland
| |
Collapse
|
6
|
An enhanced binary Rat Swarm Optimizer based on local-best concepts of PSO and collaborative crossover operators for feature selection. Comput Biol Med 2022; 147:105675. [PMID: 35687926 DOI: 10.1016/j.compbiomed.2022.105675] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 05/24/2022] [Accepted: 05/26/2022] [Indexed: 11/22/2022]
Abstract
In this paper, an enhanced binary version of the Rat Swarm Optimizer (RSO) is proposed to deal with Feature Selection (FS) problems. FS is an important data reduction step in data mining which finds the most representative features from the entire data. Many FS-based swarm intelligence algorithms have been used to tackle FS. However, the door is still open for further investigations since no FS method gives cutting-edge results for all cases. In this paper, a recent swarm intelligence metaheuristic method called RSO which is inspired by the social and hunting behavior of a group of rats is enhanced and explored for FS problems. The binary enhanced RSO is built based on three successive modifications: i) an S-shape transfer function is used to develop binary RSO algorithms; ii) the local search paradigm of particle swarm optimization is used with the iterative loop of RSO to boost its local exploitation; iii) three crossover mechanisms are used and controlled by a switch probability to improve the diversity. Based on these enhancements, three versions of RSO are produced, referred to as Binary RSO (BRSO), Binary Enhanced RSO (BERSO), and Binary Enhanced RSO with Crossover operators (BERSOC). To assess the performance of these versions, a benchmark of 24 datasets from various domains is used. The proposed methods are assessed concerning the fitness value, number of selected features, classification accuracy, specificity, sensitivity, and computational time. The best performance is achieved by BERSOC followed by BERSO and then BRSO. These proposed versions are comparatively assessed against 25 well-regarded metaheuristic methods and five filter-based approaches. The obtained results underline their superiority by producing new best results for some datasets.
Collapse
|
7
|
Alrefai N, Ibrahim O. Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07147-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
8
|
Issa M, Helmi AM, Elsheikh AH, Abd Elaziz M. A biological sub-sequences detection using integrated BA-PSO based on infection propagation mechanism: Case study COVID-19. EXPERT SYSTEMS WITH APPLICATIONS 2022; 189:116063. [PMID: 34690450 PMCID: PMC8527645 DOI: 10.1016/j.eswa.2021.116063] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 10/09/2021] [Accepted: 10/09/2021] [Indexed: 05/11/2023]
Abstract
The longest common consecutive subsequences (LCCS) play a vital role in revealing the biological relationships between DNA/RNA sequences especially the newly discovered ones such as COVID-19. FLAT is a Fragmented local aligner technique which is an accelerated version of the local pairwise sequence alignment algorithm based on meta-heuristic algorithms. The performance of FLAT needs to be enhanced since the huge length of biological sequences leads to trapping in local optima. This paper introduces a modified version of FLAT based on improving the performance of the BA algorithm by integration with particle swarm optimization (PSO) algorithm based on a novel infection mechanism. The proposed algorithm, named BPINF, depends on finding the best-explored solution using BA operators which can infect the agents during the exploitation phase using PSO operators to move toward it instead of moving toward the best-exploited solution. Hence, moving the solutions toward the two best solutions increase the diversity of generated solutions and avoids trapping in local optima. The infection can be propagated through the agents where each infected agent can transfer the infection to other non-infected agents which enhances the diversification of generated solutions. FLAT using the proposed technique (BPINF) was validated to detect LCCS between a set of real biological sequences with huge lengths besides COVID-19 and other well-known viruses. The performance of BPINF was compared to the enhanced versions of BA in the literature and the relevant studies of FLAT. It has a preponderance to find the LCCS with the highest percentage (88%) which is better than other state-of-the-art methods.
Collapse
Affiliation(s)
- Mohamed Issa
- Computer and Systems Department, Faculty of Engineering, Zagazig University, Zagazig 44519, Egypt
| | - Ahmed M Helmi
- Computer and Systems Department, Faculty of Engineering, Zagazig University, Zagazig 44519, Egypt
- Engineering and Information Technology College, Buraydah Private Colleges, Buraydah 51418, Saudi Arabia
| | - Ammar H Elsheikh
- Department of Production Engineering and Mechanical Design, Tanta University, Tanta 31527, Egypt
| | - Mohamed Abd Elaziz
- Department of Mathematics, Faculty of Science, Zagazig University, Zagazig 44519, Egypt
- Artificial Intelligence Research Center (AIRC), Ajman University, Ajman 346, United Arab Emirates
- Faculty of Computer Science & Engineering, Galala University, Egypt
| |
Collapse
|
9
|
Abdulwahab HM, Ajitha S, Saif MAN. Feature selection techniques in the context of big data: taxonomy and analysis. APPL INTELL 2022. [DOI: 10.1007/s10489-021-03118-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
10
|
Pashaei E, Pashaei E. An efficient binary chimp optimization algorithm for feature selection in biomedical data classification. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06775-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
11
|
Al-Dyani WZ, Ahmad FK, Kamaruddin SS. Improvements of bat algorithm for optimal feature selection: A systematic literature review. INTELL DATA ANAL 2022. [DOI: 10.3233/ida-205455] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Bat Algorithm (BA) has been extensively applied as an optimal Feature Selection (FS) technique for solving a wide variety of optimization problems due to its impressive characteristics compared to other swarm intelligence methods. Nevertheless, BA still suffers from several problems such as poor exploration search, falling into local optima, and has many parameters that need to be controlled appropriately. Consequently, many researchers have proposed different techniques to handle such problems. However, there is a lack of systematic review on BA which could shed light on its variants. In the literature, several review papers have been reported, however, such studies were neither systematic nor comprehensive enough. Most studies did not report specifically which components of BA was modified. The range of improvements made to the BA varies, which often difficult for any enhancement to be accomplished if not properly addressed. Given such limitations, this study aims to review and analyse the recent variants of latest improvements in BA for optimal feature selection. The study has employed a standard systematic literature review method on four scientific databases namely, IEEE Xplore, ACM, Springer, and Science Direct. As a result, 147 research publications over the last ten years have been collected, investigated, and summarized. Several critical and significant findings based on the literature reviewed were reported in this paper which can be used as a guideline for the scientists in the future to do further research.
Collapse
Affiliation(s)
- Wafa Zubair Al-Dyani
- School of Computing, College of Arts and Science, Universiti Utara Malaysia, Sintok Kedah, Malaysia
- Department of Computer Science, College of Computing and Information Technology, Hadhramout University, Hadhramout, Yemen
| | - Farzana Kabir Ahmad
- School of Computing, College of Arts and Science, Universiti Utara Malaysia, Sintok Kedah, Malaysia
| | - Siti Sakira Kamaruddin
- School of Computing, College of Arts and Science, Universiti Utara Malaysia, Sintok Kedah, Malaysia
| |
Collapse
|
12
|
Cooperative Evolution of China's Excellent Innovative Research Groups from the Perspective of Innovation Ecosystem: Taking an "Environmental Biogeochemistry" Research Innovation Group as a Case Study. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph182312584. [PMID: 34886310 PMCID: PMC8656764 DOI: 10.3390/ijerph182312584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Revised: 11/26/2021] [Accepted: 11/27/2021] [Indexed: 11/17/2022]
Abstract
Research, understanding, and prediction of complex systems is an important starting point for human beings to tackle major problems and emergencies such as global warming and COVID-19. Research on innovation ecosystem is an important part of research on complex systems. With the rapid development of sophisticated industries, the rise of innovative countries, and the newly developed innovation theory, innovation ecosystem has become a new explanation and new paradigm for adapting to today's global innovation cooperation network and the scientific development of complex systems, which is also in line with China's concept of building an innovative country and promoting comprehensive innovation and international cooperation with scientific and technological innovation as the core. The Innovative Research Group at Peking University is the most representative scientific and technological innovation team in the frontier field of basic research in China. The characteristics of its organization mechanism and dynamic evolution connotation are consistent with the characteristics and evolution of innovation ecosystem. An excellent innovative research group is regarded as a small innovation ecosystem. We selected the "Environmental Biogeochemistry" Innovation Research Group at Peking University as a typical case in order to understand and analyze the evolution of cooperation among scientific and technological innovation teams, improve the healthy development as well as internal and external governance of this special small innovation ecosystem, promote the expansion of an innovation team cooperation network and the improvement of cooperation quality, promote the linkage supports of funding and management departments, and improve their scientific and technological governance abilities. Through scientometrics, visual analysis of knowledge maps, and an exploratory case study, we study the evolution process and development law of team cooperation. It is found that the main node authors of the cooperation network maintain strong cooperation frequency and centrality, and gradually strengthen with the expansion of the cooperation network and the evolution of time. Driven by the internal cooperative governance of the team and the external governance of the funding and management departments, this group has gradually formed a healthy, orderly, open, and cooperative special innovation ecosystem, which is conducive to the stability and sustainable development of the national innovation ecosystem and the global innovation ecosystem.
Collapse
|
13
|
A Scoping Review of Ontologies Relevant to Design Strategies in Response to the UN Sustainable Development Goals (SDGs). SUSTAINABILITY 2021. [DOI: 10.3390/su131810012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Since the initiation of the 2030 Agenda for Sustainable Development in 2015, academia and industry have been taking action to seek how to address the Sustainable Development Goals (SDGs) via research, practice, and community engagement. Due to the UN SDGs comprising comprehensive domain-centric ontologies for reaching a consensus on their achievement, so far there has been a literature gap on how and what product design strategies can help achieve which of the SDGs. Inspired by the implication of creating a better world with design, this study conducted a scoping review to synthesize existing design strategies toward the implementation of the SDGs. More than 110 design strategies/methods were collected and synthesized as evidence to map onto the ontological domains of the SDGs. The results indicate that Goals 8, 9, 11, and 12 can be correspondingly addressed by the current body of design strategies, whereas a gap exists in the design strategies to address Goals 15, 16, and 17. Most of the corresponding strategies can be workable to Goals 3, 4, 6, and 7 to a certain extent and, in a broad sense, are in line with the contextual implications of Goals 1, 2, 5, 10, 13, and 14. This study provides a useful starting point for researchers to explore how design has been contributing to the sustainability goals. It also contributes to existing knowledge of the design discipline by providing methodological guidance for researchers and practitioners to conduct further research and practice on the UN SDGs.
Collapse
|
14
|
Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107034] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
15
|
Pashaei E, Pashaei E. Gene selection using hybrid dragonfly black hole algorithm: A case study on RNA-seq COVID-19 data. Anal Biochem 2021; 627:114242. [PMID: 33974890 DOI: 10.1016/j.ab.2021.114242] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 04/12/2021] [Accepted: 05/02/2021] [Indexed: 11/18/2022]
Abstract
This paper introduces a new hybrid approach (DBH) for solving gene selection problem that incorporates the strengths of two existing metaheuristics: binary dragonfly algorithm (BDF) and binary black hole algorithm (BBHA). This hybridization aims to identify a limited and stable set of discriminative genes without sacrificing classification accuracy, whereas most current methods have encountered challenges in extracting disease-related information from a vast amount of redundant genes. The proposed approach first applies the minimum redundancy maximum relevancy (MRMR) filter method to reduce the dimensionality of feature space and then utilizes the suggested hybrid DBH algorithm to determine a smaller set of significant genes. The proposed approach was evaluated on eight benchmark gene expression datasets, and then, was compared against the latest state-of-art techniques to demonstrate algorithm efficiency. The comparative study shows that the proposed approach achieves a significant improvement as compared with existing methods in terms of classification accuracy and the number of selected genes. Moreover, the performance of the suggested method was examined on real RNA-Seq coronavirus-related gene expression data of asthmatic patients for selecting the most significant genes in order to improve the discriminative accuracy of angiotensin-converting enzyme 2 (ACE2). ACE2, as a coronavirus receptor, is a biomarker that helps to classify infected patients from uninfected in order to identify subgroups at risk for COVID-19. The result denotes that the suggested MRMR-DBH approach represents a very promising framework for finding a new combination of most discriminative genes with high classification accuracy.
Collapse
Affiliation(s)
- Elnaz Pashaei
- Department of Software Engineering, Istanbul Aydin University, Istanbul, Turkey.
| | - Elham Pashaei
- Department of Computer Engineering, Istanbul Gelisim University, Istanbul, Turkey.
| |
Collapse
|
16
|
Lai CM, Huang HP. A gene selection algorithm using simplified swarm optimization with multi-filter ensemble technique. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.106994] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
17
|
Abiodun EO, Alabdulatif A, Abiodun OI, Alawida M, Alabdulatif A, Alkhawaldeh RS. A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities. Neural Comput Appl 2021; 33:15091-15118. [PMID: 34404964 PMCID: PMC8361413 DOI: 10.1007/s00521-021-06406-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 07/31/2021] [Indexed: 02/07/2023]
Abstract
Specialized data preparation techniques, ranging from data cleaning, outlier detection, missing value imputation, feature selection (FS), amongst others, are procedures required to get the most out of data and, consequently, get the optimal performance of predictive models for classification tasks. FS is a vital and indispensable technique that enables the model to perform faster, eliminate noisy data, remove redundancy, reduce overfitting, improve precision and increase generalization on testing data. While conventional FS techniques have been leveraged for classification tasks in the past few decades, they fail to optimally reduce the high dimensionality of the feature space of texts, thus breeding inefficient predictive models. Emerging technologies such as the metaheuristics and hyper-heuristics optimization methods provide a new paradigm for FS due to their efficiency in improving the accuracy of classification, computational demands, storage, as well as functioning seamlessly in solving complex optimization problems with less time. However, little details are known on best practices for case-to-case usage of emerging FS methods. The literature continues to be engulfed with clear and unclear findings in leveraging effective methods, which, if not performed accurately, alters precision, real-world-use feasibility, and the predictive model's overall performance. This paper reviews the present state of FS with respect to metaheuristics and hyper-heuristic methods. Through a systematic literature review of over 200 articles, we set out the most recent findings and trends to enlighten analysts, practitioners and researchers in the field of data analytics seeking clarity in understanding and implementing effective FS optimization methods for improved text classification tasks.
Collapse
Affiliation(s)
- Esther Omolara Abiodun
- School of Computer Sciences, Universiti Sains Malaysia, George Town, Malaysia ,Department of Computer Sciences, University of Abuja, Abuja, Nigeria
| | - Abdulatif Alabdulatif
- Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
| | - Oludare Isaac Abiodun
- School of Computer Sciences, Universiti Sains Malaysia, George Town, Malaysia ,Department of Computer Sciences, University of Abuja, Abuja, Nigeria
| | - Moatsum Alawida
- School of Computer Sciences, Universiti Sains Malaysia, George Town, Malaysia ,Department of Computer Sciences, Abu Dhabi University, Abu Dhabi, UAE
| | - Abdullah Alabdulatif
- Computer Department, College of Sciences and Arts, Qassim University, P.O. Box 53, Al-Rass, Saudi Arabia
| | - Rami S. Alkhawaldeh
- Department of Computer Information Systems, The University of Jordan, Aqaba, 77110 Jordan
| |
Collapse
|
18
|
Abu Khurmaa R, Aljarah I, Sharieh A. An intelligent feature selection approach based on moth flame optimization for medical diagnosis. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05483-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
19
|
World competitive contest-based artificial neural network: A new class-specific method for classification of clinical and biological datasets. Genomics 2020; 113:541-552. [PMID: 32991962 PMCID: PMC7521912 DOI: 10.1016/j.ygeno.2020.09.047] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 09/05/2020] [Accepted: 09/22/2020] [Indexed: 12/26/2022]
Abstract
Many data mining methods have been proposed to generate computer-aided diagnostic systems, which may determine diseases in their early stages by categorizing the data into some proper classes. Considering the importance of the existence of a suitable classifier, the present study aims to introduce an efficient approach based on the World Competitive Contests (WCC) algorithm as well as a multi-layer perceptron artificial neural network (ANN). Unlike the previously introduced methods, which each has developed a universal model for all different kinds of data classes, our proposed approach generates a single specific model for each individual class of data. The experimental results show that the proposed method (ANNWCC), which can be applied to both the balanced and unbalanced datasets, yields more than 76% (without applying feature selection methods) and 90% (with applying feature selection methods) of the average five-fold cross-validation accuracy on the 13 clinical and biological datasets. The findings also indicate that under different conditions, our proposed method can produce better results in comparison to some state-of-art meta-heuristic algorithms and methods in terms of various statistical and classification measurements. To classify the clinical and biological data, a multi-layer ANN and the WCC algorithm were combined. It was shown that developing a specific model for each individual class of data may yield better results compared with creating a universal model for all of the existing data classes. Besides, some efficient algorithms proved to be essential to generate acceptable biological results, and the methods' performance was found to be enhanced by fuzzifying or normalizing the biological data. We combined multi-layer artificial neural networks and world competitive contests algorithms to classify biological datasets The proposed method has been investigated on 13 clinical datasets with different properties Efficient models may yield better classification models and health diagnostic systems Feature selection methods can improve the performance of a model in separating case and control samples
Collapse
|
20
|
Integration of multi-objective PSO based feature selection and node centrality for medical datasets. Genomics 2020; 112:4370-4384. [PMID: 32717320 DOI: 10.1016/j.ygeno.2020.07.027] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 06/22/2020] [Accepted: 07/14/2020] [Indexed: 01/19/2023]
Abstract
In the past decades, the rapid growth of computer and database technologies has led to the rapid growth of large-scale medical datasets. On the other, medical applications with high dimensional datasets that require high speed and accuracy are rapidly increasing. One of the dimensionality reduction approaches is feature selection that can increase the accuracy of the disease diagnosis and reduce its computational complexity. In this paper, a novel PSO-based multi objective feature selection method is proposed. The proposed method consists of three main phases. In the first phase, the original features are showed as a graph representation model. In the next phase, feature centralities for all nodes in the graph are calculated, and finally, in the third phase, an improved PSO-based search process is utilized to final feature selection. The results on five medical datasets indicate that the proposed method improves previous related methods in terms of efficiency and effectiveness.
Collapse
|
21
|
Shukla AK. Feature selection inspired by human intelligence for improving classification accuracy of cancer types. Comput Intell 2020. [DOI: 10.1111/coin.12341] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Alok Kumar Shukla
- Department of Computer Science & EngineeringG.L. Bajaj Institute of Technology and Management Gr. Noida India
| |
Collapse
|
22
|
Abasi AK, Khader AT, Al-Betar MA, Naim S, Alyasseri ZAA, Makhadmeh SN. A novel hybrid multi-verse optimizer with K-means for text documents clustering. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-04945-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|