1
|
Ikotun AM, Habyarimana F, Ezugwu AE. Cluster validity indices for automatic clustering: A comprehensive review. Heliyon 2025; 11:e41953. [PMID: 39897868 PMCID: PMC11787482 DOI: 10.1016/j.heliyon.2025.e41953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 01/08/2025] [Accepted: 01/13/2025] [Indexed: 02/04/2025] Open
Abstract
The Cluster Validity Index is an integral part of clustering algorithms. It evaluates inter-cluster separation and intra-cluster cohesion of candidate clusters to determine the quality of potential solutions. Several cluster validity indices have been suggested for both classical clustering algorithms and automatic metaheuristic-based clustering algorithms. Different cluster validity indices exhibit different characteristics based on the mathematical models they employ in determining the values for the various cluster attributes. Metaheuristic-based automatic clustering algorithms use cluster validity index as a fitness function in its optimization procedure to evaluate the candidate cluster solution's quality. A systematic review of the cluster validity indices used as fitness functions in metaheuristic-based automatic clustering algorithms is presented in this study. Identifying, reporting, and analysing various cluster validity indices is important in classifying the best CVIs for optimum performance of a metaheuristic-based automatic clustering algorithm. This review also includes an experimental study on the performance of some common cluster validity indices on some synthetic datasets with varied characteristics as well as real-life datasets using the SOSK-means automatic clustering algorithm. This review aims to assist researchers in identifying and selecting the most suitable cluster validity indices (CVIs) for their specific application areas.
Collapse
Affiliation(s)
- Abiodun M. Ikotun
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Faustin Habyarimana
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Absalom E. Ezugwu
- Unit for Data Science and Computing, North-West University, 11 Hoffman Street, Potchefstroom, 2520, North-West, South Africa
| |
Collapse
|
2
|
Fu Z, Li Z, Li Y, Chen H. MICFOA: A Novel Improved Catch Fish Optimization Algorithm with Multi-Strategy for Solving Global Problems. Biomimetics (Basel) 2024; 9:509. [PMID: 39329532 PMCID: PMC11430388 DOI: 10.3390/biomimetics9090509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 08/18/2024] [Accepted: 08/19/2024] [Indexed: 09/28/2024] Open
Abstract
Catch fish optimization algorithm (CFOA) is a newly proposed meta-heuristic algorithm based on human behaviors. CFOA shows better performance on multiple test functions and clustering problems. However, CFOA shows poor performance in some cases, and there is still room for improvement in convergence accuracy, getting rid of local traps, and so on. To further enhance the performance of CFOA, a multi-strategy improved catch fish optimization algorithm (MICFOA) is proposed in this paper. In the exploration phase, we propose a Lévy-based differential independent search strategy to enhance the global search capability of the algorithm while minimizing the impact on the convergence speed. Secondly, in the exploitation phase, a weight-balanced selection mechanism is used to maintain population diversity, enhance the algorithm's ability to get rid of local optima during the search process, and effectively boost the convergence accuracy. Furthermore, the structure of CFOA is also modified in this paper. A fishermen position replacement strategy is added at the end of the algorithm as a way to strengthen the robustness of the algorithm. To evaluate the performance of MICFOA, a comprehensive comparison with nine other metaheuristic algorithms is performed on the 10/30/50/100 dimensions of the CEC 2017 test functions and the 10/20 dimensions of the CEC2022 test functions. Statistical experiments show that MICFOA has more significant dominance in numerical optimization problems, and its overall performance outperforms the CFOA, PEOA, TLBO, COA, ARO, EDO, YDSE, and other state-of-the-art algorithms such as LSHADE, JADE, IDE-EDA, and APSM-jSO.
Collapse
Affiliation(s)
- Zhihao Fu
- School of Electronic Information Engineering, Hankou University, Wuhan 430212, China
| | - Zhichun Li
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong 999077, China
| | - Yongkang Li
- School of Electronic Information Engineering, Hankou University, Wuhan 430212, China
| | - Haoyu Chen
- College of Petroleum Engineering, Xi'an Shiyou University, Xi'an 710065, China
| |
Collapse
|
3
|
Martin-Calle D, Pierre-Louis O. Domain convexification: A simple model for invasion processes. Phys Rev E 2023; 108:044108. [PMID: 37978705 DOI: 10.1103/physreve.108.044108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 09/13/2023] [Indexed: 11/19/2023]
Abstract
We propose an invasion model where domains grow up to their convex hulls and merge when they overlap. This model can be seen as a continuum and isotropic counterpart of bootstrap percolation models. From numerical investigations of the model starting with randomly deposited overlapping disks on a plane, we find an invasion transition that occurs via macroscopic avalanches. The disk concentration threshold and the width of the transition are found to decrease as the system size is increased. Our results are consistent with a vanishing threshold in the limit of infinitely large system sizes. However, this limit could not be investigated by simulations. For finite initial concentrations of disks, the cluster size distribution presents a power-law tail characterized by an exponent that varies approximately linearly with the initial concentration of disks. These results at finite initial concentration open novel directions for the understanding of the transition in systems of finite size. Furthermore, we find that the domain area distribution has oscillations with discontinuities. In addition, the deviation from circularity of large domains is constant. Finally, we compare our results to experimental observations on de-adhesion of graphene induced by the intercalation of nanoparticles.
Collapse
Affiliation(s)
- David Martin-Calle
- Institut Lumière Matière, Université de Lyon, Université Claude Bernard Lyon 1, CNRS UMR5306, Campus de la Doua, F-69622 Villeurbanne, France
| | - Olivier Pierre-Louis
- Institut Lumière Matière, Université de Lyon, Université Claude Bernard Lyon 1, CNRS UMR5306, Campus de la Doua, F-69622 Villeurbanne, France
| |
Collapse
|
4
|
K-means Clustering Algorithms: A Comprehensive Review, Variants Analysis, and Advances in the Era of Big Data. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
5
|
Duan Y, Liu C, Li S, Guo X, Yang C. An automatic affinity propagation clustering based on improved equilibrium optimizer and t-SNE for high-dimensional data. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.12.057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
6
|
Zorarpacı E. Data clustering using leaders and followers optimization and differential evolution. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
7
|
Boosting k-means clustering with symbiotic organisms search for automatic clustering problems. PLoS One 2022; 17:e0272861. [PMID: 35951672 PMCID: PMC9371361 DOI: 10.1371/journal.pone.0272861] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 07/28/2022] [Indexed: 11/19/2022] Open
Abstract
Kmeans clustering algorithm is an iterative unsupervised learning algorithm that tries to partition the given dataset into k pre-defined distinct non-overlapping clusters where each data point belongs to only one group. However, its performance is affected by its sensitivity to the initial cluster centroids with the possibility of convergence into local optimum and specification of cluster number as the input parameter. Recently, the hybridization of metaheuristics algorithms with the K-Means algorithm has been explored to address these problems and effectively improve the algorithm’s performance. Nonetheless, most metaheuristics algorithms require rigorous parameter tunning to achieve an optimum result. This paper proposes a hybrid clustering method that combines the well-known symbiotic organisms search algorithm with K-Means using the SOS as a global search metaheuristic for generating the optimum initial cluster centroids for the K-Means. The SOS algorithm is more of a parameter-free metaheuristic with excellent search quality that only requires initialising a single control parameter. The performance of the proposed algorithm is investigated by comparing it with the classical SOS, classical K-means and other existing hybrids clustering algorithms on eleven (11) UCI Machine Learning Repository datasets and one artificial dataset. The results from the extensive computational experimentation show improved performance of the hybrid SOSK-Means for solving automatic clustering compared to the standard K-Means, symbiotic organisms search clustering methods and other hybrid clustering approaches.
Collapse
|
8
|
Automatic clustering based on dynamic parameters harmony search optimization algorithm. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-022-01065-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
9
|
Jiménez P, Roldán JC, Corchuelo R. On exploring data lakes by finding compact, isolated clusters. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.12.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
10
|
Duan Y, Liu C, Li S, Guo X, Yang C. Gradient-based elephant herding optimization for cluster analysis. APPL INTELL 2022; 52:11606-11637. [PMID: 35106027 PMCID: PMC8795968 DOI: 10.1007/s10489-021-03020-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/15/2021] [Indexed: 11/17/2022]
Abstract
Clustering analysis is essential for obtaining valuable information from a predetermined dataset. However, traditional clustering methods suffer from falling into local optima and an overdependence on the quality of the initial solution. Given these defects, a novel clustering method called gradient-based elephant herding optimization for cluster analysis (GBEHO) is proposed. A well-defined set of heuristics is introduced to select the initial centroids instead of selecting random initial points. Specifically, the elephant optimization algorithm (EHO) is combined with the gradient-based algorithm GBO for assigning initial cluster centers across the search space. Second, to overcome the imbalance between the original EHO exploration and exploitation, the initialized population is improved by introducing Gaussian chaos mapping. In addition, two operators, i.e., random wandering and variation operators, are set to adjust the location update strategy of the agents. Nine datasets from synthetic and real-world datasets are adopted to evaluate the effectiveness of the proposed algorithm and the other metaheuristic algorithms. The results show that the proposed algorithm ranks first among the 10 algorithms. It is also extensively compared with state-of-the-art techniques, and four evaluation criteria of accuracy rate, specificity, detection rate, and F-measure are used. The obtained results clearly indicate the excellent performance of GBEHO, while the stability is also more prominent.
Collapse
|
11
|
Salehan A, Deldari A. Corona virus optimization (CVO): a novel optimization algorithm inspired from the Corona virus pandemic. THE JOURNAL OF SUPERCOMPUTING 2022; 78:5712-5743. [PMID: 34629744 PMCID: PMC8489174 DOI: 10.1007/s11227-021-04100-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 09/20/2021] [Indexed: 05/11/2023]
Abstract
This research introduces a new probabilistic and meta-heuristic optimization approach inspired by the Corona virus pandemic. Corona is an infection that originates from an unknown animal virus, which is of three known types and COVID-19 has been rapidly spreading since late 2019. Based on the SIR model, the virus can easily transmit from one person to several, causing an epidemic over time. Considering the characteristics and behavior of this virus, the current paper presents an optimization algorithm called Corona virus optimization (CVO) which is feasible, effective, and applicable. A set of benchmark functions evaluates the performance of this algorithm for discrete and continuous problems by comparing the results with those of other well-known optimization algorithms. The CVO algorithm aims to find suitable solutions to application problems by solving several continuous mathematical functions as well as three continuous and discrete applications. Experimental results denote that the proposed optimization method has a credible, reasonable, and acceptable performance.
Collapse
Affiliation(s)
- Alireza Salehan
- Department of Computer Engineering, University of Torbat Heydarieh, Torbat Heydarieh, Iran
| | - Arash Deldari
- Department of Computer Engineering, University of Torbat Heydarieh, Torbat Heydarieh, Iran
| |
Collapse
|
12
|
K-Means-Based Nature-Inspired Metaheuristic Algorithms for Automatic Data Clustering Problems: Recent Advances and Future Directions. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app112311246] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
K-means clustering algorithm is a partitional clustering algorithm that has been used widely in many applications for traditional clustering due to its simplicity and low computational complexity. This clustering technique depends on the user specification of the number of clusters generated from the dataset, which affects the clustering results. Moreover, random initialization of cluster centers results in its local minimal convergence. Automatic clustering is a recent approach to clustering where the specification of cluster number is not required. In automatic clustering, natural clusters existing in datasets are identified without any background information of the data objects. Nature-inspired metaheuristic optimization algorithms have been deployed in recent times to overcome the challenges of the traditional clustering algorithm in handling automatic data clustering. Some nature-inspired metaheuristics algorithms have been hybridized with the traditional K-means algorithm to boost its performance and capability to handle automatic data clustering problems. This study aims to identify, retrieve, summarize, and analyze recently proposed studies related to the improvements of the K-means clustering algorithm with nature-inspired optimization techniques. A quest approach for article selection was adopted, which led to the identification and selection of 147 related studies from different reputable academic avenues and databases. More so, the analysis revealed that although the K-means algorithm has been well researched in the literature, its superiority over several well-established state-of-the-art clustering algorithms in terms of speed, accessibility, simplicity of use, and applicability to solve clustering problems with unlabeled and nonlinearly separable datasets has been clearly observed in the study. The current study also evaluated and discussed some of the well-known weaknesses of the K-means clustering algorithm, for which the existing improvement methods were conceptualized. It is noteworthy to mention that the current systematic review and analysis of existing literature on K-means enhancement approaches presents possible perspectives in the clustering analysis research domain and serves as a comprehensive source of information regarding the K-means algorithm and its variants for the research community.
Collapse
|
13
|
Jiménez P, Roldán JC, Corchuelo R. A clustering approach to extract data from HTML tables. Inf Process Manag 2021. [DOI: 10.1016/j.ipm.2021.102683] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
14
|
Anomaly Detection in Automotive Industry Using Clustering Methods—A Case Study. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11219868] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In automotive industries, pricing anomalies may occur for components of different products, despite their similar physical characteristics, which raises the total production cost of the company. However, detecting such discrepancies is often neglected since it is necessary to find the problems considering the observation of thousands of pieces, which often present inconsistencies when specified by the product engineering team. In this investigation, we propose a solution for a real case study. We use as strategy a set of clustering algorithms to group components by similarity: K-Means, K-Medoids, Fuzzy C-Means (FCM), Hierarchical, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Self-Organizing Maps (SOM), Particle Swarm Optimization (PSO), Genetic Algorithm (GA) and Differential Evolution (DE). We observed that the methods could automatically perform the grouping of parts considering physical characteristics present in the material master data, allowing anomaly detection and identification, which can consequently lead to cost reduction. The computational results indicate that the Hierarchical approach presented the best performance on 1 of 6 evaluation metrics and was the second place on four others indexes, considering the Borda count method. The K-Medoids win for most metrics, but it was the second best positioned due to its bad performance regarding SI-index. By the end, this proposal allowed identify mistakes in the specification and pricing of some items in the company.
Collapse
|
15
|
Chattopadhyay S, Kundu R, Singh PK, Mirjalili S, Sarkar R. Pneumonia detection from lung X‐ray images using local search aided sine cosine algorithm based deep feature selection method. INT J INTELL SYST 2021. [DOI: 10.1002/int.22703] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
| | - Rohit Kundu
- Department of Electrical Engineering Jadavpur University Kolkata India
| | - Pawan Kumar Singh
- Department of Information Technology Jadavpur University Kolkata India
| | - Seyedali Mirjalili
- Centre for Artificial Intelligence Research and Optimization Torrens University Fortitude Valley Queensland Australia
- Yonser Frontier Lab Yonsei University Seoul Korea
| | - Ram Sarkar
- Department of Computer Science and Engineering Jadavpur University Kolkata India
| |
Collapse
|
16
|
Abstract
Models of computation are fundamental notions in computer science; consequently, they have been the subject of countless research papers, with numerous novel models proposed even in recent years. Amongst a multitude of different approaches, many of these methods draw inspiration from the biological processes observed in nature. P systems, or membrane systems, make an analogy between the communication in computing and the flow of information that can be perceived in living organisms. These systems serve as a basis for various concepts, ranging from the fields of computational economics and robotics to the techniques of data clustering. In this paper, such utilization of these systems—membrane system–based clustering—is taken into focus. Considering the growing number of data stored worldwide, more and more data have to be handled by clustering algorithms too. To solve this issue, bringing these methods closer to the data, their main element provides several benefits. Database systems equip their users with, for instance, well-integrated security features and more direct control over the data itself. Our goal is if the type of the database management system is given, e.g., NoSQL, but the corporation or the research team can choose which specific database management system is used, then we give a perspective, how the algorithms written like this behave in such an environment, so that, based on this, a more substantiated decision can be made, meaning which database management system should be connected to the system. For this purpose, we discover the possibilities of a clustering algorithm based on P systems when used alongside NoSQL database systems, that are designed to manage big data. Variants over two competing databases, MongoDB and Redis, are evaluated and compared to identify the advantages and limitations of using such a solution in these systems.
Collapse
|
17
|
José-García A, Handl J, Gómez-Flores W, Garza-Fabre M. An evolutionary many-objective approach to multiview clustering using feature and relational data. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107425] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
18
|
Dey A, Dey S, Bhattacharyya S, Platos J, Snasel V. Quantum inspired meta‐heuristic approaches for automatic clustering of colour images. INT J INTELL SYST 2021. [DOI: 10.1002/int.22494] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Alokananda Dey
- Department of Computer Science and Engineering RCC Institute of Information Technology Kolkata India
| | | | | | - Jan Platos
- Faculty of Electrical Engineering and Computer Science VSB Technical University of Ostrava Ostrava Czech Republic
| | - Vaclav Snasel
- Faculty of Electrical Engineering and Computer Science VSB Technical University of Ostrava Ostrava Czech Republic
| |
Collapse
|
19
|
Chen JX, Gong YJ, Chen WN, Li M, Zhang J. Elastic Differential Evolution for Automatic Data Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4134-4147. [PMID: 31613788 DOI: 10.1109/tcyb.2019.2941707] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In many practical applications, it is crucial to perform automatic data clustering without knowing the number of clusters in advance. The evolutionary computation paradigm is good at dealing with this task, but the existing algorithms encounter several deficiencies, such as the encoding redundancy and the cross-dimension learning error. In this article, we propose a novel elastic differential evolution algorithm to solve automatic data clustering. Unlike traditional methods, the proposed algorithm considers each clustering layout as a whole and adapts the cluster number and cluster centroids inherently through the variable-length encoding and the evolution operators. The encoding scheme contains no redundancy. To enable the individuals of different lengths to exchange information properly, we develop a subspace crossover and a two-phase mutation operator. The operators employ the basic method of differential evolution and, in addition, they consider the spatial information of cluster layouts to generate offspring solutions. Particularly, each dimension of the parameter vector interacts with its correlated dimensions, which not only adapts the cluster number but also avoids the cross-dimension learning error. The experimental results show that our algorithm outperforms the state-of-the-art algorithms that it is able to identify the correct number of clusters and obtain a good cluster validation value.
Collapse
|
20
|
Luo W, Zhu W, Ni L, Qiao Y, Yuan Y. SCA2: Novel Efficient Swarm Clustering Algorithm. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2019.2961190] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
21
|
Mittal H, Pandey AC, Saraswat M, Kumar S, Pal R, Modwel G. A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets. MULTIMEDIA TOOLS AND APPLICATIONS 2021; 81:35001-35026. [PMID: 33584121 PMCID: PMC7870780 DOI: 10.1007/s11042-021-10594-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 01/07/2021] [Accepted: 01/21/2021] [Indexed: 06/12/2023]
Abstract
Image segmentation is an essential phase of computer vision in which useful information is extracted from an image that can range from finding objects while moving across a room to detect abnormalities in a medical image. As image pixels are generally unlabelled, the commonly used approach for the same is clustering. This paper reviews various existing clustering based image segmentation methods. Two main clustering methods have been surveyed, namely hierarchical and partitional based clustering methods. As partitional clustering is computationally better, further study is done in the perspective of methods belonging to this class. Further, literature bifurcates the partitional based clustering methods into three categories, namely K-means based methods, histogram-based methods, and meta-heuristic based methods. The survey of various performance parameters for the quantitative evaluation of segmentation results is also included. Further, the publicly available benchmark datasets for image-segmentation are briefed.
Collapse
Affiliation(s)
- Himanshu Mittal
- Jaypee Institute of Information Technology, Noida, Uttar Pradesh India
| | | | - Mukesh Saraswat
- Jaypee Institute of Information Technology, Noida, Uttar Pradesh India
| | - Sumit Kumar
- Amity University, Noida, Uttar Pradesh India
| | - Raju Pal
- Jaypee Institute of Information Technology, Noida, Uttar Pradesh India
| | - Garv Modwel
- Valeo India Private Limited, Chennai, Tamil Nadu India
| |
Collapse
|
22
|
Jahangoshai Rezaee M, Eshkevari M, Saberi M, Hussain O. GBK-means clustering algorithm: An improvement to the K-means algorithm based on the bargaining game. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106672] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
23
|
Optimal Operation of Unbalanced Microgrid Utilizing Copula-Based Stochastic Simultaneous Unit Commitment and Distribution Feeder Reconfiguration Approach. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-020-04965-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
24
|
Abstract
This paper presents a comprehensive survey of the meta-heuristic optimization algorithms on the text clustering applications and highlights its main procedures. These Artificial Intelligence (AI) algorithms are recognized as promising swarm intelligence methods due to their successful ability to solve machine learning problems, especially text clustering problems. This paper reviews all of the relevant literature on meta-heuristic-based text clustering applications, including many variants, such as basic, modified, hybridized, and multi-objective methods. As well, the main procedures of text clustering and critical discussions are given. Hence, this review reports its advantages and disadvantages and recommends potential future research paths. The main keywords that have been considered in this paper are text, clustering, meta-heuristic, optimization, and algorithm.
Collapse
|
25
|
Ratnakumar R, Nanda SJ. A high speed roller dung beetles clustering algorithm and its architecture for real-time image segmentation. APPL INTELL 2021. [DOI: 10.1007/s10489-020-02067-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
26
|
Siegmund D, Fu B, José-García A, Salahuddin A, Kuijper A. Detection of Fiber Defects Using Keypoints and Deep Learning. INT J PATTERN RECOGN 2020. [DOI: 10.1142/s0218001421500166] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Due to the deforming and dynamically changing textile fibers, the quality assurance of cleaned industrial textiles is still a mostly manual task. Usually, textiles need to be spread flat, in order to detect defects using computer vision inspection methods. Already known methods for detecting defects on such inhomogeneous, voluminous surfaces use mainly supervised methods based on deep neural networks and require lots of labeled training data. In contrast, we present a novel unsupervised method, based on SURF keypoints, that does not require any training data. We propose using their location, number and orientation in order to group them into geographically close clusters. Keypoint clusters also indicate the exact position of the defect at the same time. We furthermore compared our approach to supervised methods using deep learning. The presented processing pipeline shows how normalization and classification methods need to be combined, in order to reliably detect fiber defects such as cuts and holes. We evaluate the performance of our system in real-world settings with images of piles of textiles, taken in stereo vision. Our results show that our novel unsupervised classification method using keypoint clustering achieves comparable results to other supervised methods.
Collapse
Affiliation(s)
- Dirk Siegmund
- Fraunhofer Institute for Computer Graphics Research (IGD), Fraunhoferstrasse 5, 64283 Darmstadt, Germany
| | - Biying Fu
- Fraunhofer Institute for Computer Graphics Research (IGD), Fraunhoferstrasse 5, 64283 Darmstadt, Germany
| | - Adán José-García
- Center for Research and Advanced Studies of the National Polytechnic Institute, Km. 5.5 Carretera Cd. Victoria-Soto la Marina, 87130 Cd. Victoria, Tamaulipas, México
| | - Ahmad Salahuddin
- Fraunhofer Institute for Computer Graphics Research (IGD), Fraunhoferstrasse 5, 64283 Darmstadt, Germany
| | - Arjan Kuijper
- Fraunhofer Institute for Computer Graphics Research (IGD), Fraunhoferstrasse 5, 64283 Darmstadt, Germany
- Technische Universität Darmstadt, Fachbereich Informatik, Hochschulstr. 10, 64289 Darmstadt, Germany
| |
Collapse
|
27
|
García-Vico ÁM, Charte F, González P, Elizondo D, Carmona CJ. E2PAMEA: A fast evolutionary algorithm for extracting fuzzy emerging patterns in big data environments. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.07.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
28
|
Ezugwu AE, Shukla AK, Agbaje MB, Oyelade ON, José-García A, Agushaka JO. Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05395-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
29
|
Zhuang H, Cui J, Liu T, Wang H. A physical model inspired density peak clustering. PLoS One 2020; 15:e0239406. [PMID: 32970727 PMCID: PMC7514087 DOI: 10.1371/journal.pone.0239406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 09/05/2020] [Indexed: 12/02/2022] Open
Abstract
Clustering is an important technology of data mining, which plays a vital role in bioscience, social network and network analysis. As a clustering algorithm based on density and distance, density peak clustering is extensively used to solve practical problems. The algorithm assumes that the clustering center has a larger local density and is farther away from the higher density points. However, the density peak clustering algorithm is highly sensitive to density and distance and cannot accurately identify clusters in a dataset having significant differences in cluster structure. In addition, the density peak clustering algorithm's allocation strategy can easily cause attached allocation errors in data point allocation. To solve these problems, this study proposes a potential-field-diffusion-based density peak clustering. As compared to existing clustering algorithms, the advantages of the potential-field-diffusion-based density peak clustering algorithm is three-fold: 1) The potential field concept is introduced in the proposed algorithm, and a density measure based on the potential field's diffusion is proposed. The cluster center can be accurately selected using this measure. 2) The potential-field-diffusion-based density peak clustering algorithm defines the judgment conditions of similar points and adopts different allocation strategies for dissimilar points to avoid attached errors in data point allocation. 3) This study conducted many experiments on synthetic and real-world datasets. Results demonstrate that the proposed potential-field-diffusion-based density peak clustering algorithm achieves excellent clustering effect and is suitable for complex datasets of different sizes, dimensions, and shapes. Besides, the proposed potential-field-diffusion-based density peak clustering algorithm shows particularly excellent performance on variable density and nonconvex datasets.
Collapse
Affiliation(s)
- Hui Zhuang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Jiancong Cui
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Taoran Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Hong Wang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology, Shandong Normal University, Jinan, China
| |
Collapse
|
30
|
Molina D, Poyatos J, Ser JD, García S, Hussain A, Herrera F. Comprehensive Taxonomies of Nature- and Bio-inspired Optimization: Inspiration Versus Algorithmic Behavior, Critical Analysis Recommendations. Cognit Comput 2020. [DOI: 10.1007/s12559-020-09730-8] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
31
|
Ezugwu AE. Nature-inspired metaheuristic techniques for automatic clustering: a survey and performance study. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-2073-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
32
|
Evolutionary multi-objective automatic clustering enhanced with quality metrics and ensemble strategy. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2019.105018] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
33
|
Ezugwu AES, Agbaje MB, Aljojo N, Els R, Chiroma H, Elaziz MA. A Comparative Performance Study of Hybrid Firefly Algorithms for Automatic Data Clustering. IEEE ACCESS 2020; 8:121089-121118. [DOI: 10.1109/access.2020.3006173] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
34
|
Qilu Z, Zongmin L, Junyu D. Unsupervised representation learning with Laplacian pyramid auto-encoders. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105851] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
35
|
|
36
|
|
37
|
Al-Sahaf H, Bi Y, Chen Q, Lensen A, Mei Y, Sun Y, Tran B, Xue B, Zhang M. A survey on evolutionary machine learning. J R Soc N Z 2019. [DOI: 10.1080/03036758.2019.1609052] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Harith Al-Sahaf
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Ying Bi
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Qi Chen
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Andrew Lensen
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Yi Mei
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Yanan Sun
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Binh Tran
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Bing Xue
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| | - Mengjie Zhang
- School of Engineering and Computer Science, Victoria University of Wellington, Wellington, New Zealand
| |
Collapse
|
38
|
Tran CT, Zhang M, Andreae P, Xue B, Bui LT. Improving performance of classification on incomplete data using feature selection and clustering. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.09.026] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
39
|
Zabihi F, Nasiri B. A Novel History-driven Artificial Bee Colony Algorithm for Data Clustering. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.06.013] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
40
|
Inga E, Campaña M, Hincapié R, Moscoso-Zea O. Optimal Deployment of FiWi Networks Using Heuristic Method for Integration Microgrids with Smart Metering. SENSORS 2018; 18:s18082724. [PMID: 30126233 PMCID: PMC6111294 DOI: 10.3390/s18082724] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Revised: 08/12/2018] [Accepted: 08/15/2018] [Indexed: 11/16/2022]
Abstract
The unpredictable increase in electrical demand affects the quality of the energy throughout the network. A solution to the problem is the increase of distributed generation units, which burn fossil fuels. While this is an immediate solution to the problem, the ecosystem is affected by the emission of CO₂. A promising solution is the integration of Distributed Renewable Energy Sources (DRES) with the conventional electrical system, thus introducing the concept of Smart Microgrids (SMG). These SMGs require a safe, reliable and technically planned two-way communication system. This paper presents a heuristic based on planning capable of providing a bidirectional communication that is near optimal. The model follows the structure of a hybrid Fiber-Wireless (FiWi) network with the purpose of obtaining information of electrical parameters that help us to manage the use of energy by integrating conventional electrical system with SMG. The optimization model is based on clustering techniques, through the construction of balanced conglomerates. The method is used for the development of the clusters along with the Nearest-Neighbor Spanning Tree algorithm (N-NST). Additionally, the Optimal Delay Balancing (ODB) model will be used to minimize the end to end delay of each grouping. In addition, the heuristic observes real design parameters such as: capacity and coverage. Using the Dijkstra algorithm, the routes are built following the shortest path. Therefore, this paper presents a heuristic able to plan the deployment of Smart Meters (SMs) through a tree-like hierarchical topology for the integration of SMG at the lowest cost.
Collapse
Affiliation(s)
- Esteban Inga
- Electrical Engineering, Universidad Politécnica Salesiana, Quito EC170146, Ecuador.
| | - Miguel Campaña
- Electrical Engineering, Universidad Politécnica Salesiana, Quito EC170146, Ecuador.
| | - Roberto Hincapié
- Department of Telecommunications, Universidad Pontificia Bolivariana, Medellín 050031, Colombia.
| | - Oswaldo Moscoso-Zea
- Faculty of Engineering, Universidad Tecnológica Equinoccial, Quito EC170147, Ecuador.
| |
Collapse
|
41
|
Integrating fitness predator optimizer with multi-objective PSO for dynamic partitional clustering. PROGRESS IN ARTIFICIAL INTELLIGENCE 2018. [DOI: 10.1007/s13748-018-0157-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
42
|
|
43
|
Özbakır L, Turna F. Clustering performance comparison of new generation meta-heuristic algorithms. Knowl Based Syst 2017. [DOI: 10.1016/j.knosys.2017.05.023] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
44
|
|
45
|
Automatic data clustering using continuous action-set learning automata and its application in segmentation of images. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2016.12.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
46
|
A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2017; 2016:2647389. [PMID: 28042291 PMCID: PMC5153549 DOI: 10.1155/2016/2647389] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 10/17/2016] [Indexed: 11/18/2022]
Abstract
For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule [Formula: see text] and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result.
Collapse
|