1
|
Ikotun AM, Habyarimana F, Ezugwu AE. Cluster validity indices for automatic clustering: A comprehensive review. Heliyon 2025; 11:e41953. [PMID: 39897868 PMCID: PMC11787482 DOI: 10.1016/j.heliyon.2025.e41953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 01/08/2025] [Accepted: 01/13/2025] [Indexed: 02/04/2025] Open
Abstract
The Cluster Validity Index is an integral part of clustering algorithms. It evaluates inter-cluster separation and intra-cluster cohesion of candidate clusters to determine the quality of potential solutions. Several cluster validity indices have been suggested for both classical clustering algorithms and automatic metaheuristic-based clustering algorithms. Different cluster validity indices exhibit different characteristics based on the mathematical models they employ in determining the values for the various cluster attributes. Metaheuristic-based automatic clustering algorithms use cluster validity index as a fitness function in its optimization procedure to evaluate the candidate cluster solution's quality. A systematic review of the cluster validity indices used as fitness functions in metaheuristic-based automatic clustering algorithms is presented in this study. Identifying, reporting, and analysing various cluster validity indices is important in classifying the best CVIs for optimum performance of a metaheuristic-based automatic clustering algorithm. This review also includes an experimental study on the performance of some common cluster validity indices on some synthetic datasets with varied characteristics as well as real-life datasets using the SOSK-means automatic clustering algorithm. This review aims to assist researchers in identifying and selecting the most suitable cluster validity indices (CVIs) for their specific application areas.
Collapse
Affiliation(s)
- Abiodun M. Ikotun
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Faustin Habyarimana
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Absalom E. Ezugwu
- Unit for Data Science and Computing, North-West University, 11 Hoffman Street, Potchefstroom, 2520, North-West, South Africa
| |
Collapse
|
2
|
Brito da Silva LE, Rayapati N, Wunsch DC. iCVI-ARTMAP: Using Incremental Cluster Validity Indices and Adaptive Resonance Theory Reset Mechanism to Accelerate Validation and Achieve Multiprototype Unsupervised Representations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9757-9770. [PMID: 35353707 DOI: 10.1109/tnnls.2022.3160381] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article presents an adaptive resonance theory predictive mapping (ARTMAP) model, which uses incremental cluster validity indices (iCVIs) to perform unsupervised learning, namely, iCVI-ARTMAP. Incorporating iCVIs to the decision-making and many-to-one mapping capabilities of this adaptive resonance theory (ART)-based model can improve the choices of clusters to which samples are incrementally assigned. These improvements are accomplished by intelligently performing the operations of swapping sample assignments between clusters, splitting and merging clusters, and caching the values of variables when iCVI values need to be recomputed. Using recursive formulations enables iCVI-ARTMAP to considerably reduce the computational burden associated with cluster validity index (CVI)-based offline clustering. In this work, six iCVI-ARTMAP variants were realized via the integration of one information-theoretic and five sum-of-squares-based iCVIs into fuzzy ARTMAP. With proper choice of iCVI, iCVI-ARTMAP either outperformed or performed comparably to three ART-based and four non-ART-based clustering algorithms in experiments using benchmark datasets of different natures. Naturally, the performance of iCVI-ARTMAP is subject to the selected iCVI and its suitability to the data at hand; fortunately, it is a general model in which other iCVIs can be easily embedded.
Collapse
|
3
|
Brito da Silva LE, Rayapati N, Wunsch DC. Incremental Cluster Validity Index-Guided Online Learning for Performance and Robustness to Presentation Order. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6686-6700. [PMID: 36256718 DOI: 10.1109/tnnls.2022.3212345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In streaming data applications, the incoming samples are processed and discarded, and therefore, intelligent decision-making is crucial for the performance of lifelong learning systems. In addition, the order in which the samples arrive may heavily affect the performance of incremental learners. The recently introduced incremental cluster validity indices (iCVIs) provide valuable aid in addressing such class of problems. Their primary use case has been cluster quality monitoring; nonetheless, they have been recently integrated in a streaming clustering method. In this context, the work presented, here, introduces the first adaptive resonance theory (ART)-based model that uses iCVIs for unsupervised and semi-supervised online learning. Moreover, it shows how to use iCVIs to regulate ART vigilance via an iCVI-based match tracking mechanism. The model achieves improved accuracy and robustness to ordering effects by integrating an online iCVI module as module B of a topological ART predictive mapping (TopoARTMAP)-thereby being named iCVI-TopoARTMAP-and using iCVI-driven postprocessing heuristics at the end of each learning step. The online iCVI module provides assignments of input samples to clusters at each iteration in accordance to any of the several iCVIs. The iCVI-TopoARTMAP maintains useful properties shared by the ART predictive mapping (ARTMAP) models, such as stability, immunity to catastrophic forgetting, and the many-to-one mapping capability via the map field module. The performance and robustness to the presentation order of iCVI-TopoARTMAP were evaluated via experiments with synthetic and real-world datasets.
Collapse
|
4
|
Modeling Soil Temperature for Different Days Using Novel Quadruplet Loss-Guided LSTM. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:9016823. [PMID: 35222636 PMCID: PMC8872672 DOI: 10.1155/2022/9016823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Accepted: 01/22/2022] [Indexed: 11/17/2022]
Abstract
Soil temperature (Ts), a key variable in geosciences study, has generated growing interest among researchers. There are many factors affecting the spatiotemporal variation of Ts, which poses immense challenges for the Ts estimation. To enrich processing information on loss function and achieve better performance in estimation, the paper designed a new long short-term memory model using quadruplet loss function as an intelligence tool for data processing (QL-LSTM). The model in this paper combined the traditional squared-error loss function with distance metric learning between the sample features. It can zoom analyze the samples accurately to optimize the estimation accuracy. We applied the meteorological data from Laegern and Fluehli stations at 5, 10, and 15 cm depth on the 1st, 5th, and 15th day separately to verify the performance of the proposed soil temperature estimation model. Meanwhile, this paper inputs the variables into the proposed model including radiation, air temperature, vapor pressure deficit, wind speed, air pressure, and past Ts data. The performance of the model was tested by several error evaluation indices, including root mean square error (RMSE), mean absolute error (MAE), Nash-Sutcliffe model efficiency coefficient (NS), Willmott Index of Agreement (WI), and Legates and McCabe index (LMI). As the test results at different soil depths show, our model generally outperformed the four existing advanced estimation models, namely, backpropagation neural networks, extreme learning machines, support vector regression, and LSTM. Furthermore, as experiments show, the proposed model achieved the best performance at the 15 cm depth of soil on the 1st day at Laegern station, which achieved higher WI (0.998), NS (0.995), and LMI (0.938) values, and got lower RMSE (0.312) and MAE (0.239) values. Consequently, the QL-LSTM model is recommended to estimate daily Ts profiles estimation on the 1st, 5th, and 15th days.
Collapse
|
5
|
Gagolewski M, Bartoszuk M, Cena A. Are cluster validity measures (in) valid? Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.10.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
6
|
Chen JX, Gong YJ, Chen WN, Li M, Zhang J. Elastic Differential Evolution for Automatic Data Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4134-4147. [PMID: 31613788 DOI: 10.1109/tcyb.2019.2941707] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In many practical applications, it is crucial to perform automatic data clustering without knowing the number of clusters in advance. The evolutionary computation paradigm is good at dealing with this task, but the existing algorithms encounter several deficiencies, such as the encoding redundancy and the cross-dimension learning error. In this article, we propose a novel elastic differential evolution algorithm to solve automatic data clustering. Unlike traditional methods, the proposed algorithm considers each clustering layout as a whole and adapts the cluster number and cluster centroids inherently through the variable-length encoding and the evolution operators. The encoding scheme contains no redundancy. To enable the individuals of different lengths to exchange information properly, we develop a subspace crossover and a two-phase mutation operator. The operators employ the basic method of differential evolution and, in addition, they consider the spatial information of cluster layouts to generate offspring solutions. Particularly, each dimension of the parameter vector interacts with its correlated dimensions, which not only adapts the cluster number but also avoids the cross-dimension learning error. The experimental results show that our algorithm outperforms the state-of-the-art algorithms that it is able to identify the correct number of clusters and obtain a good cluster validation value.
Collapse
|
7
|
Liu Y, Jiang Y, Hou T, Liu F. A new robust fuzzy clustering validity index for imbalanced data sets. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.08.041] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
8
|
Zhou S, Liu F, Song W. Estimating the Optimal Number of Clusters Via Internal Validity Index. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10427-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
9
|
Xu X, Ding S, Wang L, Wang Y. A robust density peaks clustering algorithm with density-sensitive similarity. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106028] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
10
|
An Algorithm for the Evolutionary-Fuzzy Generation of on-Line Signature Hybrid Descriptors. JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH 2020. [DOI: 10.2478/jaiscr-2020-0012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Abstract
In biometrics, methods which are able to precisely adapt to the biometric features of users are much sought after. They use various methods of artificial intelligence, in particular methods from the group of soft computing. In this paper, we focus on on-line signature verification. Such signatures are complex objects described not only by the shape but also by the dynamics of the signing process. In standard devices used for signature acquisition (with an LCD touch screen) this dynamics may include pen velocity, but sometimes other types of signals are also available, e.g. pen pressure on the screen surface (e.g. in graphic tablets), the angle between the pen and the screen surface, etc. The precision of the on-line signature dynamics processing has been a motivational springboard for developing methods that use signature partitioning. Partitioning uses a well-known principle of decomposing the problem into smaller ones. In this paper, we propose a new partitioning algorithm that uses capabilities of the algorithms based on populations and fuzzy systems. Evolutionary-fuzzy partitioning eliminates the need to average dynamic waveforms in created partitions because it replaces them. Evolutionary separation of partitions results in a better matching of partitions with reference signatures, eliminates dispro-portions between the number of points describing dynamics in partitions, eliminates the impact of random values, separates partitions related to the signing stage and its dynamics (e.g. high and low velocity of signing, where high and low are imprecise-fuzzy concepts). The operation of the presented algorithm has been tested using the well-known BioSecure DS2 database of real dynamic signatures.
Collapse
|
11
|
Zhou S, Liu F. A novel internal cluster validity index. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-191361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Shibing Zhou
- Department of Computer Science and Technology, Jiangnan University, Wuxi, Jiangsu, P.R. China
- Jiangsu Provincial Engineering Laboratory for Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, Jiangsu, P.R. China
| | - Fei Liu
- Institute of Automation, Jiangnan University, Wuxi, Jiangsu, P.R. China
| |
Collapse
|
12
|
Ezugwu AE. Nature-inspired metaheuristic techniques for automatic clustering: a survey and performance study. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-2073-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
13
|
A novel internal validity index based on the cluster centre and the nearest neighbour cluster. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.06.033] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
14
|
Short-Term Forecasting of the Output Power of a Building-Integrated Photovoltaic System Using a Metaheuristic Approach. ENERGIES 2018. [DOI: 10.3390/en11051260] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
15
|
López-Rubio E, Palomo EJ, Ortega-Zamorano F. Unsupervised learning by cluster quality optimization. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.01.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
16
|
Yao L, Weng KS. Imputation of incomplete data using adaptive ellipsoids with linear regression. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2015. [DOI: 10.3233/ifs-151592] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
17
|
|
18
|
Ozturk C, Hancer E, Karaboga D. Dynamic clustering with improved binary artificial bee colony algorithm. Appl Soft Comput 2015. [DOI: 10.1016/j.asoc.2014.11.040] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
19
|
Garcia-Ceja E, Brena RF, Carrasco-Jimenez JC, Garrido L. Long-term activity recognition from wristwatch accelerometer data. SENSORS 2014; 14:22500-24. [PMID: 25436652 PMCID: PMC4299024 DOI: 10.3390/s141222500] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Revised: 10/18/2014] [Accepted: 11/14/2014] [Indexed: 11/17/2022]
Abstract
With the development of wearable devices that have several embedded sensors, it is possible to collect data that can be analyzed in order to understand the user's needs and provide personalized services. Examples of these types of devices are smartphones, fitness-bracelets, smartwatches, just to mention a few. In the last years, several works have used these devices to recognize simple activities like running, walking, sleeping, and other physical activities. There has also been research on recognizing complex activities like cooking, sporting, and taking medication, but these generally require the installation of external sensors that may become obtrusive to the user. In this work we used acceleration data from a wristwatch in order to identify long-term activities. We compare the use of Hidden Markov Models and Conditional Random Fields for the segmentation task. We also added prior knowledge into the models regarding the duration of the activities by coding them as constraints and sequence patterns were added in the form of feature functions. We also performed subclassing in order to deal with the problem of intra-class fragmentation, which arises when the same label is applied to activities that are conceptually the same but very different from the acceleration point of view.
Collapse
Affiliation(s)
- Enrique Garcia-Ceja
- Tecnológico de Monterrey, Campus Monterrey, Av. Eugenio Garza Sada 2501 Sur, Monterrey 64849, Mexico.
| | - Ramon F Brena
- Tecnológico de Monterrey, Campus Monterrey, Av. Eugenio Garza Sada 2501 Sur, Monterrey 64849, Mexico.
| | - Jose C Carrasco-Jimenez
- Tecnológico de Monterrey, Campus Monterrey, Av. Eugenio Garza Sada 2501 Sur, Monterrey 64849, Mexico.
| | - Leonardo Garrido
- Tecnológico de Monterrey, Campus Monterrey, Av. Eugenio Garza Sada 2501 Sur, Monterrey 64849, Mexico.
| |
Collapse
|
20
|
Clustering of web search results based on the cuckoo search algorithm and Balanced Bayesian Information Criterion. Inf Sci (N Y) 2014. [DOI: 10.1016/j.ins.2014.05.047] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
21
|
Özgüler AB, Yildiz A. Foraging swarms as Nash equilibria of dynamic games. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:979-987. [PMID: 24122615 DOI: 10.1109/tcyb.2013.2283102] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
The question of whether foraging swarms can form as a result of a noncooperative game played by individuals is shown here to have an affirmative answer. A dynamic game played by N agents in 1-D motion is introduced and models, for instance, a foraging ant colony. Each agent controls its velocity to minimize its total work done in a finite time interval. The game is shown to have a unique Nash equilibrium under two different foraging location specifications, and both equilibria display many features of a foraging swarm behavior observed in biological swarms. Explicit expressions are derived for pairwise distances between individuals of the swarm, swarm size, and swarm center location during foraging.
Collapse
|
22
|
Grid topologies for the self-organizing map. Neural Netw 2014; 56:35-48. [PMID: 24861385 DOI: 10.1016/j.neunet.2014.05.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Revised: 04/28/2014] [Accepted: 05/01/2014] [Indexed: 11/20/2022]
Abstract
The original Self-Organizing Feature Map (SOFM) has been extended in many ways to suit different goals and application domains. However, the topologies of the map lattice that we can found in literature are nearly always square or, more rarely, hexagonal. In this paper we study alternative grid topologies, which are derived from the geometrical theory of tessellations. Experimental results are presented for unsupervised clustering, color image segmentation and classification tasks, which show that the differences among the topologies are statistically significant in most cases, and that the optimal topology depends on the problem at hand. A theoretical interpretation of these results is also developed.
Collapse
|
23
|
Bao Y, Xiong T, Hu Z. PSO-MISMO modeling strategy for multistep-ahead time series prediction. IEEE TRANSACTIONS ON CYBERNETICS 2014; 44:655-668. [PMID: 23846512 DOI: 10.1109/tcyb.2013.2265084] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Multistep-ahead time series prediction is one of the most challenging research topics in the field of time series modeling and prediction, and is continually under research. Recently, the multiple-input several multiple-outputs (MISMO) modeling strategy has been proposed as a promising alternative for multistep-ahead time series prediction, exhibiting advantages compared with the two currently dominating strategies, the iterated and the direct strategies. Built on the established MISMO strategy, this paper proposes a particle swarm optimization (PSO)-based MISMO modeling strategy, which is capable of determining the number of sub-models in a self-adaptive mode, with varying prediction horizons. Rather than deriving crisp divides with equal-size s prediction horizons from the established MISMO, the proposed PSO-MISMO strategy, implemented with neural networks, employs a heuristic to create flexible divides with varying sizes of prediction horizons and to generate corresponding sub-models, providing considerable flexibility in model construction, which has been validated with simulated and real datasets.
Collapse
|
24
|
Lopez-Rubio E. Improving the quality of self-organizing maps by self-intersection avoidance. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:1253-1265. [PMID: 24808565 DOI: 10.1109/tnnls.2013.2254127] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The quality of self-organizing maps is always a key issue to practitioners. Smooth maps convey information about input data sets in a clear manner. Here a method is presented to modify the learning algorithm of self-organizing maps to reduce the number of topology errors, hence the obtained map has better quality at the expense of increased quantization error. It is based on avoiding maps that self-intersect or nearly so, as these states are related to low quality. Our approach is tested with synthetic data and real data from visualization, pattern recognition and computer vision applications, with satisfactory results.
Collapse
|
25
|
Lee JS, Olafsson S. A meta-learning approach for determining the number of clusters with consideration of nearest neighbors. Inf Sci (N Y) 2013. [DOI: 10.1016/j.ins.2012.12.033] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|