1
|
Zhang R, Li X, Wu T, Zhao Y. Data Clustering via Uncorrelated Ridge Regression. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:450-456. [PMID: 32275606 DOI: 10.1109/tnnls.2020.2978755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Ridge regression is frequently utilized by both supervised and semisupervised learnings. However, the trivial solution might occur, when ridge regression is directly applied for clustering. To address this issue, an uncorrelated constraint is introduced to the ridge regression with embedding the manifold structure. In particular, we choose uncorrelated constraint over orthogonal constraint, since the closed-form solution can be obtained correspondingly. In addition to the proposed uncorrelated ridge regression, a soft pseudo label is utilized with l1 ball constraint for clustering. Moreover, a brand new strategy, i.e., a rescaled technique, is proposed such that optimal scaling within the uncorrelated constraint can be achieved automatically to avoid the inconvenience of tuning it manually. Equipped with the rescaled uncorrelated ridge regression with the soft label, a novel clustering method can be developed based on solving the related clustering model. Consequently, extensive experiments are provided to illustrate the effectiveness of the proposed method.
Collapse
|
2
|
Zhang R, Nie F, Guo M, Wei X, Li X. Joint Learning of Fuzzy k-Means and Nonnegative Spectral Clustering With Side Information. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:2152-2162. [PMID: 30475719 DOI: 10.1109/tip.2018.2882925] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
As one of the most widely used clustering techniques, the fuzzy k -means (FKM) assigns every data point to each cluster with a certain degree of membership. However, conventional FKM approach relies on the square data fitting term, which is sensitive to the outliers with ignoring the prior information. In this paper, we develop a novel and robust fuzzy k -means clustering algorithm, namely, joint learning of fuzzy k -means and nonnegative spectral clustering with side information. The proposed method combines fuzzy k -means and nonnegative spectral clustering into a unified model, which can further exploit the prior knowledge of data pairs such that both the quality of affinity graph and the clustering performance can be improved. In addition, for the purpose of enhancing the robustness, the adaptive loss function is adopted in the objective function, since it smoothly interpolates between l1 -norm and l2 -norm. Finally, experimental results on benchmark datasets verify the effectiveness and the superiority of our clustering method.
Collapse
|
3
|
Brito da Silva LE, Wunsch DC. An Information-Theoretic-Cluster Visualization for Self-Organizing Maps. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2595-2613. [PMID: 28534793 DOI: 10.1109/tnnls.2017.2699674] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Improved data visualization will be a significant tool to enhance cluster analysis. In this paper, an information-theoretic-based method for cluster visualization using self-organizing maps (SOMs) is presented. The information-theoretic visualization (IT-vis) has the same structure as the unified distance matrix, but instead of depicting Euclidean distances between adjacent neurons, it displays the similarity between the distributions associated with adjacent neurons. Each SOM neuron has an associated subset of the data set whose cardinality controls the granularity of the IT-vis and with which the first- and second-order statistics are computed and used to estimate their probability density functions. These are used to calculate the similarity measure, based on Renyi's quadratic cross entropy and cross information potential (CIP). The introduced visualizations combine the low computational cost and kernel estimation properties of the representative CIP and the data structure representation of a single-linkage-based grouping algorithm to generate an enhanced SOM-based visualization. The visual quality of the IT-vis is assessed by comparing it with other visualization methods for several real-world and synthetic benchmark data sets. Thus, this paper also contains a significant literature survey. The experiments demonstrate the IT-vis cluster revealing capabilities, in which cluster boundaries are sharply captured. Additionally, the information-theoretic visualizations are used to perform clustering of the SOM. Compared with other methods, IT-vis of large SOMs yielded the best results in this paper, for which the quality of the final partitions was evaluated using external validity indices.
Collapse
|
4
|
Self-Organizing Hidden Markov Model Map (SOHMMM): Biological Sequence Clustering and Cluster Visualization. Methods Mol Biol 2017. [PMID: 28224492 DOI: 10.1007/978-1-4939-6753-7_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.
Collapse
|
5
|
Wang FY, Takatsuka M. Self-organizing Map (SOM) Based Data Navigation for Identifying Shape Similarities of Graphic Logos. Neural Process Lett 2014. [DOI: 10.1007/s11063-014-9375-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
6
|
Ferles C, Stafylopatis A. Self-Organizing Hidden Markov Model Map (SOHMMM). Neural Netw 2013; 48:133-47. [DOI: 10.1016/j.neunet.2013.07.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2011] [Revised: 06/08/2013] [Accepted: 07/31/2013] [Indexed: 10/26/2022]
|
7
|
Liao K, Liu G, Xiao L, Liu C. A sample-based hierarchical adaptive K-means clustering method for large-scale video retrieval. Knowl Based Syst 2013. [DOI: 10.1016/j.knosys.2013.05.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
8
|
Katwal SB, Gore JC, Marois R, Rogers BP. Unsupervised spatiotemporal analysis of fMRI data using graph-based visualizations of self-organizing maps. IEEE Trans Biomed Eng 2013; 60:2472-83. [PMID: 23613020 DOI: 10.1109/tbme.2013.2258344] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We present novel graph-based visualizations of self-organizing maps for unsupervised functional magnetic resonance imaging (fMRI) analysis. A self-organizing map is an artificial neural network model that transforms high-dimensional data into a low-dimensional (often a 2-D) map using unsupervised learning. However, a postprocessing scheme is necessary to correctly interpret similarity between neighboring node prototypes (feature vectors) on the output map and delineate clusters and features of interest in the data. In this paper, we used graph-based visualizations to capture fMRI data features based upon 1) the distribution of data across the receptive fields of the prototypes (density-based connectivity); and 2) temporal similarities (correlations) between the prototypes (correlation-based connectivity). We applied this approach to identify task-related brain areas in an fMRI reaction time experiment involving a visuo-manual response task, and we correlated the time-to-peak of the fMRI responses in these areas with reaction time. Visualization of self-organizing maps outperformed independent component analysis and voxelwise univariate linear regression analysis in identifying and classifying relevant brain regions. We conclude that the graph-based visualizations of self-organizing maps help in advanced visualization of cluster boundaries in fMRI data enabling the separation of regions with small differences in the timings of their brain responses.
Collapse
Affiliation(s)
- Santosh B Katwal
- Department of Electrical Engineering and Institute of Imaging Science (VUIIS), Vanderbilt University, Nashville, TN 37212, USA.
| | | | | | | |
Collapse
|
9
|
Measuring relative timings of brain activities using fMRI. Neuroimage 2012; 66:436-48. [PMID: 23110880 DOI: 10.1016/j.neuroimage.2012.10.052] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2012] [Revised: 10/04/2012] [Accepted: 10/06/2012] [Indexed: 11/20/2022] Open
Abstract
Functional MRI (fMRI) has previously been shown to be able to measure hundreds of milliseconds differences in timings of activities in different brain regions, even though the underlying blood oxygenation level-dependent (BOLD) response is delayed and dispersed on the order of seconds. This capability may contribute towards the study of communication within the brain by assessing the temporal sequences of various brain processes (mental chronometry). The practical limit of fMRI for detecting the relative timing of brain activity is not known. We aimed to detect fine differences in the timings of brain activities beyond those previously measured from fMRI data in human subjects. We introduced known delays between the onsets of visual stimuli in a controlled, sparse event-related design and investigated if the temporal shifts in the corresponding average BOLD signals were detectable. To maximize sensitivity, we used high spatial and temporal resolution fMRI at ultrahigh field (7 T), in conjunction with a novel data-driven technique for voxel selection using graph-based visualizations of self-organizing maps and Granger causality to measure relative timing. This approach detected timing differences as small as 28ms in visual cortex in individual subjects. For signal extraction, the self-organizing map approach outperformed other common techniques including independent component analysis, voxelwise univariate linear regression analysis and a separate localizer scan. For relative timing measurement, Granger causality outperformed time-to-peak calculations derived from an inverse logit curve fit. We conclude that high-resolution imaging at ultrahigh field, signal extraction via self-organizing map, and appropriate use of Granger causality permit the detection of small timing differences in fMRI data, despite the intrinsically slow hemodynamic response.
Collapse
|
10
|
Taşdemir K, Milenov P, Tapsall B. Topology-based hierarchical clustering of self-organizing maps. ACTA ACUST UNITED AC 2012; 22:474-85. [PMID: 21356611 DOI: 10.1109/tnn.2011.2107527] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A powerful method in the analysis of datasets where there are many natural clusters with varying statistics such as different sizes, shapes, density distribution, overlaps, etc., is the use of self-organizing maps (SOMs). However, further processing tools, such as visualization and interactive clustering, are often necessary to capture the clusters from the learned SOM knowledge. A recent visualization scheme (CONNvis) and its interactive clustering utilize the data topology for SOM knowledge representation by using a connectivity matrix (a weighted Delaunay graph), CONN. In this paper, we propose an automated clustering method for SOMs, which is a hierarchical agglomerative clustering of CONN. We determine the number of clusters either by using cluster validity indices or by prior knowledge on the datasets. We show that, for the datasets used in this paper, data-topology-based hierarchical clustering can produce better partitioning than hierarchical clustering based solely on distance information.
Collapse
Affiliation(s)
- Kadim Taşdemir
- European Commission Joint Research Centre, Institute for Environment and Sustainability, Monitoring Agricultural Resources Unit, Ispra 21027, Italy.
| | | | | |
Collapse
|
11
|
Nie F, Zeng Z, Tsang IW, Xu D, Zhang C. Spectral embedded clustering: a framework for in-sample and out-of-sample spectral clustering. ACTA ACUST UNITED AC 2011; 22:1796-808. [PMID: 21965198 DOI: 10.1109/tnn.2011.2162000] [Citation(s) in RCA: 187] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Spectral clustering (SC) methods have been successfully applied to many real-world applications. The success of these SC methods is largely based on the manifold assumption, namely, that two nearby data points in the high-density region of a low-dimensional data manifold have the same cluster label. However, such an assumption might not always hold on high-dimensional data. When the data do not exhibit a clear low-dimensional manifold structure (e.g., high-dimensional and sparse data), the clustering performance of SC will be degraded and become even worse than K -means clustering. In this paper, motivated by the observation that the true cluster assignment matrix for high-dimensional data can be always embedded in a linear space spanned by the data, we propose the spectral embedded clustering (SEC) framework, in which a linearity regularization is explicitly added into the objective function of SC methods. More importantly, the proposed SEC framework can naturally deal with out-of-sample data. We also present a new Laplacian matrix constructed from a local regression of each pattern and incorporate it into our SEC framework to capture both local and global discriminative information for clustering. Comprehensive experiments on eight real-world high-dimensional datasets demonstrate the effectiveness and advantages of our SEC framework over existing SC methods and K-means-based clustering methods. Our SEC framework significantly outperforms SC using the Nyström algorithm on unseen data.
Collapse
Affiliation(s)
- Feiping Nie
- University of Texas, Arlington, TX 76019, USA.
| | | | | | | | | |
Collapse
|
12
|
Zhang J, Wang X, Kruger U, Wang FY. Principal curve algorithms for partitioning high-dimensional data spaces. IEEE TRANSACTIONS ON NEURAL NETWORKS 2011; 22:367-80. [PMID: 21193373 DOI: 10.1109/tnn.2010.2100408] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Most partitioning algorithms iteratively partition a space into cells that contain underlying linear or nonlinear structures using linear partitioning strategies. The compactness of each cell depends on how well the (locally) linear partitioning strategy approximates the intrinsic structure. To partition a compact structure for complex data in a nonlinear context, this paper proposes a nonlinear partition strategy. This is a principal curve tree (PC-tree), which is implemented iteratively. Given that a PC passes through the middle of the data distribution, it allows for partitioning based on the arc length of the PC. To enhance the partitioning of a given space, a residual version of the PC-tree algorithm is developed, denoted here as the principal component analysis tree (PCR-tree) algorithm. Because of its residual property, the PCR-tree can yield the intrinsic dimension of high-dimensional data. Comparisons presented in this paper confirm that the proposed PC-tree and PCR-tree approaches show a better performance than several other competing partitioning algorithms in terms of vector quantization error and nearest neighbor search. The comparison also shows that the proposed algorithms outperform competing linear methods in total average coverage which measures the nonlinear compactness of partitioning algorithms.
Collapse
Affiliation(s)
- Junping Zhang
- Shanghai Key Laboratory of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai 200433, China.
| | | | | | | |
Collapse
|