1
|
Montambault B, Appleby G, Rogers J, Brumar CD, Li M, Chang R. DimBridge: Interactive Explanation of Visual Patterns in Dimensionality Reductions with Predicate Logic. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:207-217. [PMID: 39312423 DOI: 10.1109/tvcg.2024.3456391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Dimensionality reduction techniques are widely used for visualizing high-dimensional data. However, support for interpreting patterns of dimension reduction results in the context of the original data space is often insufficient. Consequently, users may struggle to extract insights from the projections. In this paper, we introduce DimBridge, a visual analytics tool that allows users to interact with visual patterns in a projection and retrieve corresponding data patterns. DimBridge supports several interactions, allowing users to perform various analyses, from contrasting multiple clusters to explaining complex latent structures. Leveraging first-order predicate logic, DimBridge identifies subspaces in the original dimensions relevant to a queried pattern and provides an interface for users to visualize and interact with them. We demonstrate how DimBridge can help users overcome the challenges associated with interpreting visual patterns in projections.
Collapse
|
2
|
Wu L, Yuan L, Zhao G, Lin H, Li SZ. Deep Clustering and Visualization for End-to-End High-Dimensional Data Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8543-8554. [PMID: 35263258 DOI: 10.1109/tnnls.2022.3151498] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
High-dimensional data analysis for exploration and discovery includes two fundamental tasks: deep clustering and data visualization. When these two associated tasks are done separately, as is often the case thus far, disagreements can occur among the tasks in terms of geometry preservation. Namely, the clustering process is often accompanied by the corruption of the geometric structure, whereas visualization aims to preserve the data geometry for better interpretation. Therefore, how to achieve deep clustering and data visualization in an end-to-end unified framework is an important but challenging problem. In this article, we propose a novel neural network-based method, called deep clustering and visualization (DCV), to accomplish the two associated tasks end-to-end to resolve their disagreements. The DCV framework consists of two nonlinear dimensionality reduction (NLDR) transformations: 1) one from the input data space to latent feature space for clustering and 2) the other from the latent feature space to the final 2-D space for visualization. Importantly, the first NLDR transformation is mainly optimized by one Clustering Loss, allowing arbitrary corruption of the geometric structure for better clustering, while the second NLDR transformation is optimized by one Geometry-Preserving Loss to recover the corrupted geometry for better visualization. Extensive comparative results show that the DCV framework outperforms other leading clustering-visualization algorithms in terms of both quantitative evaluation metrics and qualitative visualization.
Collapse
|
3
|
Fujiwara T, Liu TP. Contrastive multiple correspondence analysis (cMCA): Using contrastive learning to identify latent subgroups in political parties. PLoS One 2023; 18:e0287180. [PMID: 37428735 PMCID: PMC10332614 DOI: 10.1371/journal.pone.0287180] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 05/31/2023] [Indexed: 07/12/2023] Open
Abstract
Scaling methods have long been utilized to simplify and cluster high-dimensional data. However, the general latent spaces across all predefined groups derived from these methods sometimes do not fall into researchers' interest regarding specific patterns within groups. To tackle this issue, we adopt an emerging analysis approach called contrastive learning. We contribute to this growing field by extending its ideas to multiple correspondence analysis (MCA) in order to enable an analysis of data often encountered by social scientists-containing binary, ordinal, and nominal variables. We demonstrate the utility of contrastive MCA (cMCA) by analyzing two different surveys of voters in the U.S. and U.K. Our results suggest that, first, cMCA can identify substantively important dimensions and divisions among subgroups that are overlooked by traditional methods; second, for other cases, cMCA can derive latent traits that emphasize subgroups seen moderately in those derived by traditional methods.
Collapse
|
4
|
Eckelt K, Hinterreiter A, Adelberger P, Walchshofer C, Dhanoa V, Humer C, Heckmann M, Steinparz C, Streit M. Visual Exploration of Relationships and Structure in Low-Dimensional Embeddings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:3312-3326. [PMID: 35254984 DOI: 10.1109/tvcg.2022.3156760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
In this work, we propose an interactive visual approach for the exploration and formation of structural relationships in embeddings of high-dimensional data. These structural relationships, such as item sequences, associations of items with groups, and hierarchies between groups of items, are defining properties of many real-world datasets. Nevertheless, most existing methods for the visual exploration of embeddings treat these structures as second-class citizens or do not take them into account at all. In our proposed analysis workflow, users explore enriched scatterplots of the embedding, in which relationships between items and/or groups are visually highlighted. The original high-dimensional data for single items, groups of items, or differences between connected items and groups are accessible through additional summary visualizations. We carefully tailored these summary and difference visualizations to the various data types and semantic contexts. During their exploratory analysis, users can externalize their insights by setting up additional groups and relationships between items and/or groups. We demonstrate the utility and potential impact of our approach by means of two use cases and multiple examples from various domains.
Collapse
|
5
|
Gebrye H, Wang Y, Li F. Traffic data extraction and labeling for machine learning based attack detection in IoT networks. INT J MACH LEARN CYB 2023. [DOI: 10.1007/s13042-022-01765-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
6
|
Liu Q, Ren Y, Zhu Z, Li D, Ma X, Li Q. RankAxis: Towards a Systematic Combination of Projection and Ranking in Multi-Attribute Data Exploration. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:701-711. [PMID: 36155453 DOI: 10.1109/tvcg.2022.3209463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Projection and ranking are frequently used analysis techniques in multi-attribute data exploration. Both families of techniques help analysts with tasks such as identifying similarities between observations and determining ordered subgroups, and have shown good performances in multi-attribute data exploration. However, they often exhibit problems such as distorted projection layouts, obscure semantic interpretations, and non-intuitive effects produced by selecting a subset of (weighted) attributes. Moreover, few studies have attempted to combine projection and ranking into the same exploration space to complement each other's strengths and weaknesses. For this reason, we propose RankAxis, a visual analytics system that systematically combines projection and ranking to facilitate the mutual interpretation of these two techniques and jointly support multi-attribute data exploration. A real-world case study, expert feedback, and a user study demonstrate the efficacy of RankAxis.
Collapse
|
7
|
Li J, Zhou CQ. Incorporation of Human Knowledge into Data Embeddings to Improve Pattern Significance and Interpretability. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:723-733. [PMID: 36155441 DOI: 10.1109/tvcg.2022.3209382] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Embedding is a common technique for analyzing multi-dimensional data. However, the embedding projection cannot always form significant and interpretable visual structures that foreshadow underlying data patterns. We propose an approach that incorporates human knowledge into data embeddings to improve pattern significance and interpretability. The core idea is (1) externalizing tacit human knowledge as explicit sample labels and (2) adding a classification loss in the embedding network to encode samples' classes. The approach pulls samples of the same class with similar data features closer in the projection, leading to more compact (significant) and class-consistent (interpretable) visual structures. We give an embedding network with a customized classification loss to implement the idea and integrate the network into a visualization system to form a workflow that supports flexible class creation and pattern exploration. Patterns found on open datasets in case studies, subjects' performance in a user study, and quantitative experiment results illustrate the general usability and effectiveness of the approach.
Collapse
|
8
|
Ying L, Shu X, Deng D, Yang Y, Tang T, Yu L, Wu Y. MetaGlyph: Automatic Generation of Metaphoric Glyph-based Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:331-341. [PMID: 36179002 DOI: 10.1109/tvcg.2022.3209447] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Glyph-based visualization achieves an impressive graphic design when associated with comprehensive visual metaphors, which help audiences effectively grasp the conveyed information through revealing data semantics. However, creating such metaphoric glyph-based visualization (MGV) is not an easy task, as it requires not only a deep understanding of data but also professional design skills. This paper proposes MetaGlyph, an automatic system for generating MGVs from a spreadsheet. To develop MetaGlyph, we first conduct a qualitative analysis to understand the design of current MGVs from the perspectives of metaphor embodiment and glyph design. Based on the results, we introduce a novel framework for generating MGVs by metaphoric image selection and an MGV construction. Specifically, MetaGlyph automatically selects metaphors with corresponding images from online resources based on the input data semantics. We then integrate a Monte Carlo tree search algorithm that explores the design of an MGV by associating visual elements with data dimensions given the data importance, semantic relevance, and glyph non-overlap. The system also provides editing feedback that allows users to customize the MGVs according to their design preferences. We demonstrate the use of MetaGlyph through a set of examples, one usage scenario, and validate its effectiveness through a series of expert interviews.
Collapse
|
9
|
Xia J, Huang L, Lin W, Zhao X, Wu J, Chen Y, Zhao Y, Chen W. Interactive Visual Cluster Analysis by Contrastive Dimensionality Reduction. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:734-744. [PMID: 36166528 DOI: 10.1109/tvcg.2022.3209423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
We propose a contrastive dimensionality reduction approach (CDR) for interactive visual cluster analysis. Although dimensionality reduction of high-dimensional data is widely used in visual cluster analysis in conjunction with scatterplots, there are several limitations on effective visual cluster analysis. First, it is non-trivial for an embedding to present clear visual cluster separation when keeping neighborhood structures. Second, as cluster analysis is a subjective task, user steering is required. However, it is also non-trivial to enable interactions in dimensionality reduction. To tackle these problems, we introduce contrastive learning into dimensionality reduction for high-quality embedding. We then redefine the gradient of the loss function to the negative pairs to enhance the visual cluster separation of embedding results. Based on the contrastive learning scheme, we employ link-based interactions to steer embeddings. After that, we implement a prototype visual interface that integrates the proposed algorithms and a set of visualizations. Quantitative experiments demonstrate that CDR outperforms existing techniques in terms of preserving correct neighborhood structures and improving visual cluster separation. The ablation experiment demonstrates the effectiveness of gradient redefinition. The user study verifies that CDR outperforms t-SNE and UMAP in the task of cluster identification. We also showcase two use cases on real-world datasets to present the effectiveness of link-based interactions.
Collapse
|
10
|
Yuan J, Chan GYY, Barr B, Overton K, Rees K, Nonato LG, Bertini E, Silva CT. SUBPLEX: A Visual Analytics Approach to Understand Local Model Explanations at the Subpopulation Level. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2022; 42:24-36. [PMID: 37015716 DOI: 10.1109/mcg.2022.3199727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Understanding the interpretation of machine learning (ML) models has been of paramount importance when making decisions with societal impacts, such as transport control, financial activities, and medical diagnosis. While local explanation techniques are popular methods to interpret ML models on a single instance, they do not scale to the understanding of a model's behavior on the whole dataset. In this article, we outline the challenges and needs of visually analyzing local explanations and propose SUBPLEX, a visual analytics approach to help users understand local explanations with subpopulation visual analysis. SUBPLEX provides steerable clustering and projection visualization techniques that allow users to derive interpretable subpopulations of local explanations with users' expertise. We evaluate our approach through two use cases and experts' feedback.
Collapse
|
11
|
Kaur R, Gabrijelčič D. Behavior segmentation of electricity consumption patterns: A cluster analytical approach. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
12
|
Xia J, Zhang Y, Song J, Chen Y, Wang Y, Liu S. Revisiting Dimensionality Reduction Techniques for Visual Cluster Analysis: An Empirical Study. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:529-539. [PMID: 34587015 DOI: 10.1109/tvcg.2021.3114694] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Dimensionality Reduction (DR) techniques can generate 2D projections and enable visual exploration of cluster structures of high-dimensional datasets. However, different DR techniques would yield various patterns, which significantly affect the performance of visual cluster analysis tasks. We present the results of a user study that investigates the influence of different DR techniques on visual cluster analysis. Our study focuses on the most concerned property types, namely the linearity and locality, and evaluates twelve representative DR techniques that cover the concerned properties. Four controlled experiments were conducted to evaluate how the DR techniques facilitate the tasks of 1) cluster identification, 2) membership identification, 3) distance comparison, and 4) density comparison, respectively. We also evaluated users' subjective preference of the DR techniques regarding the quality of projected clusters. The results show that: 1) Non-linear and Local techniques are preferred in cluster identification and membership identification; 2) Linear techniques perform better than non-linear techniques in density comparison; 3) UMAP (Uniform Manifold Approximation and Projection) and t-SNE (t-Distributed Stochastic Neighbor Embedding) perform the best in cluster identification and membership identification; 4) NMF (Nonnegative Matrix Factorization) has competitive performance in distance comparison; 5) t-SNLE (t-Distributed Stochastic Neighbor Linear Embedding) has competitive performance in density comparison.
Collapse
|
13
|
Fujiwara T, Wei X, Zhao J, Ma KL. Interactive Dimensionality Reduction for Comparative Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:758-768. [PMID: 34591765 DOI: 10.1109/tvcg.2021.3114807] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Finding the similarities and differences between groups of datasets is a fundamental analysis task. For high-dimensional data, dimensionality reduction (DR) methods are often used to find the characteristics of each group. However, existing DR methods provide limited capability and flexibility for such comparative analysis as each method is designed only for a narrow analysis target, such as identifying factors that most differentiate groups. This paper presents an interactive DR framework where we integrate our new DR method, called ULCA (unified linear comparative analysis), with an interactive visual interface. ULCA unifies two DR schemes, discriminant analysis and contrastive learning, to support various comparative analysis tasks. To provide flexibility for comparative analysis, we develop an optimization algorithm that enables analysts to interactively refine ULCA results. Additionally, the interactive visualization interface facilitates interpretation and refinement of the ULCA results. We evaluate ULCA and the optimization algorithm to show their efficiency as well as present multiple case studies using real-world datasets to demonstrate the usefulness of this framework.
Collapse
|
14
|
IXVC: An interactive pipeline for explaining visual clusters in dimensionality reduction visualizations with decision trees. ARRAY 2021. [DOI: 10.1016/j.array.2021.100080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
15
|
Xia J, Lin W, Jiang G, Wang Y, Chen W, Schreck T. Visual Clustering Factors in Scatterplots. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2021; 41:79-89. [PMID: 34310292 DOI: 10.1109/mcg.2021.3098804] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Cluster analysis is an important technique in data analysis. However, there is no encompassing theory on scatterplots to evaluate clustering. Human visual perception is regarded as a gold standard to evaluate clustering. The cluster analysis based on human visual perception requires the participation of many probands, to obtain diverse data, and hence is a challenge to do. We contribute an empirical and data-driven study on human perception for visual clustering of large scatterplot data. First, we systematically construct and label a large, publicly available scatterplot dataset. Second, we carry out a qualitative analysis based on the dataset and summarize the influence of visual factors on clustering perception. Third, we use the labeled datasets to train a deep neural network for modeling human visual clustering perception. Our experiments show that the data-driven model successfully models the human visual perception, and outperforms conventional clustering algorithms in synthetic and real datasets.
Collapse
|
16
|
SemanticAxis: exploring multi-attribute data by semantic construction and ranking analysis. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-020-00733-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
17
|
Natsukawa H, Deyle ER, Pao GM, Koyamada K, Sugihara G. A Visual Analytics Approach for Ecosystem Dynamics based on Empirical Dynamic Modeling. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:506-516. [PMID: 33026998 DOI: 10.1109/tvcg.2020.3028956] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
An important approach for scientific inquiry across many disciplines involves using observational time series data to understand the relationships between key variables to gain mechanistic insights into the underlying rules that govern the given system. In real systems, such as those found in ecology, the relationships between time series variables are generally not static; instead, these relationships are dynamical and change in a nonlinear or state-dependent manner. To further understand such systems, we investigate integrating methods that appropriately characterize these dynamics (i.e., methods that measure interactions as they change with time-varying system states) with visualization techniques that can help analyze the behavior of the system. Here, we focus on empirical dynamic modeling (EDM) as a state-of-the-art method that specifically identifies causal variables and measures changing state-dependent relationships between time series variables. Instead of using approaches centered on parametric equations, EDM is an equation-free approach that studies systems based on their dynamic attractors. We propose a visual analytics system to support the identification and mechanistic interpretation of system states using an EDM-constructed dynamic graph. This work, as detailed in four analysis tasks and demonstrated with a GUI, provides a novel synthesis of EDM and visualization techniques such as brush-link visualization and visual summarization to interpret dynamic graphs representing ecosystem dynamics. We applied our proposed system to ecological simulation data and real data from a marine mesocosm study as two key use cases. Our case studies show that our visual analytics tools support the identification and interpretation of the system state by the user, and enable us to discover both confirmatory and new findings in ecosystem dynamics. Overall, we demonstrated that our system can facilitate an understanding of how systems function beyond the intuitive analysis of high-dimensional information based on specific domain knowledge.
Collapse
|
18
|
Reipschlager P, Flemisch T, Dachselt R. Personal Augmented Reality for Information Visualization on Large Interactive Displays. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1182-1192. [PMID: 33052863 DOI: 10.1109/tvcg.2020.3030460] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this work we propose the combination of large interactive displays with personal head-mounted Augmented Reality (AR) for information visualization to facilitate data exploration and analysis. Even though large displays provide more display space, they are challenging with regard to perception, effective multi-user support, and managing data density and complexity. To address these issues and illustrate our proposed setup, we contribute an extensive design space comprising first, the spatial alignment of display, visualizations, and objects in AR space. Next, we discuss which parts of a visualization can be augmented. Finally, we analyze how AR can be used to display personal views in order to show additional information and to minimize the mutual disturbance of data analysts. Based on this conceptual foundation, we present a number of exemplary techniques for extending visualizations with AR and discuss their relation to our design space. We further describe how these techniques address typical visualization problems that we have identified during our literature research. To examine our concepts, we introduce a generic AR visualization framework as well as a prototype implementing several example techniques. In order to demonstrate their potential, we further present a use case walkthrough in which we analyze a movie data set. From these experiences, we conclude that the contributed techniques can be useful in exploring and understanding multivariate data. We are convinced that the extension of large displays with AR for information visualization has a great potential for data analysis and sense-making.
Collapse
|
19
|
Wenskovitch J, North C. An Examination of Grouping and Spatial Organization Tasks for High-Dimensional Data Exploration. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1742-1752. [PMID: 33031038 DOI: 10.1109/tvcg.2020.3028890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
How do analysts think about grouping and spatial operations? This overarching research question incorporates a number of points for investigation, including understanding how analysts begin to explore a dataset, the types of grouping/spatial structures created and the operations performed on them, the relationship between grouping and spatial structures, the decisions analysts make when exploring individual observations, and the role of external information. This work contributes the design and results of such a study, in which a group of participants are asked to organize the data contained within an unfamiliar quantitative dataset. We identify several overarching approaches taken by participants to design their organizational space, discuss the interactions performed by the participants, and propose design recommendations to improve the usability of future high-dimensional data exploration tools that make use of grouping (clustering) and spatial (dimension reduction) operations.
Collapse
|
20
|
Ma Y, Maciejewski R. Visual Analysis of Class Separations With Locally Linear Segments. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:241-253. [PMID: 32746282 DOI: 10.1109/tvcg.2020.3011155] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
High-dimensional labeled data widely exists in many real-world applications such as classification and clustering. One main task in analyzing such datasets is to explore class separations and class boundaries derived from machine learning models. Dimension reduction techniques are commonly applied to support analysts in exploring the underlying decision boundary structures by depicting a low-dimensional representation of the data distributions from multiple classes. However, such projection-based analyses are limited due to their lack of ability to show separations in complex non-linear decision boundary structures and can suffer from heavy distortion and low interpretability. To overcome these issues of separability and interpretability, we propose a visual analysis approach that utilizes the power of explainability from linear projections to support analysts when exploring non-linear separation structures. Our approach is to extract a set of locally linear segments that approximate the original non-linear separations. Unlike traditional projection-based analysis where the data instances are mapped to a single scatterplot, our approach supports the exploration of complex class separations through multiple local projection results. We conduct case studies on two labeled datasets to demonstrate the effectiveness of our approach.
Collapse
|
21
|
Multi-assignment clustering: Machine learning from a biological perspective. J Biotechnol 2020; 326:1-10. [PMID: 33285150 DOI: 10.1016/j.jbiotec.2020.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 12/03/2020] [Indexed: 11/21/2022]
Abstract
A common approach for analyzing large-scale molecular data is to cluster objects sharing similar characteristics. This assumes that genes with highly similar expression profiles are likely participating in a common molecular process. Biological systems are extremely complex and challenging to understand, with proteins having multiple functions that sometimes need to be activated or expressed in a time-dependent manner. Thus, the strategies applied for clustering of these molecules into groups are of key importance for translation of data to biologically interpretable findings. Here we implemented a multi-assignment clustering (MAsC) approach that allows molecules to be assigned to multiple clusters, rather than single ones as in commonly used clustering techniques. When applied to high-throughput transcriptomics data, MAsC increased power of the downstream pathway analysis and allowed identification of pathways with high biological relevance to the experimental setting and the biological systems studied. Multi-assignment clustering also reduced noise in the clustering partition by excluding genes with a low correlation to all of the resulting clusters. Together, these findings suggest that our methodology facilitates translation of large-scale molecular data into biological knowledge. The method is made available as an R package on GitLab (https://gitlab.com/wolftower/masc).
Collapse
|
22
|
Yu S, Yang D, Hao Y, Lian M, Zang Y. Visual Analysis of Merchandise Sales Trend Based on Online Transaction Log. INT J PATTERN RECOGN 2020. [DOI: 10.1142/s0218001420590363] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Online transaction log records the relevant information of the users, commodities and transactions, as well as changes over time, which can help analysts understand commodities’ sales. The existing visualization methods mainly analyze the purchase behavior from the perspective of users, while analyzing the sales trend of commodities can better help merchants to make business decisions. Based on the transaction log, this paper puts forward the visual analysis framework of commodity sales trend and the corresponding data processing algorithm. The concepts of volatility and dynamic performance of sales trend are proposed, through which the multi-dimensional sales data of time-oriented are displayed in two-dimensional space. The “Feature Ring” is designed to display the detailed sales information of the products. Based on the above methods, a visual analysis system is designed and implemented. The usability and validity of the visualization methods are verified by using JD online transaction data. The visualization methods enable manufacturers to formulate production plans and carry out product research and develop better.
Collapse
Affiliation(s)
- Shidong Yu
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
- Department of Electrical Engineering, Yingkou Institute of Technology, Yingkou 115014, P. R. China
| | - Dongsheng Yang
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, P. R. China
| | - Ying Hao
- Department of Electrical Engineering, Yingkou Institute of Technology, Yingkou 115014, P. R. China
- Department of Information Science, Dalian Maritime University, Dalian 116026, P. R. China
| | - Mengjia Lian
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| | - Ying Zang
- Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, P. R. China
- University of Chinese Academy of Sciences, Beijing 100049, P. R. China
| |
Collapse
|
23
|
Visual Analytics for Dimension Reduction and Cluster Analysis of High Dimensional Electronic Health Records. INFORMATICS 2020. [DOI: 10.3390/informatics7020017] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Recent advancement in EHR-based (Electronic Health Record) systems has resulted in producing data at an unprecedented rate. The complex, growing, and high-dimensional data available in EHRs creates great opportunities for machine learning techniques such as clustering. Cluster analysis often requires dimension reduction to achieve efficient processing time and mitigate the curse of dimensionality. Given a wide range of techniques for dimension reduction and cluster analysis, it is not straightforward to identify which combination of techniques from both families leads to the desired result. The ability to derive useful and precise insights from EHRs requires a deeper understanding of the data, intermediary results, configuration parameters, and analysis processes. Although these tasks are often tackled separately in existing studies, we present a visual analytics (VA) system, called Visual Analytics for Cluster Analysis and Dimension Reduction of High Dimensional Electronic Health Records (VALENCIA), to address the challenges of high-dimensional EHRs in a single system. VALENCIA brings a wide range of cluster analysis and dimension reduction techniques, integrate them seamlessly, and make them accessible to users through interactive visualizations. It offers a balanced distribution of processing load between users and the system to facilitate the performance of high-level cognitive tasks in such a way that would be difficult without the aid of a VA system. Through a real case study, we have demonstrated how VALENCIA can be used to analyze the healthcare administrative dataset stored at ICES. This research also highlights what needs to be considered in the future when developing VA systems that are designed to derive deep and novel insights into EHRs.
Collapse
|
24
|
Kammer D, Keck M, Grunder T, Maasch A, Thom T, Kleinsteuber M, Groh R. Glyphboard: Visual Exploration of High-Dimensional Data Combining Glyphs with Dimensionality Reduction. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1661-1671. [PMID: 31985425 DOI: 10.1109/tvcg.2020.2969060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Rigorous data science is interdisciplinary at its core. In order to make sense of high-dimensional data, data scientists need to enter into a dialogue with domain experts. We present Glyphboard, a visualization tool that aims to support this dialogue. Glyphboard is a zoomable user interface that combines well-known methods such as dimensionality reduction and glyph-based visualizations in a novel, seamless, and integrated tool. While the dimensionality reduction affords a quick overview over the data, glyph-based visualizations are able to show the most relevant dimensions in the data set at one glance. We contribute an open-source prototype of Glyphboard, a general exchange format for high-dimensional data, and a case study with nine data scientists and domain experts from four exemplary domains in order to evaluate how the different visualization and interaction features of Glyphboard are used.
Collapse
|
25
|
Fujiwara T, Kwon OH, Ma KL. Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:45-55. [PMID: 31425080 DOI: 10.1109/tvcg.2019.2934251] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Dimensionality reduction (DR) is frequently used for analyzing and visualizing high-dimensional data as it provides a good first glance of the data. However, to interpret the DR result for gaining useful insights from the data, it would take additional analysis effort such as identifying clusters and understanding their characteristics. While there are many automatic methods (e.g., density-based clustering methods) to identify clusters, effective methods for understanding a cluster's characteristics are still lacking. A cluster can be mostly characterized by its distribution of feature values. Reviewing the original feature values is not a straightforward task when the number of features is large. To address this challenge, we present a visual analytics method that effectively highlights the essential features of a cluster in a DR result. To extract the essential features, we introduce an enhanced usage of contrastive principal component analysis (cPCA). Our method, called ccPCA (contrasting clusters in PCA), can calculate each feature's relative contribution to the contrast between one cluster and other clusters. With ccPCA, we have created an interactive system including a scalable visualization of clusters' feature contributions. We demonstrate the effectiveness of our method and system with case studies using several publicly available datasets.
Collapse
|
26
|
Saket B, Huron S, Perin C, Endert A. Investigating Direct Manipulation of Graphical Encodings as a Method for User Interaction. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:482-491. [PMID: 31442983 DOI: 10.1109/tvcg.2019.2934534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We investigate direct manipulation of graphical encodings as a method for interacting with visualizations. There is an increasing interest in developing visualization tools that enable users to perform operations by directly manipulating graphical encodings rather than external widgets such as checkboxes and sliders. Designers of such tools must decide which direct manipulation operations should be supported, and identify how each operation can be invoked. However, we lack empirical guidelines for how people convey their intended operations using direct manipulation of graphical encodings. We address this issue by conducting a qualitative study that examines how participants perform 15 operations using direct manipulation of standard graphical encodings. From this study, we 1) identify a list of strategies people employ to perform each operation, 2) observe commonalities in strategies across operations, and 3) derive implications to help designers leverage direct manipulation of graphical encoding as a method for user interaction.
Collapse
|
27
|
AirInsight: Visual Exploration and Interpretation of Latent Patterns and Anomalies in Air Quality Data. SUSTAINABILITY 2019. [DOI: 10.3390/su11102944] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Nowadays, huge volume of air quality data provides unprecedented opportunities for analyzing pollution. However, due to the high complexity, most traditional analytical methods focus on abstracting data, so these techniques discard the original structure and limit the understanding of the results. Visual analysis is a powerful technique for exploring unknown patterns since it retains the details of the original data and gives visual feedback to users. In this paper, we focus on air quality data and propose the AirInsight design, an interactive visual analytic system for recognizing, exploring, and summarizing regular patterns, as well as detecting, classifying, and interpreting abnormal cases. Based on the time-varying and multivariate features of air quality data, a dimension reduction method Composite Least Square Projection (CLSP) is proposed, which allows appreciating and interpreting the data patterns in the context of attributes. On the basis of the observed regular patterns, multiple abnormal cases are further detected, including the multivariate anomalies by the proposed Noise Hierarchical Clustering (NHC) method, abruptly changing timestamps by Time diversity (TD) indicator, and cities with unique patterns by the Geographical Surprise (GS) measure. Moreover, we combine TD and GS to group anomalies based on their underlying spatiotemporal correlations. AirInsight includes multiple coordinated views and rich interactive functions to provide contextual information from different aspects and facilitate a comprehensive understanding. In particular, a pair of glyphs are designed that provide a visual representation of the temporal variation in air quality conditions for a user-selected city. Experiments show that CLSP improves the accuracy of Least Square Projection (LSP) and that NHC has the ability to separate noises. Meanwhile, several case studies and task-based user evaluation demonstrate that our system is effective and practical for exploring and interpreting multivariate spatiotemporal patterns and anomalies in air quality data.
Collapse
|
28
|
Long-Term Regional Environmental Risk Assessment and Future Scenario Projection at Ningbo, China Coupling the Impact of Sea Level Rise. SUSTAINABILITY 2019. [DOI: 10.3390/su11061560] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Regional environmental risk (RER) denotes potential threats to the natural environment, human health and socioeconomic development caused by specific risks. It is valuable to assess long-term RER in coastal areas with the increasing effects of global change. We proposed a new approach to assess coastal RER considering spatial factors using principal component analysis (PCA) and used a future land use simulation (FLUS) model to project future RER scenarios considering the impact of sea level rise (SLR). In our study, the RER status was classified in five levels as highest, high, medium, low and lowest. We evaluated the 30 m × 30 m gridded spatial pattern of the long-term RER at Ningbo of China by assessing its 1975–2015 history and projecting this to 2020–2050. Our results show that RER at Ningbo has increased substantially over the past 40 years and will slowly increase over the next 35 years. Ningbo’s city center and district centers are exposed to medium-to-highest RER, while the suburban areas are exposed to lowest-to-medium lower RER. Storm surges will lead to strong RER increases along the Ningbo coast, with the low-lying northern coast being more affected than the mountainous southern coast. RER at Ningbo is affected principally by the combined effects of increased human activity, rapid population growth, rapid industrialization, and unprecedented urbanization. This study provides early warnings to support practical regulation for disaster mitigation and environmental protection.
Collapse
|
29
|
Exploring linear projections for revealing clusters, outliers, and trends in subsets of multi-dimensional datasets. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2018.08.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
30
|
Self JZ, Dowling M, Wenskovitch J, Crandell I, Wang M, House L, Leman S, North C. Observation-Level and Parametric Interaction for High-Dimensional Data Analysis. ACM T INTERACT INTEL 2018. [DOI: 10.1145/3158230] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Exploring high-dimensional data is challenging. Dimension reduction algorithms, such as weighted multidimensional scaling, support data exploration by projecting datasets to two dimensions for visualization. These projections can be explored through parametric interaction, tweaking underlying parameterizations, and observation-level interaction, directly interacting with the points within the projection. In this article, we present the results of a controlled usability study determining the differences, advantages, and drawbacks among parametric interaction, observation-level interaction, and their combination. The study assesses both interaction technique effects on domain-specific high-dimensional data analyses performed by non-experts of statistical algorithms. This study is performed using Andromeda, a tool that enables both parametric and observation-level interaction to provide in-depth data exploration. The results indicate that the two forms of interaction serve different, but complementary, purposes in gaining insight through steerable dimension reduction algorithms.
Collapse
|