1
|
Montambault B, Appleby G, Rogers J, Brumar CD, Li M, Chang R. DimBridge: Interactive Explanation of Visual Patterns in Dimensionality Reductions with Predicate Logic. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:207-217. [PMID: 39312423 DOI: 10.1109/tvcg.2024.3456391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Dimensionality reduction techniques are widely used for visualizing high-dimensional data. However, support for interpreting patterns of dimension reduction results in the context of the original data space is often insufficient. Consequently, users may struggle to extract insights from the projections. In this paper, we introduce DimBridge, a visual analytics tool that allows users to interact with visual patterns in a projection and retrieve corresponding data patterns. DimBridge supports several interactions, allowing users to perform various analyses, from contrasting multiple clusters to explaining complex latent structures. Leveraging first-order predicate logic, DimBridge identifies subspaces in the original dimensions relevant to a queried pattern and provides an interface for users to visualize and interact with them. We demonstrate how DimBridge can help users overcome the challenges associated with interpreting visual patterns in projections.
Collapse
|
2
|
Zhao J, Liu X, Tang H, Wang X, Yang S, Liu D, Chen Y, Chen YV. Mesoscopic structure graphs for interpreting uncertainty in non-linear embeddings. Comput Biol Med 2024; 182:109105. [PMID: 39265479 DOI: 10.1016/j.compbiomed.2024.109105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 07/06/2024] [Accepted: 09/01/2024] [Indexed: 09/14/2024]
Abstract
Probabilistic-based non-linear dimensionality reduction (PB-NL-DR) methods, such as t-SNE and UMAP, are effective in unfolding complex high-dimensional manifolds, allowing users to explore and understand the structural patterns of data. However, due to the trade-off between global and local structure preservation and the randomness during computation, these methods may introduce false neighborhood relationships, known as distortion errors and misleading visualizations. To address this issue, we first conduct a detailed survey to illustrate the design space of prior layout enrichment visualizations for interpreting DR results, and then propose a node-link visualization technique, ManiGraph. This technique rethinks the neighborhood fidelity between the high- and low-dimensional spaces by constructing dynamic mesoscopic structure graphs and measuring region-adapted trustworthiness. ManiGraph also addresses the overplotting issue in scatterplot visualization for large-scale datasets and supports examining in unsupervised scenarios. We demonstrate the effectiveness of ManiGraph in different analytical cases, including generic machine learning using 3D toy data illustrations and fashion-MNIST, a computational biology study using a single-cell RNA sequencing dataset, and a deep learning-enabled colorectal cancer study with histopathology-MNIST.
Collapse
Affiliation(s)
- Junhan Zhao
- Harvard Medical School, Boston, 02114, MA, USA; Harvard T.H.Chan School of Public Health, Boston, 02114, MA, USA; Purdue University, West Lafayette, 47907, IN, USA.
| | - Xiang Liu
- Purdue University, West Lafayette, 47907, IN, USA; Indiana University School of Medicine, Indianapolis, 46202, IN, USA.
| | - Hongping Tang
- Shenzhen Maternity and Child Healthcare Hospital, Shenzhen, 518048, China.
| | - Xiyue Wang
- Stanford University School of Medicine, Stanford, 94304, CA, USA.
| | - Sen Yang
- Stanford University School of Medicine, Stanford, 94304, CA, USA.
| | - Donfang Liu
- Rochester Institute of Technology, Rochester, 14623, NY, USA.
| | - Yijiang Chen
- Stanford University School of Medicine, Stanford, 94304, CA, USA.
| | | |
Collapse
|
3
|
Eckelt K, Hinterreiter A, Adelberger P, Walchshofer C, Dhanoa V, Humer C, Heckmann M, Steinparz C, Streit M. Visual Exploration of Relationships and Structure in Low-Dimensional Embeddings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:3312-3326. [PMID: 35254984 DOI: 10.1109/tvcg.2022.3156760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
In this work, we propose an interactive visual approach for the exploration and formation of structural relationships in embeddings of high-dimensional data. These structural relationships, such as item sequences, associations of items with groups, and hierarchies between groups of items, are defining properties of many real-world datasets. Nevertheless, most existing methods for the visual exploration of embeddings treat these structures as second-class citizens or do not take them into account at all. In our proposed analysis workflow, users explore enriched scatterplots of the embedding, in which relationships between items and/or groups are visually highlighted. The original high-dimensional data for single items, groups of items, or differences between connected items and groups are accessible through additional summary visualizations. We carefully tailored these summary and difference visualizations to the various data types and semantic contexts. During their exploratory analysis, users can externalize their insights by setting up additional groups and relationships between items and/or groups. We demonstrate the utility and potential impact of our approach by means of two use cases and multiple examples from various domains.
Collapse
|
4
|
Espadoto M, Appleby G, Suh A, Cashman D, Li M, Scheidegger C, Anderson EW, Chang R, Telea AC. UnProjection: Leveraging Inverse-Projections for Visual Analytics of High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1559-1572. [PMID: 34748493 DOI: 10.1109/tvcg.2021.3125576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Projection techniques are often used to visualize high-dimensional data, allowing users to better understand the overall structure of multi-dimensional spaces on a 2D screen. Although many such methods exist, comparably little work has been done on generalizable methods of inverse-projection - the process of mapping the projected points, or more generally, the projection space back to the original high-dimensional space. In this article we present NNInv, a deep learning technique with the ability to approximate the inverse of any projection or mapping. NNInv learns to reconstruct high-dimensional data from any arbitrary point on a 2D projection space, giving users the ability to interact with the learned high-dimensional representation in a visual analytics system. We provide an analysis of the parameter space of NNInv, and offer guidance in selecting these parameters. We extend validation of the effectiveness of NNInv through a series of quantitative and qualitative analyses. We then demonstrate the method's utility by applying it to three visualization tasks: interactive instance interpolation, classifier agreement, and gradient visualization.
Collapse
|
5
|
Bibal A, Delchevalerie V, Frénay B. DT-SNE: t-SNE Discrete Visualizations as Decision Tree Structures. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
6
|
iHELP: interactive hierarchical linear projections for interpreting non-linear projections. J Vis (Tokyo) 2022. [DOI: 10.1007/s12650-022-00900-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
Heimerl F, Kralj C, Moller T, Gleicher M. embComp: Visual Interactive Comparison of Vector Embeddings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2953-2969. [PMID: 33347410 DOI: 10.1109/tvcg.2020.3045918] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article introduces embComp, a novel approach for comparing two embeddings that capture the similarity between objects, such as word and document embeddings. We survey scenarios where comparing these embedding spaces is useful. From those scenarios, we derive common tasks, introduce visual analysis methods that support these tasks, and combine them into a comprehensive system. One of embComp's central features are overview visualizations that are based on metrics for measuring differences in the local structure around objects. Summarizing these local metrics over the embeddings provides global overviews of similarities and differences. Detail views allow comparison of the local structure around selected objects and relating this local information to the global views. Integrating and connecting all of these components, embComp supports a range of analysis workflows that help understand similarities and differences between embedding spaces. We assess our approach by applying it in several use cases, including understanding corpora differences via word vector embeddings, and understanding algorithmic differences in generating embeddings.
Collapse
|
8
|
Ghosh A, Nashaat M, Miller J, Quader S. VisExPreS: A Visual Interactive Toolkit for User-Driven Evaluations of Embeddings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2791-2807. [PMID: 33211658 DOI: 10.1109/tvcg.2020.3039106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Although popularly used in big-data analytics, dimensionality reduction is a complex, black-box technique whose outcome is difficult to interpret and evaluate. In recent years, a number of quantitative and visual methods have been proposed for analyzing low-dimensional embeddings. On the one hand, quantitative methods associate numeric identifiers to qualitative characteristics of these embeddings; and, on the other hand, visual techniques allow users to interactively explore these embeddings and make decisions. However, in the former case, users do not have control over the analysis, while in the latter case. assessment decisions are entirely dependent on the user's perception and expertise. In order to bridge the gap between the two, in this article, we present VisExPreS, a visual interactive toolkit that enables a user-driven assessment of low-dimensional embeddings. VisExPreS is based on three novel techniques namely PG-LAPS, PG-GAPS, and RepSubset, that generate interpretable explanations of the preserved local and global structures in embeddings. In the first two techniques, the VisExPreS system proactively guides users during every step of the analysis. We demonstrate the utility of VisExPreS in interpreting, analyzing, and evaluating embeddings from different dimensionality reduction algorithms using multiple case studies and an extensive user study.
Collapse
|
9
|
Liu Z, Wang Y, Bernard J, Munzner T. Visualizing Graph Neural Networks With CorGIE: Corresponding a Graph to Its Embedding. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2500-2516. [PMID: 35120005 DOI: 10.1109/tvcg.2022.3148197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Graph neural networks (GNNs) are a class of powerful machine learning tools that model node relations for making predictions of nodes or links. GNN developers rely on quantitative metrics of the predictions to evaluate a GNN, but similar to many other neural networks, it is difficult for them to understand if the GNN truly learns characteristics of a graph as expected. We propose an approach to corresponding an input graph to its node embedding (aka latent space), a common component of GNNs that is later used for prediction. We abstract the data and tasks, and develop an interactive multi-view interface called CorGIE to instantiate the abstraction. As the key function in CorGIE, we propose the K-hop graph layout to show topological neighbors in hops and their clustering structure. To evaluate the functionality and usability of CorGIE, we present how to use CorGIE in two usage scenarios, and conduct a case study with five GNN experts. Availability: Open-source code at https://github.com/zipengliu/corgie-ui/, supplemental materials & video at https://osf.io/tr3sb/.
Collapse
|
10
|
Sohns JT, Schmitt M, Jirasek F, Hasse H, Leitte H. Attribute-based Explanation of Non-Linear Embeddings of High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:540-550. [PMID: 34587086 DOI: 10.1109/tvcg.2021.3114870] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Embeddings of high-dimensional data are widely used to explore data, to verify analysis results, and to communicate information. Their explanation, in particular with respect to the input attributes, is often difficult. With linear projects like PCA the axes can still be annotated meaningfully. With non-linear projections this is no longer possible and alternative strategies such as attribute-based color coding are required. In this paper, we review existing augmentation techniques and discuss their limitations. We present the Non-Linear Embeddings Surveyor (NoLiES) that combines a novel augmentation strategy for projected data (rangesets) with interactive analysis in a small multiples setting. Rangesets use a set-based visualization approach for binned attribute values that enable the user to quickly observe structure and detect outliers. We detail the link between algebraic topology and rangesets and demonstrate the utility of NoLiES in case studies with various challenges (complex attribute value distribution, many attributes, many data points) and a real-world application to understand latent features of matrix completion in thermodynamics.
Collapse
|
11
|
Construct boundaries and place labels for multi-class scatterplots. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00791-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
12
|
SemanticAxis: exploring multi-attribute data by semantic construction and ranking analysis. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-020-00733-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
13
|
Bian R, Xue Y, Zhou L, Zhang J, Chen B, Weiskopf D, Wang Y. Implicit Multidimensional Projection of Local Subspaces. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1558-1568. [PMID: 33048698 DOI: 10.1109/tvcg.2020.3030368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We propose a visualization method to understand the effect of multidimensional projection on local subspaces, using implicit function differentiation. Here, we understand the local subspace as the multidimensional local neighborhood of data points. Existing methods focus on the projection of multidimensional data points, and the neighborhood information is ignored. Our method is able to analyze the shape and directional information of the local subspace to gain more insights into the global structure of the data through the perception of local structures. Local subspaces are fitted by multidimensional ellipses that are spanned by basis vectors. An accurate and efficient vector transformation method is proposed based on analytical differentiation of multidimensional projections formulated as implicit functions. The results are visualized as glyphs and analyzed using a full set of specifically-designed interactions supported in our efficient web-based visualization tool. The usefulness of our method is demonstrated using various multi- and high-dimensional benchmark datasets. Our implicit differentiation vector transformation is evaluated through numerical comparisons; the overall method is evaluated through exploration examples and use cases.
Collapse
|
14
|
Kang B, García García D, Lijffijt J, Santos-Rodríguez R, De Bie T. Conditional t-SNE: more informative t-SNE embeddings. Mach Learn 2020; 110:2905-2940. [PMID: 34840420 PMCID: PMC8599264 DOI: 10.1007/s10994-020-05917-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 07/28/2020] [Accepted: 09/19/2020] [Indexed: 11/25/2022]
Abstract
Dimensionality reduction and manifold learning methods such as t-distributed stochastic neighbor embedding (t-SNE) are frequently used to map high-dimensional data into a two-dimensional space to visualize and explore that data. Going beyond the specifics of t-SNE, there are two substantial limitations of any such approach: (1) not all information can be captured in a single two-dimensional embedding, and (2) to well-informed users, the salient structure of such an embedding is often already known, preventing that any real new insights can be obtained. Currently, it is not known how to extract the remaining information in a similarly effective manner. We introduce conditional t-SNE (ct-SNE), a generalization of t-SNE that discounts prior information in the form of labels. This enables obtaining more informative and more relevant embeddings. To achieve this, we propose a conditioned version of the t-SNE objective, obtaining an elegant method with a single integrated objective. We show how to efficiently optimize the objective and study the effects of the extra parameter that ct-SNE has over t-SNE. Qualitative and quantitative empirical results on synthetic and real data show ct-SNE is scalable, effective, and achieves its goal: it allows complementary structure to be captured in the embedding and provided new insights into real data.
Collapse
Affiliation(s)
- Bo Kang
- Department of Electronics and Information Systems, IDLab, Ghent University, Ghent, Belgium
| | | | - Jefrey Lijffijt
- Department of Electronics and Information Systems, IDLab, Ghent University, Ghent, Belgium
| | | | - Tijl De Bie
- Department of Electronics and Information Systems, IDLab, Ghent University, Ghent, Belgium
| |
Collapse
|
15
|
Chatzimparmpas A, Martins RM, Kerren A. t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:2696-2714. [PMID: 32305922 DOI: 10.1109/tvcg.2020.2986996] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction. In this article, we present t-viSNE, an interactive tool for the visual exploration of t-SNE projections that enables analysts to inspect different aspects of their accuracy and meaning, such as the effects of hyper-parameters, distance and neighborhood preservation, densities and costs of specific neighborhoods, and the correlations between dimensions and visual patterns. We propose a coherent, accessible, and well-integrated collection of different views for the visualization of t-SNE projections. The applicability and usability of t-viSNE are demonstrated through hypothetical usage scenarios with real data sets. Finally, we present the results of a user study where the tool's effectiveness was evaluated. By bringing to light information that would normally be lost after running t-SNE, we hope to support analysts in using t-SNE and making its results better understandable.
Collapse
|
16
|
Han Q, Thom D, John M, Koch S, Heimerl F, Ertl T. Visual Quality Guidance for Document Exploration with Focus+Context Techniques. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:2715-2731. [PMID: 30676964 DOI: 10.1109/tvcg.2019.2895073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Magic lens based focus+context techniques are powerful means for exploring document spatializations. Typically, they only offer additional summarized or abstracted views on focused documents. As a consequence, users might miss important information that is either not shown in aggregated form or that never happens to get focused. In this work, we present the design process and user study results for improving a magic lens based document exploration approach with exemplary visual quality cues to guide users in steering the exploration and support them in interpreting the summarization results. We contribute a thorough analysis of potential sources of information loss involved in these techniques, which include the visual spatialization of text documents, user-steered exploration, and the visual summarization. With lessons learned from previous research, we highlight the various ways those information losses could hamper the exploration. Furthermore, we formally define measures for the aforementioned different types of information losses and bias. Finally, we present the visual cues to depict these quality measures that are seamlessly integrated into the exploration approach. These visual cues guide users during the exploration and reduce the risk of misinterpretation and accelerate insight generation. We conclude with the results of a controlled user study and discuss the benefits and challenges of integrating quality guidance in exploration techniques.
Collapse
|
17
|
Lespinats S, De Clerck O, Colange B, Gorelova V, Grando D, Maréchal E, Van Der Straeten D, Rébeillé F, Bastien O. Phylogeny and Sequence Space: A Combined Approach to Analyze the Evolutionary Trajectories of Homologous Proteins. The Case Study of Aminodeoxychorismate Synthase. Acta Biotheor 2020; 68:139-156. [PMID: 31312977 DOI: 10.1007/s10441-019-09352-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 07/10/2019] [Indexed: 11/27/2022]
Abstract
During the course of evolution, variations of a protein sequence is an ongoing phenomenon however limited by the need to maintain its structural and functional integrity. Deciphering the evolutionary path of a protein is thus of fundamental interest. With the development of new methods to visualize high dimension spaces and the improvement of phylogenetic analysis tools, it is possible to study the evolutionary trajectories of proteins in the sequence space. Using the data-driven high-dimensional scaling method, we show that it is possible to predict and represent potential evolutionary trajectories by representing phylogenetic trees into a 3D projection of the sequence space. With the case of the aminodeoxychorismate synthase, an enzyme involved in folate synthesis, we show that this representation raises interesting questions about the complexity of the evolution of a given biological function, in particular concerning its capacity to explore the sequence space.
Collapse
Affiliation(s)
| | - Olivier De Clerck
- Department of Biology, Phycology Research Group, Ghent University, Krijgslaan 281, 9000, Ghent, Belgium
| | - Benoît Colange
- Univ. Grenoble Alpes, INES, 73375, Le Bourget du Lac, France
| | - Vera Gorelova
- Department of Biology, Laboratory of Functional Plant Biology, Ghent University, K.L Ledeganckstraat 35, 9000, Ghent, Belgium
- Department of Botany and Plant Biology, Laboratory of Plant Biochemistry and Physiology, University of Geneva, Quai E. Ansermet 30, 1211, Geneva, Switzerland
| | - Delphine Grando
- Univ. Grenoble Alpes, CEA, CNRS, INRA, BIG-LPCV, 38000, Grenoble, France
| | - Eric Maréchal
- Univ. Grenoble Alpes, CEA, CNRS, INRA, BIG-LPCV, 38000, Grenoble, France
| | - Dominique Van Der Straeten
- Department of Biology, Laboratory of Functional Plant Biology, Ghent University, K.L Ledeganckstraat 35, 9000, Ghent, Belgium
| | - Fabrice Rébeillé
- Univ. Grenoble Alpes, CEA, CNRS, INRA, BIG-LPCV, 38000, Grenoble, France
| | - Olivier Bastien
- Univ. Grenoble Alpes, CEA, CNRS, INRA, BIG-LPCV, 38000, Grenoble, France.
- Laboratoire de Physiologie Cellulaire Végétale, Département Réponse et Dynamique Cellulaire, CEA Grenoble, UMR 5168, CNRS-CEA-INRA-Université J. Fourier, 17 rue des Martyrs, 38054, Grenoble Cedex 09, France.
| |
Collapse
|
18
|
Fujiwara T, Kwon OH, Ma KL. Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:45-55. [PMID: 31425080 DOI: 10.1109/tvcg.2019.2934251] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Dimensionality reduction (DR) is frequently used for analyzing and visualizing high-dimensional data as it provides a good first glance of the data. However, to interpret the DR result for gaining useful insights from the data, it would take additional analysis effort such as identifying clusters and understanding their characteristics. While there are many automatic methods (e.g., density-based clustering methods) to identify clusters, effective methods for understanding a cluster's characteristics are still lacking. A cluster can be mostly characterized by its distribution of feature values. Reviewing the original feature values is not a straightforward task when the number of features is large. To address this challenge, we present a visual analytics method that effectively highlights the essential features of a cluster in a DR result. To extract the essential features, we introduce an enhanced usage of contrastive principal component analysis (cPCA). Our method, called ccPCA (contrasting clusters in PCA), can calculate each feature's relative contribution to the contrast between one cluster and other clusters. With ccPCA, we have created an interactive system including a scalable visualization of clusters' feature contributions. We demonstrate the effectiveness of our method and system with case studies using several publicly available datasets.
Collapse
|
19
|
Interactive visual data exploration with subjective feedback: an information-theoretic approach. Data Min Knowl Discov 2019. [DOI: 10.1007/s10618-019-00655-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Abstract
Visual exploration of high-dimensional real-valued datasets is a fundamental task in exploratory data analysis (EDA). Existing projection methods for data visualization use predefined criteria to choose the representation of data. There is a lack of methods that (i) use information on what the user has learned from the data and (ii) show patterns that she does not know yet. We construct a theoretical model where identified patterns can be input as knowledge to the system. The knowledge syntax here is intuitive, such as “this set of points forms a cluster”, and requires no knowledge of maths. This background knowledge is used to find a maximum entropy distribution of the data, after which the user is provided with data projections for which the data and the maximum entropy distribution differ the most, hence showing the user aspects of data that are maximally informative given the background knowledge. We study the computational performance of our model and present use cases on synthetic and real data. We find that the model allows the user to learn information efficiently from various data sources and works sufficiently fast in practice. In addition, we provide an open source EDA demonstrator system implementing our model with tailored interactive visualizations. We conclude that the information theoretic approach to EDA where patterns observed by a user are formalized as constraints provides a principled, intuitive, and efficient basis for constructing an EDA system.
Collapse
|
20
|
Castermans T, Verbeek K, Speckmann B, Westenberg MA, Koopman R, Wang S, van den Berg H, Betti A. SolarView: Low Distortion Radial Embedding with a Focus. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2969-2982. [PMID: 30106733 DOI: 10.1109/tvcg.2018.2865361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We propose a novel type of low distortion radial embedding which focuses on one specific entity and its closest neighbors. Our embedding preserves near-exact distances to the focus entity and aims to minimize distortion between the other entities. We present an interactive exploration tool SolarView which places the focus entity at the center of a "solar system" and embeds its neighbors guided by concentric circles. SolarView provides an implementation of our novel embedding and several state-of-the-art dimensionality reduction and embedding techniques, which we adapted to our setting in various ways. We experimentally evaluated our embedding and compared it to these state-of-the-art techniques. The results show that our embedding competes with these techniques and achieves low distortion in practice. Our method performs particularly well when the visualization, and hence the embedding, adheres to the solar system design principle of our application. Nonetheless-as with all dimensionality reduction techniques-the distortion may be high. We leverage interaction techniques to give clear visual cues that allow users to accurately judge distortion. We illustrate the use of SolarView by exploring the high-dimensional metric space of bibliographic entity similarities.
Collapse
|
21
|
Nonato LG, Aupetit M. Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2650-2673. [PMID: 29994258 DOI: 10.1109/tvcg.2018.2846735] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visual analysis of multidimensional data requires expressive and effective ways to reduce data dimensionality to encode them visually. Multidimensional projections (MDP) figure among the most important visualization techniques in this context, transforming multidimensional data into scatter plots whose visual patterns reflect some notion of similarity in the original data. However, MDP come with distortions that make these visual patterns not trustworthy, hindering users to infer actual data characteristics. Moreover, the patterns present in the scatter plots might not be enough to allow a clear understanding of multidimensional data, motivating the development of layout enrichment methodologies to operate together with MDP. This survey attempts to cover the main aspects of MDP as a visualization and visual analytic tool. It provides detailed analysis and taxonomies as to the organization of MDP techniques according to their main properties and traits, discussing the impact of such properties for visual perception and other human factors. The survey also approaches the different types of distortions that can result from MDP mappings and it overviews existing mechanisms to quantitatively evaluate such distortions. A qualitative analysis of the impact of distortions on the different analytic tasks performed by users when exploring multidimensional data through MDP is also presented. Guidelines for choosing the best MDP for an intended task are also provided as a result of this analysis. Finally, layout enrichment schemes to debunk MDP distortions and/or reveal relevant information not directly inferable from the scatter plot are reviewed and discussed in the light of new taxonomies. We conclude the survey providing future research axes to fill discovered gaps in this domain.
Collapse
|
22
|
|
23
|
Lai C, Zhao Y, Yuan X. Exploring high-dimensional data through locally enhanced projections. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2018.08.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
24
|
Weng D, Chen R, Deng Z, Wu F, Chen J, Wu Y. SRVis: Towards Better Spatial Integration in Ranking Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:459-469. [PMID: 30188825 DOI: 10.1109/tvcg.2018.2865126] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Interactive ranking techniques have substantially promoted analysts' ability in making judicious and informed decisions effectively based on multiple criteria. However, the existing techniques cannot satisfactorily support the analysis tasks involved in ranking large-scale spatial alternatives, such as selecting optimal locations for chain stores, where the complex spatial contexts involved are essential to the decision-making process. Limitations observed in the prior attempts of integrating rankings with spatial contexts motivate us to develop a context-integrated visual ranking technique. Based on a set of generic design requirements we summarized by collaborating with domain experts, we propose SRVis, a novel spatial ranking visualization technique that supports efficient spatial multi-criteria decision-making processes by addressing three major challenges in the aforementioned context integration, namely, a) the presentation of spatial rankings and contexts, b) the scalability of rankings' visual representations, and c) the analysis of context-integrated spatial rankings. Specifically, we encode massive rankings and their cause with scalable matrix-based visualizations and stacked bar charts based on a novel two-phase optimization framework that minimizes the information loss, and the flexible spatial filtering and intuitive comparative analysis are adopted to enable the in-depth evaluation of the rankings and assist users in selecting the best spatial alternative. The effectiveness of the proposed technique has been evaluated and demonstrated with an empirical study of optimization methods, two case studies, and expert interviews.
Collapse
|
25
|
Faust R, Glickenstein D, Scheidegger C. DimReader: Axis lines that explain non-linear projections. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:481-490. [PMID: 30136997 DOI: 10.1109/tvcg.2018.2865194] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Non-linear dimensionality reduction (NDR) methods such as LLE and t-SNE are popular with visualization researchers and experienced data analysts, but present serious problems of interpretation. In this paper, we present DimReader, a technique that recovers readable axes from such techniques. DimReader is based on analyzing infinitesimal perturbations of the dataset with respect to variables of interest. The perturbations define exactly how we want to change each point in the original dataset and we measure the effect that these changes have on the projection. The recovered axes are in direct analogy with the axis lines (grid lines) of traditional scatterplots. We also present methods for discovering perturbations on the input data that change the projection the most. The calculation of the perturbations is efficient and easily integrated into programs written in modern programming languages. We present results of DimReader on a variety of NDR methods and datasets both synthetic and real-life, and show how it can be used to compare different NDR methods. Finally, we discuss limitations of our proposal and situations where further research is needed.
Collapse
|
26
|
Lehmann DJ, Theisel H. The LloydRelaxer: An Approach to Minimize Scaling Effects for Multivariate Projections. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:2424-2439. [PMID: 28534778 DOI: 10.1109/tvcg.2017.2705189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Star Coordinates are a popular projection technique in order to analyze and to disclose characteristic patterns of multidimensional data. Unfortunately, the shape, appearance, and distribution of such patterns are strongly affected by the given scaling of the data and can mislead the projection-based data analysis. In an extreme case, patterns might be more related to the choice of scaling than to the data themselves. Thus, we present the LloydRelaxer : a tool to minimize scaling-based effects in Star Coordinates. Our algorithm enforces a scaling configuration for which the data explains the observed patterns better than any scaling of them could do. It does so by an iterative minimizing and optimization process based on Voronoi diagrams and on the Lloyd relaxation within the projection space. We evaluate and test our approach by real benchmark multidimensional data of the UCI data repository.
Collapse
|
27
|
|
28
|
Dragicevic P, Jansen Y. Blinded with Science or Informed by Charts? A Replication Study. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:781-790. [PMID: 28866535 DOI: 10.1109/tvcg.2017.2744298] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We provide a reappraisal of Tal and Wansink's study "Blinded with Science", where seemingly trivial charts were shown to increase belief in drug efficacy, presumably because charts are associated with science. Through a series of four replications conducted on two crowdsourcing platforms, we investigate an alternative explanation, namely, that the charts allowed participants to better assess the drug's efficacy. Considered together, our experiments suggest that the chart seems to have indeed promoted understanding, although the effect is likely very small. Meanwhile, we were unable to replicate the original study's findings, as text with chart appeared to be no more persuasive - and sometimes less persuasive - than text alone. This suggests that the effect may not be as robust as claimed and may need specific conditions to be reproduced. Regardless, within our experimental settings and considering our study as a whole (), the chart's contribution to understanding was clearly larger than its contribution to persuasion.
Collapse
|
29
|
Kwon BC, Eysenbach B, Verma J, Ng K, De Filippi C, Stewart WF, Perer A. Clustervision: Visual Supervision of Unsupervised Clustering. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:142-151. [PMID: 28866567 DOI: 10.1109/tvcg.2017.2745085] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.
Collapse
|
30
|
Bernard J, Hutter M, Zeppelzauer M, Fellner D, Sedlmair M. Comparing Visual-Interactive Labeling with Active Learning: An Experimental Study. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:298-308. [PMID: 28866560 DOI: 10.1109/tvcg.2017.2744818] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Labeling data instances is an important task in machine learning and visual analytics. Both fields provide a broad set of labeling strategies, whereby machine learning (and in particular active learning) follows a rather model-centered approach and visual analytics employs rather user-centered approaches (visual-interactive labeling). Both approaches have individual strengths and weaknesses. In this work, we conduct an experiment with three parts to assess and compare the performance of these different labeling strategies. In our study, we (1) identify different visual labeling strategies for user-centered labeling, (2) investigate strengths and weaknesses of labeling strategies for different labeling tasks and task complexities, and (3) shed light on the effect of using different visual encodings to guide the visual-interactive labeling process. We further compare labeling of single versus multiple instances at a time, and quantify the impact on efficiency. We systematically compare the performance of visual interactive labeling with that of active learning. Our main findings are that visual-interactive labeling can outperform active learning, given the condition that dimension reduction separates well the class distributions. Moreover, using dimension reduction in combination with additional visual encodings that expose the internal state of the learning model turns out to improve the performance of visual-interactive labeling.
Collapse
|
31
|
Xia J, Ye F, Chen W, Wang Y, Chen W, Ma Y, Tung AKH. LDSScanner: Exploratory Analysis of Low-Dimensional Structures in High-Dimensional Datasets. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:236-245. [PMID: 28866522 DOI: 10.1109/tvcg.2017.2744098] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Many approaches for analyzing a high-dimensional dataset assume that the dataset contains specific structures, e.g., clusters in linear subspaces or non-linear manifolds. This yields a trial-and-error process to verify the appropriate model and parameters. This paper contributes an exploratory interface that supports visual identification of low-dimensional structures in a high-dimensional dataset, and facilitates the optimized selection of data models and configurations. Our key idea is to abstract a set of global and local feature descriptors from the neighborhood graph-based representation of the latent low-dimensional structure, such as pairwise geodesic distance (GD) among points and pairwise local tangent space divergence (LTSD) among pointwise local tangent spaces (LTS). We propose a new LTSD-GD view, which is constructed by mapping LTSD and GD to the axis and axis using 1D multidimensional scaling, respectively. Unlike traditional dimensionality reduction methods that preserve various kinds of distances among points, the LTSD-GD view presents the distribution of pointwise LTS ( axis) and the variation of LTS in structures (the combination of axis and axis). We design and implement a suite of visual tools for navigating and reasoning about intrinsic structures of a high-dimensional dataset. Three case studies verify the effectiveness of our approach.
Collapse
|
32
|
Pienta R, Hohman F, Endert A, Tamersoy A, Roundy K, Gates C, Navathe S, Chau DH. VIGOR: Interactive Visual Exploration of Graph Query Results. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:215-225. [PMID: 28866563 DOI: 10.1109/tvcg.2017.2744898] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Finding patterns in graphs has become a vital challenge in many domains from biological systems, network security, to finance (e.g., finding money laundering rings of bankers and business owners). While there is significant interest in graph databases and querying techniques, less research has focused on helping analysts make sense of underlying patterns within a group of subgraph results. Visualizing graph query results is challenging, requiring effective summarization of a large number of subgraphs, each having potentially shared node-values, rich node features, and flexible structure across queries. We present VIGOR, a novel interactive visual analytics system, for exploring and making sense of query results. VIGOR uses multiple coordinated views, leveraging different data representations and organizations to streamline analysts sensemaking process. VIGOR contributes: (1) an exemplar-based interaction technique, where an analyst starts with a specific result and relaxes constraints to find other similar results or starts with only the structure (i.e., without node value constraints), and adds constraints to narrow in on specific results; and (2) a novel feature-aware subgraph result summarization. Through a collaboration with Symantec, we demonstrate how VIGOR helps tackle real-world problems through the discovery of security blindspots in a cybersecurity dataset with over 11,000 incidents. We also evaluate VIGOR with a within-subjects study, demonstrating VIGOR's ease of use over a leading graph database management system, and its ability to help analysts understand their results at higher speed and make fewer errors.
Collapse
|
33
|
Turkay C, Slingsby A, Lahtinen K, Butt S, Dykes J. Supporting theoretically-grounded model building in the social sciences through interactive visualisation. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.11.087] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
34
|
Liu S, Maljovec D, Wang B, Bremer PT, Pascucci V. Visualizing High-Dimensional Data: Advances in the Past Decade. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:1249-1268. [PMID: 28113321 DOI: 10.1109/tvcg.2016.2640960] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Massive simulations and arrays of sensing devices, in combination with increasing computing resources, have generated large, complex, high-dimensional datasets used to study phenomena across numerous fields of study. Visualization plays an important role in exploring such datasets. We provide a comprehensive survey of advances in high-dimensional data visualization that focuses on the past decade. We aim at providing guidance for data practitioners to navigate through a modular view of the recent advances, inspiring the creation of new visualizations along the enriched visualization pipeline, and identifying future opportunities for visualization research.
Collapse
|
35
|
Sacha D, Zhang L, Sedlmair M, Lee JA, Peltonen J, Weiskopf D, North SC, Keim DA. Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:241-250. [PMID: 27875141 DOI: 10.1109/tvcg.2016.2598495] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Dimensionality Reduction (DR) is a core building block in visualizing multidimensional data. For DR techniques to be useful in exploratory data analysis, they need to be adapted to human needs and domain-specific problems, ideally, interactively, and on-the-fly. Many visual analytics systems have already demonstrated the benefits of tightly integrating DR with interactive visualizations. Nevertheless, a general, structured understanding of this integration is missing. To address this, we systematically studied the visual analytics and visualization literature to investigate how analysts interact with automatic DR techniques. The results reveal seven common interaction scenarios that are amenable to interactive control such as specifying algorithmic constraints, selecting relevant features, or choosing among several DR algorithms. We investigate specific implementations of visual analysis systems integrating DR, and analyze ways that other machine learning methods have been combined with DR. Summarizing the results in a "human in the loop" process model provides a general lens for the evaluation of visual interactive DR systems. We apply the proposed model to study and classify several systems previously described in the literature, and to derive future research opportunities.
Collapse
|