1
|
Wyss A, Morgenshtern G, Hirsch-Husler A, Bernard J. DaedalusData: Exploration, Knowledge Externalization and Labeling of Particles in Medical Manufacturing - A Design Study. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:54-64. [PMID: 39312428 DOI: 10.1109/tvcg.2024.3456329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
In medical diagnostics of both early disease detection and routine patient care, particle-based contamination of in-vitro diagnostics consumables poses a significant threat to patients. Objective data-driven decision-making on the severity of contamination is key for reducing patient risk, while saving time and cost in quality assessment. Our collaborators introduced us to their quality control process, including particle data acquisition through image recognition, feature extraction, and attributes reflecting the production context of particles. Shortcomings in the current process are limitations in exploring thousands of images, data-driven decision making, and ineffective knowledge externalization. Following the design study methodology, our contributions are a characterization of the problem space and requirements, the development and validation of DaedalusData, a comprehensive discussion of our study's learnings, and a generalizable framework for knowledge externalization. DaedalusData is a visual analytics system that enables domain experts to explore particle contamination patterns, label particles in label alphabets, and externalize knowledge through semi-supervised label-informed data projections. The results of our case study and user study show high usability of DaedalusData and its efficient support of experts in generating comprehensive overviews of thousands of particles, labeling of large quantities of particles, and externalizing knowledge to augment the dataset further. Reflecting on our approach, we discuss insights on dataset augmentation via human knowledge externalization, and on the scalability and trade-offs that come with the adoption of this approach in practice.
Collapse
|
2
|
Zhao J, Liu X, Tang H, Wang X, Yang S, Liu D, Chen Y, Chen YV. Mesoscopic structure graphs for interpreting uncertainty in non-linear embeddings. Comput Biol Med 2024; 182:109105. [PMID: 39265479 DOI: 10.1016/j.compbiomed.2024.109105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 07/06/2024] [Accepted: 09/01/2024] [Indexed: 09/14/2024]
Abstract
Probabilistic-based non-linear dimensionality reduction (PB-NL-DR) methods, such as t-SNE and UMAP, are effective in unfolding complex high-dimensional manifolds, allowing users to explore and understand the structural patterns of data. However, due to the trade-off between global and local structure preservation and the randomness during computation, these methods may introduce false neighborhood relationships, known as distortion errors and misleading visualizations. To address this issue, we first conduct a detailed survey to illustrate the design space of prior layout enrichment visualizations for interpreting DR results, and then propose a node-link visualization technique, ManiGraph. This technique rethinks the neighborhood fidelity between the high- and low-dimensional spaces by constructing dynamic mesoscopic structure graphs and measuring region-adapted trustworthiness. ManiGraph also addresses the overplotting issue in scatterplot visualization for large-scale datasets and supports examining in unsupervised scenarios. We demonstrate the effectiveness of ManiGraph in different analytical cases, including generic machine learning using 3D toy data illustrations and fashion-MNIST, a computational biology study using a single-cell RNA sequencing dataset, and a deep learning-enabled colorectal cancer study with histopathology-MNIST.
Collapse
Affiliation(s)
- Junhan Zhao
- Harvard Medical School, Boston, 02114, MA, USA; Harvard T.H.Chan School of Public Health, Boston, 02114, MA, USA; Purdue University, West Lafayette, 47907, IN, USA.
| | - Xiang Liu
- Purdue University, West Lafayette, 47907, IN, USA; Indiana University School of Medicine, Indianapolis, 46202, IN, USA.
| | - Hongping Tang
- Shenzhen Maternity and Child Healthcare Hospital, Shenzhen, 518048, China.
| | - Xiyue Wang
- Stanford University School of Medicine, Stanford, 94304, CA, USA.
| | - Sen Yang
- Stanford University School of Medicine, Stanford, 94304, CA, USA.
| | - Donfang Liu
- Rochester Institute of Technology, Rochester, 14623, NY, USA.
| | - Yijiang Chen
- Stanford University School of Medicine, Stanford, 94304, CA, USA.
| | | |
Collapse
|
3
|
Dennig FL, Miller M, Keim DA, El-Assady M. FS/DS: A Theoretical Framework for the Dual Analysis of Feature Space and Data Space. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5165-5182. [PMID: 37342951 DOI: 10.1109/tvcg.2023.3288356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/23/2023]
Abstract
With the surge of data-driven analysis techniques, there is a rising demand for enhancing the exploration of large high-dimensional data by enabling interactions for the joint analysis of features (i.e., dimensions). Such a dual analysis of the feature space and data space is characterized by three components, 1) a view visualizing feature summaries, 2) a view that visualizes the data records, and 3) a bidirectional linking of both plots triggered by human interaction in one of both visualizations, e.g., Linking & Brushing. Dual analysis approaches span many domains, e.g., medicine, crime analysis, and biology. The proposed solutions encapsulate various techniques, such as feature selection or statistical analysis. However, each approach establishes a new definition of dual analysis. To address this gap, we systematically reviewed published dual analysis methods to investigate and formalize the key elements, such as the techniques used to visualize the feature space and data space, as well as the interaction between both spaces. From the information elicited during our review, we propose a unified theoretical framework for dual analysis, encompassing all existing approaches extending the field. We apply our proposed formalization describing the interactions between each component and relate them to the addressed tasks. Additionally, we categorize the existing approaches using our framework and derive future research directions to advance dual analysis by including state-of-the-art visual analysis techniques to improve data exploration.
Collapse
|
4
|
Chen Y, Wu C, Zhang Q, Wu D. Review of visual analytics methods for food safety risks. NPJ Sci Food 2023; 7:49. [PMID: 37699926 PMCID: PMC10497676 DOI: 10.1038/s41538-023-00226-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 08/31/2023] [Indexed: 09/14/2023] Open
Abstract
With the availability of big data for food safety, more and more advanced data analysis methods are being applied to risk analysis and prewarning (RAPW). Visual analytics, which has emerged in recent years, integrates human and machine intelligence into the data analysis process in a visually interactive manner, helping researchers gain insights into large-scale data and providing new solutions for RAPW. This review presents the developments in visual analytics for food safety RAPW in the past decade. Firstly, the data sources, data characteristics, and analysis tasks in the food safety field are summarized. Then, data analysis methods for four types of analysis tasks: association analysis, risk assessment, risk prediction, and fraud identification, are reviewed. After that, the visualization and interaction techniques are reviewed for four types of characteristic data: multidimensional, hierarchical, associative, and spatial-temporal data. Finally, opportunities and challenges in this area are proposed, such as the visual analysis of multimodal food safety data, the application of artificial intelligence techniques in the visual analysis pipeline, etc.
Collapse
Affiliation(s)
- Yi Chen
- Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing, 100048, China.
| | - Caixia Wu
- Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing, 100048, China
| | - Qinghui Zhang
- Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing, 100048, China
| | - Di Wu
- National Measurement Laboratory: Centre of Excellence in Agriculture and Food Integrity, Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast, Northern Ireland, UK
| |
Collapse
|
5
|
Wang AX, Chukova SS, Nguyen BP. Ensemble k-Nearest Neighbors based on Centroid Displacement. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
|
6
|
Liu Q, Ren Y, Zhu Z, Li D, Ma X, Li Q. RankAxis: Towards a Systematic Combination of Projection and Ranking in Multi-Attribute Data Exploration. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:701-711. [PMID: 36155453 DOI: 10.1109/tvcg.2022.3209463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Projection and ranking are frequently used analysis techniques in multi-attribute data exploration. Both families of techniques help analysts with tasks such as identifying similarities between observations and determining ordered subgroups, and have shown good performances in multi-attribute data exploration. However, they often exhibit problems such as distorted projection layouts, obscure semantic interpretations, and non-intuitive effects produced by selecting a subset of (weighted) attributes. Moreover, few studies have attempted to combine projection and ranking into the same exploration space to complement each other's strengths and weaknesses. For this reason, we propose RankAxis, a visual analytics system that systematically combines projection and ranking to facilitate the mutual interpretation of these two techniques and jointly support multi-attribute data exploration. A real-world case study, expert feedback, and a user study demonstrate the efficacy of RankAxis.
Collapse
|
7
|
What can scatterplots teach us about doing data science better? INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022. [DOI: 10.1007/s41060-022-00362-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
8
|
Multidimensional data visualization applying a variety-oriented scatterplot selection technique. J Vis (Tokyo) 2022. [DOI: 10.1007/s12650-022-00871-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
9
|
IXVC: An interactive pipeline for explaining visual clusters in dimensionality reduction visualizations with decision trees. ARRAY 2021. [DOI: 10.1016/j.array.2021.100080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
10
|
Garrison L, Muller J, Schreiber S, Oeltze-Jafra S, Hauser H, Bruckner S. DimLift: Interactive Hierarchical Data Exploration Through Dimensional Bundling. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:2908-2922. [PMID: 33544674 DOI: 10.1109/tvcg.2021.3057519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The identification of interesting patterns and relationships is essential to exploratory data analysis. This becomes increasingly difficult in high dimensional datasets. While dimensionality reduction techniques can be utilized to reduce the analysis space, these may unintentionally bury key dimensions within a larger grouping and obfuscate meaningful patterns. With this work we introduce DimLift, a novel visual analysis method for creating and interacting with dimensional bundles. Generated through an iterative dimensionality reduction or user-driven approach, dimensional bundles are expressive groups of dimensions that contribute similarly to the variance of a dataset. Interactive exploration and reconstruction methods via a layered parallel coordinates plot allow users to lift interesting and subtle relationships to the surface, even in complex scenarios of missing and mixed data types. We exemplify the power of this technique in an expert case study on clinical cohort data alongside two additional case examples from nutrition and ecology.
Collapse
|
11
|
Abstract
Feature Analysis has become a very critical task in data analysis and visualization. Graph structures are very flexible in terms of representation and may encode important information on features but are challenging in regards to layout being adequate for analysis tasks. In this study, we propose and develop similarity-based graph layouts with the purpose of locating relevant patterns in sets of features, thus supporting feature analysis and selection. We apply a tree layout in the first step of the strategy, to accomplish node placement and overview based on feature similarity. By drawing the remainder of the graph edges on demand, further grouping and relationships among features are revealed. We evaluate those groups and relationships in terms of their effectiveness in exploring feature sets for data analysis. Correlation of features with a target categorical attribute and feature ranking are added to support the task. Multidimensional projections are employed to plot the dataset based on selected attributes to reveal the effectiveness of the feature set. Our results have shown that the tree-graph layout framework allows for a number of observations that are very important in user-centric feature selection, and not easy to observe by any other available tool. They provide a way of finding relevant and irrelevant features, spurious sets of noisy features, groups of similar features, and opposite features, all of which are essential tasks in different scenarios of data analysis. Case studies in application areas centered on documents, images and sound data demonstrate the ability of the framework to quickly reach a satisfactory compact representation from a larger feature set.
Collapse
|
12
|
Zhang Y, Yu C, Wang R, Liu X. Visual dimension analysis based on dimension subdivision. J Vis (Tokyo) 2020. [DOI: 10.1007/s12650-020-00694-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
13
|
Nonato LG, Aupetit M. Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2650-2673. [PMID: 29994258 DOI: 10.1109/tvcg.2018.2846735] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visual analysis of multidimensional data requires expressive and effective ways to reduce data dimensionality to encode them visually. Multidimensional projections (MDP) figure among the most important visualization techniques in this context, transforming multidimensional data into scatter plots whose visual patterns reflect some notion of similarity in the original data. However, MDP come with distortions that make these visual patterns not trustworthy, hindering users to infer actual data characteristics. Moreover, the patterns present in the scatter plots might not be enough to allow a clear understanding of multidimensional data, motivating the development of layout enrichment methodologies to operate together with MDP. This survey attempts to cover the main aspects of MDP as a visualization and visual analytic tool. It provides detailed analysis and taxonomies as to the organization of MDP techniques according to their main properties and traits, discussing the impact of such properties for visual perception and other human factors. The survey also approaches the different types of distortions that can result from MDP mappings and it overviews existing mechanisms to quantitatively evaluate such distortions. A qualitative analysis of the impact of distortions on the different analytic tasks performed by users when exploring multidimensional data through MDP is also presented. Guidelines for choosing the best MDP for an intended task are also provided as a result of this analysis. Finally, layout enrichment schemes to debunk MDP distortions and/or reveal relevant information not directly inferable from the scatter plot are reviewed and discussed in the light of new taxonomies. We conclude the survey providing future research axes to fill discovered gaps in this domain.
Collapse
|
14
|
Wu B, Smith JS, Wilamowski BM, Nelms RM. DCMDS-RV: density-concentrated multi-dimensional scaling for relation visualization. J Vis (Tokyo) 2018. [DOI: 10.1007/s12650-018-0532-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
15
|
Lai C, Zhao Y, Yuan X. Exploring high-dimensional data through locally enhanced projections. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2018.08.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
16
|
Exploring linear projections for revealing clusters, outliers, and trends in subsets of multi-dimensional datasets. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2018.08.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
Rauber PE, Falcão AX, Telea AC. Projections as visual aids for classification system design. INFORMATION VISUALIZATION 2018; 17:282-305. [PMID: 30263012 PMCID: PMC6131729 DOI: 10.1177/1473871617713337] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Dimensionality reduction is a compelling alternative for high-dimensional data visualization. This method provides insight into high-dimensional feature spaces by mapping relationships between observations (high-dimensional vectors) to low (two or three) dimensional spaces. These low-dimensional representations support tasks such as outlier and group detection based on direct visualization. Supervised learning, a subfield of machine learning, is also concerned with observations. A key task in supervised learning consists of assigning class labels to observations based on generalization from previous experience. Effective development of such classification systems depends on many choices, including features descriptors, learning algorithms, and hyperparameters. These choices are not trivial, and there is no simple recipe to improve classification systems that perform poorly. In this context, we first propose the use of visual representations based on dimensionality reduction (projections) for predictive feedback on classification efficacy. Second, we propose a projection-based visual analytics methodology, and supportive tooling, that can be used to improve classification systems through feature selection. We evaluate our proposal through experiments involving four datasets and three representative learning algorithms.
Collapse
Affiliation(s)
- Paulo E Rauber
- Department of Mathematics and Computing
Science, University of Groningen, Groningen, The Netherlands
- University of Campinas, Campinas,
Brazil
| | | | - Alexandru C Telea
- Department of Mathematics and Computing
Science, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
18
|
Dowling M, Wenskovitch J, Fry JT, Leman S, House L, North C. SIRIUS: Dual, Symmetric, Interactive Dimension Reductions. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:172-182. [PMID: 30136978 DOI: 10.1109/tvcg.2018.2865047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Much research has been done regarding how to visualize and interact with observations and attributes of high-dimensional data for exploratory data analysis. From the analyst's perceptual and cognitive perspective, current visualization approaches typically treat the observations of the high-dimensional dataset very differently from the attributes. Often, the attributes are treated as inputs (e.g., sliders), and observations as outputs (e.g., projection plots), thus emphasizing investigation of the observations. However, there are many cases in which analysts wish to investigate both the observations and the attributes of the dataset, suggesting a symmetry between how analysts think about attributes and observations. To address this, we define SIRIUS (Symmetric Interactive Representations In a Unified System), a symmetric, dual projection technique to support exploratory data analysis of high-dimensional data. We provide an example implementation of SIRIUS and demonstrate how this symmetry affords additional insights.
Collapse
|
19
|
Zhou Z, Ye Z, Yu J, Chen W. Cluster-aware arrangement of the parallel coordinate plots. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2017.10.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
20
|
Zhou L, Weiskopf D. Indexed-Points Parallel Coordinates Visualization of Multivariate Correlations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:1997-2010. [PMID: 28459690 DOI: 10.1109/tvcg.2017.2698041] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We address the problem of visualizing multivariate correlations in parallel coordinates. We focus on multivariate correlation in the form of linear relationships between multiple variables. Traditional parallel coordinates are well prepared to show negative correlations between two attributes by distinct visual patterns. However, it is difficult to recognize positive correlations in parallel coordinates. Furthermore, there is no support to highlight multivariate correlations in parallel coordinates. In this paper, we exploit the indexed point representation of p -flats (planes in multidimensional data) to visualize local multivariate correlations in parallel coordinates. Our method yields clear visual signatures for negative and positive correlations alike, and it supports large datasets. All information is shown in a unified parallel coordinates framework, which leads to easy and familiar user interactions for analysts who have experience with traditional parallel coordinates. The usefulness of our method is demonstrated through examples of typical multidimensional datasets.
Collapse
|
21
|
Li C, Baciu G, Han Y. StreamMap: Smooth Dynamic Visualization of High-Density Streaming Points. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:1381-1393. [PMID: 28212089 DOI: 10.1109/tvcg.2017.2668409] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Interactive visualization of streaming points for real-time scatterplots and linear blending of correlation patterns is increasingly becoming the dominant mode of visual analytics for both big data and streaming data from active sensors and broadcasting media. To better visualize and interact with inter-stream patterns, it is generally necessary to smooth out gaps or distortions in the streaming data. Previous approaches either animate the points directly or present a sampled static heat-map. We propose a new approach, called StreamMap, to smoothly blend high-density streaming points and create a visual flow that emphasizes the density pattern distributions. In essence, we present three new contributions for the visualization of high-density streaming points. The first contribution is a density-based method called super kernel density estimation that aggregates streaming points using an adaptive kernel to solve the overlapping problem. The second contribution is a robust density morphing algorithm that generates several smooth intermediate frames for a given pair of frames. The third contribution is a trend representation design that can help convey the flow directions of the streaming points. The experimental results on three datasets demonstrate the effectiveness of StreamMap when dynamic visualization and visual analysis of trend patterns on streaming points are required.
Collapse
|
22
|
Wang B, Mueller K. The Subspace Voyager: Exploring High-Dimensional Data along a Continuum of Salient 3D Subspaces. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:1204-1222. [PMID: 28252403 DOI: 10.1109/tvcg.2017.2672987] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Analyzing high-dimensional data and finding hidden patterns is a difficult problem and has attracted numerous research efforts. Automated methods can be useful to some extent but bringing the data analyst into the loop via interactive visual tools can help the discovery process tremendously. An inherent problem in this effort is that humans lack the mental capacity to truly understand spaces exceeding three spatial dimensions. To keep within this limitation, we describe a framework that decomposes a high-dimensional data space into a continuum of generalized 3D subspaces. Analysts can then explore these 3D subspaces individually via the familiar trackball interface while using additional facilities to smoothly transition to adjacent subspaces for expanded space comprehension. Since the number of such subspaces suffers from combinatorial explosion, we provide a set of data-driven subspace selection and navigation tools which can guide users to interesting subspaces and views. A subspace trail map allows users to manage the explored subspaces, keep their bearings, and return to interesting subspaces and views. Both trackball and trail map are each embedded into a word cloud of attribute labels which aid in navigation. We demonstrate our system via several use cases in a diverse set of application areas-cluster analysis and refinement, information discovery, and supervised training of classifiers. We also report on a user study that evaluates the usability of the various interactions our system provides.
Collapse
|
23
|
Sarikaya A, Gleicher M. Scatterplots: Tasks, Data, and Designs. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:402-412. [PMID: 28866528 DOI: 10.1109/tvcg.2017.2744184] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Traditional scatterplots fail to scale as the complexity and amount of data increases. In response, there exist many design options that modify or expand the traditional scatterplot design to meet these larger scales. This breadth of design options creates challenges for designers and practitioners who must select appropriate designs for particular analysis goals. In this paper, we help designers in making design choices for scatterplot visualizations. We survey the literature to catalog scatterplot-specific analysis tasks. We look at how data characteristics influence design decisions. We then survey scatterplot-like designs to understand the range of design options. Building upon these three organizations, we connect data characteristics, analysis tasks, and design choices in order to generate challenges, open questions, and example best practices for the effective design of scatterplots.
Collapse
|
24
|
Xia J, Ye F, Chen W, Wang Y, Chen W, Ma Y, Tung AKH. LDSScanner: Exploratory Analysis of Low-Dimensional Structures in High-Dimensional Datasets. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:236-245. [PMID: 28866522 DOI: 10.1109/tvcg.2017.2744098] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Many approaches for analyzing a high-dimensional dataset assume that the dataset contains specific structures, e.g., clusters in linear subspaces or non-linear manifolds. This yields a trial-and-error process to verify the appropriate model and parameters. This paper contributes an exploratory interface that supports visual identification of low-dimensional structures in a high-dimensional dataset, and facilitates the optimized selection of data models and configurations. Our key idea is to abstract a set of global and local feature descriptors from the neighborhood graph-based representation of the latent low-dimensional structure, such as pairwise geodesic distance (GD) among points and pairwise local tangent space divergence (LTSD) among pointwise local tangent spaces (LTS). We propose a new LTSD-GD view, which is constructed by mapping LTSD and GD to the axis and axis using 1D multidimensional scaling, respectively. Unlike traditional dimensionality reduction methods that preserve various kinds of distances among points, the LTSD-GD view presents the distribution of pointwise LTS ( axis) and the variation of LTS in structures (the combination of axis and axis). We design and implement a suite of visual tools for navigating and reasoning about intrinsic structures of a high-dimensional dataset. Three case studies verify the effectiveness of our approach.
Collapse
|
25
|
High-dimensional data visualization by interactive construction of low-dimensional parallel coordinate plots. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2017. [DOI: 10.1016/j.jvlc.2017.03.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
26
|
|
27
|
Xia J, Jiang G, Zhang Y, Li R, Chen W. Visual subspace clustering based on dimension relevance. ACTA ACUST UNITED AC 2017. [DOI: 10.1016/j.jvlc.2017.05.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
28
|
Liu S, Maljovec D, Wang B, Bremer PT, Pascucci V. Visualizing High-Dimensional Data: Advances in the Past Decade. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:1249-1268. [PMID: 28113321 DOI: 10.1109/tvcg.2016.2640960] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Massive simulations and arrays of sensing devices, in combination with increasing computing resources, have generated large, complex, high-dimensional datasets used to study phenomena across numerous fields of study. Visualization plays an important role in exploring such datasets. We provide a comprehensive survey of advances in high-dimensional data visualization that focuses on the past decade. We aim at providing guidance for data practitioners to navigate through a modular view of the recent advances, inspiring the creation of new visualizations along the enriched visualization pipeline, and identifying future opportunities for visualization research.
Collapse
|
29
|
Rauber PE, Fadel SG, Falcao AX, Telea AC. Visualizing the Hidden Activity of Artificial Neural Networks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:101-110. [PMID: 27875137 DOI: 10.1109/tvcg.2016.2598838] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In machine learning, pattern classification assigns high-dimensional vectors (observations) to classes based on generalization from examples. Artificial neural networks currently achieve state-of-the-art results in this task. Although such networks are typically used as black-boxes, they are also widely believed to learn (high-dimensional) higher-level representations of the original observations. In this paper, we propose using dimensionality reduction for two tasks: visualizing the relationships between learned representations of observations, and visualizing the relationships between artificial neurons. Through experiments conducted in three traditional image classification benchmark datasets, we show how visualization can provide highly valuable feedback for network designers. For instance, our discoveries in one of these datasets (SVHN) include the presence of interpretable clusters of learned representations, and the partitioning of artificial neurons into groups with apparently related discriminative roles.
Collapse
|
30
|
Chen H, Chen W, Mei H, Liu Z, Zhou K, Chen W, Gu W, Ma KL. Visual Abstraction and Exploration of Multi-class Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014; 20:1683-1692. [PMID: 26356882 DOI: 10.1109/tvcg.2014.2346594] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Scatterplots are widely used to visualize scatter dataset for exploring outliers, clusters, local trends, and correlations. Depicting multi-class scattered points within a single scatterplot view, however, may suffer from heavy overdraw, making it inefficient for data analysis. This paper presents a new visual abstraction scheme that employs a hierarchical multi-class sampling technique to show a feature-preserving simplification. To enhance the density contrast, the colors of multiple classes are optimized by taking the multi-class point distributions into account. We design a visual exploration system that supports visual inspection and quantitative analysis from different perspectives. We have applied our system to several challenging datasets, and the results demonstrate the efficiency of our approach.
Collapse
|
31
|
Zgraggen E, Zeleznik R, Drucker SM. PanoramicData: Data Analysis through Pen & Touch. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2014; 20:2112-2121. [PMID: 26356925 DOI: 10.1109/tvcg.2014.2346293] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Interactively exploring multidimensional datasets requires frequent switching among a range of distinct but inter-related tasks (e.g., producing different visuals based on different column sets, calculating new variables, and observing the interactions between sets of data). Existing approaches either target specific different problem domains (e.g., data-transformation or data-presentation) or expose only limited aspects of the general exploratory process; in either case, users are forced to adopt coping strategies (e.g., arranging windows or using undo as a mechanism for comparison instead of using side-by-side displays) to compensate for the lack of an integrated suite of exploratory tools. PanoramicData (PD) addresses these problems by unifying a comprehensive set of tools for visual data exploration into a hybrid pen and touch system designed to exploit the visualization advantages of large interactive displays. PD goes beyond just familiar visualizations by including direct UI support for data transformation and aggregation, filtering and brushing. Leveraging an unbounded whiteboard metaphor, users can combine these tools like building blocks to create detailed interactive visual display networks in which each visualization can act as a filter for others. Further, by operating directly on relational-databases, PD provides an approachable visual language that exposes a broad set of the expressive power of SQL including functionally complete logic filtering, computation of aggregates and natural table joins. To understand the implications of this novel approach, we conducted a formative user study with both data and visualization experts. The results indicated that the system provided a fluid and natural user experience for probing multi-dimensional data and was able to cover the full range of queries that the users wanted to pose.
Collapse
|