1
|
Rave H, Molchanov V, Linsen L. De-Cluttering Scatterplots With Integral Images. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2114-2126. [PMID: 38526894 DOI: 10.1109/tvcg.2024.3381453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Scatterplots provide a visual representation of bivariate data (or 2D embeddings of multivariate data) that allows for effective analyses of data dependencies, clusters, trends, and outliers. Unfortunately, classical scatterplots suffer from scalability issues, since growing data sizes eventually lead to overplotting and visual clutter on a screen with a fixed resolution, which hinders the data analysis process. We propose an algorithm that compensates for irregular sample distributions by a smooth transformation of the scatterplot's visual domain. Our algorithm evaluates the scatterplot's density distribution to compute a regularization mapping based on integral images of the rasterized density function. The mapping preserves the samples' neighborhood relations. Few regularization iterations suffice to achieve a nearly uniform sample distribution that efficiently uses the available screen space. We further propose approaches to visually convey the transformation that was applied to the scatterplot and compare them in a user study. We present a novel parallel algorithm for fast GPU-based integral-image computation, which allows for integrating our de-cluttering approach into interactive visual data analysis systems.
Collapse
|
2
|
Li S, Liu G, Wei T, Jia S, Zhang J. EvoVis: A Visual Analytics Method to Understand the Labeling Iterations in Data Programming. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:1802-1817. [PMID: 38416617 DOI: 10.1109/tvcg.2024.3370654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
Obtaining high-quality labeled training data poses a significant bottleneck in the domain of machine learning. Data programming has emerged as a new paradigm to address this issue by converting human knowledge into labeling functions (LFs) to quickly produce low-cost probabilistic labels. To ensure the quality of labeled data, data programmers commonly iterate LFs for many rounds until satisfactory performance is achieved. However, the challenge in understanding the labeling iterations stems from interpreting the intricate relationships between data programming elements, exacerbated by their many-to-many and directed characteristics, inconsistent formats, and the large scale of data typically involved in labeling tasks. These complexities may impede the evaluation of label quality, identification of areas for improvement, and the effective optimization of LFs for acquiring high-quality labeled data. In this article, we introduce EvoVis, a visual analytics method for multi-class text labeling tasks. It seamlessly integrates relationship analysis and temporal overview to display contextual and historical information on a single screen, aiding in explaining the labeling iterations in data programming. We assessed its utility and effectiveness through case studies and user studies. The results indicate that EvoVis can effectively assist data programmers in understanding labeling iterations and improving the quality of labeled data, as evidenced by an increase of 0.16 in the average F1 score when compared to the default analysis tool.
Collapse
|
3
|
Chen X, Wang Y, Bao H, Lu K, Jo J, Fu CW, Fekete JD. Visualization-Driven Illumination for Density Plots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:1631-1644. [PMID: 39527427 DOI: 10.1109/tvcg.2024.3495695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
We present a novel visualization-driven illumination model for density plots, a new technique to enhance density plots by effectively revealing the detailed structures in high- and medium-density regions and outliers in low-density regions, while avoiding artifacts in the density field's colors. When visualizing large and dense discrete point samples, scatterplots and dot density maps often suffer from overplotting, and density plots are commonly employed to provide aggregated views while revealing underlying structures. Yet, in such density plots, existing illumination models may produce color distortion and hide details in low-density regions, making it challenging to look up density values, compare them, and find outliers. The key novelty in this work includes (i) a visualization-driven illumination model that inherently supports density-plot-specific analysis tasks and (ii) a new image composition technique to reduce the interference between the image shading and the color-encoded density values. To demonstrate the effectiveness of our technique, we conducted a quantitative study, an empirical evaluation of our technique in a controlled study, and two case studies, exploring twelve datasets with up to two million data point samples.
Collapse
|
4
|
Manz T, Lekschas F, Greene E, Finak G, Gehlenborg N. A General Framework for Comparing Embedding Visualizations Across Class-Label Hierarchies. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:283-293. [PMID: 39255153 PMCID: PMC11875997 DOI: 10.1109/tvcg.2024.3456370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Projecting high-dimensional vectors into two dimensions for visualization, known as embedding visualization, facilitates perceptual reasoning and interpretation. Comparing multiple embedding visualizations drives decision-making in many domains, but traditional comparison methods are limited by a reliance on direct point correspondences. This requirement precludes comparisons without point correspondences, such as two different datasets of annotated images, and fails to capture meaningful higher-level relationships among point groups. To address these shortcomings, we propose a general framework for comparing embedding visualizations based on shared class labels rather than individual points. Our approach partitions points into regions corresponding to three key class concepts-confusion, neighborhood, and relative size-to characterize intra- and inter-class relationships. Informed by a preliminary user study, we implemented our framework using perceptual neighborhood graphs to define these regions and introduced metrics to quantify each concept. We demonstrate the generality of our framework with usage scenarios from machine learning and single-cell biology, highlighting our metrics' ability to draw insightful comparisons across label hierarchies. To assess the effectiveness of our approach, we conducted an evaluation study with five machine learning researchers and six single-cell biologists using an interactive and scalable prototype built with Python, JavaScript, and Rust. Our metrics enable more structured comparisons through visual guidance and increased participants' confidence in their findings.
Collapse
|
5
|
Tseng C, Wang AZ, Quadri GJ, Szafir DA. Shape It Up: An Empirically Grounded Approach for Designing Shape Palettes. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:349-359. [PMID: 39283798 DOI: 10.1109/tvcg.2024.3456385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Shape is commonly used to distinguish between categories in multi-class scatterplots. However, existing guidelines for choosing effective shape palettes rely largely on intuition and do not consider how these needs may change as the number of categories increases. Unlike color, shapes can not be represented by a numerical space, making it difficult to propose general guidelines or design heuristics for using shape effectively. This paper presents a series of four experiments evaluating the efficiency of 39 shapes across three tasks: relative mean judgment tasks, expert preference, and correlation estimation. Our results show that conventional means for reasoning about shapes, such as filled versus unfilled, are insufficient to inform effective palette design. Further, even expert palettes vary significantly in their use of shape and corresponding effectiveness. To support effective shape palette design, we developed a model based on pairwise relations between shapes in our experiments and the number of shapes required for a given design. We embed this model in a palette design tool to give designers agency over shape selection while incorporating empirical elements of perceptual performance captured in our study. Our model advances understanding of shape perception in visualization contexts and provides practical design guidelines that can help improve categorical data encodings.
Collapse
|
6
|
Zhang Z, Yang F, Cheng R, Ma Y. ParetoTracker: Understanding Population Dynamics in Multi-Objective Evolutionary Algorithms Through Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:820-830. [PMID: 39255166 DOI: 10.1109/tvcg.2024.3456142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Multi-objective evolutionary algorithms (MOEAs) have emerged as powerful tools for solving complex optimization problems characterized by multiple, often conflicting, objectives. While advancements have been made in computational efficiency as well as diversity and convergence of solutions, a critical challenge persists: the internal evolutionary mechanisms are opaque to human users. Drawing upon the successes of explainable AI in explaining complex algorithms and models, we argue that the need to understand the underlying evolutionary operators and population dynamics within MOEAs aligns well with a visual analytics paradigm. This paper introduces ParetoTracker, a visual analytics framework designed to support the comprehension and inspection of population dynamics in the evolutionary processes of MOEAs. Informed by preliminary literature review and expert interviews, the framework establishes a multi-level analysis scheme, which caters to user engagement and exploration ranging from examining overall trends in performance metrics to conducting fine-grained inspections of evolutionary operations. In contrast to conventional practices that require manual plotting of solutions for each generation, ParetoTracker facilitates the examination of temporal trends and dynamics across consecutive generations in an integrated visual interface. The effectiveness of the framework is demonstrated through case studies and expert interviews focused on widely adopted benchmark optimization problems.
Collapse
|
7
|
Hilasaca GM, Marcilio-Jr WE, Eler DM, Martins RM, Paulovich FV. A Grid-Based Method for Removing Overlaps of Dimensionality Reduction Scatterplot Layouts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5733-5749. [PMID: 37647195 DOI: 10.1109/tvcg.2023.3309941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Dimensionality Reduction (DR) scatterplot layouts have become a ubiquitous visualization tool for analyzing multidimensional datasets. Despite their popularity, such scatterplots suffer from occlusion, especially when informative glyphs are used to represent data instances, potentially obfuscating critical information for the analysis under execution. Different strategies have been devised to address this issue, either producing overlap-free layouts that lack the powerful capabilities of contemporary DR techniques in uncovering interesting data patterns or eliminating overlaps as a post-processing strategy. Despite the good results of post-processing techniques, most of the best methods typically expand or distort the scatterplot area, thus reducing glyphs' size (sometimes) to unreadable dimensions, defeating the purpose of removing overlaps. This article presents Distance Grid (DGrid), a novel post-processing strategy to remove overlaps from DR layouts that faithfully preserves the original layout's characteristics and bounds the minimum glyph sizes. We show that DGrid surpasses the state-of-the-art in overlap removal (through an extensive comparative evaluation considering multiple different metrics) while also being one of the fastest techniques, especially for large datasets. A user study with 51 participants also shows that DGrid is consistently ranked among the top techniques for preserving the original scatterplots' visual characteristics and the aesthetics of the final results.
Collapse
|
8
|
Sharma M, Masood TB, Thygesen SS, Linares M, Hotz I, Natarajan V. Continuous Scatterplot Operators for Bivariate Analysis and Study of Electronic Transitions. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:3532-3544. [PMID: 37021886 DOI: 10.1109/tvcg.2023.3237768] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Electronic transitions in molecules due to the absorption or emission of light is a complex quantum mechanical process. Their study plays an important role in the design of novel materials. A common yet challenging task in the study is to determine the nature of electronic transitions, namely which subgroups of the molecule are involved in the transition by donating or accepting electrons, followed by an investigation of the variation in the donor-acceptor behavior for different transitions or conformations of the molecules. In this article, we present a novel approach for the analysis of a bivariate field and show its applicability to the study of electronic transitions. This approach is based on two novel operators, the continuous scatterplot (CSP) lens operator and the CSP peel operator, that enable effective visual analysis of bivariate fields. Both operators can be applied independently or together to facilitate analysis. The operators motivate the design of control polygon inputs to extract fiber surfaces of interest in the spatial domain. The CSPs are annotated with a quantitative measure to further support the visual analysis. We study different molecular systems and demonstrate how the CSP peel and CSP lens operators help identify and study donor and acceptor characteristics in molecular systems.
Collapse
|
9
|
Guo Q, Chen Y. The Effects of Visual Complexity and Task Difficulty on the Comprehensive Cognitive Efficiency of Cluster Separation Tasks. Behav Sci (Basel) 2023; 13:827. [PMID: 37887477 PMCID: PMC10604666 DOI: 10.3390/bs13100827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 09/24/2023] [Accepted: 09/28/2023] [Indexed: 10/28/2023] Open
Abstract
Cluster separation is required to perform multi-class visual statistics tasks and plays an essential role in information processing in visualization. This cognition behavioral study investigated the cluster separation task and the effects of visual complexity and task difficulty. A total of 32 college students (18 men and 14 women, with ages ranging from 18 to 25 years; mean = 21.2, SD = 3.9) participated in this study. The observers' average response accuracy, reaction time, mental effort, and comprehensive cognitive efficiency were measured as functions of three levels of visual complexity and task difficulty. The levels of visual complexity and task difficulty were quantified via an optimized complexity evaluation method and discrimination judgment task, respectively. The results showed that visual complexity and task difficulty significantly influenced comprehensive cognitive efficiency. Moreover, a strong interaction was observed between the effects of visual complexity and task difficulty. However, there was no positive linear relationship between the mental effort and the complexity level. Furthermore, two-dimensional color × shape redundant coding showed higher cognitive efficiency at low task difficulty levels. In contrast, the one-dimensional color encoding approach showed higher cognitive efficiency at increased task difficulty levels. The findings of this study provide valuable insights into designing more efficient and user-friendly visualization in the future.
Collapse
Affiliation(s)
- Qi Guo
- School of Art Design and Media, East China University of Science and Technology, Shanghai 200030, China;
| | | |
Collapse
|
10
|
Quadri GJ, Nieves JA, Wiernik BM, Rosen P. Automatic Scatterplot Design Optimization for Clustering Identification. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4312-4327. [PMID: 35816525 DOI: 10.1109/tvcg.2022.3189883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Scatterplots are among the most widely used visualization techniques. Compelling scatterplot visualizations improve understanding of data by leveraging visual perception to boost awareness when performing specific visual analytic tasks. Design choices in scatterplots, such as graphical encodings or data aspects, can directly impact decision-making quality for low-level tasks like clustering. Hence, constructing frameworks that consider both the perceptions of the visual encodings and the task being performed enables optimizing visualizations to maximize efficacy. In this article, we propose an automatic tool to optimize the design factors of scatterplots to reveal the most salient cluster structure. Our approach leverages the merge tree data structure to identify the clusters and optimize the choice of subsampling algorithm, sampling rate, marker size, and marker opacity used to generate a scatterplot image. We validate our approach with user and case studies that show it efficiently provides high-quality scatterplot designs from a large parameter space.
Collapse
|
11
|
Piccolotto N, Bögl M, Miksch S. Visual Parameter Space Exploration in Time and Space. COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS 2023; 42:e14785. [PMID: 38505647 PMCID: PMC10947302 DOI: 10.1111/cgf.14785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
Computational models, such as simulations, are central to a wide range of fields in science and industry. Those models take input parameters and produce some output. To fully exploit their utility, relations between parameters and outputs must be understood. These include, for example, which parameter setting produces the best result (optimization) or which ranges of parameter settings produce a wide variety of results (sensitivity). Such tasks are often difficult to achieve for various reasons, for example, the size of the parameter space, and supported with visual analytics. In this paper, we survey visual parameter space exploration (VPSE) systems involving spatial and temporal data. We focus on interactive visualizations and user interfaces. Through thematic analysis of the surveyed papers, we identify common workflow steps and approaches to support them. We also identify topics for future work that will help enable VPSE on a greater variety of computational models.
Collapse
Affiliation(s)
- Nikolaus Piccolotto
- TU WienInstitute of Visual Computing and Human‐Centered TechnologyWienAustria
| | - Markus Bögl
- TU WienInstitute of Visual Computing and Human‐Centered TechnologyWienAustria
| | - Silvia Miksch
- TU WienInstitute of Visual Computing and Human‐Centered TechnologyWienAustria
| |
Collapse
|
12
|
Gleicher M, Riveiro M, von Landesberger T, Deussen O, Chang R, Gillman C, Rhyne TM. A Problem Space for Designing Visualizations. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2023; 43:111-120. [PMID: 37432777 DOI: 10.1109/mcg.2023.3267213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/13/2023]
Abstract
Visualization researchers and visualization professionals seek appropriate abstractions of visualization requirements that permit considering visualization solutions independently from specific problems. Abstractions can help us design, analyze, organize, and evaluate the things we create. The literature has many task structures (taxonomies, typologies, etc.), design spaces, and related "frameworks" that provide abstractions of the problems a visualization is meant to address. In this Visualization Viewpoints article, we introduce a different one, a problem space that complements existing frameworks by focusing on the needs that a visualization is meant to solve. We believe it provides a valuable conceptual tool for designing and discussing visualizations.
Collapse
|
13
|
Eckelt K, Hinterreiter A, Adelberger P, Walchshofer C, Dhanoa V, Humer C, Heckmann M, Steinparz C, Streit M. Visual Exploration of Relationships and Structure in Low-Dimensional Embeddings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:3312-3326. [PMID: 35254984 DOI: 10.1109/tvcg.2022.3156760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
In this work, we propose an interactive visual approach for the exploration and formation of structural relationships in embeddings of high-dimensional data. These structural relationships, such as item sequences, associations of items with groups, and hierarchies between groups of items, are defining properties of many real-world datasets. Nevertheless, most existing methods for the visual exploration of embeddings treat these structures as second-class citizens or do not take them into account at all. In our proposed analysis workflow, users explore enriched scatterplots of the embedding, in which relationships between items and/or groups are visually highlighted. The original high-dimensional data for single items, groups of items, or differences between connected items and groups are accessible through additional summary visualizations. We carefully tailored these summary and difference visualizations to the various data types and semantic contexts. During their exploratory analysis, users can externalize their insights by setting up additional groups and relationships between items and/or groups. We demonstrate the utility and potential impact of our approach by means of two use cases and multiple examples from various domains.
Collapse
|
14
|
Li Z, Shi R, Liu Y, Long S, Guo Z, Jia S, Zhang J. Dual Space Coupling Model Guided Overlap-Free Scatterplot. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:657-667. [PMID: 36260569 DOI: 10.1109/tvcg.2022.3209459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The overdraw problem of scatterplots seriously interferes with the visual tasks. Existing methods, such as data sampling, node dispersion, subspace mapping, and visual abstraction, cannot guarantee the correspondence and consistency between the data points that reflect the intrinsic original data distribution and the corresponding visual units that reveal the presented data distribution, thus failing to obtain an overlap-free scatterplot with unbiased and lossless data distribution. A dual space coupling model is proposed in this paper to represent the complex bilateral relationship between data space and visual space theoretically and analytically. Under the guidance of the model, an overlap-free scatterplot method is developed through integration of the following: a geometry-based data transformation algorithm, namely DistributionTranscriptor; an efficient spatial mutual exclusion guided view transformation algorithm, namely PolarPacking; an overlap-free oriented visual encoding configuration model and a radius adjustment tool, namely frdraw. Our method can ensure complete and accurate information transfer between the two spaces, maintaining consistency between the newly created scatterplot and the original data distribution on global and local features. Quantitative evaluation proves our remarkable progress on computational efficiency compared with the state-of-the-art methods. Three applications involving pattern enhancement, interaction improvement, and overdraw mitigation of trajectory visualization demonstrate the broad prospects of our method.
Collapse
|
15
|
Li S, Yu J, Li M, Liu L, Zhang XL, Yuan X. A Framework for Multiclass Contour Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:353-362. [PMID: 36194705 DOI: 10.1109/tvcg.2022.3209482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Multiclass contour visualization is often used to interpret complex data attributes in such fields as weather forecasting, computational fluid dynamics, and artificial intelligence. However, effective and accurate representations of underlying data patterns and correlations can be challenging in multiclass contour visualization, primarily due to the inevitable visual cluttering and occlusions when the number of classes is significant. To address this issue, visualization design must carefully choose design parameters to make visualization more comprehensible. With this goal in mind, we proposed a framework for multiclass contour visualization. The framework has two components: a set of four visualization design parameters, which are developed based on an extensive review of literature on contour visualization, and a declarative domain-specific language (DSL) for creating multiclass contour rendering, which enables a fast exploration of those design parameters. A task-oriented user study was conducted to assess how those design parameters affect users' interpretations of real-world data. The study results offered some suggestions on the value choices of design parameters in multiclass contour visualization.
Collapse
|
16
|
Sarma A, Guo S, Hoffswell J, Rossi R, Du F, Koh E, Kay M. Evaluating the Use of Uncertainty Visualisations for Imputations of Data Missing At Random in Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:602-612. [PMID: 36166557 DOI: 10.1109/tvcg.2022.3209348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Most real-world datasets contain missing values yet most exploratory data analysis (EDA) systems only support visualising data points with complete cases. This omission may potentially lead the user to biased analyses and insights. Imputation techniques can help estimate the value of a missing data point, but introduces additional uncertainty. In this work, we investigate the effects of visualising imputed values in charts using different ways of representing data imputations and imputation uncertainty-no imputation, mean, 95% confidence intervals, probability density plots, gradient intervals, and hypothetical outcome plots. We focus on scatterplots, which is a commonly used chart type, and conduct a crowdsourced study with 202 participants. We measure users' bias and precision in performing two tasks-estimating average and detecting trend-and their self-reported confidence in performing these tasks. Our results suggest that, when estimating averages, uncertainty representations may reduce bias but at the cost of decreasing precision. When estimating trend, only hypothetical outcome plots may lead to a small probability of reducing bias while increasing precision. Participants in every uncertainty representation were less certain about their response when compared to the baseline. The findings point towards potential trade-offs in using uncertainty encodings for datasets with a large number of missing values. This paper and the associated analysis materials are available at: https://osf.io/q4y5r/.
Collapse
|
17
|
Quadri GJ, Rosen P. A Survey of Perception-Based Visualization Studies by Task. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:5026-5048. [PMID: 34283717 DOI: 10.1109/tvcg.2021.3098240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Knowledge of human perception has long been incorporated into visualizations to enhance their quality and effectiveness. The last decade, in particular, has shown an increase in perception-based visualization research studies. With all of this recent progress, the visualization community lacks a comprehensive guide to contextualize their results. In this report, we provide a systematic and comprehensive review of research studies on perception related to visualization. This survey reviews perception-focused visualization studies since 1980 and summarizes their research developments focusing on low-level tasks, further breaking techniques down by visual encoding and visualization type. In particular, we focus on how perception is used to evaluate the effectiveness of visualizations, to help readers understand and apply the principles of perception of their visualization designs through a task-optimized approach. We concluded our report with a summary of the weaknesses and open research questions in the area.
Collapse
|
18
|
Pandey A, Srinivasan A, Setlur V. MEDLEY: Intent-based Recommendations to Support Dashboard Composition. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; PP:1135-1145. [PMID: 36194711 DOI: 10.1109/tvcg.2022.3209421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Despite the ever-growing popularity of dashboards across a wide range of domains, their authoring still remains a tedious and complex process. Current tools offer considerable support for creating individual visualizations but provide limited support for discovering groups of visualizations that can be collectively useful for composing analytic dashboards. To address this problem, we present MEDLEY, a mixed-initiative interface that assists in dashboard composition by recommending dashboard collections (i.e., a logically grouped set of views and filtering widgets) that map to specific analytical intents. Users can specify dashboard intents (namely, measure analysis, change analysis, category analysis, or distribution analysis) explicitly through an input panel in the interface or implicitly by selecting data attributes and views of interest. The system recommends collections based on these analytic intents, and views and widgets can be selected to compose a variety of dashboards. MEDLEY also provides a lightweight direct manipulation interface to configure interactions between views in a dashboard. Based on a study with 13 participants performing both targeted and open-ended tasks, we discuss how MEDLEY's recommendations guide dashboard composition and facilitate different user workflows. Observations from the study identify potential directions for future work, including combining manual view specification with dashboard recommendations and designing natural language interfaces for dashboard authoring.
Collapse
|
19
|
Shen L, Shen E, Tai Z, Xu Y, Dong J, Wang J. Visual Data Analysis with Task-Based Recommendations. DATA SCIENCE AND ENGINEERING 2022; 7:354-369. [PMID: 36117680 PMCID: PMC9470074 DOI: 10.1007/s41019-022-00195-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 07/25/2022] [Accepted: 08/28/2022] [Indexed: 06/15/2023]
Abstract
General visualization recommendation systems typically make design decisions for the dataset automatically. However, most of them can only prune meaningless visualizations but fail to recommend targeted results. This paper contributes TaskVis, a task-oriented visualization recommendation system that allows users to select their tasks precisely on the interface. We first summarize a task base with 18 classical analytic tasks by a survey both in academia and industry. On this basis, we maintain a rule base, which extends empirical wisdom with our targeted modeling of the analytic tasks. Then, our rule-based approach enumerates all the candidate visualizations through answer set programming. After that, the generated charts can be ranked by four ranking schemes. Furthermore, we introduce a task-based combination recommendation strategy, leveraging a set of visualizations to give a brief view of the dataset collaboratively. Finally, we evaluate TaskVis through a series of use cases and a user study.
Collapse
Affiliation(s)
| | | | | | - Yihao Xu
- Tsinghua University, Beijing, China
| | | | | |
Collapse
|
20
|
What can scatterplots teach us about doing data science better? INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022. [DOI: 10.1007/s41060-022-00362-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
21
|
Hong MH, Witt JK, Szafir DA. The Weighted Average Illusion: Biases in Perceived Mean Position in Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:987-997. [PMID: 34596541 DOI: 10.1109/tvcg.2021.3114783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Scatterplots can encode a third dimension by using additional channels like size or color (e.g. bubble charts). We explore a potential misinterpretation of trivariate scatterplots, which we call the weighted average illusion, where locations of larger and darker points are given more weight toward x- and y-mean estimates. This systematic bias is sensitive to a designer's choice of size or lightness ranges mapped onto the data. In this paper, we quantify this bias against varying size/lightness ranges and data correlations. We discuss possible explanations for its cause by measuring attention given to individual data points using a vision science technique called the centroid method. Our work illustrates how ensemble processing mechanisms and mental shortcuts can significantly distort visual summaries of data, and can lead to misjudgments like the demonstrated weighted average illusion.
Collapse
|
22
|
No matter how you mark the points on the fever curve – threatening shapes do not add to threat of climate change. CURRENT PSYCHOLOGY 2021. [DOI: 10.1007/s12144-021-02553-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractGraphs have become an increasingly important means of representing data, for instance, when communicating data on climate change. However, graph characteristics might significantly affect graph comprehension. The goal of the present work was to test whether the marking forms usually depicted on line-graphs, can have an impact on graph evaluation. As past work suggests that triangular forms might be related to threat, we compared the effect of triangular marking forms with other symbols (triangles, circles, squares, rhombi, and asterisks) on subjective assessments. Participants in Study 1 (N = 314) received 5 different line-graphs about climate change, each of them using one out of 5 marking forms. In Study 1, the threat and arousal ratings of the graphs with triangular marking shapes were not higher than those with the other marking symbols. Participants in Study 2 (N = 279) received the same graphs, yet without labels and indeed rated the graphs with triangle point markers as more threatening. Testing whether local rather than global spatial attention would lead to an impact of marker shape in climate graphs, Study 3 (N = 307) documented that a task demanding to process a specific data-point on the graph (rather than just the line graph as a whole) did not lead to an effect either. These results suggest that marking symbols can principally affect threat and arousal ratings but not in the context of climate change. Hence, in graphs on climate change, choice of point markers does not have to take potential side-effects on threat and arousal into account. These seem to be restricted to the processing of graphs where form aspects face less competition from the content domain on judgments.
Collapse
|
23
|
Construct boundaries and place labels for multi-class scatterplots. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00791-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
24
|
Wang J, Cai X, Su J, Liao Y, Wu Y. What makes a scatterplot hard to comprehend: data size and pattern salience matter. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00778-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
25
|
Reimann D, Blech C, Ram N, Gaschler R. Visual Model Fit Estimation in Scatterplots: Influence of Amount and Decentering of Noise. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3834-3838. [PMID: 33444142 DOI: 10.1109/tvcg.2021.3051853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Scatterplots with a model enable visual estimation of model-data fit. In Experiment 1 (N = 62) we quantified the influence of noise-level on subjective misfit and found a negatively accelerated relationship. Experiment 2 showed that decentering of noise only mildly reduced fit ratings. The results have consequences for model-evaluation.
Collapse
|
26
|
Devakumar A, Jay Modh, Saket B, Baumer EPS, De Choudhury M. A Review on Strategies for Data Collection, Reflection, and Communication in Eating Disorder Apps. PROCEEDINGS OF THE SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS. CHI CONFERENCE 2021; 2021:547. [PMID: 35615054 PMCID: PMC9128313 DOI: 10.1145/3411764.3445670] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/22/2023]
Abstract
Eating disorders (EDs) constitute a mental illness with the highest mortality. Today, mobile health apps provide promising means to ED patients for managing their condition. Apps enable users to monitor their eating habits, thoughts, and feelings, and offer analytic insights for behavior change. However, not only have scholars critiqued the clinical validity of these apps, their underlying design principles are not well understood. Through a review of 34 ED apps, we uncovered 11 different data types ED apps collect, and 9 strategies they employ to support collection and reflection. Drawing upon personal health informatics and visualization frameworks, we found that most apps did not adhere to best practices on what and how data should be collected from and reflected to users, or how data-driven insights should be communicated. Our review offers suggestions for improving the design of ED apps such that they can be useful and meaningful in ED recovery.
Collapse
Affiliation(s)
| | - Jay Modh
- Georgia Institute of Technology, Atlanta, GA, USA
| | | | | | | |
Collapse
|
27
|
Reipschlager P, Flemisch T, Dachselt R. Personal Augmented Reality for Information Visualization on Large Interactive Displays. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1182-1192. [PMID: 33052863 DOI: 10.1109/tvcg.2020.3030460] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this work we propose the combination of large interactive displays with personal head-mounted Augmented Reality (AR) for information visualization to facilitate data exploration and analysis. Even though large displays provide more display space, they are challenging with regard to perception, effective multi-user support, and managing data density and complexity. To address these issues and illustrate our proposed setup, we contribute an extensive design space comprising first, the spatial alignment of display, visualizations, and objects in AR space. Next, we discuss which parts of a visualization can be augmented. Finally, we analyze how AR can be used to display personal views in order to show additional information and to minimize the mutual disturbance of data analysts. Based on this conceptual foundation, we present a number of exemplary techniques for extending visualizations with AR and discuss their relation to our design space. We further describe how these techniques address typical visualization problems that we have identified during our literature research. To examine our concepts, we introduce a generic AR visualization framework as well as a prototype implementing several example techniques. In order to demonstrate their potential, we further present a use case walkthrough in which we analyze a movie data set. From these experiences, we conclude that the contributed techniques can be useful in exploring and understanding multivariate data. We are convinced that the extension of large displays with AR for information visualization has a great potential for data analysis and sense-making.
Collapse
|
28
|
Yang Y, Cordeil M, Beyer J, Dwyer T, Marriott K, Pfister H. Embodied Navigation in Immersive Abstract Data Visualization: Is Overview+Detail or Zooming Better for 3D Scatterplots? IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1214-1224. [PMID: 33048730 DOI: 10.1109/tvcg.2020.3030427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
data has no natural scale and so interactive data visualizations must provide techniques to allow the user to choose their viewpoint and scale. Such techniques are well established in desktop visualization tools. The two most common techniques are zoom+pan and overview+detail. However, how best to enable the analyst to navigate and view abstract data at different levels of scale in immersive environments has not previously been studied. We report the findings of the first systematic study of immersive navigation techniques for 3D scatterplots. We tested four conditions that represent our best attempt to adapt standard 2D navigation techniques to data visualization in an immersive environment while still providing standard immersive navigation techniques through physical movement and teleportation. We compared room-sized visualization versus a zooming interface, each with and without an overview. We find significant differences in participants' response times and accuracy for a number of standard visual analysis tasks. Both zoom and overview provide benefits over standard locomotion support alone (i.e., physical movement and pointer teleportation). However, which variation is superior, depends on the task. We obtain a more nuanced understanding of the results by analyzing them in terms of a time-cost model for the different components of navigation: way-finding, travel, number of travel steps, and context switching.
Collapse
|
29
|
Tao W, Hou X, Sah A, Battle L, Chang R, Stonebraker M. Kyrix-S: Authoring Scalable Scatterplot Visualizations of Big Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:401-411. [PMID: 33048700 DOI: 10.1109/tvcg.2020.3030372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Static scatterplots often suffer from the overdraw problem on big datasets where object overlap causes undesirable visual clutter. The use of zooming in scatterplots can help alleviate this problem. With multiple zoom levels, more screen real estate is available, allowing objects to be placed in a less crowded way. We call this type of visualization scalable scatterplot visualizations, or SSV for short. Despite the potential of SSVs, existing systems and toolkits fall short in supporting the authoring of SSVs due to three limitations. First, many systems have limited scalability, assuming that data fits in the memory of one computer. Second, too much developer work, e.g., using custom code to generate mark layouts or render objects, is required. Third, many systems focus on only a small subset of the SSV design space (e.g. supporting a specific type of visual marks). To address these limitations, we have developed Kyrix-S, a system for easy authoring of SSVs at scale. Kyrix-S derives a declarative grammar that enables specification of a variety of SSVs in a few tens of lines of code, based on an existing survey of scatterplot tasks and designs. The declarative grammar is supported by a distributed layout algorithm which automatically places visual marks onto zoom levels. We store data in a multi-node database and use multi-node spatial indexes to achieve interactive browsing of large SSVs. Extensive experiments show that 1) Kyrix-S enables interactive browsing of SSVs of billions of objects, with response times under 500ms and 2) Kyrix-S achieves 4X-9X reduction in specification compared to a state-of-the-art authoring system.
Collapse
|
30
|
Yuan J, Xiang S, Xia J, Yu L, Liu S. Evaluation of Sampling Methods for Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1720-1730. [PMID: 33074820 DOI: 10.1109/tvcg.2020.3030432] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but "good" scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this study is to understand the capability of sampling methods in preserving the density, outliers, and overall shape of a scatterplot. To this end, we comprehensively review the literature and select seven typical sampling strategies as well as eight representative datasets. We then design four experiments to understand the performance of different strategies in maintaining: 1) region density; 2) class density; 3) outliers; and 4) overall shape in the sampling results. The results show that: 1) random sampling is preferred for preserving region density; 2) blue noise sampling and random sampling have comparable performance with the three multi-class sampling strategies in preserving class density; 3) outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers; and 4) blue noise sampling outperforms the others in maintaining the overall shape of a scatterplot.
Collapse
|
31
|
Zhang D, Sarvghad A, Miklau G. Investigating Visual Analysis of Differentially Private Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1786-1796. [PMID: 33074813 DOI: 10.1109/tvcg.2020.3030369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Differential Privacy is an emerging privacy model with increasing popularity in many domains. It functions by adding carefully calibrated noise to data that blurs information about individuals while preserving overall statistics about the population. Theoretically, it is possible to produce robust privacy-preserving visualizations by plotting differentially private data. However, noise-induced data perturbations can alter visual patterns and impact the utility of a private visualization. We still know little about the challenges and opportunities for visual data exploration and analysis using private visualizations. As a first step towards filling this gap, we conducted a crowdsourced experiment, measuring participants' performance under three levels of privacy (high, low, non-private) for combinations of eight analysis tasks and four visualization types (bar chart, pie chart, line chart, scatter plot). Our findings show that for participants' accuracy for summary tasks (e.g., find clusters in data) was higher that value tasks (e.g., retrieve a certain value). We also found that under DP, pie chart and line chart offer similar or better accuracy than bar chart. In this work, we contribute the results of our empirical study, investigating the task-based effectiveness of basic private visualizations, a dichotomous model for defining and measuring user success in performing visual analysis tasks under DP, and a set of distribution metrics for tuning the injection to improve the utility of private visualizations.
Collapse
|
32
|
Quadri GJ, Rosen P. Modeling the Influence of Visual Density on Cluster Perception in Scatterplots Using Topology. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1829-1839. [PMID: 33048695 DOI: 10.1109/tvcg.2020.3030365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Scatterplots are used for a variety of visual analytics tasks, including cluster identification, and the visual encodings used on a scatterplot play a deciding role on the level of visual separation of clusters. For visualization designers, optimizing the visual encodings is crucial to maximizing the clarity of data. This requires accurately modeling human perception of cluster separation, which remains challenging. We present a multi-stage user study focusing on four factors-distribution size of clusters, number of points, size of points, and opacity of points-that influence cluster identification in scatterplots. From these parameters, we have constructed two models, a distance-based model, and a density-based model, using the merge tree data structure from Topological Data Analysis. Our analysis demonstrates that these factors play an important role in the number of clusters perceived, and it verifies that the distance-based and density-based models can reasonably estimate the number of clusters a user observes. Finally, we demonstrate how these models can be used to optimize visual encodings on real-world data.
Collapse
|
33
|
Reimann D, Blech C, Gaschler R. Visual Model Fit Estimation in Scatterplots and Distribution of Attention. Exp Psychol 2020; 67:292-302. [PMID: 33274658 DOI: 10.1027/1618-3169/a000499] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Scatterplots are ubiquitous data graphs and can be used to depict how well data fit to a quantitative theory. We investigated which information is used for such estimates. In Experiment 1 (N = 25), we tested the influence of slope and noise on perceived fit between a linear model and data points. Additionally, eye tracking was used to analyze the deployment of attention. Visual fit estimation might mimic one or the other statistical estimate: If participants were influenced by noise only, this would suggest that their subjective judgment was similar to root mean square error. If slope was relevant, subjective estimation would mimic variance explained. While the influence of noise on estimated fit was stronger, we also found an influence of slope. As most of the fixations fell into the center of the scatterplot, in Experiment 2 (N = 51), we tested whether location of noise affects judgment. Indeed, high noise influenced the judgment of fit more strongly if it was located in the middle of the scatterplot. Visual fit estimates seem to be driven by the center of the scatterplot and to mimic variance explained.
Collapse
Affiliation(s)
- Daniel Reimann
- Department of Psychology, FernUniversität in Hagen, Hagen, Germany
| | - Christine Blech
- Department of Psychology, FernUniversität in Hagen, Hagen, Germany
| | - Robert Gaschler
- Department of Psychology, FernUniversität in Hagen, Hagen, Germany
| |
Collapse
|
34
|
Representing Data Visualization Goals and Tasks through Meta-Modeling to Tailor Information Dashboards. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10072306] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Information dashboards are everywhere. They support knowledge discovery in a huge variety of contexts and domains. Although powerful, these tools can be complex, not only for the end-users but also for developers and designers. Information dashboards encode complex datasets into different visual marks to ease knowledge discovery. Choosing a wrong design could compromise the entire dashboard’s effectiveness, selecting the appropriate encoding or configuration for each potential context, user, or data domain is a crucial task. For these reasons, there is a necessity to automatize the recommendation of visualizations and dashboard configurations to deliver tools adapted to their context. Recommendations can be based on different aspects, such as user characteristics, the data domain, or the goals and tasks that will be achieved or carried out through the visualizations. This work presents a dashboard meta-model that abstracts all these factors and the integration of a visualization task taxonomy to account for the different actions that can be performed with information dashboards. This meta-model has been used to design a domain specific language to specify dashboards requirements in a structured way. The ultimate goal is to obtain a dashboard generation pipeline to deliver dashboards adapted to any context, such as the educational context, in which a lot of data are generated, and there are several actors involved (students, teachers, managers, etc.) that would want to reach different insights regarding their learning performance or learning methodologies.
Collapse
|
35
|
Lu M, Wang S, Lanir J, Fish N, Yue Y, Cohen-Or D, Huang H. Winglets: Visualizing Association with Uncertainty in Multi-class Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:770-779. [PMID: 31562094 DOI: 10.1109/tvcg.2019.2934811] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This work proposes Winglets, an enhancement to the classic scatterplot to better perceptually pronounce multiple classes by improving the perception of association and uncertainty of points to their related cluster. Designed as a pair of dual-sided strokes belonging to a data point, Winglets leverage the Gestalt principle of Closure to shape the perception of the form of the clusters, rather than use an explicit divisive encoding. Through a subtle design of two dominant attributes, length and orientation, Winglets enable viewers to perform a mental completion of the clusters. A controlled user study was conducted to examine the efficiency of Winglets in perceiving the cluster association and the uncertainty of certain points. The results show Winglets form a more prominent association of points into clusters and improve the perception of associating uncertainty.
Collapse
|
36
|
Chen X, Ge T, Zhang J, Chen B, Fu CW, Deussen O, Wang Y. A Recursive Subdivision Technique for Sampling Multi-class Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:729-738. [PMID: 31442987 DOI: 10.1109/tvcg.2019.2934541] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a non-uniform recursive sampling technique for multi-class scatterplots, with the specific goal of faithfully presenting relative data and class densities, while preserving major outliers in the plots. Our technique is based on a customized binary kd-tree, in which leaf nodes are created by recursively subdividing the underlying multi-class density map. By backtracking, we merge leaf nodes until they encompass points of all classes for our subsequently applied outlier-aware multi-class sampling strategy. A quantitative evaluation shows that our approach can better preserve outliers and at the same time relative densities in multi-class scatterplots compared to the previous approaches, several case studies demonstrate the effectiveness of our approach in exploring complex and real world data.
Collapse
|
37
|
Xiong C, Shapiro J, Hullman J, Franconeri S. Illusion of Causality in Visualized Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:853-862. [PMID: 31425111 DOI: 10.1109/tvcg.2019.2934399] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Students who eat breakfast more frequently tend to have a higher grade point average. From this data, many people might confidently state that a before-school breakfast program would lead to higher grades. This is a reasoning error, because correlation does not necessarily indicate causation - X and Y can be correlated without one directly causing the other. While this error is pervasive, its prevalence might be amplified or mitigated by the way that the data is presented to a viewer. Across three crowdsourced experiments, we examined whether how simple data relations are presented would mitigate this reasoning error. The first experiment tested examples similar to the breakfast-GPA relation, varying in the plausibility of the causal link. We asked participants to rate their level of agreement that the relation was correlated, which they rated appropriately as high. However, participants also expressed high agreement with a causal interpretation of the data. Levels of support for the causal interpretation were not equally strong across visualization types: causality ratings were highest for text descriptions and bar graphs, but weaker for scatter plots. But is this effect driven by bar graphs aggregating data into two groups or by the visual encoding type? We isolated data aggregation versus visual encoding type and examined their individual effect on perceived causality. Overall, different visualization designs afford different cognitive reasoning affordances across the same data. High levels of data aggregation by graphs tend to be associated with higher perceived causality in data. Participants perceived line and dot visual encodings as more causal than bar encodings. Our results demonstrate how some visualization designs trigger stronger causal links while choosing others can help mitigate unwarranted perceptions of causality.
Collapse
|
38
|
Wei Y, Mei H, Zhao Y, Zhou S, Lin B, Jiang H, Chen W. Evaluating Perceptual Bias During Geometric Scaling of Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:321-331. [PMID: 31403425 DOI: 10.1109/tvcg.2019.2934208] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Scatterplots are frequently scaled to fit display areas in multi-view and multi-device data analysis environments. A common method used for scaling is to enlarge or shrink the entire scatterplot together with the inside points synchronously and proportionally. This process is called geometric scaling. However, geometric scaling of scatterplots may cause a perceptual bias, that is, the perceived and physical values of visual features may be dissociated with respect to geometric scaling. For example, if a scatterplot is projected from a laptop to a large projector screen, then observers may feel that the scatterplot shown on the projector has fewer points than that viewed on the laptop. This paper presents an evaluation study on the perceptual bias of visual features in scatterplots caused by geometric scaling. The study focuses on three fundamental visual features (i.e., numerosity, correlation, and cluster separation) and three hypotheses that are formulated on the basis of our experience. We carefully design three controlled experiments by using well-prepared synthetic data and recruit participants to complete the experiments on the basis of their subjective experience. With a detailed analysis of the experimental results, we obtain a set of instructive findings. First, geometric scaling causes a bias that has a linear relationship with the scale ratio. Second, no significant difference exists between the biases measured from normally and uniformly distributed scatterplots. Third, changing the point radius can correct the bias to a certain extent. These findings can be used to inspire the design decisions of scatterplots in various scenarios.
Collapse
|
39
|
Luo X, Yuan Y, Zhang K, Xia J, Zhou Z, Chang L, Gu T. Enhancing statistical charts: toward better data visualization and analysis. J Vis (Tokyo) 2019. [DOI: 10.1007/s12650-019-00569-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
40
|
Raidou RG, Groller ME, Eisemann M. Relaxing Dense Scatter Plots with Pixel-Based Mappings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2205-2216. [PMID: 30892214 DOI: 10.1109/tvcg.2019.2903956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Scatter plots are the most commonly employed technique for the visualization of bivariate data. Despite their versatility and expressiveness in showing data aspects, such as clusters, correlations, and outliers, scatter plots face a main problem. For large and dense data, the representation suffers from clutter due to overplotting. This is often partially solved with the use of density plots. Yet, data overlap may occur in certain regions of a scatter or density plot, while other regions may be partially, or even completely empty. Adequate pixel-based techniques can be employed for effectively filling the plotting space, giving an additional notion of the numerosity of data motifs or clusters. We propose the Pixel-Relaxed Scatter Plots, a new and simple variant, to improve the display of dense scatter plots, using pixel-based, space-filling mappings. Our Pixel-Relaxed Scatter Plots make better use of the plotting canvas, while avoiding data overplotting, and optimizing space coverage and insight in the presence and size of data motifs. We have employed different methods to map scatter plot points to pixels and to visually present this mapping. We demonstrate our approach on several synthetic and realistic datasets, and we discuss the suitability of our technique for different tasks. Our conducted user evaluation shows that our Pixel-Relaxed Scatter Plots can be a useful enhancement to traditional scatter plots.
Collapse
|
41
|
Abstract
The contact center industry represents a large proportion of many country’s economies. For example, 4% of the entire United States and UK’s working population is employed in this sector. As in most modern industries, contact centers generate gigabytes of operational data that require analysis to provide insight and to improve efficiency. Visualization is a valuable approach to data analysis, enabling trends and correlations to be discovered, particularly when using scatterplots. We present a feature-rich application that visualizes large call center data sets using scatterplots that support millions of points. The application features a scatterplot matrix to provide an overview of the call center data attributes, animation of call start and end times, and utilizes both the CPU and GPU acceleration for processing and filtering. We illustrate the use of the Open Computing Language (OpenCL) to utilize a commodity graphics card for the fast filtering of fields with multiple attributes. We demonstrate the use of the application with millions of call events from a month’s worth of real-world data and report domain expert feedback from our industry partner.
Collapse
|
42
|
Exploring linear projections for revealing clusters, outliers, and trends in subsets of multi-dimensional datasets. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2018.08.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
43
|
Wang Y, Chen X, Ge T, Bao C, Sedlmair M, Fu CW, Deussen O, Chen B. Optimizing Color Assignment for Perception of Class Separability in Multiclass Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:820-829. [PMID: 30136963 DOI: 10.1109/tvcg.2018.2864912] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Appropriate choice of colors significantly aids viewers in understanding the structures in multiclass scatterplots and becomes more important with a growing number of data points and groups. An appropriate color mapping is also an important parameter for the creation of an aesthetically pleasing scatterplot. Currently, users of visualization software routinely rely on color mappings that have been pre-defined by the software. A default color mapping, however, cannot ensure an optimal perceptual separability between groups, and sometimes may even lead to a misinterpretation of the data. In this paper, we present an effective approach for color assignment based on a set of given colors that is designed to optimize the perception of scatterplots. Our approach takes into account the spatial relationships, density, degree of overlap between point clusters, and also the background color. For this purpose, we use a genetic algorithm that is able to efficiently find good color assignments. We implemented an interactive color assignment system with three extensions of the basic method that incorporates top K suggestions, user-defined color subsets, and classes of interest for the optimization. To demonstrate the effectiveness of our assignment technique, we conducted a numerical study and a controlled user study to compare our approach with default color assignments; our findings were verified by two expert studies. The results show that our approach is able to support users in distinguishing cluster numbers faster and more precisely than default assignment methods.
Collapse
|
44
|
Jo J, Vernier F, Dragicevic P, Fekete JD. A Declarative Rendering Model for Multiclass Density Maps. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:470-480. [PMID: 30136987 DOI: 10.1109/tvcg.2018.2865141] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Multiclass maps are scatterplots, multidimensional projections, or thematic geographic maps where data points have a categorical attribute in addition to two quantitative attributes. This categorical attribute is often rendered using shape or color, which does not scale when overplotting occurs. When the number of data points increases, multiclass maps must resort to data aggregation to remain readable. We present multiclass density maps: multiple 2D histograms computed for each of the category values. Multiclass density maps are meant as a building block to improve the expressiveness and scalability of multiclass map visualization. In this article, we first present a short survey of aggregated multiclass maps, mainly from cartography. We then introduce a declarative model-a simple yet expressive JSON grammar associated with visual semantics-that specifies a wide design space of visualizations for multiclass density maps. Our declarative model is expressive and can be efficiently implemented in visualization front-ends such as modern web browsers. Furthermore, it can be reconfigured dynamically to support data exploration tasks without recomputing the raw data. Finally, we demonstrate how our model can be used to reproduce examples from the past and support exploring data at scale.
Collapse
|
45
|
Wang Y, Wang Z, Fu CW, Schmauder H, Deussen O, Weiskopf D. Image-Based Aspect Ratio Selection. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:840-849. [PMID: 30137008 DOI: 10.1109/tvcg.2018.2865266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Selecting a good aspect ratio is crucial for effective 2D diagrams. There are several aspect ratio selection methods for function plots and line charts, but only few can handle general, discrete diagrams such as 2D scatter plots. However, these methods either lack a perceptual foundation or heavily rely on intermediate isoline representations, which depend on choosing the right isovalues and are time-consuming to compute. This paper introduces a general image-based approach for selecting aspect ratios for a wide variety of 2D diagrams, ranging from scatter plots and density function plots to line charts. Our approach is derived from Federer's co-area formula and a line integral representation that enable us to directly construct image-based versions of existing selection methods using density fields. In contrast to previous methods, our approach bypasses isoline computation, so it is faster to compute, while following the perceptual foundation to select aspect ratios. Furthermore, this approach is complemented by an anisotropic kernel density estimation to construct density fields, allowing us to more faithfully characterize data patterns, such as the subgroups in scatterplots or dense regions in time series. We demonstrate the effectiveness of our approach by quantitatively comparing to previous methods and revisiting a prior user study. Finally, we present extensions for ROI banking, multi-scale banking, and the application to image data.
Collapse
|