1
|
Rave H, Molchanov V, Linsen L. De-Cluttering Scatterplots With Integral Images. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2114-2126. [PMID: 38526894 DOI: 10.1109/tvcg.2024.3381453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Scatterplots provide a visual representation of bivariate data (or 2D embeddings of multivariate data) that allows for effective analyses of data dependencies, clusters, trends, and outliers. Unfortunately, classical scatterplots suffer from scalability issues, since growing data sizes eventually lead to overplotting and visual clutter on a screen with a fixed resolution, which hinders the data analysis process. We propose an algorithm that compensates for irregular sample distributions by a smooth transformation of the scatterplot's visual domain. Our algorithm evaluates the scatterplot's density distribution to compute a regularization mapping based on integral images of the rasterized density function. The mapping preserves the samples' neighborhood relations. Few regularization iterations suffice to achieve a nearly uniform sample distribution that efficiently uses the available screen space. We further propose approaches to visually convey the transformation that was applied to the scatterplot and compare them in a user study. We present a novel parallel algorithm for fast GPU-based integral-image computation, which allows for integrating our de-cluttering approach into interactive visual data analysis systems.
Collapse
|
2
|
Chen X, Wang Y, Bao H, Lu K, Jo J, Fu CW, Fekete JD. Visualization-Driven Illumination for Density Plots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:1631-1644. [PMID: 39527427 DOI: 10.1109/tvcg.2024.3495695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
We present a novel visualization-driven illumination model for density plots, a new technique to enhance density plots by effectively revealing the detailed structures in high- and medium-density regions and outliers in low-density regions, while avoiding artifacts in the density field's colors. When visualizing large and dense discrete point samples, scatterplots and dot density maps often suffer from overplotting, and density plots are commonly employed to provide aggregated views while revealing underlying structures. Yet, in such density plots, existing illumination models may produce color distortion and hide details in low-density regions, making it challenging to look up density values, compare them, and find outliers. The key novelty in this work includes (i) a visualization-driven illumination model that inherently supports density-plot-specific analysis tasks and (ii) a new image composition technique to reduce the interference between the image shading and the color-encoded density values. To demonstrate the effectiveness of our technique, we conducted a quantitative study, an empirical evaluation of our technique in a controlled study, and two case studies, exploring twelve datasets with up to two million data point samples.
Collapse
|
3
|
Tseng C, Wang AZ, Quadri GJ, Szafir DA. Shape It Up: An Empirically Grounded Approach for Designing Shape Palettes. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:349-359. [PMID: 39283798 DOI: 10.1109/tvcg.2024.3456385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Shape is commonly used to distinguish between categories in multi-class scatterplots. However, existing guidelines for choosing effective shape palettes rely largely on intuition and do not consider how these needs may change as the number of categories increases. Unlike color, shapes can not be represented by a numerical space, making it difficult to propose general guidelines or design heuristics for using shape effectively. This paper presents a series of four experiments evaluating the efficiency of 39 shapes across three tasks: relative mean judgment tasks, expert preference, and correlation estimation. Our results show that conventional means for reasoning about shapes, such as filled versus unfilled, are insufficient to inform effective palette design. Further, even expert palettes vary significantly in their use of shape and corresponding effectiveness. To support effective shape palette design, we developed a model based on pairwise relations between shapes in our experiments and the number of shapes required for a given design. We embed this model in a palette design tool to give designers agency over shape selection while incorporating empirical elements of perceptual performance captured in our study. Our model advances understanding of shape perception in visualization contexts and provides practical design guidelines that can help improve categorical data encodings.
Collapse
|
4
|
Panavas L, Crnovrsanin T, Adams JL, Ullman J, Sargavad A, Tory M, Dunne C. Investigating the Visual Utility of Differentially Private Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5370-5385. [PMID: 37405888 PMCID: PMC11296227 DOI: 10.1109/tvcg.2023.3292391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
Increasingly, visualization practitioners are working with, using, and studying private and sensitive data. There can be many stakeholders interested in the resulting analyses-but widespread sharing of the data can cause harm to individuals, companies, and organizations. Practitioners are increasingly turning to differential privacy to enable public data sharing with a guaranteed amount of privacy. Differential privacy algorithms do this by aggregating data statistics with noise, and this now-private data can be released visually with differentially private scatterplots. While the private visual output is affected by the algorithm choice, privacy level, bin number, data distribution, and user task, there is little guidance on how to choose and balance the effect of these parameters. To address this gap, we had experts examine 1,200 differentially private scatterplots created with a variety of parameter choices and tested their ability to see aggregate patterns in the private output (i.e. the visual utility of the chart). We synthesized these results to provide easy-to-use guidance for visualization practitioners releasing private data through scatterplots. Our findings also provide a ground truth for visual utility, which we use to benchmark automated utility metrics from various fields. We demonstrate how multi-scale structural similarity (MS-SSIM), the metric most strongly correlated with our study's utility results, can be used to optimize parameter selection.
Collapse
|
5
|
Hilasaca GM, Marcilio-Jr WE, Eler DM, Martins RM, Paulovich FV. A Grid-Based Method for Removing Overlaps of Dimensionality Reduction Scatterplot Layouts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5733-5749. [PMID: 37647195 DOI: 10.1109/tvcg.2023.3309941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Dimensionality Reduction (DR) scatterplot layouts have become a ubiquitous visualization tool for analyzing multidimensional datasets. Despite their popularity, such scatterplots suffer from occlusion, especially when informative glyphs are used to represent data instances, potentially obfuscating critical information for the analysis under execution. Different strategies have been devised to address this issue, either producing overlap-free layouts that lack the powerful capabilities of contemporary DR techniques in uncovering interesting data patterns or eliminating overlaps as a post-processing strategy. Despite the good results of post-processing techniques, most of the best methods typically expand or distort the scatterplot area, thus reducing glyphs' size (sometimes) to unreadable dimensions, defeating the purpose of removing overlaps. This article presents Distance Grid (DGrid), a novel post-processing strategy to remove overlaps from DR layouts that faithfully preserves the original layout's characteristics and bounds the minimum glyph sizes. We show that DGrid surpasses the state-of-the-art in overlap removal (through an extensive comparative evaluation considering multiple different metrics) while also being one of the fastest techniques, especially for large datasets. A user study with 51 participants also shows that DGrid is consistently ranked among the top techniques for preserving the original scatterplots' visual characteristics and the aesthetics of the final results.
Collapse
|
6
|
Wang Y, Bace M, Bulling A. Scanpath Prediction on Information Visualisations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:3902-3914. [PMID: 37022407 DOI: 10.1109/tvcg.2023.3242293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
We propose Unified Model of Saliency and Scanpaths (UMSS)- a model that learns to predict multi-duration saliency and scanpaths (i.e. sequences of eye fixations) on information visualisations. Although scanpaths provide rich information about the importance of different visualisation elements during the visual exploration process, prior work has been limited to predicting aggregated attention statistics, such as visual saliency. We present in-depth analyses of gaze behaviour for different information visualisation elements (e.g. Title, Label, Data) on the popular MASSVIS dataset. We show that while, overall, gaze patterns are surprisingly consistent across visualisations and viewers, there are also structural differences in gaze dynamics for different elements. Informed by our analyses, UMSS first predicts multi-duration element-level saliency maps, then probabilistically samples scanpaths from them. Extensive experiments on MASSVIS show that our method consistently outperforms state-of-the-art methods with respect to several, widely used scanpath and saliency evaluation metrics. Our method achieves a relative improvement in sequence score of 11.5% for scanpath prediction, and a relative improvement in Pearson correlation coefficient of up to 23.6% for saliency prediction. These results are auspicious and point towards richer user models and simulations of visual attention on visualisations without the need for any eye tracking equipment.
Collapse
|
7
|
Quadri GJ, Nieves JA, Wiernik BM, Rosen P. Automatic Scatterplot Design Optimization for Clustering Identification. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4312-4327. [PMID: 35816525 DOI: 10.1109/tvcg.2022.3189883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Scatterplots are among the most widely used visualization techniques. Compelling scatterplot visualizations improve understanding of data by leveraging visual perception to boost awareness when performing specific visual analytic tasks. Design choices in scatterplots, such as graphical encodings or data aspects, can directly impact decision-making quality for low-level tasks like clustering. Hence, constructing frameworks that consider both the perceptions of the visual encodings and the task being performed enables optimizing visualizations to maximize efficacy. In this article, we propose an automatic tool to optimize the design factors of scatterplots to reveal the most salient cluster structure. Our approach leverages the merge tree data structure to identify the clusters and optimize the choice of subsampling algorithm, sampling rate, marker size, and marker opacity used to generate a scatterplot image. We validate our approach with user and case studies that show it efficiently provides high-quality scatterplot designs from a large parameter space.
Collapse
|
8
|
Li Z, Shi R, Liu Y, Long S, Guo Z, Jia S, Zhang J. Dual Space Coupling Model Guided Overlap-Free Scatterplot. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:657-667. [PMID: 36260569 DOI: 10.1109/tvcg.2022.3209459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The overdraw problem of scatterplots seriously interferes with the visual tasks. Existing methods, such as data sampling, node dispersion, subspace mapping, and visual abstraction, cannot guarantee the correspondence and consistency between the data points that reflect the intrinsic original data distribution and the corresponding visual units that reveal the presented data distribution, thus failing to obtain an overlap-free scatterplot with unbiased and lossless data distribution. A dual space coupling model is proposed in this paper to represent the complex bilateral relationship between data space and visual space theoretically and analytically. Under the guidance of the model, an overlap-free scatterplot method is developed through integration of the following: a geometry-based data transformation algorithm, namely DistributionTranscriptor; an efficient spatial mutual exclusion guided view transformation algorithm, namely PolarPacking; an overlap-free oriented visual encoding configuration model and a radius adjustment tool, namely frdraw. Our method can ensure complete and accurate information transfer between the two spaces, maintaining consistency between the newly created scatterplot and the original data distribution on global and local features. Quantitative evaluation proves our remarkable progress on computational efficiency compared with the state-of-the-art methods. Three applications involving pattern enhancement, interaction improvement, and overdraw mitigation of trajectory visualization demonstrate the broad prospects of our method.
Collapse
|
9
|
Quadri GJ, Rosen P. A Survey of Perception-Based Visualization Studies by Task. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:5026-5048. [PMID: 34283717 DOI: 10.1109/tvcg.2021.3098240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Knowledge of human perception has long been incorporated into visualizations to enhance their quality and effectiveness. The last decade, in particular, has shown an increase in perception-based visualization research studies. With all of this recent progress, the visualization community lacks a comprehensive guide to contextualize their results. In this report, we provide a systematic and comprehensive review of research studies on perception related to visualization. This survey reviews perception-focused visualization studies since 1980 and summarizes their research developments focusing on low-level tasks, further breaking techniques down by visual encoding and visualization type. In particular, we focus on how perception is used to evaluate the effectiveness of visualizations, to help readers understand and apply the principles of perception of their visualization designs through a task-optimized approach. We concluded our report with a summary of the weaknesses and open research questions in the area.
Collapse
|
10
|
Lees JA, Tonkin-Hill G, Yang Z, Corander J. Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210237. [PMID: 35989601 PMCID: PMC9393562 DOI: 10.1098/rstb.2021.0237] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
In less than a decade, population genomics of microbes has progressed from the effort of sequencing dozens of strains to thousands, or even tens of thousands of strains in a single study. There are now hundreds of thousands of genomes available even for a single bacterial species, and the number of genomes is expected to continue to increase at an accelerated pace given the advances in sequencing technology and widespread genomic surveillance initiatives. This explosion of data calls for innovative methods to enable rapid exploration of the structure of a population based on different data modalities, such as multiple sequence alignments, assemblies and estimates of gene content across different genomes. Here, we present Mandrake, an efficient implementation of a dimensional reduction method tailored for the needs of large-scale population genomics. Mandrake is capable of visualizing population structure from millions of whole genomes, and we illustrate its usefulness with several datasets representing major pathogens. Our method is freely available both as an analysis pipeline (https://github.com/johnlees/mandrake) and as a browser-based interactive application (https://gtonkinhill.github.io/mandrake-web/). This article is part of a discussion meeting issue ‘Genomic population structures of microbial pathogens’.
Collapse
Affiliation(s)
- John A Lees
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London W2 1PG, UK.,European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton CB10 1SD, UK
| | | | - Zhirong Yang
- Department of Computer Science, Norwegian University of Science and Technology, 7491 Trondheim, Norway.,Aalto University, 02150 Espoo, Finland
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, 0317 Oslo, Norway.,Parasites and Microbes, Wellcome Sanger Institute, Cambridge CB10 1SA, UK.,Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, 00100 Helsinki, Finland
| |
Collapse
|
11
|
Sohns JT, Schmitt M, Jirasek F, Hasse H, Leitte H. Attribute-based Explanation of Non-Linear Embeddings of High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:540-550. [PMID: 34587086 DOI: 10.1109/tvcg.2021.3114870] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Embeddings of high-dimensional data are widely used to explore data, to verify analysis results, and to communicate information. Their explanation, in particular with respect to the input attributes, is often difficult. With linear projects like PCA the axes can still be annotated meaningfully. With non-linear projections this is no longer possible and alternative strategies such as attribute-based color coding are required. In this paper, we review existing augmentation techniques and discuss their limitations. We present the Non-Linear Embeddings Surveyor (NoLiES) that combines a novel augmentation strategy for projected data (rangesets) with interactive analysis in a small multiples setting. Rangesets use a set-based visualization approach for binned attribute values that enable the user to quickly observe structure and detect outliers. We detail the link between algebraic topology and rangesets and demonstrate the utility of NoLiES in case studies with various challenges (complex attribute value distribution, many attributes, many data points) and a real-world application to understand latent features of matrix completion in thermodynamics.
Collapse
|
12
|
Hong MH, Witt JK, Szafir DA. The Weighted Average Illusion: Biases in Perceived Mean Position in Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:987-997. [PMID: 34596541 DOI: 10.1109/tvcg.2021.3114783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Scatterplots can encode a third dimension by using additional channels like size or color (e.g. bubble charts). We explore a potential misinterpretation of trivariate scatterplots, which we call the weighted average illusion, where locations of larger and darker points are given more weight toward x- and y-mean estimates. This systematic bias is sensitive to a designer's choice of size or lightness ranges mapped onto the data. In this paper, we quantify this bias against varying size/lightness ranges and data correlations. We discuss possible explanations for its cause by measuring attention given to individual data points using a vision science technique called the centroid method. Our work illustrates how ensemble processing mechanisms and mental shortcuts can significantly distort visual summaries of data, and can lead to misjudgments like the demonstrated weighted average illusion.
Collapse
|
13
|
Heine C. Towards Modeling Visualization Processes as Dynamic Bayesian Networks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1000-1010. [PMID: 33074817 DOI: 10.1109/tvcg.2020.3030395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Visualization designs typically need to be evaluated with user studies, because their suitability for a particular task is hard to predict. What the field of visualization is currently lacking are theories and models that can be used to explain why certain designs work and others do not. This paper outlines a general framework for modeling visualization processes that can serve as the first step towards such a theory. It surveys related research in mathematical and computational psychology and argues for the use of dynamic Bayesian networks to describe these time-dependent, probabilistic processes. It is discussed how these models could be used to aid in design evaluation. The development of concrete models will be a long process. Thus, the paper outlines a research program sketching how to develop prototypes and their extensions from existing models, controlled experiments, and observational studies.
Collapse
|
14
|
Lu K, Feng M, Chen X, Sedlmair M, Deussen O, Lischinski D, Cheng Z, Wang Y. Palettailor: Discriminable Colorization for Categorical Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:475-484. [PMID: 33048720 DOI: 10.1109/tvcg.2020.3030406] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We present an integrated approach for creating and assigning color palettes to different visualizations such as multi-class scatterplots, line, and bar charts. While other methods separate the creation of colors from their assignment, our approach takes data characteristics into account to produce color palettes, which are then assigned in a way that fosters better visual discrimination of classes. To do so, we use a customized optimization based on simulated annealing to maximize the combination of three carefully designed color scoring functions: point distinctness, name difference, and color discrimination. We compare our approach to state-of-the-art palettes with a controlled user study for scatterplots and line charts, furthermore we performed a case study. Our results show that Palettailor, as a fully-automated approach, generates color palettes with a higher discrimination quality than existing approaches. The efficiency of our optimization allows us also to incorporate user modifications into the color selection process.
Collapse
|
15
|
Yuan J, Xiang S, Xia J, Yu L, Liu S. Evaluation of Sampling Methods for Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1720-1730. [PMID: 33074820 DOI: 10.1109/tvcg.2020.3030432] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but "good" scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this study is to understand the capability of sampling methods in preserving the density, outliers, and overall shape of a scatterplot. To this end, we comprehensively review the literature and select seven typical sampling strategies as well as eight representative datasets. We then design four experiments to understand the performance of different strategies in maintaining: 1) region density; 2) class density; 3) outliers; and 4) overall shape in the sampling results. The results show that: 1) random sampling is preferred for preserving region density; 2) blue noise sampling and random sampling have comparable performance with the three multi-class sampling strategies in preserving class density; 3) outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers; and 4) blue noise sampling outperforms the others in maintaining the overall shape of a scatterplot.
Collapse
|
16
|
Quadri GJ, Rosen P. Modeling the Influence of Visual Density on Cluster Perception in Scatterplots Using Topology. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1829-1839. [PMID: 33048695 DOI: 10.1109/tvcg.2020.3030365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Scatterplots are used for a variety of visual analytics tasks, including cluster identification, and the visual encodings used on a scatterplot play a deciding role on the level of visual separation of clusters. For visualization designers, optimizing the visual encodings is crucial to maximizing the clarity of data. This requires accurately modeling human perception of cluster separation, which remains challenging. We present a multi-stage user study focusing on four factors-distribution size of clusters, number of points, size of points, and opacity of points-that influence cluster identification in scatterplots. From these parameters, we have constructed two models, a distance-based model, and a density-based model, using the merge tree data structure from Topological Data Analysis. Our analysis demonstrates that these factors play an important role in the number of clusters perceived, and it verifies that the distance-based and density-based models can reasonably estimate the number of clusters a user observes. Finally, we demonstrate how these models can be used to optimize visual encodings on real-world data.
Collapse
|
17
|
Synergy between research on ensemble perception, data visualization, and statistics education: A tutorial review. Atten Percept Psychophys 2021; 83:1290-1311. [PMID: 33389673 DOI: 10.3758/s13414-020-02212-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2020] [Indexed: 11/08/2022]
Abstract
In the age of big data, we are constantly inventing new data visualizations to consolidate massive amounts of numerical information into smaller and more digestible visual formats. These data visualizations use various visual features to convey quantitative information, such as spatial position in scatter plots, color saturation in heat maps, and area in dot maps. These data visualizations are typically composed of ensembles, or groups of related objects, that together convey information about a data set. Ensemble perception, or one's ability to perceive summary statistics from an ensemble, such as the mean, has been used as a foundation for understanding and explaining the effectiveness of certain data visualizations. However, research in data visualization has revealed some perceptual biases and conceptual difficulties people face when trying to utilize the information in these graphs. In this tutorial review, we will provide a broad overview of research conducted in ensemble perception, discuss how principles of ensemble encoding have been applied to the research in data visualization, and showcase the barriers graphs can pose to learning statistical concepts, using histograms as a specific example. The goal of this tutorial review is to highlight possible connections between three areas of research-ensemble perception, data visualization, and statistics education-and to encourage research in the practical applications of ensemble perception in solving real-world problems in statistics education.
Collapse
|
18
|
Chen X, Ge T, Zhang J, Chen B, Fu CW, Deussen O, Wang Y. A Recursive Subdivision Technique for Sampling Multi-class Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:729-738. [PMID: 31442987 DOI: 10.1109/tvcg.2019.2934541] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a non-uniform recursive sampling technique for multi-class scatterplots, with the specific goal of faithfully presenting relative data and class densities, while preserving major outliers in the plots. Our technique is based on a customized binary kd-tree, in which leaf nodes are created by recursively subdividing the underlying multi-class density map. By backtracking, we merge leaf nodes until they encompass points of all classes for our subsequently applied outlier-aware multi-class sampling strategy. A quantitative evaluation shows that our approach can better preserve outliers and at the same time relative densities in multi-class scatterplots compared to the previous approaches, several case studies demonstrate the effectiveness of our approach in exploring complex and real world data.
Collapse
|
19
|
Hu R, Sha T, Van Kaick O, Deussen O, Huang H. Data Sampling in Multi-view and Multi-class Scatterplots via Set Cover Optimization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:739-748. [PMID: 31443021 DOI: 10.1109/tvcg.2019.2934799] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a method for data sampling in scatterplots by jointly optimizing point selection for different views or classes. Our method uses space-filling curves (Z-order curves) that partition a point set into subsets that, when covered each by one sample, provide a sampling or coreset with good approximation guarantees in relation to the original point set. For scatterplot matrices with multiple views, different views provide different space-filling curves, leading to different partitions of the given point set. For multi-class scatterplots, the focus on either per-class distribution or global distribution provides two different partitions of the given point set that need to be considered in the selection of the coreset. For both cases, we convert the coreset selection problem into an Exact Cover Problem (ECP), and demonstrate with quantitative and qualitative evaluations that an approximate solution that solves the ECP efficiently is able to provide high-quality samplings.
Collapse
|
20
|
Raidou RG, Groller ME, Eisemann M. Relaxing Dense Scatter Plots with Pixel-Based Mappings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2205-2216. [PMID: 30892214 DOI: 10.1109/tvcg.2019.2903956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Scatter plots are the most commonly employed technique for the visualization of bivariate data. Despite their versatility and expressiveness in showing data aspects, such as clusters, correlations, and outliers, scatter plots face a main problem. For large and dense data, the representation suffers from clutter due to overplotting. This is often partially solved with the use of density plots. Yet, data overlap may occur in certain regions of a scatter or density plot, while other regions may be partially, or even completely empty. Adequate pixel-based techniques can be employed for effectively filling the plotting space, giving an additional notion of the numerosity of data motifs or clusters. We propose the Pixel-Relaxed Scatter Plots, a new and simple variant, to improve the display of dense scatter plots, using pixel-based, space-filling mappings. Our Pixel-Relaxed Scatter Plots make better use of the plotting canvas, while avoiding data overplotting, and optimizing space coverage and insight in the presence and size of data motifs. We have employed different methods to map scatter plot points to pixels and to visually present this mapping. We demonstrate our approach on several synthetic and realistic datasets, and we discuss the suitability of our technique for different tasks. Our conducted user evaluation shows that our Pixel-Relaxed Scatter Plots can be a useful enhancement to traditional scatter plots.
Collapse
|
21
|
Correll M, Li M, Kindlmann G, Scheidegger C. Looks Good To Me: Visualizations As Sanity Checks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:830-839. [PMID: 30136960 DOI: 10.1109/tvcg.2018.2864907] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Famous examples such as Anscombe's Quartet highlight that one of the core benefits of visualizations is allowing people to discover visual patterns that might otherwise be hidden by summary statistics. This visual inspection is particularly important in exploratory data analysis, where analysts can use visualizations such as histograms and dot plots to identify data quality issues. Yet, these visualizations are driven by parameters such as histogram bin size or mark opacity that have a great deal of impact on the final visual appearance of the chart, but are rarely optimized to make important features visible. In this paper, we show that data flaws have varying impact on the visual features of visualizations, and that the adversarial or merely uncritical setting of design parameters of visualizations can obscure the visual signatures of these flaws. Drawing on the framework of Algebraic Visualization Design, we present the results of a crowdsourced study showing that common visualization types can appear to reasonably summarize distributional data while hiding large and important flaws such as missing data and extraneous modes. We make use of these results to propose additional best practices for visualizations of distributions for data quality tasks.
Collapse
|
22
|
Jo J, Vernier F, Dragicevic P, Fekete JD. A Declarative Rendering Model for Multiclass Density Maps. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:470-480. [PMID: 30136987 DOI: 10.1109/tvcg.2018.2865141] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Multiclass maps are scatterplots, multidimensional projections, or thematic geographic maps where data points have a categorical attribute in addition to two quantitative attributes. This categorical attribute is often rendered using shape or color, which does not scale when overplotting occurs. When the number of data points increases, multiclass maps must resort to data aggregation to remain readable. We present multiclass density maps: multiple 2D histograms computed for each of the category values. Multiclass density maps are meant as a building block to improve the expressiveness and scalability of multiclass map visualization. In this article, we first present a short survey of aggregated multiclass maps, mainly from cartography. We then introduce a declarative model-a simple yet expressive JSON grammar associated with visual semantics-that specifies a wide design space of visualizations for multiclass density maps. Our declarative model is expressive and can be efficiently implemented in visualization front-ends such as modern web browsers. Furthermore, it can be reconfigured dynamically to support data exploration tasks without recomputing the raw data. Finally, we demonstrate how our model can be used to reproduce examples from the past and support exploring data at scale.
Collapse
|
23
|
Valdez AC, Ziefle M, Sedlmair M. Priming and Anchoring Effects in Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:584-594. [PMID: 28866525 DOI: 10.1109/tvcg.2017.2744138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We investigate priming and anchoring effects on perceptual tasks in visualization. Priming or anchoring effects depict the phenomena that a stimulus might influence subsequent human judgments on a perceptual level, or on a cognitive level by providing a frame of reference. Using visual class separability in scatterplots as an example task, we performed a set of five studies to investigate the potential existence of priming and anchoring effects. Our findings show that-under certain circumstances-such effects indeed exist. In other words, humans judge class separability of the same scatterplot differently depending on the scatterplot(s) they have seen before. These findings inform future work on better understanding and more accurately modeling human perception of visual patterns.
Collapse
|
24
|
Hollt T, Pezzotti N, van Unen V, Koning F, Lelieveldt BPF, Vilanova A. CyteGuide: Visual Guidance for Hierarchical Single-Cell Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:739-748. [PMID: 28866537 DOI: 10.1109/tvcg.2017.2744318] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Single-cell analysis through mass cytometry has become an increasingly important tool for immunologists to study the immune system in health and disease. Mass cytometry creates a high-dimensional description vector for single cells by time-of-flight measurement. Recently, t-Distributed Stochastic Neighborhood Embedding (t-SNE) has emerged as one of the state-of-the-art techniques for the visualization and exploration of single-cell data. Ever increasing amounts of data lead to the adoption of Hierarchical Stochastic Neighborhood Embedding (HSNE), enabling the hierarchical representation of the data. Here, the hierarchy is explored selectively by the analyst, who can request more and more detail in areas of interest. Such hierarchies are usually explored by visualizing disconnected plots of selections in different levels of the hierarchy. This poses problems for navigation, by imposing a high cognitive load on the analyst. In this work, we present an interactive summary-visualization to tackle this problem. CyteGuide guides the analyst through the exploration of hierarchically represented single-cell data, and provides a complete overview of the current state of the analysis. We conducted a two-phase user study with domain experts that use HSNE for data exploration. We first studied their problems with their current workflow using HSNE and the requirements to ease this workflow in a field study. These requirements have been the basis for our visual design. In the second phase, we verified our proposed solution in a user evaluation.
Collapse
|