1
|
Rave H, Molchanov V, Linsen L. De-Cluttering Scatterplots With Integral Images. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2114-2126. [PMID: 38526894 DOI: 10.1109/tvcg.2024.3381453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Scatterplots provide a visual representation of bivariate data (or 2D embeddings of multivariate data) that allows for effective analyses of data dependencies, clusters, trends, and outliers. Unfortunately, classical scatterplots suffer from scalability issues, since growing data sizes eventually lead to overplotting and visual clutter on a screen with a fixed resolution, which hinders the data analysis process. We propose an algorithm that compensates for irregular sample distributions by a smooth transformation of the scatterplot's visual domain. Our algorithm evaluates the scatterplot's density distribution to compute a regularization mapping based on integral images of the rasterized density function. The mapping preserves the samples' neighborhood relations. Few regularization iterations suffice to achieve a nearly uniform sample distribution that efficiently uses the available screen space. We further propose approaches to visually convey the transformation that was applied to the scatterplot and compare them in a user study. We present a novel parallel algorithm for fast GPU-based integral-image computation, which allows for integrating our de-cluttering approach into interactive visual data analysis systems.
Collapse
|
2
|
Chen X, Wang Y, Bao H, Lu K, Jo J, Fu CW, Fekete JD. Visualization-Driven Illumination for Density Plots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:1631-1644. [PMID: 39527427 DOI: 10.1109/tvcg.2024.3495695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
We present a novel visualization-driven illumination model for density plots, a new technique to enhance density plots by effectively revealing the detailed structures in high- and medium-density regions and outliers in low-density regions, while avoiding artifacts in the density field's colors. When visualizing large and dense discrete point samples, scatterplots and dot density maps often suffer from overplotting, and density plots are commonly employed to provide aggregated views while revealing underlying structures. Yet, in such density plots, existing illumination models may produce color distortion and hide details in low-density regions, making it challenging to look up density values, compare them, and find outliers. The key novelty in this work includes (i) a visualization-driven illumination model that inherently supports density-plot-specific analysis tasks and (ii) a new image composition technique to reduce the interference between the image shading and the color-encoded density values. To demonstrate the effectiveness of our technique, we conducted a quantitative study, an empirical evaluation of our technique in a controlled study, and two case studies, exploring twelve datasets with up to two million data point samples.
Collapse
|
3
|
Mildau K, Ehlers H, Meisenburg M, Del Pup E, Koetsier RA, Torres Ortega LR, de Jonge NF, Singh KS, Ferreira D, Othibeng K, Tugizimana F, Huber F, van der Hooft JJJ. Effective data visualization strategies in untargeted metabolomics. Nat Prod Rep 2024. [PMID: 39620439 PMCID: PMC11610048 DOI: 10.1039/d4np00039k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Indexed: 12/11/2024]
Abstract
Covering: 2014 to 2023 for metabolomics, 2002 to 2023 for information visualizationLC-MS/MS-based untargeted metabolomics is a rapidly developing research field spawning increasing numbers of computational metabolomics tools assisting researchers with their complex data processing, analysis, and interpretation tasks. In this article, we review the entire untargeted metabolomics workflow from the perspective of information visualization, visual analytics and visual data integration. Data visualization is a crucial step at every stage of the metabolomics workflow, where it provides core components of data inspection, evaluation, and sharing capabilities. However, due to the large number of available data analysis tools and corresponding visualization components, it is hard for both users and developers to get an overview of what is already available and which tools are suitable for their analysis. In addition, there is little cross-pollination between the fields of data visualization and metabolomics, leaving visual tools to be designed in a secondary and mostly ad hoc fashion. With this review, we aim to bridge the gap between the fields of untargeted metabolomics and data visualization. First, we introduce data visualization to the untargeted metabolomics field as a topic worthy of its own dedicated research, and provide a primer on cutting-edge visualization research into data visualization for both researchers as well as developers active in metabolomics. We extend this primer with a discussion of best practices for data visualization as they have emerged from data visualization studies. Second, we provide a practical roadmap to the visual tool landscape and its use within the untargeted metabolomics field. Here, for several computational analysis stages within the untargeted metabolomics workflow, we provide an overview of commonly used visual strategies with practical examples. In this context, we will also outline promising areas for further research and development. We end the review with a set of recommendations for developers and users on how to make the best use of visualizations for more effective and transparent communication of results.
Collapse
Affiliation(s)
- Kevin Mildau
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
| | - Henry Ehlers
- Visualization Group, Institute of Visual Computing and Human-Centered Technology, TU Wien, Vienna, Austria.
| | - Mara Meisenburg
- Adaptation Physiology Group, Wageningen University & Research, Wageningen, The Netherlands
| | - Elena Del Pup
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
| | - Robert A Koetsier
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
| | | | - Niek F de Jonge
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
| | - Kumar Saurabh Singh
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
- Maastricht University Faculty of Science and Engineering, Plant Functional Genomics Maastricht, Limburg, The Netherlands
- Faculty of Environment, Science and Economy, University of Exeter, Penryl Cornwall, UK
| | | | - Kgalaletso Othibeng
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Fidele Tugizimana
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Florian Huber
- Centre for Digitalisation and Digitality, Düsseldorf University of Applied Sciences, Düsseldorf, Germany
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University & Research, Wageningen, The Netherlands.
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| |
Collapse
|
4
|
Zhao J, Liu X, Tang H, Wang X, Yang S, Liu D, Chen Y, Chen YV. Mesoscopic structure graphs for interpreting uncertainty in non-linear embeddings. Comput Biol Med 2024; 182:109105. [PMID: 39265479 DOI: 10.1016/j.compbiomed.2024.109105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 07/06/2024] [Accepted: 09/01/2024] [Indexed: 09/14/2024]
Abstract
Probabilistic-based non-linear dimensionality reduction (PB-NL-DR) methods, such as t-SNE and UMAP, are effective in unfolding complex high-dimensional manifolds, allowing users to explore and understand the structural patterns of data. However, due to the trade-off between global and local structure preservation and the randomness during computation, these methods may introduce false neighborhood relationships, known as distortion errors and misleading visualizations. To address this issue, we first conduct a detailed survey to illustrate the design space of prior layout enrichment visualizations for interpreting DR results, and then propose a node-link visualization technique, ManiGraph. This technique rethinks the neighborhood fidelity between the high- and low-dimensional spaces by constructing dynamic mesoscopic structure graphs and measuring region-adapted trustworthiness. ManiGraph also addresses the overplotting issue in scatterplot visualization for large-scale datasets and supports examining in unsupervised scenarios. We demonstrate the effectiveness of ManiGraph in different analytical cases, including generic machine learning using 3D toy data illustrations and fashion-MNIST, a computational biology study using a single-cell RNA sequencing dataset, and a deep learning-enabled colorectal cancer study with histopathology-MNIST.
Collapse
Affiliation(s)
- Junhan Zhao
- Harvard Medical School, Boston, 02114, MA, USA; Harvard T.H.Chan School of Public Health, Boston, 02114, MA, USA; Purdue University, West Lafayette, 47907, IN, USA.
| | - Xiang Liu
- Purdue University, West Lafayette, 47907, IN, USA; Indiana University School of Medicine, Indianapolis, 46202, IN, USA.
| | - Hongping Tang
- Shenzhen Maternity and Child Healthcare Hospital, Shenzhen, 518048, China.
| | - Xiyue Wang
- Stanford University School of Medicine, Stanford, 94304, CA, USA.
| | - Sen Yang
- Stanford University School of Medicine, Stanford, 94304, CA, USA.
| | - Donfang Liu
- Rochester Institute of Technology, Rochester, 14623, NY, USA.
| | - Yijiang Chen
- Stanford University School of Medicine, Stanford, 94304, CA, USA.
| | | |
Collapse
|
5
|
Hilasaca GM, Marcilio-Jr WE, Eler DM, Martins RM, Paulovich FV. A Grid-Based Method for Removing Overlaps of Dimensionality Reduction Scatterplot Layouts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5733-5749. [PMID: 37647195 DOI: 10.1109/tvcg.2023.3309941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Dimensionality Reduction (DR) scatterplot layouts have become a ubiquitous visualization tool for analyzing multidimensional datasets. Despite their popularity, such scatterplots suffer from occlusion, especially when informative glyphs are used to represent data instances, potentially obfuscating critical information for the analysis under execution. Different strategies have been devised to address this issue, either producing overlap-free layouts that lack the powerful capabilities of contemporary DR techniques in uncovering interesting data patterns or eliminating overlaps as a post-processing strategy. Despite the good results of post-processing techniques, most of the best methods typically expand or distort the scatterplot area, thus reducing glyphs' size (sometimes) to unreadable dimensions, defeating the purpose of removing overlaps. This article presents Distance Grid (DGrid), a novel post-processing strategy to remove overlaps from DR layouts that faithfully preserves the original layout's characteristics and bounds the minimum glyph sizes. We show that DGrid surpasses the state-of-the-art in overlap removal (through an extensive comparative evaluation considering multiple different metrics) while also being one of the fastest techniques, especially for large datasets. A user study with 51 participants also shows that DGrid is consistently ranked among the top techniques for preserving the original scatterplots' visual characteristics and the aesthetics of the final results.
Collapse
|
6
|
Quadri GJ, Nieves JA, Wiernik BM, Rosen P. Automatic Scatterplot Design Optimization for Clustering Identification. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4312-4327. [PMID: 35816525 DOI: 10.1109/tvcg.2022.3189883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Scatterplots are among the most widely used visualization techniques. Compelling scatterplot visualizations improve understanding of data by leveraging visual perception to boost awareness when performing specific visual analytic tasks. Design choices in scatterplots, such as graphical encodings or data aspects, can directly impact decision-making quality for low-level tasks like clustering. Hence, constructing frameworks that consider both the perceptions of the visual encodings and the task being performed enables optimizing visualizations to maximize efficacy. In this article, we propose an automatic tool to optimize the design factors of scatterplots to reveal the most salient cluster structure. Our approach leverages the merge tree data structure to identify the clusters and optimize the choice of subsampling algorithm, sampling rate, marker size, and marker opacity used to generate a scatterplot image. We validate our approach with user and case studies that show it efficiently provides high-quality scatterplot designs from a large parameter space.
Collapse
|
7
|
Otten TM, Grimm SE, Ramaekers B, Joore MA. Comprehensive Review of Methods to Assess Uncertainty in Health Economic Evaluations. PHARMACOECONOMICS 2023; 41:619-632. [PMID: 36943674 PMCID: PMC10163110 DOI: 10.1007/s40273-023-01242-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 01/11/2023] [Indexed: 05/06/2023]
Abstract
Uncertainty assessment is a cornerstone in model-based health economic evaluations (HEEs) that inform reimbursement decisions. No comprehensive overview of available uncertainty assessment methods currently exists. We aimed to review methods for uncertainty assessment for use in model-based HEEs, by conducting a snowballing review. We categorised all methods according to their stage of use relating to uncertainty assessment (identification, analysis, communication). Additionally, we classified identification methods according to sources of uncertainty, and subdivided analysis and communication methods according to their purpose. The review identified a total of 80 uncertainty methods: 30 identification, 28 analysis, and 22 communication methods. Uncertainty identification methods exist to address uncertainty from different sources. Most identification methods were developed with the objective to assess related concepts such as validity, model quality, and relevance. Almost all uncertainty analysis and communication methods required uncertainty to be quantified and inclusion of uncertainties in probabilistic analysis. Our review can help analysts and decision makers in selecting uncertainty assessment methods according to their aim and purpose of the assessment. We noted a need for further clarification of terminology and guidance on the use of (combinations of) methods to identify uncertainty and related concepts such as validity and quality. A key finding is that uncertainty assessment relies heavily on quantification, which may necessitate increased use of expert elicitation and/or the development of methods to assess unquantified uncertainty.
Collapse
Affiliation(s)
- Thomas Michael Otten
- Department of Clinical Epidemiology and Medical Technology Assessment (KEMTA), P. Debyelaan 25, Oxford Building, PO Box 5800a, Maastricht, Limburg, The Netherlands.
| | - Sabine E Grimm
- Department of Clinical Epidemiology and Medical Technology Assessment (KEMTA), P. Debyelaan 25, Oxford Building, PO Box 5800a, Maastricht, Limburg, The Netherlands
| | - Bram Ramaekers
- Department of Clinical Epidemiology and Medical Technology Assessment (KEMTA), P. Debyelaan 25, Oxford Building, PO Box 5800a, Maastricht, Limburg, The Netherlands
| | - Manuela A Joore
- Department of Clinical Epidemiology and Medical Technology Assessment (KEMTA), P. Debyelaan 25, Oxford Building, PO Box 5800a, Maastricht, Limburg, The Netherlands
| |
Collapse
|
8
|
Brandao ILS, van der Molen J, van der Wal D. Effects of offshore wind farms on suspended particulate matter derived from satellite remote sensing. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 866:161114. [PMID: 36586698 DOI: 10.1016/j.scitotenv.2022.161114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 11/15/2022] [Accepted: 12/18/2022] [Indexed: 06/17/2023]
Affiliation(s)
- I L S Brandao
- NIOZ Royal Netherlands Institute for Sea Research, Dept of Estuarine and Delta Systems, P.O. Box 140, 4400 AC Yerseke, the Netherlands.
| | - J van der Molen
- NIOZ Royal Netherlands Institute for Sea Research, Dept of Coastal Systems, P.O. Box 59, 1790 AB Den Burg, Texel, the Netherlands.
| | - D van der Wal
- NIOZ Royal Netherlands Institute for Sea Research, Dept of Estuarine and Delta Systems, P.O. Box 140, 4400 AC Yerseke, the Netherlands; Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, P.O. Box 217, 7500 AE Enschede, the Netherlands.
| |
Collapse
|
9
|
Li Z, Shi R, Liu Y, Long S, Guo Z, Jia S, Zhang J. Dual Space Coupling Model Guided Overlap-Free Scatterplot. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:657-667. [PMID: 36260569 DOI: 10.1109/tvcg.2022.3209459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The overdraw problem of scatterplots seriously interferes with the visual tasks. Existing methods, such as data sampling, node dispersion, subspace mapping, and visual abstraction, cannot guarantee the correspondence and consistency between the data points that reflect the intrinsic original data distribution and the corresponding visual units that reveal the presented data distribution, thus failing to obtain an overlap-free scatterplot with unbiased and lossless data distribution. A dual space coupling model is proposed in this paper to represent the complex bilateral relationship between data space and visual space theoretically and analytically. Under the guidance of the model, an overlap-free scatterplot method is developed through integration of the following: a geometry-based data transformation algorithm, namely DistributionTranscriptor; an efficient spatial mutual exclusion guided view transformation algorithm, namely PolarPacking; an overlap-free oriented visual encoding configuration model and a radius adjustment tool, namely frdraw. Our method can ensure complete and accurate information transfer between the two spaces, maintaining consistency between the newly created scatterplot and the original data distribution on global and local features. Quantitative evaluation proves our remarkable progress on computational efficiency compared with the state-of-the-art methods. Three applications involving pattern enhancement, interaction improvement, and overdraw mitigation of trajectory visualization demonstrate the broad prospects of our method.
Collapse
|
10
|
Li S, Yu J, Li M, Liu L, Zhang XL, Yuan X. A Framework for Multiclass Contour Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:353-362. [PMID: 36194705 DOI: 10.1109/tvcg.2022.3209482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Multiclass contour visualization is often used to interpret complex data attributes in such fields as weather forecasting, computational fluid dynamics, and artificial intelligence. However, effective and accurate representations of underlying data patterns and correlations can be challenging in multiclass contour visualization, primarily due to the inevitable visual cluttering and occlusions when the number of classes is significant. To address this issue, visualization design must carefully choose design parameters to make visualization more comprehensible. With this goal in mind, we proposed a framework for multiclass contour visualization. The framework has two components: a set of four visualization design parameters, which are developed based on an extensive review of literature on contour visualization, and a declarative domain-specific language (DSL) for creating multiclass contour rendering, which enables a fast exploration of those design parameters. A task-oriented user study was conducted to assess how those design parameters affect users' interpretations of real-world data. The study results offered some suggestions on the value choices of design parameters in multiclass contour visualization.
Collapse
|
11
|
Zhu Z, Shen Y, Zhu S, Zhang G, Liang R, Sun G. Towards better pattern enhancement in temporal evolving set visualization. J Vis (Tokyo) 2022. [DOI: 10.1007/s12650-022-00896-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
12
|
What can scatterplots teach us about doing data science better? INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022. [DOI: 10.1007/s41060-022-00362-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
13
|
Sohns JT, Schmitt M, Jirasek F, Hasse H, Leitte H. Attribute-based Explanation of Non-Linear Embeddings of High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:540-550. [PMID: 34587086 DOI: 10.1109/tvcg.2021.3114870] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Embeddings of high-dimensional data are widely used to explore data, to verify analysis results, and to communicate information. Their explanation, in particular with respect to the input attributes, is often difficult. With linear projects like PCA the axes can still be annotated meaningfully. With non-linear projections this is no longer possible and alternative strategies such as attribute-based color coding are required. In this paper, we review existing augmentation techniques and discuss their limitations. We present the Non-Linear Embeddings Surveyor (NoLiES) that combines a novel augmentation strategy for projected data (rangesets) with interactive analysis in a small multiples setting. Rangesets use a set-based visualization approach for binned attribute values that enable the user to quickly observe structure and detect outliers. We detail the link between algebraic topology and rangesets and demonstrate the utility of NoLiES in case studies with various challenges (complex attribute value distribution, many attributes, many data points) and a real-world application to understand latent features of matrix completion in thermodynamics.
Collapse
|
14
|
Li C, Baciu G, Wang Y, Chen J, Wang C. DDLVis: Real-time Visual Query of Spatiotemporal Data Distribution via Density Dictionary Learning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1062-1072. [PMID: 34587020 DOI: 10.1109/tvcg.2021.3114762] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Visual query of spatiotemporal data is becoming an increasingly important function in visual analytics applications. Various works have been presented for querying large spatiotemporal data in real time. However, the real-time query of spatiotemporal data distribution is still an open challenge. As spatiotemporal data become larger, methods of aggregation, storage and querying become critical. We propose a new visual query system that creates a low-memory storage component and provides real-time visual interactions of spatiotemporal data. We first present a peak-based kernel density estimation method to produce the data distribution for the spatiotemporal data. Then a novel density dictionary learning approach is proposed to compress temporal density maps and accelerate the query calculation. Moreover, various intuitive query interactions are presented to interactively gain patterns. The experimental results obtained on three datasets demonstrate that the presented system offers an effective query for visual analytics of spatiotemporal data.
Collapse
|
15
|
Zhao Y, Wang Y, Zhang J, Fu CW, Xu M, Moritz D. KD-Box: Line-segment-based KD-tree for Interactive Exploration of Large-scale Time-Series Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:890-900. [PMID: 34587082 DOI: 10.1109/tvcg.2021.3114865] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Time-series data-usually presented in the form of lines-plays an important role in many domains such as finance, meteorology, health, and urban informatics. Yet, little has been done to support interactive exploration of large-scale time-series data, which requires a clutter-free visual representation with low-latency interactions. In this paper, we contribute a novel line-segment-based KD-tree method to enable interactive analysis of many time series. Our method enables not only fast queries over time series in selected regions of interest but also a line splatting method for efficient computation of the density field and selection of representative lines. Further, we develop KD-Box, an interactive system that provides rich interactions, e.g., timebox, attribute filtering, and coordinated multiple views. We demonstrate the effectiveness of KD-Box in supporting efficient line query and density field computation through a quantitative comparison and show its usefulness for interactive visual analysis on several real-world datasets.
Collapse
|
16
|
Hong MH, Witt JK, Szafir DA. The Weighted Average Illusion: Biases in Perceived Mean Position in Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:987-997. [PMID: 34596541 DOI: 10.1109/tvcg.2021.3114783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Scatterplots can encode a third dimension by using additional channels like size or color (e.g. bubble charts). We explore a potential misinterpretation of trivariate scatterplots, which we call the weighted average illusion, where locations of larger and darker points are given more weight toward x- and y-mean estimates. This systematic bias is sensitive to a designer's choice of size or lightness ranges mapped onto the data. In this paper, we quantify this bias against varying size/lightness ranges and data correlations. We discuss possible explanations for its cause by measuring attention given to individual data points using a vision science technique called the centroid method. Our work illustrates how ensemble processing mechanisms and mental shortcuts can significantly distort visual summaries of data, and can lead to misjudgments like the demonstrated weighted average illusion.
Collapse
|
17
|
Construct boundaries and place labels for multi-class scatterplots. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00791-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
18
|
IXVC: An interactive pipeline for explaining visual clusters in dimensionality reduction visualizations with decision trees. ARRAY 2021. [DOI: 10.1016/j.array.2021.100080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
19
|
Zhao J, Liu X, Guo C, Qian ZC, Chen YV. Phoenixmap: An Abstract Approach to Visualize 2D Spatial Distributions. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:2000-2014. [PMID: 31603789 DOI: 10.1109/tvcg.2019.2945960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The multidimensional nature of spatial data poses a challenge for visualization. In this paper, we introduce Phoenixmap, a simple abstract visualization method to address the issue of visualizing multiple spatial distributions at once. The Phoenixmap approach starts by identifying the enclosed outline of the point collection, then assigns different widths to outline segments according to the segments' corresponding inside regions. Thus, one 2D distribution is represented as an outline with varied thicknesses. Phoenixmap is capable of overlaying multiple outlines and comparing them across categories of objects in a 2D space. We chose heatmap as a benchmark spatial visualization method and conducted user studies to compare performances among Phoenixmap, heatmap, and dot distribution map. Based on the analysis and participant feedback, we demonstrate that Phoenixmap 1) allows users to perceive and compare spatial distribution data efficiently; 2) frees up graphics space with a concise form that can provide visualization design possibilities like overlapping; and 3) provides a good quantitative perceptual estimating capability given the proper legends. Finally, we discuss several possible applications of Phoenixmap and present one visualization of multiple species of birds' active regions in a nature preserve.
Collapse
|
20
|
Tao W, Hou X, Sah A, Battle L, Chang R, Stonebraker M. Kyrix-S: Authoring Scalable Scatterplot Visualizations of Big Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:401-411. [PMID: 33048700 DOI: 10.1109/tvcg.2020.3030372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Static scatterplots often suffer from the overdraw problem on big datasets where object overlap causes undesirable visual clutter. The use of zooming in scatterplots can help alleviate this problem. With multiple zoom levels, more screen real estate is available, allowing objects to be placed in a less crowded way. We call this type of visualization scalable scatterplot visualizations, or SSV for short. Despite the potential of SSVs, existing systems and toolkits fall short in supporting the authoring of SSVs due to three limitations. First, many systems have limited scalability, assuming that data fits in the memory of one computer. Second, too much developer work, e.g., using custom code to generate mark layouts or render objects, is required. Third, many systems focus on only a small subset of the SSV design space (e.g. supporting a specific type of visual marks). To address these limitations, we have developed Kyrix-S, a system for easy authoring of SSVs at scale. Kyrix-S derives a declarative grammar that enables specification of a variety of SSVs in a few tens of lines of code, based on an existing survey of scatterplot tasks and designs. The declarative grammar is supported by a distributed layout algorithm which automatically places visual marks onto zoom levels. We store data in a multi-node database and use multi-node spatial indexes to achieve interactive browsing of large SSVs. Extensive experiments show that 1) Kyrix-S enables interactive browsing of SSVs of billions of objects, with response times under 500ms and 2) Kyrix-S achieves 4X-9X reduction in specification compared to a state-of-the-art authoring system.
Collapse
|
21
|
Yuan J, Xiang S, Xia J, Yu L, Liu S. Evaluation of Sampling Methods for Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1720-1730. [PMID: 33074820 DOI: 10.1109/tvcg.2020.3030432] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Given a scatterplot with tens of thousands of points or even more, a natural question is which sampling method should be used to create a small but "good" scatterplot for a better abstraction. We present the results of a user study that investigates the influence of different sampling strategies on multi-class scatterplots. The main goal of this study is to understand the capability of sampling methods in preserving the density, outliers, and overall shape of a scatterplot. To this end, we comprehensively review the literature and select seven typical sampling strategies as well as eight representative datasets. We then design four experiments to understand the performance of different strategies in maintaining: 1) region density; 2) class density; 3) outliers; and 4) overall shape in the sampling results. The results show that: 1) random sampling is preferred for preserving region density; 2) blue noise sampling and random sampling have comparable performance with the three multi-class sampling strategies in preserving class density; 3) outlier biased density based sampling, recursive subdivision based sampling, and blue noise sampling perform the best in keeping outliers; and 4) blue noise sampling outperforms the others in maintaining the overall shape of a scatterplot.
Collapse
|
22
|
Ma Y, Fan A, He J, Nelakurthi AR, Maciejewski R. A Visual Analytics Framework for Explaining and Diagnosing Transfer Learning Processes. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1385-1395. [PMID: 33035164 DOI: 10.1109/tvcg.2020.3028888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Many statistical learning models hold an assumption that the training data and the future unlabeled data are drawn from the same distribution. However, this assumption is difficult to fulfill in real-world scenarios and creates barriers in reusing existing labels from similar application domains. Transfer Learning is intended to relax this assumption by modeling relationships between domains, and is often applied in deep learning applications to reduce the demand for labeled data and training time. Despite recent advances in exploring deep learning models with visual analytics tools, little work has explored the issue of explaining and diagnosing the knowledge transfer process between deep learning models. In this paper, we present a visual analytics framework for the multi-level exploration of the transfer learning processes when training deep neural networks. Our framework establishes a multi-aspect design to explain how the learned knowledge from the existing model is transferred into the new learning task when training deep neural networks. Based on a comprehensive requirement and task analysis, we employ descriptive visualization with performance measures and detailed inspections of model behaviors from the statistical, instance, feature, and model structure levels. We demonstrate our framework through two case studies on image classification by fine-tuning AlexNets to illustrate how analysts can utilize our framework.
Collapse
|
23
|
Rapp T, Peters C, Dachsbacher C. Visual Analysis of Large Multivariate Scattered Data using Clustering and Probabilistic Summaries. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1580-1590. [PMID: 33048705 DOI: 10.1109/tvcg.2020.3030379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Rapidly growing data sizes of scientific simulations pose significant challenges for interactive visualization and analysis techniques. In this work, we propose a compact probabilistic representation to interactively visualize large scattered datasets. In contrast to previous approaches that represent blocks of volumetric data using probability distributions, we model clusters of arbitrarily structured multivariate data. In detail, we discuss how to efficiently represent and store a high-dimensional distribution for each cluster. We observe that it suffices to consider low-dimensional marginal distributions for two or three data dimensions at a time to employ common visual analysis techniques. Based on this observation, we represent high-dimensional distributions by combinations of low-dimensional Gaussian mixture models. We discuss the application of common interactive visual analysis techniques to this representation. In particular, we investigate several frequency-based views, such as density plots in 1D and 2D, density-based parallel coordinates, and a time histogram. We visualize the uncertainty introduced by the representation, discuss a level-of-detail mechanism, and explicitly visualize outliers. Furthermore, we propose a spatial visualization by splatting anisotropic 3D Gaussians for which we derive a closed-form solution. Lastly, we describe the application of brushing and linking to this clustered representation. Our evaluation on several large, real-world datasets demonstrates the scaling of our approach.
Collapse
|
24
|
Quadri GJ, Rosen P. Modeling the Influence of Visual Density on Cluster Perception in Scatterplots Using Topology. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1829-1839. [PMID: 33048695 DOI: 10.1109/tvcg.2020.3030365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Scatterplots are used for a variety of visual analytics tasks, including cluster identification, and the visual encodings used on a scatterplot play a deciding role on the level of visual separation of clusters. For visualization designers, optimizing the visual encodings is crucial to maximizing the clarity of data. This requires accurately modeling human perception of cluster separation, which remains challenging. We present a multi-stage user study focusing on four factors-distribution size of clusters, number of points, size of points, and opacity of points-that influence cluster identification in scatterplots. From these parameters, we have constructed two models, a distance-based model, and a density-based model, using the merge tree data structure from Topological Data Analysis. Our analysis demonstrates that these factors play an important role in the number of clusters perceived, and it verifies that the distance-based and density-based models can reasonably estimate the number of clusters a user observes. Finally, we demonstrate how these models can be used to optimize visual encodings on real-world data.
Collapse
|
25
|
Ma Y, Maciejewski R. Visual Analysis of Class Separations With Locally Linear Segments. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:241-253. [PMID: 32746282 DOI: 10.1109/tvcg.2020.3011155] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
High-dimensional labeled data widely exists in many real-world applications such as classification and clustering. One main task in analyzing such datasets is to explore class separations and class boundaries derived from machine learning models. Dimension reduction techniques are commonly applied to support analysts in exploring the underlying decision boundary structures by depicting a low-dimensional representation of the data distributions from multiple classes. However, such projection-based analyses are limited due to their lack of ability to show separations in complex non-linear decision boundary structures and can suffer from heavy distortion and low interpretability. To overcome these issues of separability and interpretability, we propose a visual analysis approach that utilizes the power of explainability from linear projections to support analysts when exploring non-linear separation structures. Our approach is to extract a set of locally linear segments that approximate the original non-linear separations. Unlike traditional projection-based analysis where the data instances are mapped to a single scatterplot, our approach supports the exploration of complex class separations through multiple local projection results. We conduct case studies on two labeled datasets to demonstrate the effectiveness of our approach.
Collapse
|
26
|
Patashnik O, Lu M, Bermano AH, Cohen-Or D. Temporal scatterplots. COMPUTATIONAL VISUAL MEDIA 2020; 6:385-400. [PMID: 33194253 PMCID: PMC7648217 DOI: 10.1007/s41095-020-0197-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 09/12/2020] [Indexed: 06/11/2023]
Abstract
Visualizing high-dimensional data on a 2D canvas is generally challenging. It becomes significantly more difficult when multiple time-steps are to be presented, as the visual clutter quickly increases. Moreover, the challenge to perceive the significant temporal evolution is even greater. In this paper, we present a method to plot temporal high-dimensional data in a static scatterplot; it uses the established PCA technique to project data from multiple time-steps. The key idea is to extend each individual displacement prior to applying PCA, so as to skew the projection process, and to set a projection plane that balances the directions of temporal change and spatial variance. We present numerous examples and various visual cues to highlight the data trajectories, and demonstrate the effectiveness of the method for visualizing temporal data.
Collapse
Affiliation(s)
| | - Min Lu
- Shenzhen University, Shenzhen, China
| | | | | |
Collapse
|
27
|
Phenotyping chronic tinnitus patients using self-report questionnaire data: cluster analysis and visual comparison. Sci Rep 2020; 10:16411. [PMID: 33009468 PMCID: PMC7532444 DOI: 10.1038/s41598-020-73402-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 09/16/2020] [Indexed: 12/02/2022] Open
Abstract
Chronic tinnitus is a complex, multi-factorial symptom that requires careful assessment and management. Evidence-based therapeutic approaches involve audiological and psychological treatment components. However, not everyone benefits from treatment. The identification and characterisation of patient subgroups (or “phenotypes”) may provide clinically relevant information. Due to the large number of assessment tools, data-driven methods appear to be promising. The acceptance of these empirical results can be further strengthened by a comprehensive visualisation. In this study, we used cluster analysis to identify distinct tinnitus phenotypes based on self-report questionnaire data and implemented a visualisation tool to explore phenotype idiosyncrasies. 1228 patients with chronic tinnitus from the Charité Tinnitus Center in Berlin were included. At baseline, each participant completed 14 questionnaires measuring tinnitus distress, -loudness, frequency and location, depressivity, perceived stress, quality of life, physical and mental health, pain perception, somatic symptom expression and coping attitudes. Four distinct patient phenotypes emerged from clustering: avoidant group (56.8%), psychosomatic group (14.1%), somatic group (15.2%), and distress group (13.9%). Radial bar- and line charts allowed for visual inspection and juxtaposition of major phenotype characteristics. The phenotypes differed in terms of clinical information including psychological symptoms, quality of life, coping attitudes, stress, tinnitus-related distress and pain, as well as socio-demographics. Our findings suggest that identifiable patient subgroups and their visualisation may allow for stratified treatment strategies and research designs.
Collapse
|
28
|
Zhang Y, Yu C, Wang R, Liu X. Visual dimension analysis based on dimension subdivision. J Vis (Tokyo) 2020. [DOI: 10.1007/s12650-020-00694-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
29
|
Lu M, Wang S, Lanir J, Fish N, Yue Y, Cohen-Or D, Huang H. Winglets: Visualizing Association with Uncertainty in Multi-class Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:770-779. [PMID: 31562094 DOI: 10.1109/tvcg.2019.2934811] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This work proposes Winglets, an enhancement to the classic scatterplot to better perceptually pronounce multiple classes by improving the perception of association and uncertainty of points to their related cluster. Designed as a pair of dual-sided strokes belonging to a data point, Winglets leverage the Gestalt principle of Closure to shape the perception of the form of the clusters, rather than use an explicit divisive encoding. Through a subtle design of two dominant attributes, length and orientation, Winglets enable viewers to perform a mental completion of the clusters. A controlled user study was conducted to examine the efficiency of Winglets in perceiving the cluster association and the uncertainty of certain points. The results show Winglets form a more prominent association of points into clusters and improve the perception of associating uncertainty.
Collapse
|
30
|
Chen X, Ge T, Zhang J, Chen B, Fu CW, Deussen O, Wang Y. A Recursive Subdivision Technique for Sampling Multi-class Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:729-738. [PMID: 31442987 DOI: 10.1109/tvcg.2019.2934541] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a non-uniform recursive sampling technique for multi-class scatterplots, with the specific goal of faithfully presenting relative data and class densities, while preserving major outliers in the plots. Our technique is based on a customized binary kd-tree, in which leaf nodes are created by recursively subdividing the underlying multi-class density map. By backtracking, we merge leaf nodes until they encompass points of all classes for our subsequently applied outlier-aware multi-class sampling strategy. A quantitative evaluation shows that our approach can better preserve outliers and at the same time relative densities in multi-class scatterplots compared to the previous approaches, several case studies demonstrate the effectiveness of our approach in exploring complex and real world data.
Collapse
|
31
|
Ma Y, Xie T, Li J, Maciejewski R. Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1075-1085. [PMID: 31478859 DOI: 10.1109/tvcg.2019.2934631] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Machine learning models are currently being deployed in a variety of real-world applications where model predictions are used to make decisions about healthcare, bank loans, and numerous other critical tasks. As the deployment of artificial intelligence technologies becomes ubiquitous, it is unsurprising that adversaries have begun developing methods to manipulate machine learning models to their advantage. While the visual analytics community has developed methods for opening the black box of machine learning models, little work has focused on helping the user understand their model vulnerabilities in the context of adversarial attacks. In this paper, we present a visual analytics framework for explaining and exploring model vulnerabilities to adversarial attacks. Our framework employs a multi-faceted visualization scheme designed to support the analysis of data poisoning attacks from the perspective of models, data instances, features, and local structures. We demonstrate our framework through two case studies on binary classifiers and illustrate model vulnerabilities with respect to varying attack strategies.
Collapse
|
32
|
Liu S, Wang D, Maljovec D, Anirudh R, Thiagarajan JJ, Jacobs SA, Van Essen BC, Hysom D, Yeom JS, Gaffney J, Peterson L, Robinson PB, Bhatia H, Pascucci V, Spears BK, Bremer PT. Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:291-300. [PMID: 31484123 DOI: 10.1109/tvcg.2019.2934594] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural networks) calls for advanced techniques in exploring and interpreting model behaviors. Second, the rapid growth in computing has produced enormous datasets that require techniques that can handle millions or more samples. Although some solutions to these interpretability challenges have been proposed, they typically do not scale beyond thousands of samples, nor do they provide the high-level intuition scientists are looking for. Here, we present the first scalable solution to explore and analyze high-dimensional functions often encountered in the scientific data analysis pipeline. By combining a new streaming neighborhood graph construction, the corresponding topology computation, and a novel data aggregation scheme, namely topology aware datacubes, we enable interactive exploration of both the topological and the geometric aspect of high-dimensional data. Following two use cases from high-energy-density (HED) physics and computational biology, we demonstrate how these capabilities have led to crucial new insights in both applications.
Collapse
|
33
|
Hu R, Sha T, Van Kaick O, Deussen O, Huang H. Data Sampling in Multi-view and Multi-class Scatterplots via Set Cover Optimization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:739-748. [PMID: 31443021 DOI: 10.1109/tvcg.2019.2934799] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a method for data sampling in scatterplots by jointly optimizing point selection for different views or classes. Our method uses space-filling curves (Z-order curves) that partition a point set into subsets that, when covered each by one sample, provide a sampling or coreset with good approximation guarantees in relation to the original point set. For scatterplot matrices with multiple views, different views provide different space-filling curves, leading to different partitions of the given point set. For multi-class scatterplots, the focus on either per-class distribution or global distribution provides two different partitions of the given point set that need to be considered in the selection of the coreset. For both cases, we convert the coreset selection problem into an Exact Cover Problem (ECP), and demonstrate with quantitative and qualitative evaluations that an approximate solution that solves the ECP efficiently is able to provide high-quality samplings.
Collapse
|
34
|
Nonato LG, Aupetit M. Multidimensional Projection for Visual Analytics: Linking Techniques with Distortions, Tasks, and Layout Enrichment. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2650-2673. [PMID: 29994258 DOI: 10.1109/tvcg.2018.2846735] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visual analysis of multidimensional data requires expressive and effective ways to reduce data dimensionality to encode them visually. Multidimensional projections (MDP) figure among the most important visualization techniques in this context, transforming multidimensional data into scatter plots whose visual patterns reflect some notion of similarity in the original data. However, MDP come with distortions that make these visual patterns not trustworthy, hindering users to infer actual data characteristics. Moreover, the patterns present in the scatter plots might not be enough to allow a clear understanding of multidimensional data, motivating the development of layout enrichment methodologies to operate together with MDP. This survey attempts to cover the main aspects of MDP as a visualization and visual analytic tool. It provides detailed analysis and taxonomies as to the organization of MDP techniques according to their main properties and traits, discussing the impact of such properties for visual perception and other human factors. The survey also approaches the different types of distortions that can result from MDP mappings and it overviews existing mechanisms to quantitatively evaluate such distortions. A qualitative analysis of the impact of distortions on the different analytic tasks performed by users when exploring multidimensional data through MDP is also presented. Guidelines for choosing the best MDP for an intended task are also provided as a result of this analysis. Finally, layout enrichment schemes to debunk MDP distortions and/or reveal relevant information not directly inferable from the scatter plot are reviewed and discussed in the light of new taxonomies. We conclude the survey providing future research axes to fill discovered gaps in this domain.
Collapse
|
35
|
Luo X, Yuan Y, Zhang K, Xia J, Zhou Z, Chang L, Gu T. Enhancing statistical charts: toward better data visualization and analysis. J Vis (Tokyo) 2019. [DOI: 10.1007/s12650-019-00569-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
36
|
Raidou RG, Groller ME, Eisemann M. Relaxing Dense Scatter Plots with Pixel-Based Mappings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2205-2216. [PMID: 30892214 DOI: 10.1109/tvcg.2019.2903956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Scatter plots are the most commonly employed technique for the visualization of bivariate data. Despite their versatility and expressiveness in showing data aspects, such as clusters, correlations, and outliers, scatter plots face a main problem. For large and dense data, the representation suffers from clutter due to overplotting. This is often partially solved with the use of density plots. Yet, data overlap may occur in certain regions of a scatter or density plot, while other regions may be partially, or even completely empty. Adequate pixel-based techniques can be employed for effectively filling the plotting space, giving an additional notion of the numerosity of data motifs or clusters. We propose the Pixel-Relaxed Scatter Plots, a new and simple variant, to improve the display of dense scatter plots, using pixel-based, space-filling mappings. Our Pixel-Relaxed Scatter Plots make better use of the plotting canvas, while avoiding data overplotting, and optimizing space coverage and insight in the presence and size of data motifs. We have employed different methods to map scatter plot points to pixels and to visually present this mapping. We demonstrate our approach on several synthetic and realistic datasets, and we discuss the suitability of our technique for different tasks. Our conducted user evaluation shows that our Pixel-Relaxed Scatter Plots can be a useful enhancement to traditional scatter plots.
Collapse
|
37
|
Abstract
The contact center industry represents a large proportion of many country’s economies. For example, 4% of the entire United States and UK’s working population is employed in this sector. As in most modern industries, contact centers generate gigabytes of operational data that require analysis to provide insight and to improve efficiency. Visualization is a valuable approach to data analysis, enabling trends and correlations to be discovered, particularly when using scatterplots. We present a feature-rich application that visualizes large call center data sets using scatterplots that support millions of points. The application features a scatterplot matrix to provide an overview of the call center data attributes, animation of call start and end times, and utilizes both the CPU and GPU acceleration for processing and filtering. We illustrate the use of the Open Computing Language (OpenCL) to utilize a commodity graphics card for the fast filtering of fields with multiple attributes. We demonstrate the use of the application with millions of call events from a month’s worth of real-world data and report domain expert feedback from our industry partner.
Collapse
|
38
|
Zhang J, Wang Y, Molino P, Li L, Ebert DS. Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:364-373. [PMID: 30130197 DOI: 10.1109/tvcg.2018.2864499] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Interpretation and diagnosis of machine learning models have gained renewed interest in recent years with breakthroughs in new approaches. We present Manifold, a framework that utilizes visual analysis techniques to support interpretation, debugging, and comparison of machine learning models in a more transparent and interactive manner. Conventional techniques usually focus on visualizing the internal logic of a specific model type (i.e., deep neural networks), lacking the ability to extend to a more complex scenario where different model types are integrated. To this end, Manifold is designed as a generic framework that does not rely on or access the internal logic of the model and solely observes the input (i.e., instances or features) and the output (i.e., the predicted result and probability distribution). We describe the workflow of Manifold as an iterative process consisting of three major phases that are commonly involved in the model development and diagnosis process: inspection (hypothesis), explanation (reasoning), and refinement (verification). The visual components supporting these tasks include a scatterplot-based visual summary that overviews the models' outcome and a customizable tabular view that reveals feature discrimination. We demonstrate current applications of the framework on the classification and regression tasks and discuss other potential machine learning use scenarios where Manifold can be applied.
Collapse
|
39
|
Liao H, Wu Y, Chen L, Chen W. Cluster-Based Visual Abstraction for Multivariate Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:2531-2545. [PMID: 28945596 DOI: 10.1109/tvcg.2017.2754480] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The use of scatterplots is an important method for multivariate data visualization. The point distribution on the scatterplot, along with variable values represented by each point, can help analyze underlying patterns in data. However, determining the multivariate data variation on a scatterplot generated using projection methods, such as multidimensional scaling, is difficult. Furthermore, the point distribution becomes unclear when the data scale is large and clutter problems occur. These conditions can significantly decrease the usability of scatterplots on multivariate data analysis. In this study, we present a cluster-based visual abstraction method to enhance the visualization of multivariate scatterplots. Our method leverages an adapted multilabel clustering method to provide abstractions of high quality for scatterplots. An image-based method is used to deal with large scale data problem. Furthermore, a suite of glyphs is designed to visualize the data at different levels of detail and support data exploration. The view coordination between the glyph-based visualization and the table lens can effectively enhance the multivariate data analysis. Through numerical evaluations for data abstraction quality, case studies and a user study, we demonstrate the effectiveness and usability of the proposed techniques for multivariate data analysis on scatterplots.
Collapse
|
40
|
Jo J, Vernier F, Dragicevic P, Fekete JD. A Declarative Rendering Model for Multiclass Density Maps. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:470-480. [PMID: 30136987 DOI: 10.1109/tvcg.2018.2865141] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Multiclass maps are scatterplots, multidimensional projections, or thematic geographic maps where data points have a categorical attribute in addition to two quantitative attributes. This categorical attribute is often rendered using shape or color, which does not scale when overplotting occurs. When the number of data points increases, multiclass maps must resort to data aggregation to remain readable. We present multiclass density maps: multiple 2D histograms computed for each of the category values. Multiclass density maps are meant as a building block to improve the expressiveness and scalability of multiclass map visualization. In this article, we first present a short survey of aggregated multiclass maps, mainly from cartography. We then introduce a declarative model-a simple yet expressive JSON grammar associated with visual semantics-that specifies a wide design space of visualizations for multiclass density maps. Our declarative model is expressive and can be efficiently implemented in visualization front-ends such as modern web browsers. Furthermore, it can be reconfigured dynamically to support data exploration tasks without recomputing the raw data. Finally, we demonstrate how our model can be used to reproduce examples from the past and support exploring data at scale.
Collapse
|
41
|
Lehmann DJ, Theisel H. The LloydRelaxer: An Approach to Minimize Scaling Effects for Multivariate Projections. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:2424-2439. [PMID: 28534778 DOI: 10.1109/tvcg.2017.2705189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Star Coordinates are a popular projection technique in order to analyze and to disclose characteristic patterns of multidimensional data. Unfortunately, the shape, appearance, and distribution of such patterns are strongly affected by the given scaling of the data and can mislead the projection-based data analysis. In an extreme case, patterns might be more related to the choice of scaling than to the data themselves. Thus, we present the LloydRelaxer : a tool to minimize scaling-based effects in Star Coordinates. Our algorithm enforces a scaling configuration for which the data explains the observed patterns better than any scaling of them could do. It does so by an iterative minimizing and optimization process based on Voronoi diagrams and on the Lloyd relaxation within the projection space. We evaluate and test our approach by real benchmark multidimensional data of the UCI data repository.
Collapse
|
42
|
Boivin V, Deschamps-Francoeur G, Couture S, Nottingham RM, Bouchard-Bourelle P, Lambowitz AM, Scott MS, Abou-Elela S. Simultaneous sequencing of coding and noncoding RNA reveals a human transcriptome dominated by a small number of highly expressed noncoding genes. RNA (NEW YORK, N.Y.) 2018; 24:950-965. [PMID: 29703781 PMCID: PMC6004057 DOI: 10.1261/rna.064493.117] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Accepted: 04/24/2018] [Indexed: 06/01/2023]
Abstract
Comparing the abundance of one RNA molecule to another is crucial for understanding cellular functions but most sequencing techniques can target only specific subsets of RNA. In this study, we used a new fragmented ribodepleted TGIRT sequencing method that uses a thermostable group II intron reverse transcriptase (TGIRT) to generate a portrait of the human transcriptome depicting the quantitative relationship of all classes of nonribosomal RNA longer than 60 nt. Comparison between different sequencing methods indicated that FRT is more accurate in ranking both mRNA and noncoding RNA than viral reverse transcriptase-based sequencing methods, even those that specifically target these species. Measurements of RNA abundance in different cell lines using this method correlate with biochemical estimates, confirming tRNA as the most abundant nonribosomal RNA biotype. However, the single most abundant transcript is 7SL RNA, a component of the signal recognition particle. Structured noncoding RNAs (sncRNAs) associated with the same biological process are expressed at similar levels, with the exception of RNAs with multiple functions like U1 snRNA. In general, sncRNAs forming RNPs are hundreds to thousands of times more abundant than their mRNA counterparts. Surprisingly, only 50 sncRNA genes produce half of the non-rRNA transcripts detected in two different cell lines. Together the results indicate that the human transcriptome is dominated by a small number of highly expressed sncRNAs specializing in functions related to translation and splicing.
Collapse
Affiliation(s)
- Vincent Boivin
- Département de biochimie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec J1E 4K8, Canada
| | - Gabrielle Deschamps-Francoeur
- Département de biochimie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec J1E 4K8, Canada
| | - Sonia Couture
- Département de microbiologie et d'infectiologie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec J1E 4K8, Canada
| | - Ryan M Nottingham
- Institute for Cellular and Molecular Biology and Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas 78712, USA
| | - Philia Bouchard-Bourelle
- Département de biochimie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec J1E 4K8, Canada
| | - Alan M Lambowitz
- Institute for Cellular and Molecular Biology and Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas 78712, USA
| | - Michelle S Scott
- Département de biochimie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec J1E 4K8, Canada
| | - Sherif Abou-Elela
- Département de microbiologie et d'infectiologie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec J1E 4K8, Canada
| |
Collapse
|
43
|
Li C, Baciu G, Han Y. StreamMap: Smooth Dynamic Visualization of High-Density Streaming Points. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:1381-1393. [PMID: 28212089 DOI: 10.1109/tvcg.2017.2668409] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Interactive visualization of streaming points for real-time scatterplots and linear blending of correlation patterns is increasingly becoming the dominant mode of visual analytics for both big data and streaming data from active sensors and broadcasting media. To better visualize and interact with inter-stream patterns, it is generally necessary to smooth out gaps or distortions in the streaming data. Previous approaches either animate the points directly or present a sampled static heat-map. We propose a new approach, called StreamMap, to smoothly blend high-density streaming points and create a visual flow that emphasizes the density pattern distributions. In essence, we present three new contributions for the visualization of high-density streaming points. The first contribution is a density-based method called super kernel density estimation that aggregates streaming points using an adaptive kernel to solve the overlapping problem. The second contribution is a robust density morphing algorithm that generates several smooth intermediate frames for a given pair of frames. The third contribution is a trend representation design that can help convey the flow directions of the streaming points. The experimental results on three datasets demonstrate the effectiveness of StreamMap when dynamic visualization and visual analysis of trend patterns on streaming points are required.
Collapse
|
44
|
Wang Y, Han F, Zhu L, Deussen O, Chen B. Line Graph or Scatter Plot? Automatic Selection of Methods for Visualizing Trends in Time Series. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:1141-1154. [PMID: 28092562 DOI: 10.1109/tvcg.2017.2653106] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Line graphs are usually considered to be the best choice for visualizing time series data, whereas sometimes also scatter plots are used for showing main trends. So far there are no guidelines that indicate which of these visualization methods better display trends in time series for a given canvas. Assuming that the main information in a time series is its overall trend, we propose an algorithm that automatically picks the visualization method that reveals this trend best. This is achieved by measuring the visual consistency between the trend curve represented by a LOESS fit and the trend described by a scatter plot or a line graph. To measure the consistency between our algorithm and user choices, we performed an empirical study with a series of controlled experiments that show a large correspondence. In a factor analysis we furthermore demonstrate that various visual and data factors have effects on the preference for a certain type of visualization.
Collapse
|
45
|
Sarikaya A, Gleicher M. Scatterplots: Tasks, Data, and Designs. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:402-412. [PMID: 28866528 DOI: 10.1109/tvcg.2017.2744184] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Traditional scatterplots fail to scale as the complexity and amount of data increases. In response, there exist many design options that modify or expand the traditional scatterplot design to meet these larger scales. This breadth of design options creates challenges for designers and practitioners who must select appropriate designs for particular analysis goals. In this paper, we help designers in making design choices for scatterplot visualizations. We survey the literature to catalog scatterplot-specific analysis tasks. We look at how data characteristics influence design decisions. We then survey scatterplot-like designs to understand the range of design options. Building upon these three organizations, we connect data characteristics, analysis tasks, and design choices in order to generate challenges, open questions, and example best practices for the effective design of scatterplots.
Collapse
|
46
|
Kwon BC, Eysenbach B, Verma J, Ng K, De Filippi C, Stewart WF, Perer A. Clustervision: Visual Supervision of Unsupervised Clustering. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:142-151. [PMID: 28866567 DOI: 10.1109/tvcg.2017.2745085] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.
Collapse
|
47
|
Wenskovitch J, Crandell I, Ramakrishnan N, House L, Leman S, North C. Towards a Systematic Combination of Dimension Reduction and Clustering in Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:131-141. [PMID: 28866581 DOI: 10.1109/tvcg.2017.2745258] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Dimension reduction algorithms and clustering algorithms are both frequently used techniques in visual analytics. Both families of algorithms assist analysts in performing related tasks regarding the similarity of observations and finding groups in datasets. Though initially used independently, recent works have incorporated algorithms from each family into the same visualization systems. However, these algorithmic combinations are often ad hoc or disconnected, working independently and in parallel rather than integrating some degree of interdependence. A number of design decisions must be addressed when employing dimension reduction and clustering algorithms concurrently in a visualization system, including the selection of each algorithm, the order in which they are processed, and how to present and interact with the resulting projection. This paper contributes an overview of combining dimension reduction and clustering into a visualization system, discussing the challenges inherent in developing a visualization system that makes use of both families of algorithms.
Collapse
|
48
|
Micallef L, Palmas G, Oulasvirta A, Weinkauf T. Towards Perceptual Optimization of the Visual Design of Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:1588-1599. [PMID: 28252407 DOI: 10.1109/tvcg.2017.2674978] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Designing a good scatterplot can be difficult for non-experts in visualization, because they need to decide on many parameters, such as marker size and opacity, aspect ratio, color, and rendering order. This paper contributes to research exploring the use of perceptual models and quality metrics to set such parameters automatically for enhanced visual quality of a scatterplot. A key consideration in this paper is the construction of a cost function to capture several relevant aspects of the human visual system, examining a scatterplot design for some data analysis task. We show how the cost function can be used in an optimizer to search for the optimal visual design for a user's dataset and task objectives (e.g., "reliable linear correlation estimation is more important than class separation"). The approach is extensible to different analysis tasks. To test its performance in a realistic setting, we pre-calibrated it for correlation estimation, class separation, and outlier detection. The optimizer was able to produce designs that achieved a level of speed and success comparable to that of those using human-designed presets (e.g., in R or MATLAB). Case studies demonstrate that the approach can adapt a design to the data, to reveal patterns without user intervention.
Collapse
|
49
|
Kew W, Blackburn JW, Clarke DJ, Uhrín D. Interactive van Krevelen diagrams - Advanced visualisation of mass spectrometry data of complex mixtures. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2017; 31:658-662. [PMID: 28063248 PMCID: PMC5324645 DOI: 10.1002/rcm.7823] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Revised: 01/05/2017] [Accepted: 01/05/2017] [Indexed: 05/11/2023]
Affiliation(s)
- William Kew
- EaStCHEM, School of ChemistryUniversity of EdinburghEdinburghEH9 3FJUK
| | | | - David J. Clarke
- EaStCHEM, School of ChemistryUniversity of EdinburghEdinburghEH9 3FJUK
| | - Dušan Uhrín
- EaStCHEM, School of ChemistryUniversity of EdinburghEdinburghEH9 3FJUK
| |
Collapse
|
50
|
Liu S, Maljovec D, Wang B, Bremer PT, Pascucci V. Visualizing High-Dimensional Data: Advances in the Past Decade. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:1249-1268. [PMID: 28113321 DOI: 10.1109/tvcg.2016.2640960] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Massive simulations and arrays of sensing devices, in combination with increasing computing resources, have generated large, complex, high-dimensional datasets used to study phenomena across numerous fields of study. Visualization plays an important role in exploring such datasets. We provide a comprehensive survey of advances in high-dimensional data visualization that focuses on the past decade. We aim at providing guidance for data practitioners to navigate through a modular view of the recent advances, inspiring the creation of new visualizations along the enriched visualization pipeline, and identifying future opportunities for visualization research.
Collapse
|