1
|
Braun D, Chang R, Gleicher M, von Landesberger T. Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:787-797. [PMID: 39255144 DOI: 10.1109/tvcg.2024.3456305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Visual validation of regression models in scatterplots is a common practice for assessing model quality, yet its efficacy remains unquantified. We conducted two empirical experiments to investigate individuals' ability to visually validate linear regression models (linear trends) and to examine the impact of common visualization designs on validation quality. The first experiment showed that the level of accuracy for visual estimation of slope (i.e., fitting a line to data) is higher than for visual validation of s lope (i.e., accepting a shown line). Notably, we found bias toward slopes that are "too steep" in both cases. This lead to novel insights that participants naturally assessed regression with orthogonal distances between the points and the line (i.e., ODR regression) rather than the common vertical distances (OLS regression). In the second experiment, we investigated whether incorporating common designs for regression visualization (error lines, bounding boxes, and confidence intervals) would improve visual validation. Even though error lines reduced validation bias, results failed to show the desired improvements in accuracy for any design. Overall, our findings suggest caution in using visual model validation for linear trends in scatterplots.
Collapse
|
2
|
Shatz I. Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics. Behav Res Methods 2024; 56:826-845. [PMID: 36869217 PMCID: PMC10830673 DOI: 10.3758/s13428-023-02072-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/17/2023] [Indexed: 03/05/2023]
Abstract
Statistical methods generally have assumptions (e.g., normality in linear regression models). Violations of these assumptions can cause various issues, like statistical errors and biased estimates, whose impact can range from inconsequential to critical. Accordingly, it is important to check these assumptions, but this is often done in a flawed way. Here, I first present a prevalent but problematic approach to diagnostics-testing assumptions using null hypothesis significance tests (e.g., the Shapiro-Wilk test of normality). Then, I consolidate and illustrate the issues with this approach, primarily using simulations. These issues include statistical errors (i.e., false positives, especially with large samples, and false negatives, especially with small samples), false binarity, limited descriptiveness, misinterpretation (e.g., of p-value as an effect size), and potential testing failure due to unmet test assumptions. Finally, I synthesize the implications of these issues for statistical diagnostics, and provide practical recommendations for improving such diagnostics. Key recommendations include maintaining awareness of the issues with assumption tests (while recognizing they can be useful), using appropriate combinations of diagnostic methods (including visualization and effect sizes) while recognizing their limitations, and distinguishing between testing and checking assumptions. Additional recommendations include judging assumption violations as a complex spectrum (rather than a simplistic binary), using programmatic tools that increase replicability and decrease researcher degrees of freedom, and sharing the material and rationale involved in the diagnostics.
Collapse
|
3
|
Quadri GJ, Nieves JA, Wiernik BM, Rosen P. Automatic Scatterplot Design Optimization for Clustering Identification. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4312-4327. [PMID: 35816525 DOI: 10.1109/tvcg.2022.3189883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Scatterplots are among the most widely used visualization techniques. Compelling scatterplot visualizations improve understanding of data by leveraging visual perception to boost awareness when performing specific visual analytic tasks. Design choices in scatterplots, such as graphical encodings or data aspects, can directly impact decision-making quality for low-level tasks like clustering. Hence, constructing frameworks that consider both the perceptions of the visual encodings and the task being performed enables optimizing visualizations to maximize efficacy. In this article, we propose an automatic tool to optimize the design factors of scatterplots to reveal the most salient cluster structure. Our approach leverages the merge tree data structure to identify the clusters and optimize the choice of subsampling algorithm, sampling rate, marker size, and marker opacity used to generate a scatterplot image. We validate our approach with user and case studies that show it efficiently provides high-quality scatterplot designs from a large parameter space.
Collapse
|
4
|
Quadri GJ, Rosen P. A Survey of Perception-Based Visualization Studies by Task. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:5026-5048. [PMID: 34283717 DOI: 10.1109/tvcg.2021.3098240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Knowledge of human perception has long been incorporated into visualizations to enhance their quality and effectiveness. The last decade, in particular, has shown an increase in perception-based visualization research studies. With all of this recent progress, the visualization community lacks a comprehensive guide to contextualize their results. In this report, we provide a systematic and comprehensive review of research studies on perception related to visualization. This survey reviews perception-focused visualization studies since 1980 and summarizes their research developments focusing on low-level tasks, further breaking techniques down by visual encoding and visualization type. In particular, we focus on how perception is used to evaluate the effectiveness of visualizations, to help readers understand and apply the principles of perception of their visualization designs through a task-optimized approach. We concluded our report with a summary of the weaknesses and open research questions in the area.
Collapse
|
5
|
Kristiansen YS, Garrison L, Bruckner S. Semantic Snapping for Guided Multi-View Visualization Design. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:43-53. [PMID: 34591769 DOI: 10.1109/tvcg.2021.3114860] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Visual information displays are typically composed of multiple visualizations that are used to facilitate an understanding of the underlying data. A common example are dashboards, which are frequently used in domains such as finance, process monitoring and business intelligence. However, users may not be aware of existing guidelines and lack expert design knowledge when composing such multi-view visualizations. In this paper, we present semantic snapping, an approach to help non-expert users design effective multi-view visualizations from sets of pre-existing views. When a particular view is placed on a canvas, it is "aligned" with the remaining views-not with respect to its geometric layout, but based on aspects of the visual encoding itself, such as how data dimensions are mapped to channels. Our method uses an on-the-fly procedure to detect and suggest resolutions for conflicting, misleading, or ambiguous designs, as well as to provide suggestions for alternative presentations. With this approach, users can be guided to avoid common pitfalls encountered when composing visualizations. Our provided examples and case studies demonstrate the usefulness and validity of our approach.
Collapse
|
6
|
Kayongo P, Sun G, Hartline J, Hullman J. Visualization Equilibrium. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:465-474. [PMID: 34587069 DOI: 10.1109/tvcg.2021.3114842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In many real-world strategic settings, people use information displays to make decisions. In these settings, an information provider chooses which information to provide to strategic agents and how to present it, and agents formulate a best response based on the information and their anticipation of how others will behave. We contribute the results of a controlled online experiment to examine how the provision and presentation of information impacts people's decisions in a congestion game. Our experiment compares how different visualization approaches for displaying this information, including bar charts and hypothetical outcome plots, and different information conditions, including where the visualized information is private versus public (i.e., available to all agents), affect decision making and welfare. We characterize the effects of visualization anticipation, referring to changes to behavior when an agent goes from alone having access to a visualization to knowing that others also have access to the visualization to guide their decisions. We also empirically identify the visualization equilibrium, i.e., the visualization for which the visualized outcome of agents' decisions matches the realized decisions of the agents who view it. We reflect on the implications of visualization equilibria and visualization anticipation for designing information displays for real-world strategic settings.
Collapse
|
7
|
Muller J, Garrison L, Ulbrich P, Schreiber S, Bruckner S, Hauser H, Oeltze-Jafra S. Integrated Dual Analysis of Quantitative and Qualitative High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:2953-2966. [PMID: 33534707 DOI: 10.1109/tvcg.2021.3056424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The Dual Analysis framework is a powerful enabling technology for the exploration of high dimensional quantitative data by treating data dimensions as first-class objects that can be explored in tandem with data values. In this article, we extend the Dual Analysis framework through the joint treatment of quantitative (numerical) and qualitative (categorical) dimensions. Computing common measures for all dimensions allows us to visualize both quantitative and qualitative dimensions in the same view. This enables a natural joint treatment of mixed data during interactive visual exploration and analysis. Several measures of variation for nominal qualitative data can also be applied to ordinal qualitative and quantitative data. For example, instead of measuring variability from a mean or median, other measures assess inter-data variation or average variation from a mode. In this work, we demonstrate how these measures can be integrated into the Dual Analysis framework to explore and generate hypotheses about high-dimensional mixed data. A medical case study using clinical routine data of patients suffering from Cerebral Small Vessel Disease (CSVD), conducted with a senior neurologist and a medical student, shows that a joint Dual Analysis approach for quantitative and qualitative data can rapidly lead to new insights based on which new hypotheses may be generated.
Collapse
|
8
|
A graph for every analysis: Mapping visuals onto common analyses using flexplot. Behav Res Methods 2021; 53:1876-1894. [PMID: 33634423 DOI: 10.3758/s13428-020-01520-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/05/2020] [Indexed: 11/08/2022]
Abstract
For decades, statisticians and methodologists have insisted researchers utilize graphical analysis much more heavily. Despite cogent and passionate recommendations, there has been no graphical revolution. Instead, researchers rely heavily on misleading graphics that violate visual processing heuristics. Perhaps the main reason for the persistence of deceptive graphics is software; most software familiar to psychological researchers suffer from poor defaults and limited capabilities. Also, visualization is ancillary to statistical analysis, providing an incentive to not produce graphics at all. In this paper, we argue that every statistical analysis must have an accompanying graphic, and we introduce the point-and-click software Flexplot, available both in JASP and Jamovi. We then present the theoretical framework that guides Flexplot, as well as show how to perform the most common statistical analyses in psychological literature.
Collapse
|
9
|
Ondov BD, Yang F, Kay M, Elmqvist N, Franconeri S. Revealing Perceptual Proxies with Adversarial Examples. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1073-1083. [PMID: 33095716 DOI: 10.1109/tvcg.2020.3030429] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Data visualizations convert numbers into visual marks so that our visual system can extract data from an image instead of raw numbers. Clearly, the visual system does not compute these values as a computer would, as an arithmetic mean or a correlation. Instead, it extracts these patterns using perceptual proxies; heuristic shortcuts of the visual marks, such as a center of mass or a shape envelope. Understanding which proxies people use would lead to more effective visualizations. We present the results of a series of crowdsourced experiments that measure how powerfully a set of candidate proxies can explain human performance when comparing the mean and range of pairs of data series presented as bar charts. We generated datasets where the correct answer-the series with the larger arithmetic mean or range-was pitted against an "adversarial" series that should be seen as larger if the viewer uses a particular candidate proxy. We used both Bayesian logistic regression models and a robust Bayesian mixed-effects linear model to measure how strongly each adversarial proxy could drive viewers to answer incorrectly and whether different individuals may use different proxies. Finally, we attempt to construct adversarial datasets from scratch, using an iterative crowdsourcing procedure to perform black-box optimization.
Collapse
|
10
|
Quadri GJ, Rosen P. Modeling the Influence of Visual Density on Cluster Perception in Scatterplots Using Topology. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1829-1839. [PMID: 33048695 DOI: 10.1109/tvcg.2020.3030365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Scatterplots are used for a variety of visual analytics tasks, including cluster identification, and the visual encodings used on a scatterplot play a deciding role on the level of visual separation of clusters. For visualization designers, optimizing the visual encodings is crucial to maximizing the clarity of data. This requires accurately modeling human perception of cluster separation, which remains challenging. We present a multi-stage user study focusing on four factors-distribution size of clusters, number of points, size of points, and opacity of points-that influence cluster identification in scatterplots. From these parameters, we have constructed two models, a distance-based model, and a density-based model, using the merge tree data structure from Topological Data Analysis. Our analysis demonstrates that these factors play an important role in the number of clusters perceived, and it verifies that the distance-based and density-based models can reasonably estimate the number of clusters a user observes. Finally, we demonstrate how these models can be used to optimize visual encodings on real-world data.
Collapse
|
11
|
Somarakis A, Ijsselsteijn ME, Luk SJ, Kenkhuis B, de Miranda NFCC, Lelieveldt BPF, Hollt T. Visual cohort comparison for spatial single-cell omics-data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:733-743. [PMID: 33112747 DOI: 10.1109/tvcg.2020.3030336] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Spatially-resolved omics-data enable researchers to precisely distinguish cell types in tissue and explore their spatial interactions, enabling deep understanding of tissue functionality. To understand what causes or deteriorates a disease and identify related biomarkers, clinical researchers regularly perform large-scale cohort studies, requiring the comparison of such data at cellular level. In such studies, with little a-priori knowledge of what to expect in the data, explorative data analysis is a necessity. Here, we present an interactive visual analysis workflow for the comparison of cohorts of spatially-resolved omics-data. Our workflow allows the comparative analysis of two cohorts based on multiple levels-of-detail, from simple abundance of contained cell types over complex co-localization patterns to individual comparison of complete tissue images. As a result, the workflow enables the identification of cohort-differentiating features, as well as outlier samples at any stage of the workflow. During the development of the workflow, we continuously consulted with domain experts. To show the effectiveness of the workflow, we conducted multiple case studies with domain experts from different application areas and with different data modalities.
Collapse
|
12
|
Rahman P, Nandi A, Hebert C. Amplifying Domain Expertise in Clinical Data Pipelines. JMIR Med Inform 2020; 8:e19612. [PMID: 33151150 PMCID: PMC7677017 DOI: 10.2196/19612] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 07/07/2020] [Accepted: 07/22/2020] [Indexed: 11/28/2022] Open
Abstract
Digitization of health records has allowed the health care domain to adopt data-driven algorithms for decision support. There are multiple people involved in this process: a data engineer who processes and restructures the data, a data scientist who develops statistical models, and a domain expert who informs the design of the data pipeline and consumes its results for decision support. Although there are multiple data interaction tools for data scientists, few exist to allow domain experts to interact with data meaningfully. Designing systems for domain experts requires careful thought because they have different needs and characteristics from other end users. There should be an increased emphasis on the system to optimize the experts' interaction by directing them to high-impact data tasks and reducing the total task completion time. We refer to this optimization as amplifying domain expertise. Although there is active research in making machine learning models more explainable and usable, it focuses on the final outputs of the model. However, in the clinical domain, expert involvement is needed at every pipeline step: curation, cleaning, and analysis. To this end, we review literature from the database, human-computer information, and visualization communities to demonstrate the challenges and solutions at each of the data pipeline stages. Next, we present a taxonomy of expertise amplification, which can be applied when building systems for domain experts. This includes summarization, guidance, interaction, and acceleration. Finally, we demonstrate the use of our taxonomy with a case study.
Collapse
Affiliation(s)
| | - Arnab Nandi
- The Ohio State University, Columbus, OH, United States
| | | |
Collapse
|
13
|
Lavalle A, Teruel MA, Maté A, Trujillo J. Fostering Sustainability through Visualization Techniques for Real-Time IoT Data: A Case Study Based on Gas Turbines for Electricity Production. SENSORS 2020; 20:s20164556. [PMID: 32823870 PMCID: PMC7472268 DOI: 10.3390/s20164556] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 08/10/2020] [Accepted: 08/13/2020] [Indexed: 11/29/2022]
Abstract
Improving sustainability is a key concern for industrial development. Industry has recently been benefiting from the rise of IoT technologies, leading to improvements in the monitoring and breakdown prevention of industrial equipment. In order to properly achieve this monitoring and prevention, visualization techniques are of paramount importance. However, the visualization of real-time IoT sensor data has always been challenging, especially when such data are originated by sensors of different natures. In order to tackle this issue, we propose a methodology that aims to help users to visually locate and understand the failures that could arise in a production process.This methodology collects, in a guided manner, user goals and the requirements of the production process, analyzes the incoming data from IoT sensors and automatically derives the most suitable visualization type for each context. This approach will help users to identify if the production process is running as well as expected; thus, it will enable them to make the most sustainable decision in each situation. Finally, in order to assess the suitability of our proposal, a case study based on gas turbines for electricity generation is presented.
Collapse
Affiliation(s)
- Ana Lavalle
- Lucentia Research, DLSI, University of Alicante, Carretera San Vicente del Raspeig s/n, 03690 Alicante, Spain; (M.A.T.); (A.M.); (J.T.)
- Lucentia Lab, Avda. Pintor Pérez Gil, N-16, 03540 Alicante, Spain
- Correspondence:
| | - Miguel A. Teruel
- Lucentia Research, DLSI, University of Alicante, Carretera San Vicente del Raspeig s/n, 03690 Alicante, Spain; (M.A.T.); (A.M.); (J.T.)
- Lucentia Lab, Avda. Pintor Pérez Gil, N-16, 03540 Alicante, Spain
| | - Alejandro Maté
- Lucentia Research, DLSI, University of Alicante, Carretera San Vicente del Raspeig s/n, 03690 Alicante, Spain; (M.A.T.); (A.M.); (J.T.)
- Lucentia Lab, Avda. Pintor Pérez Gil, N-16, 03540 Alicante, Spain
| | - Juan Trujillo
- Lucentia Research, DLSI, University of Alicante, Carretera San Vicente del Raspeig s/n, 03690 Alicante, Spain; (M.A.T.); (A.M.); (J.T.)
- Lucentia Lab, Avda. Pintor Pérez Gil, N-16, 03540 Alicante, Spain
| |
Collapse
|
14
|
Wang Y, Wang Z, Liu T, Correll M, Cheng Z, Deussen O, Sedlmair M. Improving the Robustness of Scagnostics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:759-769. [PMID: 31443018 DOI: 10.1109/tvcg.2019.2934796] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this paper, we examine the robustness of scagnostics through a series of theoretical and empirical studies. First, we investigate the sensitivity of scagnostics by employing perturbing operations on more than 60M synthetic and real-world scatterplots. We found that two scagnostic measures, Outlying and Clumpy, are overly sensitive to data binning. To understand how these measures align with human judgments of visual features, we conducted a study with 24 participants, which reveals that i) humans are not sensitive to small perturbations of the data that cause large changes in both measures, and ii) the perception of clumpiness heavily depends on per-cluster topologies and structures. Motivated by these results, we propose Robust Scagnostics (RScag) by combining adaptive binning with a hierarchy-based form of scagnostics. An analysis shows that RScag improves on the robustness of original scagnostics, aligns better with human judgments, and is equally fast as the traditional scagnostic measures.
Collapse
|