1
|
Hografer M, Schulz HJ. Tailorable Sampling for Progressive Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:4809-4824. [PMID: 37204960 DOI: 10.1109/tvcg.2023.3278084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Progressive visual analytics (PVA) allows analysts to maintain their flow during otherwise long-running computations by producing early, incomplete results that refine over time, for example, by running the computation over smaller partitions of the data. These partitions are created using sampling, whose goal it isto draw samples of the dataset such that the progressive visualization becomes as useful as possible as soon as possible. What makes the visualization useful depends on the analysis task and, accordingly, some task-specific sampling methods have been proposed for PVA to address this need. However, as analysts see more and more of their data during the progression, the analysis task at hand often changes, which means that analysts need to restart the computation to switch the sampling method, causing them to lose their analysis flow. This poses a clear limitation to the proposed benefits of PVA. Hence, we propose a pipeline for PVA-sampling that allows tailoring the data partitioning to analysis scenarios by switching out modules in a way that does not require restarting the analysis. To that end, we characterize the problem of PVA-sampling, formalize the pipeline in terms of data structures, discuss on-the-fly tailoring, and present additional examples demonstrating its usefulness.
Collapse
|
2
|
Dennig FL, Miller M, Keim DA, El-Assady M. FS/DS: A Theoretical Framework for the Dual Analysis of Feature Space and Data Space. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5165-5182. [PMID: 37342951 DOI: 10.1109/tvcg.2023.3288356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/23/2023]
Abstract
With the surge of data-driven analysis techniques, there is a rising demand for enhancing the exploration of large high-dimensional data by enabling interactions for the joint analysis of features (i.e., dimensions). Such a dual analysis of the feature space and data space is characterized by three components, 1) a view visualizing feature summaries, 2) a view that visualizes the data records, and 3) a bidirectional linking of both plots triggered by human interaction in one of both visualizations, e.g., Linking & Brushing. Dual analysis approaches span many domains, e.g., medicine, crime analysis, and biology. The proposed solutions encapsulate various techniques, such as feature selection or statistical analysis. However, each approach establishes a new definition of dual analysis. To address this gap, we systematically reviewed published dual analysis methods to investigate and formalize the key elements, such as the techniques used to visualize the feature space and data space, as well as the interaction between both spaces. From the information elicited during our review, we propose a unified theoretical framework for dual analysis, encompassing all existing approaches extending the field. We apply our proposed formalization describing the interactions between each component and relate them to the addressed tasks. Additionally, we categorize the existing approaches using our framework and derive future research directions to advance dual analysis by including state-of-the-art visual analysis techniques to improve data exploration.
Collapse
|
3
|
Representation and analysis of time-series data via deep embedding and visual exploration. J Vis (Tokyo) 2022. [DOI: 10.1007/s12650-022-00890-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
4
|
Procopio M, Mosca A, Scheidegger C, Wu E, Chang R. Impact of Cognitive Biases on Progressive Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:3093-3112. [PMID: 33434132 DOI: 10.1109/tvcg.2021.3051013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Progressive visualization is fast becoming a technique in the visualization community to help users interact with large amounts of data. With progressive visualization, users can examine intermediate results of complex or long running computations, without waiting for the computation to complete. While this has shown to be beneficial to users, recent research has identified potential risks. For example, users may misjudge the uncertainty in the intermediate results and draw incorrect conclusions or see patterns that are not present in the final results. In this article, we conduct a comprehensive set of studies to quantify the advantages and limitations of progressive visualization. Based on a recent report by Micallef et al., we examine four types of cognitive biases that can occur with progressive visualization: uncertainty bias, illusion bias, control bias, and anchoring bias. The results of the studies suggest a cautious but promising use of progressive visualization - while there can be significant savings in task completion time, accuracy can be negatively affected in certain conditions. These findings confirm earlier reports of the benefits and drawbacks of progressive visualization and that continued research into mitigating the effects of cognitive biases is necessary.
Collapse
|
5
|
Fujiwara T, Sakamoto N, Nonaka J, Ma KL. A Visual Analytics Approach for Hardware System Monitoring with Streaming Functional Data Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:2338-2349. [PMID: 35394909 DOI: 10.1109/tvcg.2022.3165348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Many real-world applications involve analyzing time-dependent phenomena, which are intrinsically functional, consisting of curves varying over a continuum (e.g., time). When analyzing continuous data, functional data analysis (FDA) provides substantial benefits, such as the ability to study the derivatives and to restrict the ordering of data. However, continuous data inherently has infinite dimensions, and for a long time series, FDA methods often suffer from high computational costs. The analysis problem becomes even more challenging when updating the FDA results for continuously arriving data. In this paper, we present a visual analytics approach for monitoring and reviewing time series data streamed from a hardware system with a focus on identifying outliers by using FDA. To perform FDA while addressing the computational problem, we introduce new incremental and progressive algorithms that promptly generate the magnitude-shape (MS) plot, which conveys both the functional magnitude and shape outlyingness of time series data. In addition, by using an MS plot in conjunction with an FDA version of principal component analysis, we enhance the analyst's ability to investigate the visually-identified outliers. We illustrate the effectiveness of our approach with two use scenarios using real-world datasets. The resulting tool is evaluated by industry experts using real-world streaming datasets.
Collapse
|
6
|
Abstract
We present a comprehensive, detailed review of time-series data analysis, with emphasis on deep time-series clustering (DTSC), and a case study in the context of movement behavior clustering utilizing the deep clustering method. Specifically, we modified the DCAE architectures to suit time-series data at the time of our prior deep clustering work. Lately, several works have been carried out on deep clustering of time-series data. We also review these works and identify state-of-the-art, as well as present an outlook on this important field of DTSC from five important perspectives.
Collapse
|
7
|
|
8
|
Jo J, LrYi S, Lee B, Seo J. ProReveal: Progressive Visual Analytics With Safeguards. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3109-3122. [PMID: 31880556 DOI: 10.1109/tvcg.2019.2962404] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a new visual exploration concept-Progressive Visual Analytics with Safeguards-that helps people manage the uncertainty arising from progressive data exploration. Despite its potential benefits, intermediate knowledge from progressive analytics can be incorrect due to various machine and human factors, such as a sampling bias or misinterpretation of uncertainty. To alleviate this problem, we introduce PVA-Guards, safeguards people can leave on uncertain intermediate knowledge that needs to be verified, and derive seven PVA-Guards based on previous visualization task taxonomies. PVA-Guards provide a means of ensuring the correctness of the conclusion and understanding the reason when intermediate knowledge becomes invalid. We also present ProReveal, a proof-of-concept system designed and developed to integrate the seven safeguards into progressive data exploration. Finally, we report a user study with 14 participants, which shows people voluntarily employed PVA-Guards to safeguard their findings and ProReveal's PVA-Guard view provides an overview of uncertain intermediate knowledge. We believe our new concept can also offer better consistency in progressive data exploration, alleviating people's heterogeneous interpretation of uncertainty.
Collapse
|
9
|
Fujiwara T, Sakamoto N, Nonaka J, Yamamoto K, Ma KL. A Visual Analytics Framework for Reviewing Multivariate Time-Series Data with Dimensionality Reduction. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1601-1611. [PMID: 33026990 DOI: 10.1109/tvcg.2020.3028889] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Data-driven problem solving in many real-world applications involves analysis of time-dependent multivariate data, for which dimensionality reduction (DR) methods are often used to uncover the intrinsic structure and features of the data. However, DR is usually applied to a subset of data that is either single-time-point multivariate or univariate time-series, resulting in the need to manually examine and correlate the DR results out of different data subsets. When the number of dimensions is large either in terms of the number of time points or attributes, this manual task becomes too tedious and infeasible. In this paper, we present MulTiDR, a new DR framework that enables processing of time-dependent multivariate data as a whole to provide a comprehensive overview of the data. With the framework, we employ DR in two steps. When treating the instances, time points, and attributes of the data as a 3D array, the first DR step reduces the three axes of the array to two, and the second DR step visualizes the data in a lower-dimensional space. In addition, by coupling with a contrastive learning method and interactive visualizations, our framework enhances analysts' ability to interpret DR results. We demonstrate the effectiveness of our framework with four case studies using real-world datasets.
Collapse
|
10
|
Zhang Y, Yu C, Wang R, Liu X. Visual dimension analysis based on dimension subdivision. J Vis (Tokyo) 2020. [DOI: 10.1007/s12650-020-00694-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
Jo J, Seo J, Fekete JD. PANENE: A Progressive Algorithm for Indexing and Querying Approximate k-Nearest Neighbors. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1347-1360. [PMID: 30222575 DOI: 10.1109/tvcg.2018.2869149] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present PANENE, a progressive algorithm for approximate nearest neighbor indexing and querying. Although the use of k-nearest neighbor (KNN) libraries is common in many data analysis methods, most KNN algorithms can only be queried when the whole dataset has been indexed, i.e., they are not online. Even the few online implementations are not progressive in the sense that the time to index incoming data is not bounded and cannot satisfy the latency requirements of progressive systems. This long latency has significantly limited the use of many machine learning methods, such as t-SNE, in interactive visual analytics. PANENE is a novel algorithm for Progressive Approximate k-NEarest NEighbors, enabling fast KNN queries while continuously indexing new batches of data. Following the progressive computation paradigm, PANENE operations can be bounded in time, allowing analysts to access running results within an interactive latency. PANENE can also incrementally build and maintain a cache data structure, a KNN lookup table, to enable constant-time lookups for KNN queries. Finally, we present three progressive applications of PANENE, such as regression, density estimation, and responsive t-SNE, opening up new opportunities to use complex algorithms in interactive systems.
Collapse
|
12
|
Li JK, Ma KL. P5: Portable Progressive Parallel Processing Pipelines for Interactive Data Analysis and Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1151-1160. [PMID: 31442985 DOI: 10.1109/tvcg.2019.2934537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present P5, a web-based visualization toolkit that combines declarative visualization grammar and GPU computing for progressive data analysis and visualization. To interactively analyze and explore big data, progressive analytics and visualization methods have recently emerged. Progressive visualizations of incrementally refining results have the advantages of allowing users to steer the analysis process and make early decisions. P5 leverages declarative grammar for specifying visualization designs and exploits GPU computing to accelerate progressive data processing and rendering. The declarative specifications can be modified during progressive processing to create different visualizations for analyzing the intermediate results. To enable user interactions for progressive data analysis, P5 utilizes the GPU to automatically aggregate and index data based on declarative interaction specifications to facilitate effective interactive visualization. We demonstrate the effectiveness and usefulness of P5 through a variety of example applications and several performance benchmark tests.
Collapse
|
13
|
Fujiwara T, Chou JK, Xu P, Ren L, Ma KL. An Incremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:418-428. [PMID: 31449024 DOI: 10.1109/tvcg.2019.2934433] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Dimensionality reduction (DR) methods are commonly used for analyzing and visualizing multidimensional data. However, when data is a live streaming feed, conventional DR methods cannot be directly used because of their computational complexity and inability to preserve the projected data positions at previous time points. In addition, the problem becomes even more challenging when the dynamic data records have a varying number of dimensions as often found in real-world applications. This paper presents an incremental DR solution. We enhance an existing incremental PCA method in several ways to ensure its usability for visualizing streaming multidimensional data. First, we use geometric transformation and animation methods to help preserve a viewer's mental map when visualizing the incremental results. Second, to handle data dimension variants, we use an optimization method to estimate the projected data positions, and also convey the resulting uncertainty in the visualization. We demonstrate the effectiveness of our design with two case studies using real-world datasets.
Collapse
|
14
|
Victorelli EZ, Dos Reis JC, Santos AAS, Schiozer DJ. A Design Process Integrating Human-Data Interaction Guidelines and Semio-Participatory Design. ENTERP INF SYST-UK 2020. [DOI: 10.1007/978-3-030-40783-4_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
15
|
Abstract
Progressive visualization offers a great deal of promise for big data visualization; however, current progressive visualization systems do not allow for continuous interaction. What if users want to see more confident results on a subset of the visualization? This can happen when users are in exploratory analysis mode but want to ask some directed questions of the data as well. In a progressive visualization system, the online aggregation algorithm determines the database sampling rate and resulting convergence rate, not the user. In this paper, we extend a recent method in online aggregation, called Wander Join, that is optimized for queries that join tables, one of the most computationally expensive operations. This extension leverages importance sampling to enable user-driven sampling when data joins are in the query. We applied user interaction techniques that allow the user to view and adjust the convergence rate, providing more transparency and control over the online aggregation process. By leveraging importance sampling, our extension of Wander Join also allows for stratified sampling of groups when there is data distribution skew. We also improve the convergence rate of filtering queries, but with additional overhead costs not needed in the original Wander Join algorithm.
Collapse
|
16
|
Krokos E, Cheng HC, Chang J, Nebesh B, Paul CL, Whitley K, Varshney A. Enhancing Deep Learning with Visual Interactions. ACM T INTERACT INTEL 2019. [DOI: 10.1145/3150977] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Deep learning has emerged as a powerful tool for feature-driven labeling of datasets. However, for it to be effective, it requires a large and finely labeled training dataset. Precisely labeling a large training dataset is expensive, time-consuming, and error prone. In this article, we present a visually driven deep-learning approach that starts with a coarsely labeled training dataset and iteratively refines the labeling through intuitive interactions that leverage the latent structures of the dataset. Our approach can be used to (a) alleviate the burden of intensive manual labeling that captures the fine nuances in a high-dimensional dataset by simple visual interactions, (b) replace a complicated (and therefore difficult to design) labeling algorithm by a simpler (but coarse) labeling algorithm supplemented by user interaction to refine the labeling, or (c) use low-dimensional features (such as the RGB colors) for coarse labeling and turn to higher-dimensional latent structures that are progressively revealed by deep learning, for fine labeling. We validate our approach through use cases on three high-dimensional datasets and a user study.
Collapse
|
17
|
|
18
|
|
19
|
Li M, Bao Z, Sellis T, Yan S, Zhang R. HomeSeeker: A visual analytics system of real estate data. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2018.02.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
20
|
Hollt T, Pezzotti N, van Unen V, Koning F, Lelieveldt BPF, Vilanova A. CyteGuide: Visual Guidance for Hierarchical Single-Cell Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:739-748. [PMID: 28866537 DOI: 10.1109/tvcg.2017.2744318] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Single-cell analysis through mass cytometry has become an increasingly important tool for immunologists to study the immune system in health and disease. Mass cytometry creates a high-dimensional description vector for single cells by time-of-flight measurement. Recently, t-Distributed Stochastic Neighborhood Embedding (t-SNE) has emerged as one of the state-of-the-art techniques for the visualization and exploration of single-cell data. Ever increasing amounts of data lead to the adoption of Hierarchical Stochastic Neighborhood Embedding (HSNE), enabling the hierarchical representation of the data. Here, the hierarchy is explored selectively by the analyst, who can request more and more detail in areas of interest. Such hierarchies are usually explored by visualizing disconnected plots of selections in different levels of the hierarchy. This poses problems for navigation, by imposing a high cognitive load on the analyst. In this work, we present an interactive summary-visualization to tackle this problem. CyteGuide guides the analyst through the exploration of hierarchically represented single-cell data, and provides a complete overview of the current state of the analysis. We conducted a two-phase user study with domain experts that use HSNE for data exploration. We first studied their problems with their current workflow using HSNE and the requirements to ease this workflow in a field study. These requirements have been the basis for our visual design. In the second phase, we verified our proposed solution in a user evaluation.
Collapse
|
21
|
Pezzotti N, Hollt T, Van Gemert J, Lelieveldt BPF, Eisemann E, Vilanova A. DeepEyes: Progressive Visual Analytics for Designing Deep Neural Networks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:98-108. [PMID: 28866543 DOI: 10.1109/tvcg.2017.2744358] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Deep neural networks are now rivaling human accuracy in several pattern recognition problems. Compared to traditional classifiers, where features are handcrafted, neural networks learn increasingly complex features directly from the data. Instead of handcrafting the features, it is now the network architecture that is manually engineered. The network architecture parameters such as the number of layers or the number of filters per layer and their interconnections are essential for good performance. Even though basic design guidelines exist, designing a neural network is an iterative trial-and-error process that takes days or even weeks to perform due to the large datasets used for training. In this paper, we present DeepEyes, a Progressive Visual Analytics system that supports the design of neural networks during training. We present novel visualizations, supporting the identification of layers that learned a stable set of patterns and, therefore, are of interest for a detailed analysis. The system facilitates the identification of problems, such as superfluous filters or layers, and information that is not being captured by the network. We demonstrate the effectiveness of our system through multiple use cases, showing how a trained network can be compressed, reshaped and adapted to different problems.
Collapse
|
22
|
What you see is what you can change: Human-centered machine learning by interactive visualization. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.01.105] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
23
|
|