1
|
Hografer M, Schulz HJ. Tailorable Sampling for Progressive Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:4809-4824. [PMID: 37204960 DOI: 10.1109/tvcg.2023.3278084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Progressive visual analytics (PVA) allows analysts to maintain their flow during otherwise long-running computations by producing early, incomplete results that refine over time, for example, by running the computation over smaller partitions of the data. These partitions are created using sampling, whose goal it isto draw samples of the dataset such that the progressive visualization becomes as useful as possible as soon as possible. What makes the visualization useful depends on the analysis task and, accordingly, some task-specific sampling methods have been proposed for PVA to address this need. However, as analysts see more and more of their data during the progression, the analysis task at hand often changes, which means that analysts need to restart the computation to switch the sampling method, causing them to lose their analysis flow. This poses a clear limitation to the proposed benefits of PVA. Hence, we propose a pipeline for PVA-sampling that allows tailoring the data partitioning to analysis scenarios by switching out modules in a way that does not require restarting the analysis. To that end, we characterize the problem of PVA-sampling, formalize the pipeline in terms of data structures, discuss on-the-fly tailoring, and present additional examples demonstrating its usefulness.
Collapse
|
2
|
Patil A, Richer G, Jermaine C, Moritz D, Fekete JD. Studying Early Decision Making with Progressive Bar Charts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:407-417. [PMID: 36166544 DOI: 10.1109/tvcg.2022.3209426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
We conduct a user study to quantify and compare user performance for a value comparison task using four bar chart designs, where the bars show the mean values of data loaded progressively and updated every second (progressive bar charts). Progressive visualization divides different stages of the visualization pipeline-data loading, processing, and visualization-into iterative animated steps to limit the latency when loading large amounts of data. An animated visualization appearing quickly, unfolding, and getting more accurate with time, enables users to make early decisions. However, intermediate mean estimates are computed only on partial data and may not have time to converge to the true means, potentially misleading users and resulting in incorrect decisions. To address this issue, we propose two new designs visualizing the history of values in progressive bar charts, in addition to the use of confidence intervals. We comparatively study four progressive bar chart designs: with/without confidence intervals, and using near-history representation with/without confidence intervals, on three realistic data distributions. We evaluate user performance based on the percentage of correct answers (accuracy), response time, and user confidence. Our results show that, overall, users can make early and accurate decisions with 92% accuracy using only 18% of the data, regardless of the design. We find that our proposed bar chart design with only near-history is comparable to bar charts with only confidence intervals in performance, and the qualitative feedback we received indicates a preference for designs with history.
Collapse
|
3
|
Li J, Sun Y, Lei Z, Chen S, Andrienko G, Andrienko N, Chen W. A hybrid prediction and search approach for flexible and efficient exploration of big data. J Vis (Tokyo) 2022. [DOI: 10.1007/s12650-022-00887-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2022]
|
4
|
Procopio M, Mosca A, Scheidegger C, Wu E, Chang R. Impact of Cognitive Biases on Progressive Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:3093-3112. [PMID: 33434132 DOI: 10.1109/tvcg.2021.3051013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Progressive visualization is fast becoming a technique in the visualization community to help users interact with large amounts of data. With progressive visualization, users can examine intermediate results of complex or long running computations, without waiting for the computation to complete. While this has shown to be beneficial to users, recent research has identified potential risks. For example, users may misjudge the uncertainty in the intermediate results and draw incorrect conclusions or see patterns that are not present in the final results. In this article, we conduct a comprehensive set of studies to quantify the advantages and limitations of progressive visualization. Based on a recent report by Micallef et al., we examine four types of cognitive biases that can occur with progressive visualization: uncertainty bias, illusion bias, control bias, and anchoring bias. The results of the studies suggest a cautious but promising use of progressive visualization - while there can be significant savings in task completion time, accuracy can be negatively affected in certain conditions. These findings confirm earlier reports of the benefits and drawbacks of progressive visualization and that continued research into mitigating the effects of cognitive biases is necessary.
Collapse
|
5
|
Hogräfer M, Angelini M, Santucci G, Schulz HJ. Steering-by-Example for Progressive Visual Analytics. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3531229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Progressive visual analytics allows users to interact with early, partial results of long-running computations on large datasets. In this context, computational steering is often brought up as a means to prioritize the progressive computation. This is meant to focus computational resources on data subspaces of interest, so as to ensure their computation is completed before all others. Yet, current approaches to select a region of the view space and then to prioritize its corresponding data subspace either require a 1-to-1 mapping between view and data space, or they need to establish and maintain computationally costly index structures to trace complex mappings between view and data space. We present steering-by-example, a novel interactive steering approach for progressive visual analytics, which allows prioritizing data subspaces for the progression by generating a relaxed query from a set of selected data items. Our approach works independently of the particular visualization technique and without additional index structures. First benchmark results show that steering-by-example considerably improves Precision and Recall for prioritizing unprocessed data for a selected view region, clearly outperforming random uniform sampling.
Collapse
|
6
|
Chen X, Zhang J, Fu CW, Fekete JD, Wang Y. Pyramid-based Scatterplots Sampling for Progressive and Streaming Data Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:593-603. [PMID: 34587089 DOI: 10.1109/tvcg.2021.3114880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We present a pyramid-based scatterplot sampling technique to avoid overplotting and enable progressive and streaming visualization of large data. Our technique is based on a multiresolution pyramid-based decomposition of the underlying density map and makes use of the density values in the pyramid to guide the sampling at each scale for preserving the relative data densities and outliers. We show that our technique is competitive in quality with state-of-the-art methods and runs faster by about an order of magnitude. Also, we have adapted it to deliver progressive and streaming data visualization by processing the data in chunks and updating the scatterplot areas with visible changes in the density map. A quantitative evaluation shows that our approach generates stable and faithful progressive samples that are comparable to the state-of-the-art method in preserving relative densities and superior to it in keeping outliers and stability when switching frames. We present two case studies that demonstrate the effectiveness of our approach for exploring large data.
Collapse
|
7
|
Jo J, LrYi S, Lee B, Seo J. ProReveal: Progressive Visual Analytics With Safeguards. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3109-3122. [PMID: 31880556 DOI: 10.1109/tvcg.2019.2962404] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a new visual exploration concept-Progressive Visual Analytics with Safeguards-that helps people manage the uncertainty arising from progressive data exploration. Despite its potential benefits, intermediate knowledge from progressive analytics can be incorrect due to various machine and human factors, such as a sampling bias or misinterpretation of uncertainty. To alleviate this problem, we introduce PVA-Guards, safeguards people can leave on uncertain intermediate knowledge that needs to be verified, and derive seven PVA-Guards based on previous visualization task taxonomies. PVA-Guards provide a means of ensuring the correctness of the conclusion and understanding the reason when intermediate knowledge becomes invalid. We also present ProReveal, a proof-of-concept system designed and developed to integrate the seven safeguards into progressive data exploration. Finally, we report a user study with 14 participants, which shows people voluntarily employed PVA-Guards to safeguard their findings and ProReveal's PVA-Guard view provides an overview of uncertain intermediate knowledge. We believe our new concept can also offer better consistency in progressive data exploration, alleviating people's heterogeneous interpretation of uncertainty.
Collapse
|
8
|
Vidal J, Guillou P, Tierny J. A Progressive Approach to Scalar Field Topology. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:2833-2850. [PMID: 33606631 DOI: 10.1109/tvcg.2021.3060500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article introduces progressive algorithms for the topological analysis of scalar data. Our approach is based on a hierarchical representation of the input data and the fast identification of topologically invariant vertices, which are vertices that have no impact on the topological description of the data and for which we show that no computation is required as they are introduced in the hierarchy. This enables the definition of efficient coarse-to-fine topological algorithms, which leverage fast update mechanisms for ordinary vertices and avoid computation for the topologically invariant ones. We demonstrate our approach with two examples of topological algorithms (critical point extraction and persistence diagram computation), which generate interpretable outputs upon interruption requests and which progressively refine them otherwise. Experiments on real-life datasets illustrate that our progressive strategy, in addition to the continuous visual feedback it provides, even improves run time performance with regard to non-progressive algorithms and we describe further accelerations with shared-memory parallelism. We illustrate the utility of our approach in batch-mode and interactive setups, where it respectively enables the control of the execution time of complete topological pipelines as well as previews of the topological features found in a dataset, with progressive updates delivered within interactive times.
Collapse
|
9
|
Battle L, Scheidegger C. A Structured Review of Data Management Technology for Interactive Visualization and Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1128-1138. [PMID: 33031039 DOI: 10.1109/tvcg.2020.3028891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In the last two decades, interactive visualization and analysis have become a central tool in data-driven decision making. Concurrently to the contributions in data visualization, research in data management has produced technology that directly benefits interactive analysis. Here, we contribute a systematic review of 30 years of work in this adjacent field, and highlight techniques and principles we believe to be underappreciated in visualization work. We structure our review along two axes. First, we use task taxonomies from the visualization literature to structure the space of interactions in usual systems. Second, we created a categorization of data management work that strikes a balance between specificity and generality. Concretely, we contribute a characterization of 131 research papers along these two axes. We find that five notions in data management venues fit interactive visualization systems well: materialized views, approximate query processing, user modeling and query prediction, muiti-query optimization, lineage techniques, and indexing techniques. In addition, we find a preponderance of work in materialized views and approximate query processing, most targeting a limited subset of the interaction tasks in the taxonomy we used. This suggests natural avenues of future research both in visualization and data management. Our categorization both changes how we visualization researchers design and build our systems, and highlights where future work is necessary.
Collapse
|
10
|
Bikakis N, Maroulis S, Papastefanatos G, Vassiliadis P. In-situ visual exploration over big raw data. INFORM SYST 2021. [DOI: 10.1016/j.is.2020.101616] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
11
|
Jo J, Seo J, Fekete JD. PANENE: A Progressive Algorithm for Indexing and Querying Approximate k-Nearest Neighbors. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1347-1360. [PMID: 30222575 DOI: 10.1109/tvcg.2018.2869149] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present PANENE, a progressive algorithm for approximate nearest neighbor indexing and querying. Although the use of k-nearest neighbor (KNN) libraries is common in many data analysis methods, most KNN algorithms can only be queried when the whole dataset has been indexed, i.e., they are not online. Even the few online implementations are not progressive in the sense that the time to index incoming data is not bounded and cannot satisfy the latency requirements of progressive systems. This long latency has significantly limited the use of many machine learning methods, such as t-SNE, in interactive visual analytics. PANENE is a novel algorithm for Progressive Approximate k-NEarest NEighbors, enabling fast KNN queries while continuously indexing new batches of data. Following the progressive computation paradigm, PANENE operations can be bounded in time, allowing analysts to access running results within an interactive latency. PANENE can also incrementally build and maintain a cache data structure, a KNN lookup table, to enable constant-time lookups for KNN queries. Finally, we present three progressive applications of PANENE, such as regression, density estimation, and responsive t-SNE, opening up new opportunities to use complex algorithms in interactive systems.
Collapse
|
12
|
Mei H, Chen W, Wei Y, Hu Y, Zhou S, Lin B, Zhao Y, Xia J. RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1161-1171. [PMID: 31443022 DOI: 10.1109/tvcg.2019.2934800] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Analysts commonly investigate the data distributions derived from statistical aggregations of data that are represented by charts, such as histograms and binned scatterplots, to visualize and analyze a large-scale dataset. Aggregate queries are implicitly executed through such a process. Datasets are constantly extremely large; thus, the response time should be accelerated by calculating predefined data cubes. However, the queries are limited to the predefined binning schema of preprocessed data cubes. Such limitation hinders analysts' flexible adjustment of visual specifications to investigate the implicit patterns in the data effectively. Particularly, RSATree enables arbitrary queries and flexible binning strategies by leveraging three schemes, namely, an R-tree-based space partitioning scheme to catch the data distribution, a locality-sensitive hashing technique to achieve locality-preserving random access to data items, and a summed area table scheme to support interactive query of aggregated values with a linear computational complexity. This study presents and implements a web-based visual query system that supports visual specification, query, and exploration of large-scale tabular data with user-adjustable granularities. We demonstrate the efficiency and utility of our approach by performing various experiments on real-world datasets and analyzing time and space complexity.
Collapse
|
13
|
Li JK, Ma KL. P5: Portable Progressive Parallel Processing Pipelines for Interactive Data Analysis and Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1151-1160. [PMID: 31442985 DOI: 10.1109/tvcg.2019.2934537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present P5, a web-based visualization toolkit that combines declarative visualization grammar and GPU computing for progressive data analysis and visualization. To interactively analyze and explore big data, progressive analytics and visualization methods have recently emerged. Progressive visualizations of incrementally refining results have the advantages of allowing users to steer the analysis process and make early decisions. P5 leverages declarative grammar for specifying visualization designs and exploits GPU computing to accelerate progressive data processing and rendering. The declarative specifications can be modified during progressive processing to create different visualizations for analyzing the intermediate results. To enable user interactions for progressive data analysis, P5 utilizes the GPU to automatically aggregate and index data based on declarative interaction specifications to facilitate effective interactive visualization. We demonstrate the effectiveness and usefulness of P5 through a variety of example applications and several performance benchmark tests.
Collapse
|
14
|
Battle L, Crouser RJ, Nakeshimana A, Montoly A, Chang R, Stonebraker M. The Role of Latency and Task Complexity in Predicting Visual Search Behavior. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1246-1255. [PMID: 31442990 DOI: 10.1109/tvcg.2019.2934556] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Latency in a visualization system is widely believed to affect user behavior in measurable ways, such as requiring the user to wait for the visualization system to respond, leading to interruption of the analytic flow. While this effect is frequently observed and widely accepted, precisely how latency affects different analysis scenarios is less well understood. In this paper, we examine the role of latency in the context of visual search, an essential task in data foraging and exploration using visualization. We conduct a series of studies on Amazon Mechanical Turk and find that under certain conditions, latency is a statistically significant predictor of visual search behavior, which is consistent with previous studies. However, our results also suggest that task type, task complexity, and other factors can modulate the effect of latency, in some cases rendering latency statistically insignificant in predicting user behavior. This suggests a more nuanced view of the role of latency than previously reported. Building on these results and the findings of prior studies, we propose design guidelines for measuring and interpreting the effects of latency when evaluating performance on visual search tasks.
Collapse
|
15
|
Behrisch M, Streeb D, Stoffel F, Seebacher D, Matejek B, Weber SH, Mittelstadt S, Pfister H, Keim D. Commercial Visual Analytics Systems-Advances in the Big Data Analytics Field. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:3011-3031. [PMID: 30059307 DOI: 10.1109/tvcg.2018.2859973] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Five years after the first state-of-the-art report on Commercial Visual Analytics Systems we present a reevaluation of the Big Data Analytics field. We build on the success of the 2012 survey, which was influential even beyond the boundaries of the InfoVis and Visual Analytics (VA) community. While the field has matured significantly since the original survey, we find that innovation and research-driven development are increasingly sacrificed to satisfy a wide range of user groups. We evaluate new product versions on established evaluation criteria, such as available features, performance, and usability, to extend on and assure comparability with the previous survey. We also investigate previously unavailable products to paint a more complete picture of the commercial VA landscape. Furthermore, we introduce novel measures, like suitability for specific user groups and the ability to handle complex data types, and undertake a new case study to highlight innovative features. We explore the achievements in the commercial sector in addressing VA challenges and propose novel developments that should be on systems' roadmaps in the coming years.
Collapse
|
16
|
Servan-Schreiber S, Riondato M, Zgraggen E. ProSecCo: progressive sequence mining with convergence guarantees. Knowl Inf Syst 2019. [DOI: 10.1007/s10115-019-01393-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
17
|
Exploring Visual Analytics to Measure Reliability for IoT Oriented Pollution Detection Software Perspectives. INTERNATIONAL JOURNAL OF DISTRIBUTED SYSTEMS AND TECHNOLOGIES 2019. [DOI: 10.4018/ijdst.2019040101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The measurement of the reliability of such IoT based application requires an embedded analysis. The parameters are the number of imprecise or faulty measures as well as the identification of core modules. This article investigates that how far visual introspection can assist in troubleshooting of IoT-based software bugs. This specific requirement improvises a new idea, where the shape of the plots with actual data can indicate the cause of the error and further they can be patched if the software repairing strategies are implemented adjudging the visual analytics. It is quite indifferent to analyze faults for existing applications as a variation of topological and practicing parameters which takes substantial numbers of iterations and observations. Categorically, the present use-case establishes the fact to analyze and infer concerning the shape of the visual plots derived from embedded modules.
Collapse
|
18
|
Abstract
Progressive visualization offers a great deal of promise for big data visualization; however, current progressive visualization systems do not allow for continuous interaction. What if users want to see more confident results on a subset of the visualization? This can happen when users are in exploratory analysis mode but want to ask some directed questions of the data as well. In a progressive visualization system, the online aggregation algorithm determines the database sampling rate and resulting convergence rate, not the user. In this paper, we extend a recent method in online aggregation, called Wander Join, that is optimized for queries that join tables, one of the most computationally expensive operations. This extension leverages importance sampling to enable user-driven sampling when data joins are in the query. We applied user interaction techniques that allow the user to view and adjust the convergence rate, providing more transparency and control over the online aggregation process. By leveraging importance sampling, our extension of Wander Join also allows for stratified sampling of groups when there is data distribution skew. We also improve the convergence rate of filtering queries, but with additional overhead costs not needed in the original Wander Join algorithm.
Collapse
|
19
|
Liu R, Chen S, Ji G, Zhao B, Li Q, Su M. Interactive stratigraphic structure visualization for seismic data. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2018.07.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
20
|
|