1
|
Bernard J, Barth CM, Cuba E, Meier A, Peiris Y, Shneiderman B. IVESA - Visual Analysis of Time-Stamped Event Sequences. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2235-2256. [PMID: 38587948 DOI: 10.1109/tvcg.2024.3382760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Time-stamped event sequences (TSEQs) are time-oriented data without value information, shifting the focus of users to the exploration of temporal event occurrences. TSEQs exist in application domains, such as sleeping behavior, earthquake aftershocks, and stock market crashes. Domain experts face four challenges, for which they could use interactive and visual data analysis methods. First, TSEQs can be large with respect to both the number of sequences and events, often leading to millions of events. Second, domain experts need validated metrics and features to identify interesting patterns. Third, after identifying interesting patterns, domain experts contextualize the patterns to foster sensemaking. Finally, domain experts seek to reduce data complexity by data simplification and machine learning support. We present IVESA, a visual analytics approach for TSEQs. It supports the analysis of TSEQs at the granularities of sequences and events, supported with metrics and feature analysis tools. IVESA has multiple linked views that support overview, sort+filter, comparison, details-on-demand, and metadata relation-seeking tasks, as well as data simplification through feature analysis, interactive clustering, filtering, and motif detection and simplification. We evaluated IVESA with three case studies and a user study with six domain experts working with six different datasets and applications. Results demonstrate the usability and generalizability of IVESA across applications and cases that had up to 1,000,000 events.
Collapse
|
2
|
Rave H, Molchanov V, Linsen L. De-Cluttering Scatterplots With Integral Images. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:2114-2126. [PMID: 38526894 DOI: 10.1109/tvcg.2024.3381453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Scatterplots provide a visual representation of bivariate data (or 2D embeddings of multivariate data) that allows for effective analyses of data dependencies, clusters, trends, and outliers. Unfortunately, classical scatterplots suffer from scalability issues, since growing data sizes eventually lead to overplotting and visual clutter on a screen with a fixed resolution, which hinders the data analysis process. We propose an algorithm that compensates for irregular sample distributions by a smooth transformation of the scatterplot's visual domain. Our algorithm evaluates the scatterplot's density distribution to compute a regularization mapping based on integral images of the rasterized density function. The mapping preserves the samples' neighborhood relations. Few regularization iterations suffice to achieve a nearly uniform sample distribution that efficiently uses the available screen space. We further propose approaches to visually convey the transformation that was applied to the scatterplot and compare them in a user study. We present a novel parallel algorithm for fast GPU-based integral-image computation, which allows for integrating our de-cluttering approach into interactive visual data analysis systems.
Collapse
|
3
|
Konomi S, Gao L, Mushi D, Ren B. DCLA: Towards Distributed Cooperative Learning Analytics for Developing Communities. LECTURE NOTES IN COMPUTER SCIENCE 2025:94-106. [DOI: 10.1007/978-3-031-76815-6_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
4
|
Wang AZ, Borland D, Gotz D. Beyond Correlation: Incorporating Counterfactual Guidance to Better Support Exploratory Visual Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:776-786. [PMID: 39255136 DOI: 10.1109/tvcg.2024.3456369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Providing effective guidance for users has long been an important and challenging task for efficient exploratory visual analytics, especially when selecting variables for visualization in high-dimensional datasets. Correlation is the most widely applied metric for guidance in statistical and analytical tools, however a reliance on correlation may lead users towards false positives when interpreting causal relations in the data. In this work, inspired by prior insights on the benefits of counterfactual visualization in supporting visual causal inference, we propose a novel, simple, and efficient counterfactual guidance method to enhance causal inference performance in guided exploratory analytics based on insights and concerns gathered from expert interviews. Our technique aims to capitalize on the benefits of counterfactual approaches while reducing their complexity for users. We integrated counterfactual guidance into an exploratory visual analytics system, and using a synthetically generated ground-truth causal dataset, conducted a comparative user study and evaluated to what extent counterfactual guidance can help lead users to more precise visual causal inferences. The results suggest that counterfactual guidance improved visual causal inference performance, and also led to different exploratory behaviors compared to correlation-based guidance. Based on these findings, we offer future directions and challenges for incorporating counterfactual guidance to better support exploratory visual analytics.
Collapse
|
5
|
Wang AZ, Borland D, Peck TC, Wang W, Gotz D. Causal Priors and Their Influence on Judgements of Causality in Visualized Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:765-775. [PMID: 39255145 DOI: 10.1109/tvcg.2024.3456381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
"Correlation does not imply causation" is a famous mantra in statistical and visual analysis. However, consumers of visualizations often draw causal conclusions when only correlations between variables are shown. In this paper, we investigate factors that contribute to causal relationships users perceive in visualizations. We collected a corpus of concept pairs from variables in widely used datasets and created visualizations that depict varying correlative associations using three typical statistical chart types. We conducted two MTurk studies on (1) preconceived notions on causal relations without charts, and (2) perceived causal relations with charts, for each concept pair. Our results indicate that people make assumptions about causal relationships between pairs of concepts even without seeing any visualized data. Moreover, our results suggest that these assumptions constitute causal priors that, in combination with visualized association, impact how data visualizations are interpreted. The results also suggest that causal priors may lead to over- or under-estimation in perceived causal relations in different circumstances, and that those priors can also impact users' confidence in their causal assessments. In addition, our results align with prior work, indicating that chart type may also affect causal inference. Using data from the studies, we develop a model to capture the interaction between causal priors and visualized associations as they combine to impact a user's perceived causal relations. In addition to reporting the study results and analyses, we provide an open dataset of causal priors for 56 specific concept pairs that can serve as a potential benchmark for future studies. We also suggest remaining challenges and heuristic-based guidelines to help designers improve visualization design choices to better support visual causal inference.
Collapse
|
6
|
Hografer M, Schulz HJ. Tailorable Sampling for Progressive Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:4809-4824. [PMID: 37204960 DOI: 10.1109/tvcg.2023.3278084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Progressive visual analytics (PVA) allows analysts to maintain their flow during otherwise long-running computations by producing early, incomplete results that refine over time, for example, by running the computation over smaller partitions of the data. These partitions are created using sampling, whose goal it isto draw samples of the dataset such that the progressive visualization becomes as useful as possible as soon as possible. What makes the visualization useful depends on the analysis task and, accordingly, some task-specific sampling methods have been proposed for PVA to address this need. However, as analysts see more and more of their data during the progression, the analysis task at hand often changes, which means that analysts need to restart the computation to switch the sampling method, causing them to lose their analysis flow. This poses a clear limitation to the proposed benefits of PVA. Hence, we propose a pipeline for PVA-sampling that allows tailoring the data partitioning to analysis scenarios by switching out modules in a way that does not require restarting the analysis. To that end, we characterize the problem of PVA-sampling, formalize the pipeline in terms of data structures, discuss on-the-fly tailoring, and present additional examples demonstrating its usefulness.
Collapse
|
7
|
Domova V, Vrotsou K. A Model for Types and Levels of Automation in Visual Analytics: A Survey, a Taxonomy, and Examples. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:3550-3568. [PMID: 35358047 DOI: 10.1109/tvcg.2022.3163765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The continuous growth in availability and access to data presents a major challenge to the human analyst. As the manual analysis of large and complex datasets is nowadays practically impossible, the need for assisting tools that can automate the analysis process while keeping the human analyst in the loop is imperative. A large and growing body of literature recognizes the crucial role of automation in Visual Analytics and suggests that automation is among the most important constituents for effective Visual Analytics systems. Today, however, there is no appropriate taxonomy nor terminology for assessing the extent of automation in a Visual Analytics system. In this article, we aim to address this gap by introducing a model of levels of automation tailored for the Visual Analytics domain. The consistent terminology of the proposed taxonomy could provide a ground for users/readers/reviewers to describe and compare automation in Visual Analytics systems. Our taxonomy is grounded on a combination of several existing and well-established taxonomies of levels of automation in the human-machine interaction domain and relevant models within the visual analytics field. To exemplify the proposed taxonomy, we selected a set of existing systems from the event-sequence analytics domain and mapped the automation of their visual analytics process stages against the automation levels in our taxonomy.
Collapse
|
8
|
Lee JC, Lee BJ, Park C, Song H, Ock CY, Sung H, Woo S, Youn Y, Jung K, Jung JH, Ahn J, Kim B, Kim J, Seo J, Hwang JH. Efficacy improvement in searching MEDLINE database using a novel PubMed visual analytic system: EEEvis. PLoS One 2023; 18:e0281422. [PMID: 36758038 PMCID: PMC9910730 DOI: 10.1371/journal.pone.0281422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 01/23/2023] [Indexed: 02/10/2023] Open
Abstract
PubMed is the most extensively used database and search engine in the biomedical and healthcare fields. However, users could experience several difficulties in acquiring their target papers facing massive numbers of search results, especially in their unfamiliar fields. Therefore, we developed a novel user interface for PubMed and conducted three steps of study: step A, a preliminary user survey with 76 medical experts regarding the current usability for the biomedical literature search task at PubMed; step B is implementing EEEvis, a novel interactive visual analytic system for the search task; step C, a randomized user study comparing PubMed and EEEvis. First, we conducted a Google survey of 76 medical experts regarding the unmet needs of PubMed and the user requirements for a novel search interface. According to the data of preliminary Google survey, we implemented a novel interactive visual analytic system for biomedical literature search. This EEEvis provides enhanced literature data analysis functions including (1) an overview of the bibliographic features including publication date, citation count, and impact factors, (2) an overview of the co-authorship network, and (3) interactive sorting, filtering, and highlighting. In the randomized user study of 24 medical experts, the search speed of EEEvis was not inferior to PubMed in the time to reach the first article (median difference 3 sec, 95% CI -2.1 to 8.5, P = 0.535) nor in the search completion time (median difference 8 sec, 95% CI -4.7 to 19.1, P = 0.771). However, 22 participants (91.7%) responded that they are willing to use EEEvis as their first choice for a biomedical literature search task, and 21 participants (87.5%) answered the bibliographic sorting and filtering functionalities of EEEvis as a major advantage. EEEvis could be a supplementary interface for PubMed that can enhance the user experience in the search for biomedical literature.
Collapse
Affiliation(s)
- Jong-Chan Lee
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam, Korea
- College of Medicine, Seoul National University, Seoul, Korea
| | - Brian J. Lee
- Department of Computer Science & Engineering, Seoul National University, Seoul, Korea
| | - Changhee Park
- Department of Internal Medicine, Seoul National University Hospital, Seoul, Korea
| | - Hyunjoo Song
- School of Computer Science & Engineering, Soongsil University, Seoul, Korea
| | | | - Hyojae Sung
- College of Medicine, Seoul National University, Seoul, Korea
| | - Sungjin Woo
- College of Medicine, Seoul National University, Seoul, Korea
| | - Yuna Youn
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Kwangrok Jung
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Jae Hyup Jung
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Jinwoo Ahn
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Bomi Kim
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam, Korea
| | - Jaihwan Kim
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam, Korea
- College of Medicine, Seoul National University, Seoul, Korea
| | - Jinwook Seo
- Department of Computer Science & Engineering, Seoul National University, Seoul, Korea
- * E-mail: (J-HH); (JS)
| | - Jin-Hyeok Hwang
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam, Korea
- College of Medicine, Seoul National University, Seoul, Korea
- * E-mail: (J-HH); (JS)
| |
Collapse
|
9
|
Jentner W, Lindholz G, Hauptmann H, El-Assady M, Ma KL, Keim D. Visual Analytics of Co-Occurrences to Discover Subspaces in Structured Data. ACM T INTERACT INTEL 2023. [DOI: 10.1145/3579031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
We present an approach that shows all relevant subspaces of categorical data condensed in a single picture. We model the categorical values of the attributes as co-occurrences with data partitions generated from structured data using pattern mining. We show that these co-occurrences are a-priori allowing us to greatly reduce the search space effectively generating the condensed picture where conventional approaches filter out several subspaces as these are deemed insignificant. The task of identifying interesting subspaces is common but difficult due to exponential search spaces and the curse of dimensionality. One application of such a task might be identifying a cohort of patients defined by attributes such as gender, age, and diabetes type that share a common patient history, which is modeled as event sequences. Filtering the data by these attributes is common but cumbersome and often does not allow a comparison of subspaces. We contribute a powerful multi-dimensional pattern exploration approach (MDPE-approach) agnostic to the structured data type that models multiple attributes and their characteristics as co-occurrences, allowing the user to identify and compare thousands of subspaces of interest in a single picture. In our MDPE-approach, we introduce two methods to dramatically reduce the search space, outputting only the boundaries of the search space in the form of two tables. We implement the MDPE-approach in an interactive visual interface (MDPE-vis) that provides a scalable, pixel-based visualization design allowing the identification, comparison, and sense-making of subspaces in structured data. Our case studies using a gold-standard dataset and external domain experts confirm our approach’s and implementation’s applicability. A third use case sheds light on the scalability of our approach and a user study with 15 participants underlines its usefulness and power.
Collapse
Affiliation(s)
| | | | | | | | - Kwan-Liu Ma
- University of California-Davis, United States of America
| | | |
Collapse
|
10
|
Zhou Z, Wang W, Guo M, Wang Y, Gotz D. A Design Space for Surfacing Content Recommendations in Visual Analytic Platforms. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:84-94. [PMID: 36194706 DOI: 10.1109/tvcg.2022.3209445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Recommendation algorithms have been leveraged in various ways within visualization systems to assist users as they perform of a range of information tasks. One common focus for these techniques has been the recommendation of content, rather than visual form, as a means to assist users in the identification of information that is relevant to their task context. A wide variety of techniques have been proposed to address this general problem, with a range of design choices in how these solutions surface relevant information to users. This paper reviews the state-of-the-art in how visualization systems surface recommended content to users during users' visual analysis; introduces a four-dimensional design space for visual content recommendation based on a characterization of prior work; and discusses key observations regarding common patterns and future research opportunities.
Collapse
|
11
|
Patil A, Richer G, Jermaine C, Moritz D, Fekete JD. Studying Early Decision Making with Progressive Bar Charts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:407-417. [PMID: 36166544 DOI: 10.1109/tvcg.2022.3209426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
We conduct a user study to quantify and compare user performance for a value comparison task using four bar chart designs, where the bars show the mean values of data loaded progressively and updated every second (progressive bar charts). Progressive visualization divides different stages of the visualization pipeline-data loading, processing, and visualization-into iterative animated steps to limit the latency when loading large amounts of data. An animated visualization appearing quickly, unfolding, and getting more accurate with time, enables users to make early decisions. However, intermediate mean estimates are computed only on partial data and may not have time to converge to the true means, potentially misleading users and resulting in incorrect decisions. To address this issue, we propose two new designs visualizing the history of values in progressive bar charts, in addition to the use of confidence intervals. We comparatively study four progressive bar chart designs: with/without confidence intervals, and using near-history representation with/without confidence intervals, on three realistic data distributions. We evaluate user performance based on the percentage of correct answers (accuracy), response time, and user confidence. Our results show that, overall, users can make early and accurate decisions with 92% accuracy using only 18% of the data, regardless of the design. We find that our proposed bar chart design with only near-history is comparable to bar charts with only confidence intervals in performance, and the qualitative feedback we received indicates a preference for designs with history.
Collapse
|
12
|
Zhou J, Wang X, Wong JK, Wang H, Wang Z, Yang X, Yan X, Feng H, Qu H, Ying H, Chen W. DPVisCreator: Incorporating Pattern Constraints to Privacy-preserving Visualizations via Differential Privacy. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:809-819. [PMID: 36166552 DOI: 10.1109/tvcg.2022.3209391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Data privacy is an essential issue in publishing data visualizations. However, it is challenging to represent multiple data patterns in privacy-preserving visualizations. The prior approaches target specific chart types or perform an anonymization model uniformly without considering the importance of data patterns in visualizations. In this paper, we propose a visual analytics approach that facilitates data custodians to generate multiple private charts while maintaining user-preferred patterns. To this end, we introduce pattern constraints to model users' preferences over data patterns in the dataset and incorporate them into the proposed Bayesian network-based Differential Privacy (DP) model PriVis. A prototype system, DPVisCreator, is developed to assist data custodians in implementing our approach. The effectiveness of our approach is demonstrated with quantitative evaluation of pattern utility under the different levels of privacy protection, case studies, and semi-structured expert interviews.
Collapse
|
13
|
Deng Z, Weng D, Liu S, Tian Y, Xu M, Wu Y. A survey of urban visual analytics: Advances and future directions. COMPUTATIONAL VISUAL MEDIA 2022; 9:3-39. [PMID: 36277276 PMCID: PMC9579670 DOI: 10.1007/s41095-022-0275-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 02/08/2022] [Indexed: 06/16/2023]
Abstract
Developing effective visual analytics systems demands care in characterization of domain problems and integration of visualization techniques and computational models. Urban visual analytics has already achieved remarkable success in tackling urban problems and providing fundamental services for smart cities. To promote further academic research and assist the development of industrial urban analytics systems, we comprehensively review urban visual analytics studies from four perspectives. In particular, we identify 8 urban domains and 22 types of popular visualization, analyze 7 types of computational method, and categorize existing systems into 4 types based on their integration of visualization techniques and computational models. We conclude with potential research directions and opportunities.
Collapse
Affiliation(s)
- Zikun Deng
- State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058 China
| | - Di Weng
- Microsoft Research Asia, Beijing, 100080 China
| | - Shuhan Liu
- State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058 China
| | - Yuan Tian
- State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058 China
| | - Mingliang Xu
- School of Information Engineering, Zhengzhou University, Zhengzhou, China
- Henan Institute of Advanced Technology, Zhengzhou University, Zhengzhou, 450001 China
| | - Yingcai Wu
- State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058 China
| |
Collapse
|
14
|
Procopio M, Mosca A, Scheidegger C, Wu E, Chang R. Impact of Cognitive Biases on Progressive Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:3093-3112. [PMID: 33434132 DOI: 10.1109/tvcg.2021.3051013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Progressive visualization is fast becoming a technique in the visualization community to help users interact with large amounts of data. With progressive visualization, users can examine intermediate results of complex or long running computations, without waiting for the computation to complete. While this has shown to be beneficial to users, recent research has identified potential risks. For example, users may misjudge the uncertainty in the intermediate results and draw incorrect conclusions or see patterns that are not present in the final results. In this article, we conduct a comprehensive set of studies to quantify the advantages and limitations of progressive visualization. Based on a recent report by Micallef et al., we examine four types of cognitive biases that can occur with progressive visualization: uncertainty bias, illusion bias, control bias, and anchoring bias. The results of the studies suggest a cautious but promising use of progressive visualization - while there can be significant savings in task completion time, accuracy can be negatively affected in certain conditions. These findings confirm earlier reports of the benefits and drawbacks of progressive visualization and that continued research into mitigating the effects of cognitive biases is necessary.
Collapse
|
15
|
Musleh M, Chatzimparmpas A, Jusufi I. Visual analysis of blow molding machine multivariate time series data. J Vis (Tokyo) 2022; 25:1329-1342. [PMID: 35845181 PMCID: PMC9273703 DOI: 10.1007/s12650-022-00857-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 04/25/2022] [Accepted: 06/01/2022] [Indexed: 12/02/2022]
Abstract
Abstract The recent development in the data analytics field provides a boost in production for modern industries. Small-sized factories intend to take full advantage of the data collected by sensors used in their machinery. The ultimate goal is to minimize cost and maximize quality, resulting in an increase in profit. In collaboration with domain experts, we implemented a data visualization tool to enable decision-makers in a plastic factory to improve their production process. The tool is an interactive dashboard with multiple coordinated views supporting the exploration from both local and global perspectives. In summary, we investigate three different aspects: methods for preprocessing multivariate time series data, clustering approaches for the already refined data, and visualization techniques that aid domain experts in gaining insights into the different stages of the production process. Here we present our ongoing results grounded in a human-centered development process. We adopt a formative evaluation approach to continuously upgrade our dashboard design that eventually meets partners' requirements and follows the best practices within the field. We also conducted a case study with a domain expert to validate the potential application of the tool in the real-life context. Finally, we assessed the usability and usefulness of the tool with a two-layer summative evaluation that showed encouraging results. Graphical Abstract
Collapse
Affiliation(s)
- Maath Musleh
- Institute of Visual Computing and Human-Centered Technology, TU Wien, 1040 Vienna, Austria
| | - Angelos Chatzimparmpas
- Department of Computer Science and Media Technology, Linnaeus University, Växjö, 351 95 Sweden
| | - Ilir Jusufi
- Department of Computer Science and Media Technology, Linnaeus University, Växjö, 351 95 Sweden
| |
Collapse
|
16
|
Hogräfer M, Angelini M, Santucci G, Schulz HJ. Steering-by-Example for Progressive Visual Analytics. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3531229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Progressive visual analytics allows users to interact with early, partial results of long-running computations on large datasets. In this context, computational steering is often brought up as a means to prioritize the progressive computation. This is meant to focus computational resources on data subspaces of interest, so as to ensure their computation is completed before all others. Yet, current approaches to select a region of the view space and then to prioritize its corresponding data subspace either require a 1-to-1 mapping between view and data space, or they need to establish and maintain computationally costly index structures to trace complex mappings between view and data space. We present steering-by-example, a novel interactive steering approach for progressive visual analytics, which allows prioritizing data subspaces for the progression by generating a relaxed query from a set of selected data items. Our approach works independently of the particular visualization technique and without additional index structures. First benchmark results show that steering-by-example considerably improves Precision and Recall for prioritizing unprocessed data for a selected view region, clearly outperforming random uniform sampling.
Collapse
|
17
|
Chatzimparmpas A, Martins RM, Kucher K, Kerren A. FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1773-1791. [PMID: 34990365 DOI: 10.1109/tvcg.2022.3141040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The machine learning (ML) life cycle involves a series of iterative steps, from the effective gathering and preparation of the data-including complex feature engineering processes-to the presentation and improvement of results, with various algorithms to choose from in every step. Feature engineering in particular can be very beneficial for ML, leading to numerous improvements such as boosting the predictive results, decreasing computational times, reducing excessive noise, and increasing the transparency behind the decisions taken during the training. Despite that, while several visual analytics tools exist to monitor and control the different stages of the ML life cycle (especially those related to data and algorithms), feature engineering support remains inadequate. In this paper, we present FeatureEnVi, a visual analytics system specifically designed to assist with the feature engineering process. Our proposed system helps users to choose the most important feature, to transform the original features into powerful alternatives, and to experiment with different feature generation combinations. Additionally, data space slicing allows users to explore the impact of features on both local and global scales. FeatureEnVi utilizes multiple automatic feature selection techniques; furthermore, it visually guides users with statistical evidence about the influence of each feature (or subsets of features). The final outcome is the extraction of heavily engineered features, evaluated by multiple validation metrics. The usefulness and applicability of FeatureEnVi are demonstrated with two use cases and a case study. We also report feedback from interviews with two ML experts and a visualization researcher who assessed the effectiveness of our system.
Collapse
|
18
|
van de Ruit M, Billeter M, Eisemann E. An Efficient Dual-Hierarchy t-SNE Minimization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:614-622. [PMID: 34587052 DOI: 10.1109/tvcg.2021.3114817] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
t-distributed Stochastic Neighbour Embedding (t-SNE) has become a standard for exploratory data analysis, as it is capable of revealing clusters even in complex data while requiring minimal user input. While its run-time complexity limited it to small datasets in the past, recent efforts improved upon the expensive similarity computations and the previously quadratic minimization. Nevertheless, t-SNE still has high runtime and memory costs when operating on millions of points. We present a novel method for executing the t-SNE minimization. While our method overall retains a linear runtime complexity, we obtain a significant performance increase in the most expensive part of the minimization. We achieve a significant improvement without a noticeable decrease in accuracy even when targeting a 3D embedding. Our method constructs a pair of spatial hierarchies over the embedding, which are simultaneously traversed to approximate many N-body interactions at once. We demonstrate an efficient GPGPU implementation and evaluate its performance against state-of-the-art methods on a variety of datasets.
Collapse
|
19
|
Chen X, Zhang J, Fu CW, Fekete JD, Wang Y. Pyramid-based Scatterplots Sampling for Progressive and Streaming Data Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:593-603. [PMID: 34587089 DOI: 10.1109/tvcg.2021.3114880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We present a pyramid-based scatterplot sampling technique to avoid overplotting and enable progressive and streaming visualization of large data. Our technique is based on a multiresolution pyramid-based decomposition of the underlying density map and makes use of the density values in the pyramid to guide the sampling at each scale for preserving the relative data densities and outliers. We show that our technique is competitive in quality with state-of-the-art methods and runs faster by about an order of magnitude. Also, we have adapted it to deliver progressive and streaming data visualization by processing the data in chunks and updating the scatterplot areas with visible changes in the density map. A quantitative evaluation shows that our approach generates stable and faithful progressive samples that are comparable to the state-of-the-art method in preserving relative densities and superior to it in keeping outliers and stability when switching frames. We present two case studies that demonstrate the effectiveness of our approach for exploring large data.
Collapse
|
20
|
Evaluating a Taxonomy of Textual Uncertainty for Collaborative Visualisation in the Digital Humanities. INFORMATION 2021. [DOI: 10.3390/info12110436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The capture, modelling and visualisation of uncertainty has become a hot topic in many areas of science, such as the digital humanities (DH). Fuelled by critical voices among the DH community, DH scholars are becoming more aware of the intrinsic advantages that incorporating the notion of uncertainty into their workflows may bring. Additionally, the increasing availability of ubiquitous, web-based technologies has given rise to many collaborative tools that aim to support DH scholars in performing remote work alongside distant peers from other parts of the world. In this context, this paper describes two user studies seeking to evaluate a taxonomy of textual uncertainty aimed at enabling remote collaborations on digital humanities (DH) research objects in a digital medium. Our study focuses on the task of free annotation of uncertainty in texts in two different scenarios, seeking to establish the requirements of the underlying data and uncertainty models that would be needed to implement a hypothetical collaborative annotation system (CAS) that uses information visualisation and visual analytics techniques to leverage the cognitive effort implied by these tasks. To identify user needs and other requirements, we held two user-driven design experiences with DH experts and lay users, focusing on the annotation of uncertainty in historical recipes and literary texts. The lessons learned from these experiments are gathered in a series of insights and observations on how these different user groups collaborated to adapt an uncertainty taxonomy to solve the proposed exercises. Furthermore, we extract a series of recommendations and future lines of work that we share with the community in an attempt to establish a common agenda of DH research that focuses on collaboration around the idea of uncertainty.
Collapse
|
21
|
Kwon BC, Anand V, Severson KA, Ghosh S, Sun Z, Frohnert BI, Lundgren M, Ng K. DPVis: Visual Analytics With Hidden Markov Models for Disease Progression Pathways. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3685-3700. [PMID: 32275600 DOI: 10.1109/tvcg.2020.2985689] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Clinical researchers use disease progression models to understand patient status and characterize progression patterns from longitudinal health records. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and its variants are a class of models that both discover these states and make inferences of health states for patients. Despite the advantages of using the algorithms for discovering interesting patterns, it still remains challenging for medical experts to interpret model outputs, understand complex modeling parameters, and clinically make sense of the patterns. To tackle these problems, we conducted a design study with clinical scientists, statisticians, and visualization experts, with the goal to investigate disease progression pathways of chronic diseases, namely type 1 diabetes (T1D), Huntington's disease, Parkinson's disease, and chronic obstructive pulmonary disease (COPD). As a result, we introduce DPVis which seamlessly integrates model parameters and outcomes of HMMs into interpretable and interactive visualizations. In this article, we demonstrate that DPVis is successful in evaluating disease progression models, visually summarizing disease states, interactively exploring disease progression patterns, and building, analyzing, and comparing clinically relevant patient subgroups.
Collapse
|
22
|
Jo J, LrYi S, Lee B, Seo J. ProReveal: Progressive Visual Analytics With Safeguards. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3109-3122. [PMID: 31880556 DOI: 10.1109/tvcg.2019.2962404] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a new visual exploration concept-Progressive Visual Analytics with Safeguards-that helps people manage the uncertainty arising from progressive data exploration. Despite its potential benefits, intermediate knowledge from progressive analytics can be incorrect due to various machine and human factors, such as a sampling bias or misinterpretation of uncertainty. To alleviate this problem, we introduce PVA-Guards, safeguards people can leave on uncertain intermediate knowledge that needs to be verified, and derive seven PVA-Guards based on previous visualization task taxonomies. PVA-Guards provide a means of ensuring the correctness of the conclusion and understanding the reason when intermediate knowledge becomes invalid. We also present ProReveal, a proof-of-concept system designed and developed to integrate the seven safeguards into progressive data exploration. Finally, we report a user study with 14 participants, which shows people voluntarily employed PVA-Guards to safeguard their findings and ProReveal's PVA-Guard view provides an overview of uncertain intermediate knowledge. We believe our new concept can also offer better consistency in progressive data exploration, alleviating people's heterogeneous interpretation of uncertainty.
Collapse
|
23
|
Tovanich N, Heulot N, Fekete JD, Isenberg P. Visualization of Blockchain Data: A Systematic Review. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3135-3152. [PMID: 31899429 DOI: 10.1109/tvcg.2019.2963018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a systematic review of visual analytics tools used for the analysis of blockchains-related data. The blockchain concept has recently received considerable attention and spurred applications in a variety of domains. We systematically and quantitatively assessed 76 analytics tools that have been proposed in research as well as online by professionals and blockchain enthusiasts. Our classification of these tools distinguishes (1) target blockchains, (2) blockchain data, (3) target audiences, (4) task domains, and (5) visualization types. Furthermore, we look at which aspects of blockchain data have already been explored and point out areas that deserve more investigation in the future.
Collapse
|
24
|
Lamqaddam H, Vande Moere A, Vanden Abeele V, Brosens K, Verbert K. Introducing Layers of Meaning (LoM): A Framework to Reduce Semantic Distance of Visualization In Humanistic Research. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1084-1094. [PMID: 33048729 DOI: 10.1109/tvcg.2020.3030426] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Information visualization (infovis) is a powerful tool for exploring rich datasets. Within humanistic research, rich qualitative data and domain culture make traditional infovis approaches appear reductive and disconnected, leading to low adoption. In this paper, we use a multi-step approach to scrutinize the relationship between infovis and the humanities and suggest new directions for it. We first look into infovis from the humanistic perspective by exploring the humanistic literature around infovis. We validate and expand those findings though a co-design workshop with humanist and infovis experts. Then, we translate our findings into guidelines for designers and conduct a design critique exercise to explore their effect on the perception of humanist researchers. Based on these steps, we introduce Layers of Meaning, a framework to reduce the semantic distance between humanist researchers and visualizations of their research material, by grounding infovis tools in time and space, physicality, terminology, nuance, and provenance.
Collapse
|
25
|
Chatzimparmpas A, Martins RM, Kucher K, Kerren A. StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1547-1557. [PMID: 33048687 DOI: 10.1109/tvcg.2020.3030352] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In machine learning (ML), ensemble methods-such as bagging, boosting, and stacking-are widely-established approaches that regularly achieve top-notch predictive performance. Stacking (also called "stacked generalization") is an ensemble method that combines heterogeneous base models, arranged in at least one layer, and then employs another metamodel to summarize the predictions of those models. Although it may be a highly-effective approach for increasing the predictive performance of ML, generating a stack of models from scratch can be a cumbersome trial-and-error process. This challenge stems from the enormous space of available solutions, with different sets of data instances and features that could be used for training, several algorithms to choose from, and instantiations of these algorithms using diverse parameters (i.e., models) that perform differently according to various metrics. In this work, we present a knowledge generation model, which supports ensemble learning with the use of visualization, and a visual analytics system for stacked generalization. Our system, StackGenVis, assists users in dynamically adapting performance metrics, managing data instances, selecting the most important features for a given data set, choosing a set of top-performant and diverse algorithms, and measuring the predictive performance. In consequence, our proposed tool helps users to decide between distinct models and to reduce the complexity of the resulting stack by removing overpromising and underperforming models. The applicability and effectiveness of StackGenVis are demonstrated with two use cases: a real-world healthcare data set and a collection of data related to sentiment/stance detection in texts. Finally, the tool has been evaluated through interviews with three ML experts.
Collapse
|
26
|
Somarakis A, Van Unen V, Koning F, Lelieveldt B, Hollt T. ImaCytE: Visual Exploration of Cellular Micro-Environments for Imaging Mass Cytometry Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:98-110. [PMID: 31369380 DOI: 10.1109/tvcg.2019.2931299] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Tissue functionality is determined by the characteristics of tissue-resident cells and their interactions within their microenvironment. Imaging Mass Cytometry offers the opportunity to distinguish cell types with high precision and link them to their spatial location in intact tissues at sub-cellular resolution. This technology produces large amounts of spatially-resolved high-dimensional data, which constitutes a serious challenge for the data analysis. We present an interactive visual analysis workflow for the end-to-end analysis of Imaging Mass Cytometry data that was developed in close collaboration with domain expert partners. We implemented the presented workflow in an interactive visual analysis tool; ImaCytE. Our workflow is designed to allow the user to discriminate cell types according to their protein expression profiles and analyze their cellular microenvironments, aiding in the formulation or verification of hypotheses on tissue architecture and function. Finally, we show the effectiveness of our workflow and ImaCytE through a case study performed by a collaborating specialist.
Collapse
|
27
|
Ivson P, Moreira A, Queiroz F, Santos W, Celes W. A Systematic Review of Visualization in Building Information Modeling. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:3109-3127. [PMID: 30932840 DOI: 10.1109/tvcg.2019.2907583] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Building Information Modeling (BIM) employs data-rich 3D CAD models for large-scale facility design, construction, and operation. These complex datasets contain a large amount and variety of information, ranging from design specifications to real-time sensor data. They are used by architects and engineers for various analysis and simulations throughout a facility's life cycle. Many techniques from different visualization fields could be used to analyze these data. However, the BIM domain still remains largely unexplored by the visualization community. The goal of this article is to encourage visualization researchers to increase their involvement with BIM. To this end, we present the results of a systematic review of visualization in current BIM practice. We use a novel taxonomy to identify main application areas and analyze commonly employed techniques. From this domain characterization, we highlight future research opportunities brought forth by the unique features of BIM. For instance, exploring the synergies between scientific and information visualization to integrate spatial and non-spatial data. We hope this article raises awareness to interesting new challenges the BIM domain brings to the visualization community.
Collapse
|
28
|
Jo J, Seo J, Fekete JD. PANENE: A Progressive Algorithm for Indexing and Querying Approximate k-Nearest Neighbors. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1347-1360. [PMID: 30222575 DOI: 10.1109/tvcg.2018.2869149] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present PANENE, a progressive algorithm for approximate nearest neighbor indexing and querying. Although the use of k-nearest neighbor (KNN) libraries is common in many data analysis methods, most KNN algorithms can only be queried when the whole dataset has been indexed, i.e., they are not online. Even the few online implementations are not progressive in the sense that the time to index incoming data is not bounded and cannot satisfy the latency requirements of progressive systems. This long latency has significantly limited the use of many machine learning methods, such as t-SNE, in interactive visual analytics. PANENE is a novel algorithm for Progressive Approximate k-NEarest NEighbors, enabling fast KNN queries while continuously indexing new batches of data. Following the progressive computation paradigm, PANENE operations can be bounded in time, allowing analysts to access running results within an interactive latency. PANENE can also incrementally build and maintain a cache data structure, a KNN lookup table, to enable constant-time lookups for KNN queries. Finally, we present three progressive applications of PANENE, such as regression, density estimation, and responsive t-SNE, opening up new opportunities to use complex algorithms in interactive systems.
Collapse
|
29
|
Liu C, Wu C, Shao H, Yuan X. SmartCube: An Adaptive Data Management Architecture for the Real-Time Visualization of Spatiotemporal Datasets. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:790-799. [PMID: 31442982 DOI: 10.1109/tvcg.2019.2934434] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Interactive visualization and exploration of large spatiotemporal data sets is difficult without carefully-designed data pre-processing and management tools. We propose a novel architecture for spatiotemporal data management. The architecture can dynamically update itself based on user queries. Datasets is stored in a tree-like structure to support memory sharing among cuboids in a logical structure of data cubes. An update mechanism is designed to create or remove cuboids on it, according to the analysis of the user queries, with the consideration of memory size limitation. Data structure is dynamically optimized according to different user queries. During a query process, user queries are recorded to predict the performance increment of the new cuboid. The creation or deletion of a cuboid is determined by performance increment. Experiment results show that our prototype system deliveries good performance towards user queries on different spatiotemporal datasets, which costing small memory size with comparable performance compared with other state-of-the-art algorithms.
Collapse
|
30
|
Pezzotti N, Thijssen J, Mordvintsev A, Hollt T, Van Lew B, Lelieveldt BPF, Eisemann E, Vilanova A. GPGPU Linear Complexity t-SNE Optimization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1172-1181. [PMID: 31449023 DOI: 10.1109/tvcg.2019.2934307] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In recent years the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has become one of the most used and insightful techniques for exploratory data analysis of high-dimensional data. It reveals clusters of high-dimensional data points at different scales while only requiring minimal tuning of its parameters. However, the computational complexity of the algorithm limits its application to relatively small datasets. To address this problem, several evolutions of t-SNE have been developed in recent years, mainly focusing on the scalability of the similarity computations between data points. However, these contributions are insufficient to achieve interactive rates when visualizing the evolution of the t-SNE embedding for large datasets. In this work, we present a novel approach to the minimization of the t-SNE objective function that heavily relies on graphics hardware and has linear computational complexity. Our technique decreases the computational cost of running t-SNE on datasets by orders of magnitude and retains or improves on the accuracy of past approximated techniques. We propose to approximate the repulsive forces between data points by splatting kernel textures for each data point. This approximation allows us to reformulate the t-SNE minimization problem as a series of tensor operations that can be efficiently executed on the graphics card. An efficient implementation of our technique is integrated and available for use in the widely used Google TensorFlow.js, and an open-source C++ library.
Collapse
|
31
|
Li JK, Ma KL. P5: Portable Progressive Parallel Processing Pipelines for Interactive Data Analysis and Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1151-1160. [PMID: 31442985 DOI: 10.1109/tvcg.2019.2934537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present P5, a web-based visualization toolkit that combines declarative visualization grammar and GPU computing for progressive data analysis and visualization. To interactively analyze and explore big data, progressive analytics and visualization methods have recently emerged. Progressive visualizations of incrementally refining results have the advantages of allowing users to steer the analysis process and make early decisions. P5 leverages declarative grammar for specifying visualization designs and exploits GPU computing to accelerate progressive data processing and rendering. The declarative specifications can be modified during progressive processing to create different visualizations for analyzing the intermediate results. To enable user interactions for progressive data analysis, P5 utilizes the GPU to automatically aggregate and index data based on declarative interaction specifications to facilitate effective interactive visualization. We demonstrate the effectiveness and usefulness of P5 through a variety of example applications and several performance benchmark tests.
Collapse
|
32
|
Ma Y, Xie T, Li J, Maciejewski R. Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1075-1085. [PMID: 31478859 DOI: 10.1109/tvcg.2019.2934631] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Machine learning models are currently being deployed in a variety of real-world applications where model predictions are used to make decisions about healthcare, bank loans, and numerous other critical tasks. As the deployment of artificial intelligence technologies becomes ubiquitous, it is unsurprising that adversaries have begun developing methods to manipulate machine learning models to their advantage. While the visual analytics community has developed methods for opening the black box of machine learning models, little work has focused on helping the user understand their model vulnerabilities in the context of adversarial attacks. In this paper, we present a visual analytics framework for explaining and exploring model vulnerabilities to adversarial attacks. Our framework employs a multi-faceted visualization scheme designed to support the analysis of data poisoning attacks from the perspective of models, data instances, features, and local structures. We demonstrate our framework through two case studies on binary classifiers and illustrate model vulnerabilities with respect to varying attack strategies.
Collapse
|
33
|
Fujiwara T, Chou JK, Xu P, Ren L, Ma KL. An Incremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:418-428. [PMID: 31449024 DOI: 10.1109/tvcg.2019.2934433] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Dimensionality reduction (DR) methods are commonly used for analyzing and visualizing multidimensional data. However, when data is a live streaming feed, conventional DR methods cannot be directly used because of their computational complexity and inability to preserve the projected data positions at previous time points. In addition, the problem becomes even more challenging when the dynamic data records have a varying number of dimensions as often found in real-world applications. This paper presents an incremental DR solution. We enhance an existing incremental PCA method in several ways to ensure its usability for visualizing streaming multidimensional data. First, we use geometric transformation and animation methods to help preserve a viewer's mental map when visualizing the incremental results. Second, to handle data dimension variants, we use an optimization method to estimate the projected data positions, and also convey the resulting uncertainty in the visualization. We demonstrate the effectiveness of our design with two case studies using real-world datasets.
Collapse
|
34
|
Gehrmann S, Strobelt H, Kruger R, Pfister H, Rush AM. Visual Interaction with Deep Learning Models through Collaborative Semantic Inference. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:884-894. [PMID: 31425116 DOI: 10.1109/tvcg.2019.2934595] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Automation of tasks can have critical consequences when humans lose agency over decision processes. Deep learning models are particularly susceptible since current black-box approaches lack explainable reasoning. We argue that both the visual interface and model structure of deep learning systems need to take into account interaction design. We propose a framework of collaborative semantic inference (CSI) for the co-design of interactions and models to enable visual collaboration between humans and algorithms. The approach exposes the intermediate reasoning process of models which allows semantic interactions with the visual metaphors of a problem, which means that a user can both understand and control parts of the model reasoning process. We demonstrate the feasibility of CSI with a co-designed case study of a document summarization system.
Collapse
|
35
|
Behrisch M, Streeb D, Stoffel F, Seebacher D, Matejek B, Weber SH, Mittelstadt S, Pfister H, Keim D. Commercial Visual Analytics Systems-Advances in the Big Data Analytics Field. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:3011-3031. [PMID: 30059307 DOI: 10.1109/tvcg.2018.2859973] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Five years after the first state-of-the-art report on Commercial Visual Analytics Systems we present a reevaluation of the Big Data Analytics field. We build on the success of the 2012 survey, which was influential even beyond the boundaries of the InfoVis and Visual Analytics (VA) community. While the field has matured significantly since the original survey, we find that innovation and research-driven development are increasingly sacrificed to satisfy a wide range of user groups. We evaluate new product versions on established evaluation criteria, such as available features, performance, and usability, to extend on and assure comparability with the previous survey. We also investigate previously unavailable products to paint a more complete picture of the commercial VA landscape. Furthermore, we introduce novel measures, like suitability for specific user groups and the ability to handle complex data types, and undertake a new case study to highlight innovative features. We explore the achievements in the commercial sector in addressing VA challenges and propose novel developments that should be on systems' roadmaps in the coming years.
Collapse
|
36
|
Abstract
As visualization becomes widespread in a broad range of cross-disciplinary academic domains, such as the digital humanities (DH), critical voices have been raised on the perils of neglecting the uncertain character of data in the visualization design process. Visualizations that, purposely or not, obscure or remove uncertainty in its different forms from the scholars’ vision may negatively affect the manner in which humanities scholars regard computational methods as useful tools in their daily work. In this paper, we address the issue of uncertainty representation in the context of the humanities from a theoretical perspective, in an attempt to provide the foundations of a framework that allows for the construction of ecological interface designs which are able to expose the computational power of the algorithms at play while, at the same time, respecting the particularities and needs of humanistic research. To this end, we review past uncertainty taxonomies in other domains typically related to the humanities and visualization, such as cartography and GIScience. From this review, we select an uncertainty taxonomy related to the humanities that we link to recent research in visualization for the DH. Finally, we bring a novel analytics method developed by other authors (Progressive Visual Analytics) into question, which we argue can be a good candidate to resolve the aforementioned difficulties in DH practice.
Collapse
|
37
|
Abstract
Progressive visualization offers a great deal of promise for big data visualization; however, current progressive visualization systems do not allow for continuous interaction. What if users want to see more confident results on a subset of the visualization? This can happen when users are in exploratory analysis mode but want to ask some directed questions of the data as well. In a progressive visualization system, the online aggregation algorithm determines the database sampling rate and resulting convergence rate, not the user. In this paper, we extend a recent method in online aggregation, called Wander Join, that is optimized for queries that join tables, one of the most computationally expensive operations. This extension leverages importance sampling to enable user-driven sampling when data joins are in the query. We applied user interaction techniques that allow the user to view and adjust the convergence rate, providing more transparency and control over the online aggregation process. By leveraging importance sampling, our extension of Wander Join also allows for stratified sampling of groups when there is data distribution skew. We also improve the convergence rate of filtering queries, but with additional overhead costs not needed in the original Wander Join algorithm.
Collapse
|
38
|
Borland D, Wang W, Gotz D. Contextual Visualization. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2018; 38:17-23. [PMID: 30668452 DOI: 10.1109/mcg.2018.2874782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Unseen information can lead to various "threats to validity" when analyzing complex datasets using visual tools, resulting in potentially biased findings. We enumerate sources of unseen information and argue that a new focus on contextual visualization methods is needed to inform users of these threats and to mitigate their effects.
Collapse
|
39
|
Camisetty A, Chandurkar C, Sun M, Koop D. Enhancing Web-based Analytics Applications through Provenance. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:131-141. [PMID: 30346289 DOI: 10.1109/tvcg.2018.2865039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visual analytics systems continue to integrate new technologies and leverage modern environments for exploration and collaboration, making tools and techniques available to a wide audience through web browsers. Many of these systems have been developed with rich interactions, offering users the opportunity to examine details and explore hypotheses that have not been directly encoded by a designer. Understanding is enhanced when users can replay and revisit the steps in the sensemaking process, and in collaborative settings, it is especially important to be able to review not only the current state but also what decisions were made along the way. Unfortunately, many web-based systems lack the ability to capture such reasoning, and the path to a result is transient, forgotten when a user moves to a new view. This paper explores the requirements to augment existing client-side web applications with support for capturing, reviewing, sharing, and reusing steps in the reasoning process. Furthermore, it considers situations where decisions are made with streaming data, and the insights gained from revisiting those choices when more data is available. It presents a proof of concept, the Shareable Interactive Manipulation Provenance framework (SIMProv.js), that addresses these requirements in a modern, client-side JavaScript library, and describes how it can be integrated with existing frameworks.
Collapse
|
40
|
Liu R, Chen S, Ji G, Zhao B, Li Q, Su M. Interactive stratigraphic structure visualization for seismic data. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2018. [DOI: 10.1016/j.jvlc.2018.07.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
41
|
|
42
|
Duran D, Hermosilla P, Ropinski T, Kozlikova B, Vinacua A, Vazquez PP. Visualization of Large Molecular Trajectories. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:987-996. [PMID: 30207955 DOI: 10.1109/tvcg.2018.2864851] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The analysis of protein-ligand interactions is a time-intensive task. Researchers have to analyze multiple physico-chemical properties of the protein at once and combine them to derive conclusions about the protein-ligand interplay. Typically, several charts are inspected, and 3D animations can be played side-by-side to obtain a deeper understanding of the data. With the advances in simulation techniques, larger and larger datasets are available, with up to hundreds of thousands of steps. Unfortunately, such large trajectories are very difficult to investigate with traditional approaches. Therefore, the need for special tools that facilitate inspection of these large trajectories becomes substantial. In this paper, we present a novel system for visual exploration of very large trajectories in an interactive and user-friendly way. Several visualization motifs are automatically derived from the data to give the user the information about interactions between protein and ligand. Our system offers specialized widgets to ease and accelerate data inspection and navigation to interesting parts of the simulation. The system is suitable also for simulations where multiple ligands are involved. We have tested the usefulness of our tool on a set of datasets obtained from protein engineers, and we describe the expert feedback.
Collapse
|
43
|
Law PM, Liu Z, Malik S, Basole RC. MAQUI: Interweaving Queries and Pattern Mining for Recursive Event Sequence Exploration. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:396-406. [PMID: 30136954 DOI: 10.1109/tvcg.2018.2864886] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Exploring event sequences by defining queries alone or by using mining algorithms alone is often not sufficient to support analysis. Analysts often interweave querying and mining in a recursive manner during event sequence analysis: sequences extracted as query results are used for mining patterns, patterns generated are incorporated into a new query for segmenting the sequences, and the resulting segments are mined or queried again. To support flexible analysis, we propose a framework that describes the process of interwoven querying and mining. Based on this framework, we developed MAQUI, a Mining And Querying User Interface that enables recursive event sequence exploration. To understand the efficacy of MAQUI, we conducted two case studies with domain experts. The findings suggest that the capability of interweaving querying and mining helps the participants articulate their questions and gain novel insights from their data.
Collapse
|
44
|
Badam SK, Mathisen A, Radle R, Klokmose CN, Elmqvist N. Vistrates: A Component Model for Ubiquitous Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:586-596. [PMID: 30136988 DOI: 10.1109/tvcg.2018.2865144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visualization tools are often specialized for specific tasks, which turns the user's analytical workflow into a fragmented process performed across many tools. In this paper, we present a component model design for data visualization to promote modular designs of visualization tools that enhance their analytical scope. Rather than fragmenting tasks across tools, the component model supports unification, where components-the building blocks of this model-can be assembled to support a wide range of tasks. Furthermore, the model also provides additional key properties, such as support for collaboration, sharing across multiple devices, and adaptive usage depending on expertise, from creating visualizations using dropdown menus, through instantiating components, to actually modifying components or creating entirely new ones from scratch using JavaScript or Python source code. To realize our model, we introduce VISTRATES, a literate computing platform for developing, assembling, and sharing visualization components. From a visualization perspective, Vistrates features cross-cutting components for visual representations, interaction, collaboration, and device responsiveness maintained in a component repository. From a development perspective, Vistrates offers a collaborative programming environment where novices and experts alike can compose component pipelines for specific analytical activities. Finally, we present several Vistrates use cases that span the full range of the classic "anytime" and "anywhere" motto for ubiquitous analysis: from mobile and on-the-go usage, through office settings, to collaborative smart environments covering a variety of tasks and devices.
Collapse
|
45
|
Guo S, Jin Z, Gotz D, Du F, Zha H, Cao N. Visual Progression Analysis of Event Sequence Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:417-426. [PMID: 30136953 DOI: 10.1109/tvcg.2018.2864885] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Event sequence data is common to a broad range of application domains, from security to health care to scholarly communication. This form of data captures information about the progression of events for an individual entity (e.g., a computer network device; a patient; an author) in the form of a series of time-stamped observations. Moreover, each event is associated with an event type (e.g., a computer login attempt, or a hospital discharge). Analyses of event sequence data have been shown to help reveal important temporal patterns, such as clinical paths resulting in improved outcomes, or an understanding of common career trajectories for scholars. Moreover, recent research has demonstrated a variety of techniques designed to overcome methodological challenges such as large volumes of data and high dimensionality. However, the effective identification and analysis of latent stages of progression, which can allow for variation within different but similarly evolving event sequences, remain a significant challenge with important real-world motivations. In this paper, we propose an unsupervised stage analysis algorithm to identify semantically meaningful progression stages as well as the critical events which help define those stages. The algorithm follows three key steps: (1) event representation estimation, (2) event sequence warping and alignment, and (3) sequence segmentation. We also present a novel visualization system, ET2, which interactively illustrates the results of the stage analysis algorithm to help reveal evolution patterns across stages. Finally, we report three forms of evaluation for ET2: (1) case studies with two real-world datasets, (2) interviews with domain expert users, and (3) a performance evaluation on the progression analysis algorithm and the visualization design.
Collapse
|
46
|
Liu D, Xu P, Ren L. TPFlow: Progressive Partition and Multidimensional Pattern Extraction for Large-Scale Spatio-Temporal Data Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:1-11. [PMID: 30136965 DOI: 10.1109/tvcg.2018.2865018] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Consider a multi-dimensional spatio-temporal (ST) dataset where each entry is a numerical measure defined by the corresponding temporal, spatial and other domain-specific dimensions. A typical approach to explore such data utilizes interactive visualizations with multiple coordinated views. Each view displays the aggregated measures along one or two dimensions. By brushing on the views, analysts can obtain detailed information. However, this approach often cannot provide sufficient guidance for analysts to identify patterns hidden within subsets of data. Without a priori hypotheses, analysts need to manually select and iterate through different slices to search for patterns, which can be a tedious and lengthy process. In this work, we model multidimensional ST data as tensors and propose a novel piecewise rank-one tensor decomposition algorithm which supports automatically slicing the data into homogeneous partitions and extracting the latent patterns in each partition for comparison and visual summarization. The algorithm optimizes a quantitative measure about how faithfully the extracted patterns visually represent the original data. Based on the algorithm we further propose a visual analytics framework that supports a top-down, progressive partitioning workflow for level-of-detail multidimensional ST data exploration. We demonstrate the general applicability and effectiveness of our technique on three datasets from different application domains: regional sales trend analysis, customer traffic analysis in department stores, and taxi trip analysis with origin-destination (OD) data. We further interview domain experts to verify the usability of the prototype.
Collapse
|
47
|
Dudley JJ, Kristensson PO. A Review of User Interface Design for Interactive Machine Learning. ACM T INTERACT INTEL 2018. [DOI: 10.1145/3185517] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Interactive Machine Learning (IML) seeks to complement human perception and intelligence by tightly integrating these strengths with the computational power and speed of computers. The interactive process is designed to involve input from the user but does not require the background knowledge or experience that might be necessary to work with more traditional machine learning techniques. Under the IML process, non-experts can apply their domain knowledge and insight over otherwise unwieldy datasets to find patterns of interest or develop complex data-driven applications. This process is co-adaptive in nature and relies on careful management of the interaction between human and machine. User interface design is fundamental to the success of this approach, yet there is a lack of consolidated principles on how such an interface should be implemented. This article presents a detailed review and characterisation of Interactive Machine Learning from an interactive systems perspective. We propose and describe a structural and behavioural model of a generalised IML system and identify solution principles for building effective interfaces for IML. Where possible, these emergent solution principles are contextualised by reference to the broader human-computer interaction literature. Finally, we identify strands of user interface research key to unlocking more efficient and productive non-expert interactive machine learning applications.
Collapse
|
48
|
|
49
|
Vrotsou K, Nordman A. Exploratory Visual Sequence Mining Based on Pattern-Growth. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:2597-2610. [PMID: 29994660 DOI: 10.1109/tvcg.2018.2848247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Sequential pattern mining finds applications in numerous diverging fields. Due to the problem's combinatorial nature, two main challenges arise. First, existing algorithms output large numbers of patterns many of which are uninteresting from a user's perspective. Second, as datasets grow, mining large number of patterns gets computationally expensive. There is, thus, a need for mining approaches that make it possible to focus the pattern search towards directions of interest. This work tackles this problem by combining interactive visualization with sequential pattern mining in order to create a "transparent box" execution model. We propose a novel approach to interactive visual sequence mining that allows the user to guide the execution of a pattern-growth algorithm at suitable points through a powerful visual interface. Our approach (1) introduces the possibility of using local constraints during the mining process, (2) allows stepwise visualization of patterns being mined, and (3) enables the user to steer the mining algorithm towards directions of interest. The use of local constraints significantly improves users' capability to progressively refine the search space without the need to restart computations. We exemplify our approach using two event sequence datasets; one composed of web page visits and another composed of individuals' activity sequences.
Collapse
|
50
|
Ceravolo P, Azzini A, Angelini M, Catarci T, Cudré-Mauroux P, Damiani E, Mazak A, Van Keulen M, Jarrar M, Santucci G, Sattler KU, Scannapieco M, Wimmer M, Wrembel R, Zaraket F. Big Data Semantics. JOURNAL ON DATA SEMANTICS 2018. [DOI: 10.1007/s13740-018-0086-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|