1
|
Chen M, Liu Y, Wall E. Unmasking Dunning-Kruger Effect in Visual Reasoning & Judgment. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:743-753. [PMID: 39288064 DOI: 10.1109/tvcg.2024.3456326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
The Dunning-Kruger Effect (DKE) is a metacognitive phenomenon where low-skilled individuals tend to overestimate their competence while high-skilled individuals tend to underestimate their competence. This effect has been observed in a number of domains including humor, grammar, and logic. In this paper, we explore if and how DKE manifests in visual reasoning and judgment tasks. Across two online user studies involving (1) a sliding puzzle game and (2) a scatterplot-based categorization task, we demonstrate that individuals are susceptible to DKE in visual reasoning and judgment tasks: those who performed best underestimated their performance, while bottom performers overestimated their performance. In addition, we contribute novel analyses that correlate susceptibility of DKE with personality traits and user interactions. Our findings pave the way for novel modes of bias detection via interaction patterns and establish promising directions towards interventions tailored to an individual's personality traits. All materials and analyses are in supplemental materials: https://github.com/CAV-Lab/DKE_supplemental.git.
Collapse
|
2
|
Narechania A, Odak K, El-Assady M, Endert A. ProvenanceWidgets: A Library of UI Control Elements to Track and Dynamically Overlay Analytic Provenance. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:1235-1245. [PMID: 39250388 DOI: 10.1109/tvcg.2024.3456144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
We present ProvenanceWidgets, a Javascript library of UI control elements such as radio buttons, checkboxes, and dropdowns to track and dynamically overlay a user's analytic provenance. These in situ overlays not only save screen space but also minimize the amount of time and effort needed to access the same information from elsewhere in the UI. In this paper, we discuss how we design modular UI control elements to track how often and how recently a user interacts with them and design visual overlays showing an aggregated summary as well as a detailed temporal history. We demonstrate the capability of ProvenanceWidgets by recreating three prior widget libraries: (1) Scented Widgets, (2) Phosphor objects, and (3) Dynamic Query Widgets. We also evaluated its expressiveness and conducted case studies with visualization developers to evaluate its effectiveness. We find that ProvenanceWidgets enables developers to implement custom provenance-tracking applications effectively. ProvenanceWidgets is available as open-source software at https://github.com/ProvenanceWidgets to help application developers build custom provenance-based systems.
Collapse
|
3
|
Walch A, Szabo A, Steinlechner H, Ortner T, Groller E, Schmidt J. BEMTrace: Visualization-Driven Approach for Deriving Building Energy Models from BIM. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:240-250. [PMID: 39312422 DOI: 10.1109/tvcg.2024.3456315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Building Information Modeling (BIM) describes a central data pool covering the entire life cycle of a construction project. Similarly, Building Energy Modeling (BEM) describes the process of using a 3D representation of a building as a basis for thermal simulations to assess the building's energy performance. This paper explores the intersection of BIM and BEM, focusing on the challenges and methodologies in converting BIM data into BEM representations for energy performance analysis. BEMTrace integrates 3D data wrangling techniques with visualization methodologies to enhance the accuracy and traceability of the BIM-to-BEM conversion process. Through parsing, error detection, and algorithmic correction of BIM data, our methods generate valid BEM models suitable for energy simulation. Visualization techniques provide transparent insights into the conversion process, aiding error identification, validation, and user comprehension. We introduce context-adaptive selections to facilitate user interaction and to show that the BEMTrace workflow helps users understand complex 3D data wrangling processes.
Collapse
|
4
|
Han C, Isaacs KE. A Deixis-Centered Approach for Documenting Remote Synchronous Communication Around Data Visualizations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:930-940. [PMID: 39255160 DOI: 10.1109/tvcg.2024.3456351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Referential gestures, or as termed in linguistics, deixis, are an essential part of communication around data visualizations. Despite their importance, such gestures are often overlooked when documenting data analysis meetings. Transcripts, for instance, fail to capture gestures, and video recordings may not adequately capture or emphasize them. We introduce a novel method for documenting collaborative data meetings that treats deixis as a first-class citizen. Our proposed framework captures cursor-based gestural data along with audio and converts them into interactive documents. The framework leverages a large language model to identify word correspondences with gestures. These identified references are used to create context-based annotations in the resulting interactive document. We assess the effectiveness of our proposed method through a user study, finding that participants preferred our automated interactive documentation over recordings, transcripts, and manual note-taking. Furthermore, we derive a preliminary taxonomy of cursor-based deictic gestures from participant actions during the study. This taxonomy offers further opportunities for better utilizing cursor-based deixis in collaborative data analysis scenarios.
Collapse
|
5
|
Burns A, Lee C, On T, Xiong C, Peck E, Mahyar N. From Invisible to Visible: Impacts of Metadata in Communicative Data Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:3427-3443. [PMID: 37015379 DOI: 10.1109/tvcg.2022.3231716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Leaving the context of visualizations invisible can have negative impacts on understanding and transparency. While common wisdom suggests that recontextualizing visualizations with metadata (e.g., disclosing the data source or instructions for decoding the visualizations' encoding) may counter these effects, the impact remains largely unknown. To fill this gap, we conducted two experiments. In Experiment 1, we explored how chart type, topic, and user goal impacted which categories of metadata participants deemed most relevant. We presented 64 participants with four real-world visualizations. For each visualization, participants were given four goals and selected the type of metadata they most wanted from a set of 18 types. Our results indicated that participants were most interested in metadata which explained the visualization's encoding for goals related to understanding and metadata about the source of the data for assessing trustworthiness. In Experiment 2, we explored how these two types of metadata impact transparency, trustworthiness and persuasiveness, information relevance, and understanding. We asked 144 participants to explain the main message of two pairs of visualizations (one with metadata and one without); rate them on scales of transparency and relevance; and then predict the likelihood that they were selected for a presentation to policymakers. Our results suggested that visualizations with metadata were perceived as more thorough than those without metadata, but similarly relevant, accurate, clear, and complete. Additionally, we found that metadata did not impact the accuracy of the information extracted from visualizations, but may have influenced which information participants remembered as important or interesting.
Collapse
|
6
|
Köhler CA, Ulianych D, Grün S, Decker S, Denker M. Facilitating the Sharing of Electrophysiology Data Analysis Results Through In-Depth Provenance Capture. eNeuro 2024; 11:ENEURO.0476-23.2024. [PMID: 38777610 PMCID: PMC11181106 DOI: 10.1523/eneuro.0476-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 02/28/2024] [Accepted: 04/13/2024] [Indexed: 05/25/2024] Open
Abstract
Scientific research demands reproducibility and transparency, particularly in data-intensive fields like electrophysiology. Electrophysiology data are typically analyzed using scripts that generate output files, including figures. Handling these results poses several challenges due to the complexity and iterative nature of the analysis process. These stem from the difficulty to discern the analysis steps, parameters, and data flow from the results, making knowledge transfer and findability challenging in collaborative settings. Provenance information tracks data lineage and processes applied to it, and provenance capture during the execution of an analysis script can address those challenges. We present Alpaca (Automated Lightweight Provenance Capture), a tool that captures fine-grained provenance information with minimal user intervention when running data analysis pipelines implemented in Python scripts. Alpaca records inputs, outputs, and function parameters and structures information according to the W3C PROV standard. We demonstrate the tool using a realistic use case involving multichannel local field potential recordings of a neurophysiological experiment, highlighting how the tool makes result details known in a standardized manner in order to address the challenges of the analysis process. Ultimately, using Alpaca will help to represent results according to the FAIR principles, which will improve research reproducibility and facilitate sharing the results of data analyses.
Collapse
Affiliation(s)
- Cristiano A Köhler
- Institute for Advanced Simulation (IAS-6) and JARA Institute Brain Structure-Function Relationships (INM-10), Jülich Research Centre, 52428 Jülich, Germany
- Theoretical Systems Neurobiology, RWTH Aachen University, 52062 Aachen, Germany
| | - Danylo Ulianych
- Institute for Advanced Simulation (IAS-6) and JARA Institute Brain Structure-Function Relationships (INM-10), Jülich Research Centre, 52428 Jülich, Germany
| | - Sonja Grün
- Institute for Advanced Simulation (IAS-6) and JARA Institute Brain Structure-Function Relationships (INM-10), Jülich Research Centre, 52428 Jülich, Germany
- Theoretical Systems Neurobiology, RWTH Aachen University, 52062 Aachen, Germany
| | - Stefan Decker
- Chair of Databases and Information Systems, RWTH Aachen University, 52074 Aachen, Germany
- Fraunhofer Institute for Applied Information Technology (FIT), 53757 Sankt Augustin, Germany
| | - Michael Denker
- Institute for Advanced Simulation (IAS-6) and JARA Institute Brain Structure-Function Relationships (INM-10), Jülich Research Centre, 52428 Jülich, Germany
| |
Collapse
|
7
|
Lavikka K, Oikkonen J, Li Y, Muranen T, Micoli G, Marchi G, Lahtinen A, Huhtinen K, Lehtonen R, Hietanen S, Hynninen J, Virtanen A, Hautaniemi S. Deciphering cancer genomes with GenomeSpy: a grammar-based visualization toolkit. Gigascience 2024; 13:giae040. [PMID: 39101783 PMCID: PMC11299109 DOI: 10.1093/gigascience/giae040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 05/13/2024] [Accepted: 06/19/2024] [Indexed: 08/06/2024] Open
Abstract
BACKGROUND Visualization is an indispensable facet of genomic data analysis. Despite the abundance of specialized visualization tools, there remains a distinct need for tailored solutions. However, their implementation typically requires extensive programming expertise from bioinformaticians and software developers, especially when building interactive applications. Toolkits based on visualization grammars offer a more accessible, declarative way to author new visualizations. Yet, current grammar-based solutions fall short in adequately supporting the interactive analysis of large datasets with extensive sample collections, a pivotal task often encountered in cancer research. FINDINGS We present GenomeSpy, a grammar-based toolkit for authoring tailored, interactive visualizations for genomic data analysis. By using combinatorial building blocks and a declarative language, users can implement new visualization designs easily and embed them in web pages or end-user-oriented applications. A distinctive element of GenomeSpy's architecture is its effective use of the graphics processing unit in all rendering, enabling a high frame rate and smoothly animated interactions, such as navigation within a genome. We demonstrate the utility of GenomeSpy by characterizing the genomic landscape of 753 ovarian cancer samples from patients in the DECIDER clinical trial. Our results expand the understanding of the genomic architecture in ovarian cancer, particularly the diversity of chromosomal instability. CONCLUSIONS GenomeSpy is a visualization toolkit applicable to a wide range of tasks pertinent to genome analysis. It offers high flexibility and exceptional performance in interactive analysis. The toolkit is open source with an MIT license, implemented in JavaScript, and available at https://genomespy.app/.
Collapse
Affiliation(s)
- Kari Lavikka
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Jaana Oikkonen
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Yilin Li
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Taru Muranen
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Giulia Micoli
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Giovanni Marchi
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Alexandra Lahtinen
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Kaisa Huhtinen
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
- Cancer Research Unit, Institute of Biomedicine and FICAN West Cancer Centre, University of Turku, 20521 Turku, Finland
| | - Rainer Lehtonen
- Applied Tumor Genomics Research Program, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| | - Sakari Hietanen
- Department of Obstetrics and Gynecology, University of Turku and Turku University Hospital, 20521 Turku, Finland
| | - Johanna Hynninen
- Department of Obstetrics and Gynecology, University of Turku and Turku University Hospital, 20521 Turku, Finland
| | - Anni Virtanen
- Department of Pathology, University of Helsinki and HUS Diagnostic Center, Helsinki University Hospital, 00260 Helsinki, Finland
| | - Sampsa Hautaniemi
- Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland
| |
Collapse
|
8
|
Xiong K, Fu S, Ding G, Luo Z, Yu R, Chen W, Bao H, Wu Y. Visualizing the Scripts of Data Wrangling With Somnus. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:2950-2964. [PMID: 35077364 DOI: 10.1109/tvcg.2022.3144975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Data workers use various scripting languages for data transformation, such as SAS, R, and Python. However, understanding intricate code pieces requires advanced programming skills, which hinders data workers from grasping the idea of data transformation at ease. Program visualization is beneficial for debugging and education and has the potential to illustrate transformations intuitively and interactively. In this article, we explore visualization design for demonstrating the semantics of code pieces in the context of data transformation. First, to depict individual data transformations, we structure a design space by two primary dimensions, i.e., key parameters to encode and possible visual channels to be mapped. Then, we derive a collection of 23 glyphs that visualize the semantics of transformations. Next, we design a pipeline, named Somnus, that provides an overview of the creation and evolution of data tables using a provenance graph. At the same time, it allows detailed investigation of individual transformations. User feedback on Somnus is positive. Our study participants achieved better accuracy with less time using Somnus, and preferred it over carefully-crafted textual description. Further, we provide two example applications to demonstrate the utility and versatility of Somnus.
Collapse
|
9
|
Elzen SVD, Andrienko G, Andrienko N, Fisher BD, Martins RM, Peltonen J, Telea AC, Verleysen M, Rhyne TM. The Flow of Trust: A Visualization Framework to Externalize, Explore, and Explain Trust in ML Applications. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2023; 43:78-88. [PMID: 37030833 DOI: 10.1109/mcg.2023.3237286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
We present a conceptual framework for the development of visual interactive techniques to formalize and externalize trust in machine learning (ML) workflows. Currently, trust in ML applications is an implicit process that takes place in the user's mind. As such, there is no method of feedback or communication of trust that can be acted upon. Our framework will be instrumental in developing interactive visualization approaches that will help users to efficiently and effectively build and communicate trust in ways that fit each of the ML process stages. We formulate several research questions and directions that include: 1) a typology/taxonomy of trust objects, trust issues, and possible reasons for (mis)trust; 2) formalisms to represent trust in machine-readable form; 3) means by which users can express their state of trust by interacting with a computer system (e.g., text, drawing, marking); 4) ways in which a system can facilitate users' expression and communication of the state of trust; and 5) creation of visual interactive techniques for representation and exploration of trust over all stages of an ML pipeline.
Collapse
|
10
|
Block JE, Esmaeili S, Ragan ED, Goodall JR, Richardson GD. The Influence of Visual Provenance Representations on Strategies in a Collaborative Hand-off Data Analysis Scenario. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1113-1123. [PMID: 36155463 DOI: 10.1109/tvcg.2022.3209495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Conducting data analysis tasks rarely occur in isolation. Especially in intelligence analysis scenarios where different experts contribute knowledge to a shared understanding, members must communicate how insights develop to establish common ground among collaborators. The use of provenance to communicate analytic sensemaking carries promise by describing the interactions and summarizing the steps taken to reach insights. Yet, no universal guidelines exist for communicating provenance in different settings. Our work focuses on the presentation of provenance information and the resulting conclusions reached and strategies used by new analysts. In an open-ended, 30-minute, textual exploration scenario, we qualitatively compare how adding different types of provenance information (specifically data coverage and interaction history) affects analysts' confidence in conclusions developed, propensity to repeat work, filtering of data, identification of relevant information, and typical investigation strategies. We see that data coverage (i.e., what was interacted with) provides provenance information without limiting individual investigation freedom. On the other hand, while interaction history (i.e., when something was interacted with) does not significantly encourage more mimicry, it does take more time to comfortably understand, as represented by less confident conclusions and less relevant information-gathering behaviors. Our results contribute empirical data towards understanding how provenance summarizations can influence analysis behaviors.
Collapse
|
11
|
Ha S, Monadjemi S, Garnett R, Ottley A. A Unified Comparison of User Modeling Techniques for Predicting Data Interaction and Detecting Exploration Bias. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:483-492. [PMID: 36155457 DOI: 10.1109/tvcg.2022.3209476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The visual analytics community has proposed several user modeling algorithms to capture and analyze users' interaction behavior in order to assist users in data exploration and insight generation. For example, some can detect exploration biases while others can predict data points that the user will interact with before that interaction occurs. Researchers believe this collection of algorithms can help create more intelligent visual analytics tools. However, the community lacks a rigorous evaluation and comparison of these existing techniques. As a result, there is limited guidance on which method to use and when. Our paper seeks to fill in this missing gap by comparing and ranking eight user modeling algorithms based on their performance on a diverse set of four user study datasets. We analyze exploration bias detection, data interaction prediction, and algorithmic complexity, among other measures. Based on our findings, we highlight open challenges and new directions for analyzing user interactions and visualization provenance.
Collapse
|
12
|
Lin S, Xu Z, Sheng Y, Chen L, Chen J. AT-NeuroEAE: A Joint Extraction Model of Events With Attributes for Research Sharing-Oriented Neuroimaging Provenance Construction. Front Neurosci 2022; 15:739535. [PMID: 35321479 PMCID: PMC8936590 DOI: 10.3389/fnins.2021.739535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Accepted: 12/20/2021] [Indexed: 11/13/2022] Open
Abstract
Provenances are a research focus of neuroimaging resources sharing. An amount of work has been done to construct high-quality neuroimaging provenances in a standardized and convenient way. However, besides existing processed-based provenance extraction methods, open research sharing in computational neuroscience still needs one way to extract provenance information from rapidly growing published resources. This paper proposes a literature mining-based approach for research sharing-oriented neuroimaging provenance construction. A group of neuroimaging event-containing attributes are defined to model the whole process of neuroimaging researches, and a joint extraction model based on deep adversarial learning, called AT-NeuroEAE, is proposed to realize the event extraction in a few-shot learning scenario. Finally, a group of experiments were performed on the real data set from the journal PLOS ONE. Experimental results show that the proposed method provides a practical approach to quickly collect research information for neuroimaging provenance construction oriented to open research sharing.
Collapse
Affiliation(s)
- Shaofu Lin
- Faculty of Information Technology, Beijing University of Technology, Beijing, China
- Beijing Institute of Smart City, Beijing University of Technology, Beijing, China
| | - Zhe Xu
- Faculty of Information Technology, Beijing University of Technology, Beijing, China
| | - Ying Sheng
- Faculty of Information Technology, Beijing University of Technology, Beijing, China
| | - Lihong Chen
- Faculty of Information Technology, Beijing University of Technology, Beijing, China
- Engineering Research Center of Digital Community, Beijing University of Technology, Beijing, China
| | - Jianhui Chen
- Faculty of Information Technology, Beijing University of Technology, Beijing, China
- Beijing Key Laboratory of Magnetic Resonance Imaging (MRI) and Brain Informatics, Beijing University of Technology, Beijing, China
- Beijing International Collaboration Base on Brain Informatics and Wisdom Services, Beijing University of Technology, Beijing, China
- *Correspondence: Jianhui Chen,
| |
Collapse
|
13
|
Visionary: a framework for analysis and visualization of provenance data. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-021-01645-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
14
|
Fujiwara T, Wei X, Zhao J, Ma KL. Interactive Dimensionality Reduction for Comparative Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:758-768. [PMID: 34591765 DOI: 10.1109/tvcg.2021.3114807] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Finding the similarities and differences between groups of datasets is a fundamental analysis task. For high-dimensional data, dimensionality reduction (DR) methods are often used to find the characteristics of each group. However, existing DR methods provide limited capability and flexibility for such comparative analysis as each method is designed only for a narrow analysis target, such as identifying factors that most differentiate groups. This paper presents an interactive DR framework where we integrate our new DR method, called ULCA (unified linear comparative analysis), with an interactive visual interface. ULCA unifies two DR schemes, discriminant analysis and contrastive learning, to support various comparative analysis tasks. To provide flexibility for comparative analysis, we develop an optimization algorithm that enables analysts to interactively refine ULCA results. Additionally, the interactive visualization interface facilitates interpretation and refinement of the ULCA results. We evaluate ULCA and the optimization algorithm to show their efficiency as well as present multiple case studies using real-world datasets to demonstrate the usefulness of this framework.
Collapse
|
15
|
Narechania A, Coscia A, Wall E, Endert A. Lumos: Increasing Awareness of Analytic Behavior during Visual Data Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1009-1018. [PMID: 34587059 DOI: 10.1109/tvcg.2021.3114827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Visual data analysis tools provide people with the agency and flexibility to explore data using a variety of interactive functionalities. However, this flexibility may introduce potential consequences in situations where users unknowingly overemphasize or underemphasize specific subsets of the data or attribute space they are analyzing. For example, users may overemphasize specific attributes and/or their values (e.g., Gender is always encoded on the X axis), underemphasize others (e.g., Religion is never encoded), ignore a subset of the data (e.g., older people are filtered out), etc. In response, we present Lumos, a visual data analysis tool that captures and shows the interaction history with data to increase awareness of such analytic behaviors. Using in-situ (at the place of interaction) and ex-situ (in an external view) visualization techniques, Lumos provides real-time feedback to users for them to reflect on their activities. For example, Lumos highlights datapoints that have been previously examined in the same visualization (in-situ) and also overlays them on the underlying data distribution (i.e., baseline distribution) in a separate visualization (ex-situ). Through a user study with 24 participants, we investigate how Lumos helps users' data exploration and decision-making processes. We found that Lumos increases users' awareness of visual data analysis practices in real-time, promoting reflection upon and acknowledgement of their intentions and potentially influencing subsequent interactions.
Collapse
|
16
|
Segura V, Barbosa SDJ. BONNIE: Building Online Narratives from Noteworthy Interaction Events. ACM T INTERACT INTEL 2021. [DOI: 10.1145/3423048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Nowadays, we have access to data of unprecedented volume, high dimensionality, and complexity. To extract novel insights from such complex and dynamic data, we need effective and efficient strategies. One such strategy is to combine data analysis and visualization techniques, which are the essence of visual analytics applications. After the knowledge discovery process, a major challenge is to filter the essential information that has led to a discovery and to communicate the findings to other people, explaining the decisions they may have made based on the data. We propose to record and use the trace left by the exploratory data analysis, in the form of user interaction history, to aid this process. With the trace, users can choose the desired interaction steps and create a narrative, sharing the acquired knowledge with readers. To achieve our goal, we have developed the
BONNIE
(
Building Online Narratives from Noteworthy Interaction Events
) framework. BONNIE comprises a log model to register the interaction events, auxiliary code to help developers instrument their own code, and an environment to view users’ own interaction history and build narratives. This article presents our proposal for communicating discoveries in visual analytics applications, the BONNIE framework, and the studies we conducted to evaluate our solution. After two user studies (the first one focused on history visualization and the second one focused on narrative creation), our solution has showed to be promising, with mostly positive feedback and results from a
Technology Acceptance Model
(
TAM
) questionnaire.
Collapse
|
17
|
Corvo A, Caballero HSG, Westenberg MA, van Driel MA, van Wijk JJ. Visual Analytics for Hypothesis-Driven Exploration in Computational Pathology. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3851-3866. [PMID: 32340951 DOI: 10.1109/tvcg.2020.2990336] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recent advances in computational and algorithmic power are evolving the field of medical imaging rapidly. In cancer research, many new directions are sought to characterize patients with additional imaging features derived from radiology and pathology images. The emerging field of Computational Pathology targets the high-throughput extraction and analysis of the spatial distribution of cells from digital histopathology images. The associated morphological and architectural features allow researchers to quantify and characterize new imaging biomarkers for cancer diagnosis, prognosis, and treatment decisions. However, while the image feature space grows, exploration and analysis become more difficult and ineffective. There is a need for dedicated interfaces for interactive data manipulation and visual analysis of computational pathology and clinical data. For this purpose, we present IIComPath, a visual analytics approach that enables clinical researchers to formulate hypotheses and create computational pathology pipelines involving cohort construction, spatial analysis of image-derived features, and cohort analysis. We demonstrate our approach through use cases that investigate the prognostic value of current diagnostic features and new computational pathology biomarkers.
Collapse
|
18
|
He C, Micallef L, He L, Peddinti G, Aittokallio T, Jacucci G. Characterizing the Quality of Insight by Interactions: A Case Study. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3410-3424. [PMID: 32142444 DOI: 10.1109/tvcg.2020.2977634] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Understanding the quality of insight has become increasingly important with the trend of allowing users to post comments during visual exploration, yet approaches for qualifying insight are rare. This article presents a case study to investigate the possibility of characterizing the quality of insight via the interactions performed. To do this, we devised the interaction of a visualization tool-MediSyn-for insight generation. MediSyn supports five types of interactions: selecting, connecting, elaborating, exploring, and sharing. We evaluated MediSyn with 14 participants by allowing them to freely explore the data and generate insights. We then extracted seven interaction patterns from their interaction logs and correlated the patterns to four aspects of insight quality. The results show the possibility of qualifying insights via interactions. Among other findings, exploration actions can lead to unexpected insights; the drill-down pattern tends to increase the domain values of insights. A qualitative analysis shows that using domain knowledge to guide exploration can positively affect the domain value of derived insights. We discuss the study's implications, lessons learned, and future research opportunities.
Collapse
|
19
|
Kasica S, Berret C, Munzner T. Table Scraps: An Actionable Framework for Multi-Table Data Wrangling From An Artifact Study of Computational Journalism. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:957-966. [PMID: 33074823 DOI: 10.1109/tvcg.2020.3030462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
For the many journalists who use data and computation to report the news, data wrangling is an integral part of their work. Despite an abundance of literature on data wrangling in the context of enterprise data analysis, little is known about the specific operations, processes, and pain points journalists encounter while performing this tedious, time-consuming task. To better understand the needs of this user group, we conduct a technical observation study of 50 public repositories of data and analysis code authored by 33 professional journalists at 26 news organizations. We develop two detailed and cross-cutting taxonomies of data wrangling in computational journalism, for actions and for processes. We observe the extensive use of multiple tables, a notable gap in previous wrangling analyses. We develop a concise, actionable framework for general multi-table data wrangling that includes wrangling operations documented in our taxonomy that are without clear parallels in other work. This framework, the first to incorporate tables as first-class objects, will support future interactive wrangling tools for both computational journalism and general-purpose use. We assess the generative and descriptive power of our framework through discussion of its relationship to our set of taxonomies.
Collapse
|
20
|
Chatzimparmpas A, Martins RM, Kucher K, Kerren A. StackGenVis: Alignment of Data, Algorithms, and Models for Stacking Ensemble Learning Using Performance Metrics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1547-1557. [PMID: 33048687 DOI: 10.1109/tvcg.2020.3030352] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In machine learning (ML), ensemble methods-such as bagging, boosting, and stacking-are widely-established approaches that regularly achieve top-notch predictive performance. Stacking (also called "stacked generalization") is an ensemble method that combines heterogeneous base models, arranged in at least one layer, and then employs another metamodel to summarize the predictions of those models. Although it may be a highly-effective approach for increasing the predictive performance of ML, generating a stack of models from scratch can be a cumbersome trial-and-error process. This challenge stems from the enormous space of available solutions, with different sets of data instances and features that could be used for training, several algorithms to choose from, and instantiations of these algorithms using diverse parameters (i.e., models) that perform differently according to various metrics. In this work, we present a knowledge generation model, which supports ensemble learning with the use of visualization, and a visual analytics system for stacked generalization. Our system, StackGenVis, assists users in dynamically adapting performance metrics, managing data instances, selecting the most important features for a given data set, choosing a set of top-performant and diverse algorithms, and measuring the predictive performance. In consequence, our proposed tool helps users to decide between distinct models and to reduce the complexity of the resulting stack by removing overpromising and underperforming models. The applicability and effectiveness of StackGenVis are demonstrated with two use cases: a real-world healthcare data set and a collection of data related to sentiment/stance detection in texts. Finally, the tool has been evaluated through interviews with three ML experts.
Collapse
|
21
|
Liu J, Dwyer T, Tack G, Gratzl S, Marriott K. Supporting the Problem-Solving Loop: Designing Highly Interactive Optimisation Systems. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1764-1774. [PMID: 33112748 DOI: 10.1109/tvcg.2020.3030364] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Efficient optimisation algorithms have become important tools for finding high-quality solutions to hard, real-world problems such as production scheduling, timetabling, or vehicle routing. These algorithms are typically "black boxes" that work on mathematical models of the problem to solve. However, many problems are difficult to fully specify, and require a "human in the loop" who collaborates with the algorithm by refining the model and guiding the search to produce acceptable solutions. Recently, the Problem-Solving Loop was introduced as a high-level model of such interactive optimisation. Here, we present and evaluate nine recommendations for the design of interactive visualisation tools supporting the Problem-Solving Loop. They range from the choice of visual representation for solutions and constraints to the use of a solution gallery to support exploration of alternate solutions. We first examined the applicability of the recommendations by investigating how well they had been supported in previous interactive optimisation tools. We then evaluated the recommendations in the context of the vehicle routing problem with time windows (VRPTW). To do so we built a sophisticated interactive visual system for solving VRPTW that was informed by the recommendations. Ten participants then used this system to solve a variety of routing problems. We report on participant comments and interaction patterns with the tool. These showed the tool was regarded as highly usable and the results generally supported the usefulness of the underlying recommendations.
Collapse
|
22
|
Zakopcanova K, Rehacek M, Batrna J, Plakinger D, Stoppel S, Kozlikova B. Visilant: Visual Support for the Exploration and Analytical Process Tracking in Criminal Investigations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:881-890. [PMID: 33048690 DOI: 10.1109/tvcg.2020.3030356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The daily routine of criminal investigators consists of a thorough analysis of highly complex and heterogeneous data of crime cases. Such data can consist of case descriptions, testimonies, criminal networks, spatial and temporal information, and virtually any other data that is relevant for the case. Criminal investigators work under heavy time pressure to analyze the data for relationships, propose and verify several hypotheses, and derive conclusions, while the data can be incomplete or inconsistent and is changed and updated throughout the investigation, as new findings are added to the case. Based on a four-year intense collaboration with criminalists, we present a conceptual design for a visual tool supporting the investigation workflow and Visilant, a web-based tool for the exploration and analysis of criminal data guided by the proposed design. Visilant aims to support namely the exploratory part of the investigation pipeline, from case overview, through exploration and hypothesis generation, to the case presentation. Visilant tracks the reasoning process and as the data is changing, it informs investigators which hypotheses are affected by the data change and should be revised. The tool was evaluated by senior criminology experts within two sessions and their feedback is summarized in the paper. Additional supplementary material contains the technical details and exemplary case study.
Collapse
|
23
|
Monadjemi S, Garnett R, Ottley A. Competing Models: Inferring Exploration Patterns and Information Relevance via Bayesian Model Selection. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:412-421. [PMID: 33052859 DOI: 10.1109/tvcg.2020.3030430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Analyzing interaction data provides an opportunity to learn about users, uncover their underlying goals, and create intelligent visualization systems. The first step for intelligent response in visualizations is to enable computers to infer user goals and strategies through observing their interactions with a system. Researchers have proposed multiple techniques to model users, however, their frameworks often depend on the visualization design, interaction space, and dataset. Due to these dependencies, many techniques do not provide a general algorithmic solution to user exploration modeling. In this paper, we construct a series of models based on the dataset and pose user exploration modeling as a Bayesian model selection problem where we maintain a belief over numerous competing models that could explain user interactions. Each of these competing models represent an exploration strategy the user could adopt during a session. The goal of our technique is to make high-level and in-depth inferences about the user by observing their low-level interactions. Although our proposed idea is applicable to various probabilistic model spaces, we demonstrate a specific instance of encoding exploration patterns as competing models to infer information relevance. We validate our technique's ability to infer exploration bias, predict future interactions, and summarize an analytic session using user study datasets. Our results indicate that depending on the application, our method outperforms established baselines for bias detection and future interaction prediction. Finally, we discuss future research directions based on our proposed modeling paradigm and suggest how practitioners can use this method to build intelligent visualization systems that understand users' goals and adapt to improve the exploration process.
Collapse
|
24
|
Abstract
This article attempts to bridge the gap between widely discussed ethical principles of Human-centered AI (HCAI) and practical steps for effective governance. Since HCAI systems are developed and implemented in multiple organizational structures, I propose 15 recommendations at three levels of governance: team, organization, and industry. The recommendations are intended to increase the reliability, safety, and trustworthiness of HCAI systems: (1) reliable systems based on sound software engineering practices, (2) safety culture through business management strategies, and (3) trustworthy certification by independent oversight. Software engineering practices within teams include audit trails to enable analysis of failures, software engineering workflows, verification and validation testing, bias testing to enhance fairness, and explainable user interfaces. The safety culture within organizations comes from management strategies that include leadership commitment to safety, hiring and training oriented to safety, extensive reporting of failures and near misses, internal review boards for problems and future plans, and alignment with industry standard practices. The trustworthiness certification comes from industry-wide efforts that include government interventions and regulation, accounting firms conducting external audits, insurance companies compensating for failures, non-governmental and civil society organizations advancing design principles, and professional organizations and research institutes developing standards, policies, and novel ideas. The larger goal of effective governance is to limit the dangers and increase the benefits of HCAI to individuals, organizations, and society.
Collapse
|
25
|
Hota A, Huang J. Embedding Meta Information into Visualizations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:3189-3203. [PMID: 31095486 DOI: 10.1109/tvcg.2019.2916098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this work, we study how to co-locate meta information with visualizations by directly embedding information into visualizations. This allows for visualizations to carry provenance and authorship information themselves for reproducibility. We call these self-describing visualizations-reproducible, authenticatable, and documentable. Self-describing visualizations can be used to extend existing visualization provenance systems. Herein, we start with a survey of existing digital image watermarking literature. We search for and classify watermarking algorithms that can support scientific visualizations. Using our payload-resilience testing framework, we evaluate and recommend algorithms supporting various use cases in the payload-resiliency space, and present guidelines for optimizing visualizations to improve payload capacities and embedding robustness. We demonstrate the efficacy of self-describing visualizations with two sample application implementations: (1) adding an embedding filter as a part the standard rendering pipeline, (2) creating a web reader to automatically and reliably extract provenance information from scientific publications for review and dissemination.
Collapse
|
26
|
Chen J, Zhang G, Chiou W, Laidlaw DH, Auchus AP. Measuring the Effects of Scalar and Spherical Colormaps on Ensembles of DMRI Tubes. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:2818-2833. [PMID: 30763242 DOI: 10.1109/tvcg.2019.2898438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We report empirical study results on the color encoding of ensemble scalar and orientation to visualize diffusion magnetic resonance imaging (DMRI) tubes. The experiment tested six scalar colormaps for average fractional anisotropy (FA) tasks (grayscale, blackbody, diverging, isoluminant-rainbow, extended-blackbody, and coolwarm) and four three-dimensional (3D) spherical colormaps for tract tracing tasks (uniform gray, absolute, eigenmaps, and Boy's surface embedding). We found that extended-blackbody, coolwarm, and blackbody remain the best three approaches for identifying ensemble average in 3D. Isoluminant-rainbow colormap led to the same ensemble mean accuracy as other colormaps. However, more than 50 percent of the answers consistently had higher estimates of the ensemble average, independent of the mean values. The number of hues, not luminance, influences ensemble estimates of mean values. For ensemble orientation-tracing tasks, we found that both Boy's surface embedding (greatest spatial resolution and contrast) and absolute colormaps (lowest spatial resolution and contrast) led to more accurate answers than the eigenmaps scheme (medium resolution and contrast), acting as the uncanny-valley phenomenon of visualization design in terms of accuracy. Absolute colormap broadly used in brain science is a good default spherical colormap. We could conclude from our study that human visual processing of a chunk of colors differs from that of single colors.
Collapse
|
27
|
Zhu X, Nacenta MA, Akgun O, Nightingale P. How People Visually Represent Discrete Constraint Problems. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:2603-2619. [PMID: 30676965 DOI: 10.1109/tvcg.2019.2895085] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Problems such as timetabling or personnel allocation can be modeled and solved using discrete constraint programming languages. However, while existing constraint solving software solves such problems quickly in many cases, these systems involve specialized languages that require significant time and effort to learn and apply. These languages are typically text-based and often difficult to interpret and understand quickly, especially for people without engineering or mathematics backgrounds. Visualization could provide an alternative way to model and understand such problems. Although many visual programming languages exist for procedural languages, visual encoding of problem specifications has not received much attention. Future problem visualization languages could represent problem elements and their constraints unambiguously, but without unnecessary cognitive burdens for those needing to translate their problem's mental representation into diagrams. As a first step towards such languages, we executed a study that catalogs how people represent constraint problems graphically. We studied three groups with different expertise: non-computer scientists, computer scientists and constraint programmers and analyzed their marks on paper (e.g., arrows), gestures (e.g., pointing) and the mappings to problem concepts (e.g., containers, sets). We provide foundations to guide future tool designs allowing people to effectively grasp, model and solve problems through visual representations.
Collapse
|
28
|
Henry WC, Peterson GL. SensorRE: Provenance support for software reverse engineers. Comput Secur 2020. [DOI: 10.1016/j.cose.2020.101865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
29
|
Walch A, Schwarzler M, Luksch C, Eisemann E, Gschwandtner T. LightGuider: Guiding Interactive Lighting Design using Suggestions, Provenance, and Quality Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:569-578. [PMID: 31443004 DOI: 10.1109/tvcg.2019.2934658] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
LightGuider is a novel guidance-based approach to interactive lighting design, which typically consists of interleaved 3D modeling operations and light transport simulations. Rather than having designers use a trial-and-error approach to match their illumination constraints and aesthetic goals, LightGuider supports the process by simulating potential next modeling steps that can deliver the most significant improvements. LightGuider takes predefined quality criteria and the current focus of the designer into account to visualize suggestions for lighting-design improvements via a specialized provenance tree. This provenance tree integrates snapshot visualizations of how well a design meets the given quality criteria weighted by the designer's preferences. This integration facilitates the analysis of quality improvements over the course of a modeling workflow as well as the comparison of alternative design solutions. We evaluate our approach with three lighting designers to illustrate its usefulness.
Collapse
|
30
|
Krueger R, Beyer J, Jang WD, Kim NW, Sokolov A, Sorger PK, Pfister H. Facetto: Combining Unsupervised and Supervised Learning for Hierarchical Phenotype Analysis in Multi-Channel Image Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:227-237. [PMID: 31514138 PMCID: PMC7045445 DOI: 10.1109/tvcg.2019.2934547] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Facetto is a scalable visual analytics application that is used to discover single-cell phenotypes in high-dimensional multi-channel microscopy images of human tumors and tissues. Such images represent the cutting edge of digital histology and promise to revolutionize how diseases such as cancer are studied, diagnosed, and treated. Highly multiplexed tissue images are complex, comprising 109 or more pixels, 60-plus channels, and millions of individual cells. This makes manual analysis challenging and error-prone. Existing automated approaches are also inadequate, in large part, because they are unable to effectively exploit the deep knowledge of human tissue biology available to anatomic pathologists. To overcome these challenges, Facetto enables a semi-automated analysis of cell types and states. It integrates unsupervised and supervised learning into the image and feature exploration process and offers tools for analytical provenance. Experts can cluster the data to discover new types of cancer and immune cells and use clustering results to train a convolutional neural network that classifies new cells accordingly. Likewise, the output of classifiers can be clustered to discover aggregate patterns and phenotype subsets. We also introduce a new hierarchical approach to keep track of analysis steps and data subsets created by users; this assists in the identification of cell types. Users can build phenotype trees and interact with the resulting hierarchical structures of both high-dimensional feature and image spaces. We report on use-cases in which domain scientists explore various large-scale fluorescence imaging datasets. We demonstrate how Facetto assists users in steering the clustering and classification process, inspecting analysis results, and gaining new scientific insights into cancer biology.
Collapse
|
31
|
Madanagopal K, Ragan ED, Benjamin P. Analytic Provenance in Practice: The Role of Provenance in Real-World Visualization and Data Analysis Environments. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2019; 39:30-45. [PMID: 31395538 DOI: 10.1109/mcg.2019.2933419] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Practical data analysis scenarios involve more than just the interpretation of data through visual and algorithmic analysis. Many real-world analysis environments involve multiple types of experts and analysts working together to solve problems and make decisions, adding organizational and social requirements to the mix. We aim to provide new knowledge about the role of provenance for practical problems in a variety of analysis scenarios central to national security. We present the findings from interviews with data analysts from domains, such as intelligence analysis, cyber-security, and geospatial intelligence. In addition to covering multiple analysis domains, our study also considers practical workplace implications related to organizational roles and the level of analyst experience. The results demonstrate how different needs for provenance depend on different roles in the analysis effort (e.g., data analyst, task managers, data analyst trainers, and quality control analysts). By considering the core challenges reported along with an analysis of existing provenance-support techniques through existing research and systems, we contribute new insights about needs and opportunities for improvements to provenance-support methods.
Collapse
|
32
|
Bors C, Wenskovitch J, Dowling M, Attfield S, Battle L, Endert A, Kulyk O, Laramee RS. A Provenance Task Abstraction Framework. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2019; 39:46-60. [PMID: 31603814 DOI: 10.1109/mcg.2019.2945720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Visual analytics tools integrate provenance recording to externalize analytic processes or user insights. Provenance can be captured on varying levels of detail, and in turn activities can be characterized from different granularities. However, current approaches do not support inferring activities that can only be characterized across multiple levels of provenance. We propose a task abstraction framework that consists of a three stage approach, composed of 1) initializing a provenance task hierarchy, 2) parsing the provenance hierarchy by using an abstraction mapping mechanism, and 3) leveraging the task hierarchy in an analytical tool. Furthermore, we identify implications to accommodate iterative refinement, context, variability, and uncertainty during all stages of the framework. We describe a use case which exemplifies our abstraction framework, demonstrating how context can influence the provenance hierarchy to support analysis. The article concludes with an agenda, raising and discussing challenges that need to be considered for successfully implementing such a framework.
Collapse
|
33
|
Bors C, Gschwandtner T, Miksch S. Capturing and Visualizing Provenance From Data Wrangling. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2019; 39:61-75. [PMID: 31581076 DOI: 10.1109/mcg.2019.2941856] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Data quality management and assessment play a vital role for ensuring the trust in the data and its fitness-of-use for subsequent analysis. The transformation history of a data wrangling system is often insufficient for determining the usability of a dataset, lacking information how changes affected the dataset. Capturing workflow provenance along the wrangling process and combining it with descriptive information as data provenance can enable users to comprehend how these changes affected the dataset, and if they benefited data quality. We present DQProv Explorer, a system that captures and visualizes provenance from data wrangling operations. It features three visualization components: allowing the user to explore the provenance graph of operations and the data stream, the development of quality over time for a sequence of wrangling operations applied to the dataset, and the distribution of issues across the entirety of the dataset to determine error patterns.
Collapse
|
34
|
Camisetty A, Chandurkar C, Sun M, Koop D. Enhancing Web-based Analytics Applications through Provenance. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:131-141. [PMID: 30346289 DOI: 10.1109/tvcg.2018.2865039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Visual analytics systems continue to integrate new technologies and leverage modern environments for exploration and collaboration, making tools and techniques available to a wide audience through web browsers. Many of these systems have been developed with rich interactions, offering users the opportunity to examine details and explore hypotheses that have not been directly encoded by a designer. Understanding is enhanced when users can replay and revisit the steps in the sensemaking process, and in collaborative settings, it is especially important to be able to review not only the current state but also what decisions were made along the way. Unfortunately, many web-based systems lack the ability to capture such reasoning, and the path to a result is transient, forgotten when a user moves to a new view. This paper explores the requirements to augment existing client-side web applications with support for capturing, reviewing, sharing, and reusing steps in the reasoning process. Furthermore, it considers situations where decisions are made with streaming data, and the insights gained from revisiting those choices when more data is available. It presents a proof of concept, the Shareable Interactive Manipulation Provenance framework (SIMProv.js), that addresses these requirements in a modern, client-side JavaScript library, and describes how it can be integrated with existing frameworks.
Collapse
|
35
|
McCurdy N, Gerdes J, Meyer M. A Framework for Externalizing Implicit Error Using Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:925-935. [PMID: 30176597 DOI: 10.1109/tvcg.2018.2864913] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper presents a framework for externalizing and analyzing expert knowledge about discrepancies in data through the use of visualization. Grounded in an 18-month design study with global health experts, the framework formalizes the notion of data discrepancies as implicit error, both in global health data and more broadly. We use the term implicit error to describe measurement error that is inherent to and pervasive throughout a dataset, but that isn't explicitly accounted for or defined. Instead, implicit error exists in the minds of experts, is mainly qualitative, and is accounted for subjectively during expert interpretation of the data. Externalizing knowledge surrounding implicit error can assist in synchronizing, validating, and enhancing interpretation, and can inform error analysis and mitigation. The framework consists of a description of implicit error components that are important for downstream analysis, along with a process model for externalizing and analyzing implicit error using visualization. As a second contribution, we provide a rich, reflective, and verifiable description of our research process as an exemplar summary toward the ongoing inquiry into ways of increasing the validity and transferability of design study research.
Collapse
|
36
|
Stitz H, Gratzl S, Piringer H, Zichner T, Streit M. KnowledgePearls: Provenance-Based Visualization Retrieval. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:120-130. [PMID: 30136970 DOI: 10.1109/tvcg.2018.2865024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Storing analytical provenance generates a knowledge base with a large potential for recalling previous results and guiding users in future analyses. However, without extensive manual creation of meta information and annotations by the users, search and retrieval of analysis states can become tedious. We present KnowledgePearls, a solution for efficient retrieval of analysis states that are structured as provenance graphs containing automatically recorded user interactions and visualizations. As a core component, we describe a visual interface for querying and exploring analysis states based on their similarity to a partial definition of a requested analysis state. Depending on the use case, this definition may be provided explicitly by the user by formulating a search query or inferred from given reference states. We explain our approach using the example of efficient retrieval of demographic analyses by Hans Rosling and discuss our implementation for a fast look-up of previous states. Our approach is independent of the underlying visualization framework. We discuss the applicability for visualizations which are based on the declarative grammar Vega and we use a Vega-based implementation of Gapminder as guiding example. We additionally present a biomedical case study to illustrate how KnowledgePearls facilitates the exploration process by recalling states from earlier analyses.
Collapse
|
37
|
Gramazio CC, Huang J, Laidlaw DH. An Analysis of Automated Visual Analysis Classification: Interactive Visualization Task Inference of Cancer Genomics Domain Experts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:2270-2283. [PMID: 28783637 DOI: 10.1109/tvcg.2017.2734659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We show how mouse interaction log classification can help visualization toolsmiths understand how their tools are used "in the wild" through an evaluation of MAGI - a cancer genomics visualization tool. Our primary contribution is an evaluation of twelve visual analysis task classifiers, which compares predictions to task inferences made by pairs of genomics and visualization experts. Our evaluation uses common classifiers that are accessible to most visualization evaluators: -nearest neighbors, linear support vector machines, and random forests. By comparing classifier predictions to visual analysis task inferences made by experts, we show that simple automated task classification can have up to 73 percent accuracy and can separate meaningful logs from "junk" logs with up to 91 percent accuracy. Our second contribution is an exploration of common MAGI interaction trends using classification predictions, which expands current knowledge about ecological cancer genomics visualization tasks. Our third contribution is a discussion of how automated task classification can inform iterative tool design. These contributions suggest that mouse interaction log analysis is a viable method for (1) evaluating task requirements of client-side-focused tools, (2) allowing researchers to study experts on larger scales than is typically possible with in-lab observation, and (3) highlighting potential tool evaluation bias.
Collapse
|
38
|
Abstract
INTRODUCTION This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. Future medicine will be predictive, preventive, personalized, participatory and digital. Data and knowledge at comprehensive depth and breadth need to be available for research and at the point of care as a basis for targeted diagnosis and therapy. Data integration and data sharing will be essential to achieve these goals. For this purpose, the consortium Data Integration for Future Medicine (DIFUTURE) will establish Data Integration Centers (DICs) at university medical centers. OBJECTIVES The infrastructure envisioned by DIFUTURE will provide researchers with cross-site access to data and support physicians by innovative views on integrated data as well as by decision support components for personalized treatments. The aim of our use cases is to show that this accelerates innovation, improves health care processes and results in tangible benefits for our patients. To realize our vision, numerous challenges have to be addressed. The objective of this article is to describe our concepts and solutions on the technical and the organizational level with a specific focus on data integration and sharing. GOVERNANCE AND POLICIES Data sharing implies significant security and privacy challenges. Therefore, state-of-the-art data protection, modern IT security concepts and patient trust play a central role in our approach. We have established governance structures and policies safeguarding data use and sharing by technical and organizational measures providing highest levels of data protection. One of our central policies is that adequate methods of data sharing for each use case and project will be selected based on rigorous risk and threat analyses. Interdisciplinary groups have been installed in order to manage change. ARCHITECTURAL FRAMEWORK AND METHODOLOGY The DIFUTURE Data Integration Centers will implement a three-step approach to integrating, harmonizing and sharing structured, unstructured and omics data as well as images from clinical and research environments. First, data is imported and technically harmonized using common data and interface standards (including various IHE profiles, DICOM and HL7 FHIR). Second, data is preprocessed, transformed, harmonized and enriched within a staging and working environment. Third, data is imported into common analytics platforms and data models (including i2b2 and tranSMART) and made accessible in a form compliant with the interoperability requirements defined on the national level. Secure data access and sharing will be implemented with innovative combinations of privacy-enhancing technologies (safe data, safe settings, safe outputs) and methods of distributed computing. USE CASES From the perspective of health care and medical research, our approach is disease-oriented and use-case driven, i.e. following the needs of physicians and researchers and aiming at measurable benefits for our patients. We will work on early diagnosis, tailored therapies and therapy decision tools with focuses on neurology, oncology and further disease entities. Our early uses cases will serve as blueprints for the following ones, verifying that the infrastructure developed by DIFUTURE is able to support a variety of application scenarios. DISCUSSION Own previous work, the use of internationally successful open source systems and a state-of-the-art software architecture are cornerstones of our approach. In the conceptual phase of the initiative, we have already prototypically implemented and tested the most important components of our architecture.
Collapse
Affiliation(s)
- Fabian Prasser
- Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany
- Correspondence to: Dr. Fabian Prasser Institute of Medical InformaticsStatistics and EpidemiologyUniversity Hospital rechts der IsarTechnical University of MunichIsmaninger Straße 2281675 MunichGermany
| | - Oliver Kohlbacher
- Department of Computer Science, Center for Bioinformatics and Quantitative Biology Center, Eberhard-Karls-Universität Tübingen, Tübingen, Germany
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Ulrich Mansmann
- Institute for Medical Information Processing, Biometry, and Epidemiology, Faculty of Medicine, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Bernhard Bauer
- Department of Computer Science, University of Augsburg, Augsburg, Germany
| | - Klaus A. Kuhn
- Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany
| |
Collapse
|
39
|
Targeted and contextual redescription set exploration. Mach Learn 2018. [DOI: 10.1007/s10994-018-5738-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
40
|
Edge D, Riche NH, Larson J, White C. Beyond Tasks: An Activity Typology for Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:267-277. [PMID: 28866577 DOI: 10.1109/tvcg.2017.2745180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
As Visual Analytics (VA) research grows and diversifies to encompass new systems, techniques, and use contexts, gaining a holistic view of analytic practices is becoming ever more challenging. However, such a view is essential for researchers and practitioners seeking to develop systems for broad audiences that span multiple domains. In this paper, we interpret VA research through the lens of Activity Theory (AT)-a framework for modelling human activities that has been influential in the field of Human-Computer Interaction. We first provide an overview of Activity Theory, showing its potential for thinking beyond tasks, representations, and interactions to the broader systems of activity in which interactive tools are embedded and used. Next, we describe how Activity Theory can be used as an organizing framework in the construction of activity typologies, building and expanding upon the tradition of abstract task taxonomies in the field of Information Visualization. We then apply the resulting process to create an activity typology for Visual Analytics, synthesizing a wide range of systems and activity concepts from the literature. Finally, we use this typology as the foundation of an activity-centered design process, highlighting both tensions and opportunities in the design space of VA systems.
Collapse
|
41
|
Zhao J, Glueck M, Isenberg P, Chevalier F, Khan A. Supporting Handoff in Asynchronous Collaborative Sensemaking Using Knowledge-Transfer Graphs. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:340-350. [PMID: 28866583 DOI: 10.1109/tvcg.2017.2745279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
During asynchronous collaborative analysis, handoff of partial findings is challenging because externalizations produced by analysts may not adequately communicate their investigative process. To address this challenge, we developed techniques to automatically capture and help encode tacit aspects of the investigative process based on an analyst's interactions, and streamline explicit authoring of handoff annotations. We designed our techniques to mediate awareness of analysis coverage, support explicit communication of progress and uncertainty with annotation, and implicit communication through playback of investigation histories. To evaluate our techniques, we developed an interactive visual analysis system, KTGraph, that supports an asynchronous investigative document analysis task. We conducted a two-phase user study to characterize a set of handoff strategies and to compare investigative performance with and without our techniques. The results suggest that our techniques promote the use of more effective handoff strategies, help increase an awareness of prior investigative process and insights, as well as improve final investigative outcomes.
Collapse
|
42
|
Wall E, Das S, Chawla R, Kalidindi B, Brown ET, Endert A. Podium: Ranking Data Using Mixed-Initiative Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:288-297. [PMID: 28866565 DOI: 10.1109/tvcg.2017.2745078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
People often rank and order data points as a vital part of making decisions. Multi-attribute ranking systems are a common tool used to make these data-driven decisions. Such systems often take the form of a table-based visualization in which users assign weights to the attributes representing the quantifiable importance of each attribute to a decision, which the system then uses to compute a ranking of the data. However, these systems assume that users are able to quantify their conceptual understanding of how important particular attributes are to a decision. This is not always easy or even possible for users to do. Rather, people often have a more holistic understanding of the data. They form opinions that data point A is better than data point B but do not necessarily know which attributes are important. To address these challenges, we present a visual analytic application to help people rank multi-variate data points. We developed a prototype system, Podium, that allows users to drag rows in the table to rank order data points based on their perception of the relative value of the data. Podium then infers a weighting model using Ranking SVM that satisfies the user's data preferences as closely as possible. Whereas past systems help users understand the relationships between data points based on changes to attribute weights, our approach helps users to understand the attributes that might inform their understanding of the data. We present two usage scenarios to describe some of the potential uses of our proposed technique: (1) understanding which attributes contribute to a user's subjective preferences for data, and (2) deconstructing attributes of importance for existing rankings. Our proposed approach makes powerful machine learning techniques more usable to those who may not have expertise in these areas.
Collapse
|
43
|
Turkay C, Slingsby A, Lahtinen K, Butt S, Dykes J. Supporting theoretically-grounded model building in the social sciences through interactive visualisation. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.11.087] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
44
|
Zhang G, Kochunov P, Hong E, Kelly S, Whelan C, Jahanshad N, Thompson P, Chen J. ENIGMA-Viewer: interactive visualization strategies for conveying effect sizes in meta-analysis. BMC Bioinformatics 2017; 18:253. [PMID: 28617224 PMCID: PMC5471941 DOI: 10.1186/s12859-017-1634-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Global scale brain research collaborations such as the ENIGMA (Enhancing Neuro Imaging Genetics through Meta Analysis) consortium are beginning to collect data in large quantity and to conduct meta-analyses using uniformed protocols. It becomes strategically important that the results can be communicated among brain scientists effectively. Traditional graphs and charts failed to convey the complex shapes of brain structures which are essential to the understanding of the result statistics from the analyses. These problems could be addressed using interactive visualization strategies that can link those statistics with brain structures in order to provide a better interface to understand brain research results. RESULTS We present ENIGMA-Viewer, an interactive web-based visualization tool for brain scientists to compare statistics such as effect sizes from meta-analysis results on standardized ROIs (regions-of-interest) across multiple studies. The tool incorporates visualization design principles such as focus+context and visual data fusion to enable users to better understand the statistics on brain structures. To demonstrate the usability of the tool, three examples using recent research data are discussed via case studies. CONCLUSIONS ENIGMA-Viewer supports presentations and communications of brain research results through effective visualization designs. By linking visualizations of both statistics and structures, users can gain more insights into the presented data that are otherwise difficult to obtain. ENIGMA-Viewer is an open-source tool, the source code and sample data are publicly accessible through the NITRC website ( http://www.nitrc.org/projects/enigmaviewer_20 ). The tool can also be directly accessed online ( http://enigma-viewer.org ).
Collapse
Affiliation(s)
- Guohao Zhang
- Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, 21250 MD USA
| | - Peter Kochunov
- Maryland Psychiatric Research Center, University of Maryland, Baltimore, 55 Wade Ave, Baltimore, 21228 MD USA
| | - Elliot Hong
- Maryland Psychiatric Research Center, University of Maryland, Baltimore, 55 Wade Ave, Baltimore, 21228 MD USA
| | - Sinead Kelly
- Department of Psychiatry, Beth Israel Deaconess Medical Center, Harvard Medical School, 330 Brookline Ave, Boston, 02215 MA USA
- Psychiatry Neuroimaging Laboratory, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis St, Boston, 02115 MA USA
| | - Christopher Whelan
- Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, 1975 Zonal Ave, Los Angeles, 90033 LA USA
- Department of Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin, 123 St Stephen’s Green, Dublin 2, Ireland
| | - Neda Jahanshad
- Keck School of Medicine, University of Southern California, 1975 Zonal Ave, Los Angeles, 90033 LA USA
| | - Paul Thompson
- Keck School of Medicine, University of Southern California, 1975 Zonal Ave, Los Angeles, 90033 LA USA
| | - Jian Chen
- Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, 21250 MD USA
| |
Collapse
|
45
|
Dabek F, Caban JJ. A Grammar-based Approach for Modeling User Interactions and Generating Suggestions During the Data Exploration Process. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:41-50. [PMID: 27514057 DOI: 10.1109/tvcg.2016.2598471] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Despite the recent popularity of visual analytics focusing on big data, little is known about how to support users that use visualization techniques to explore multi-dimensional datasets and accomplish specific tasks. Our lack of models that can assist end-users during the data exploration process has made it challenging to learn from the user's interactive and analytical process. The ability to model how a user interacts with a specific visualization technique and what difficulties they face are paramount in supporting individuals with discovering new patterns within their complex datasets. This paper introduces the notion of visualization systems understanding and modeling user interactions with the intent of guiding a user through a task thereby enhancing visual data exploration. The challenges faced and the necessary future steps to take are discussed; and to provide a working example, a grammar-based model is presented that can learn from user interactions, determine the common patterns among a number of subjects using a K-Reversible algorithm, build a set of rules, and apply those rules in the form of suggestions to new users with the goal of guiding them along their visual analytic process. A formal evaluation study with 300 subjects was performed showing that our grammar-based model is effective at capturing the interactive process followed by users and that further research in this area has the potential to positively impact how users interact with a visualization system.
Collapse
|
46
|
Gratzl S, Lex A, Gehlenborg N, Cosgrove N, Streit M. From Visual Exploration to Storytelling and Back Again. COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS 2016; 35:491-500. [PMID: 27942091 PMCID: PMC5145274 DOI: 10.1111/cgf.12925] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The primary goal of visual data exploration tools is to enable the discovery of new insights. To justify and reproduce insights, the discovery process needs to be documented and communicated. A common approach to documenting and presenting findings is to capture visualizations as images or videos. Images, however, are insufficient for telling the story of a visual discovery, as they lack full provenance information and context. Videos are difficult to produce and edit, particularly due to the non-linear nature of the exploratory process. Most importantly, however, neither approach provides the opportunity to return to any point in the exploration in order to review the state of the visualization in detail or to conduct additional analyses. In this paper we present CLUE (Capture, Label, Understand, Explain), a model that tightly integrates data exploration and presentation of discoveries. Based on provenance data captured during the exploration process, users can extract key steps, add annotations, and author "Vistories", visual stories based on the history of the exploration. These Vistories can be shared for others to view, but also to retrace and extend the original analysis. We discuss how the CLUE approach can be integrated into visualization tools and provide a prototype implementation. Finally, we demonstrate the general applicability of the model in two usage scenarios: a Gapminder-inspired visualization to explore public health data and an example from molecular biology that illustrates how Vistories could be used in scientific journals. (see Figure 1 for visual abstract).
Collapse
Affiliation(s)
- S Gratzl
- Johannes Kepler University Linz, Austria
| | - A Lex
- University of Utah, United States of America
| | - N Gehlenborg
- Harvard Medical School, United States of America
| | - N Cosgrove
- Johannes Kepler University Linz, Austria
| | - M Streit
- Johannes Kepler University Linz, Austria
| |
Collapse
|
47
|
Stitz H, Luger S, Streit M, Gehlenborg N. AVOCADO: Visualization of Workflow-Derived Data Provenance for Reproducible Biomedical Research. COMPUTER GRAPHICS FORUM : JOURNAL OF THE EUROPEAN ASSOCIATION FOR COMPUTER GRAPHICS 2016; 35:481-490. [PMID: 29973745 PMCID: PMC6027754 DOI: 10.1111/cgf.12924] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
A major challenge in data-driven biomedical research lies in the collection and representation of data provenance information to ensure that findings are reproducibile. In order to communicate and reproduce multi-step analysis workflows executed on datasets that contain data for dozens or hundreds of samples, it is crucial to be able to visualize the provenance graph at different levels of aggregation. Most existing approaches are based on node-link diagrams, which do not scale to the complexity of typical data provenance graphs. In our proposed approach, we reduce the complexity of the graph using hierarchical and motif-based aggregation. Based on user action and graph attributes, a modular degree-of-interest (DoI) function is applied to expand parts of the graph that are relevant to the user. This interest-driven adaptive approach to provenance visualization allows users to review and communicate complex multi-step analyses, which can be based on hundreds of files that are processed by numerous workflows. We have integrated our approach into an analysis platform that captures extensive data provenance information, and demonstrate its effectiveness by means of a biomedical usage scenario.
Collapse
Affiliation(s)
- H Stitz
- Johannes Kepler University Linz, Austria
| | - S Luger
- Johannes Kepler University Linz, Austria
| | - M Streit
- Johannes Kepler University Linz, Austria
| | - N Gehlenborg
- Harvard Medical School, United States of America
| |
Collapse
|
48
|
Comparative Study of Evaluating the Trustworthiness of Data Based on Data Provenance. JOURNAL OF INFORMATION PROCESSING SYSTEMS 2016. [DOI: 10.3745/jips.04.0024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|