1
|
Sedlakova J, Stanikić M, Gille F, Bernard J, Horn AB, Wolf M, Haag C, Floris J, Morgenshtern G, Schneider G, Zumbrunn Wojczyńska A, Mouton Dorey C, Ettlin DA, Gero D, Friemel T, Lu Z, Papadopoulos K, Schläpfer S, Wang N, von Wyl V. Refining Established Practices for Research Question Definition to Foster Interdisciplinary Research Skills in a Digital Age: Consensus Study With Nominal Group Technique. JMIR MEDICAL EDUCATION 2025; 11:e56369. [PMID: 39847774 PMCID: PMC11803332 DOI: 10.2196/56369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 05/16/2024] [Accepted: 11/23/2024] [Indexed: 01/25/2025]
Abstract
BACKGROUND The increased use of digital data in health research demands interdisciplinary collaborations to address its methodological complexities and challenges. This often entails merging the linear deductive approach of health research with the explorative iterative approach of data science. However, there is a lack of structured teaching courses and guidance on how to effectively and constructively bridge different disciplines and research approaches. OBJECTIVE This study aimed to provide a set of tools and recommendations designed to facilitate interdisciplinary education and collaboration. Target groups are lecturers who can use these tools to design interdisciplinary courses, supervisors who guide PhD and master's students in their interdisciplinary projects, and principal investigators who design and organize workshops to initiate and guide interdisciplinary projects. METHODS Our study was conducted in 3 steps: (1) developing a common terminology, (2) identifying established workflows for research question formulation, and (3) examining adaptations of existing study workflows combining methods from health research and data science. We also formulated recommendations for a pragmatic implementation of our findings. We conducted a literature search and organized 3 interdisciplinary expert workshops with researchers at the University of Zurich. For the workshops and the subsequent manuscript writing process, we adopted a consensus study methodology. RESULTS We developed a set of tools to facilitate interdisciplinary education and collaboration. These tools focused on 2 key dimensions- content and curriculum and methods and teaching style-and can be applied in various educational and research settings. We developed a glossary to establish a shared understanding of common terminologies and concepts. We delineated the established study workflow for research question formulation, emphasizing the "what" and the "how," while summarizing the necessary tools to facilitate the process. We propose 3 clusters of contextual and methodological adaptations to this workflow to better integrate data science practices: (1) acknowledging real-life constraints and limitations in research scope; (2) allowing more iterative, data-driven approaches to research question formulation; and (3) strengthening research quality through reproducibility principles and adherence to the findable, accessible, interoperable, and reusable (FAIR) data principles. CONCLUSIONS Research question formulation remains a relevant and useful research step in projects using digital data. We recommend initiating new interdisciplinary collaborations by establishing terminologies as well as using the concepts of research tasks to foster a shared understanding. Our tools and recommendations can support academic educators in training health professionals and researchers for interdisciplinary digital health projects.
Collapse
Affiliation(s)
- Jana Sedlakova
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Implementation Science in Healthcare, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Mina Stanikić
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Implementation Science in Healthcare, Faculty of Medicine, University of Zurich, Zurich, Switzerland
- Epidemiology, Biostatistics and Prevention Institute, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Felix Gille
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Implementation Science in Healthcare, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Jürgen Bernard
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Informatics, Faculty of Business, Economics and Informatics, University of Zurich, Zurich, Switzerland
| | - Andrea B Horn
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Center for Gerontology, University of Zurich, Zurich, Switzerland
- Department of Psychology, Faculty of Arts and Social Sciences, University of Zurich, Zurich, Switzerland
| | - Markus Wolf
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Psychology, Faculty of Arts and Social Sciences, University of Zurich, Zurich, Switzerland
| | - Christina Haag
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Implementation Science in Healthcare, Faculty of Medicine, University of Zurich, Zurich, Switzerland
- Epidemiology, Biostatistics and Prevention Institute, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Joel Floris
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Evolutionary Medicine, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Gabriela Morgenshtern
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Informatics, Faculty of Business, Economics and Informatics, University of Zurich, Zurich, Switzerland
| | - Gerold Schneider
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Computational Linguistics, Faculty of Business, Economics and Informatics, University of Zurich, Zurich, Switzerland
| | - Aleksandra Zumbrunn Wojczyńska
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Center of Dental Medicine, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Corine Mouton Dorey
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Biomedical Ethics and History of Medicine, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Dominik Alois Ettlin
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Center of Dental Medicine, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Daniel Gero
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Surgery and Transplantation, University Hospital of Zurich, Zurich, Switzerland
| | - Thomas Friemel
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Department of Communication and Media Research, Faculty of Arts and Social Sciences, University of Zurich, Zurich, Switzerland
| | - Ziyuan Lu
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Evolutionary Medicine, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| | - Kimon Papadopoulos
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Implementation Science in Healthcare, Faculy of Medicine, University of Zurich, Zurich, Switzerland
| | - Sonja Schläpfer
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute for Complementary and Integrative Medicine, University Hospital of Zurich, Zurich, Switzerland
| | - Ning Wang
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
| | - Viktor von Wyl
- Digital Society Initiative, University of Zurich, Zurich, Switzerland
- Institute of Implementation Science in Healthcare, Faculty of Medicine, University of Zurich, Zurich, Switzerland
- Epidemiology, Biostatistics and Prevention Institute, Faculty of Medicine, University of Zurich, Zurich, Switzerland
| |
Collapse
|
2
|
Walch A, Szabo A, Steinlechner H, Ortner T, Groller E, Schmidt J. BEMTrace: Visualization-Driven Approach for Deriving Building Energy Models from BIM. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:240-250. [PMID: 39312422 DOI: 10.1109/tvcg.2024.3456315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Building Information Modeling (BIM) describes a central data pool covering the entire life cycle of a construction project. Similarly, Building Energy Modeling (BEM) describes the process of using a 3D representation of a building as a basis for thermal simulations to assess the building's energy performance. This paper explores the intersection of BIM and BEM, focusing on the challenges and methodologies in converting BIM data into BEM representations for energy performance analysis. BEMTrace integrates 3D data wrangling techniques with visualization methodologies to enhance the accuracy and traceability of the BIM-to-BEM conversion process. Through parsing, error detection, and algorithmic correction of BIM data, our methods generate valid BEM models suitable for energy simulation. Visualization techniques provide transparent insights into the conversion process, aiding error identification, validation, and user comprehension. We introduce context-adaptive selections to facilitate user interaction and to show that the BEMTrace workflow helps users understand complex 3D data wrangling processes.
Collapse
|
3
|
Zhang Z, Yang F, Cheng R, Ma Y. ParetoTracker: Understanding Population Dynamics in Multi-Objective Evolutionary Algorithms Through Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:820-830. [PMID: 39255166 DOI: 10.1109/tvcg.2024.3456142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Multi-objective evolutionary algorithms (MOEAs) have emerged as powerful tools for solving complex optimization problems characterized by multiple, often conflicting, objectives. While advancements have been made in computational efficiency as well as diversity and convergence of solutions, a critical challenge persists: the internal evolutionary mechanisms are opaque to human users. Drawing upon the successes of explainable AI in explaining complex algorithms and models, we argue that the need to understand the underlying evolutionary operators and population dynamics within MOEAs aligns well with a visual analytics paradigm. This paper introduces ParetoTracker, a visual analytics framework designed to support the comprehension and inspection of population dynamics in the evolutionary processes of MOEAs. Informed by preliminary literature review and expert interviews, the framework establishes a multi-level analysis scheme, which caters to user engagement and exploration ranging from examining overall trends in performance metrics to conducting fine-grained inspections of evolutionary operations. In contrast to conventional practices that require manual plotting of solutions for each generation, ParetoTracker facilitates the examination of temporal trends and dynamics across consecutive generations in an integrated visual interface. The effectiveness of the framework is demonstrated through case studies and expert interviews focused on widely adopted benchmark optimization problems.
Collapse
|
4
|
Domova V, Vrotsou K. A Model for Types and Levels of Automation in Visual Analytics: A Survey, a Taxonomy, and Examples. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:3550-3568. [PMID: 35358047 DOI: 10.1109/tvcg.2022.3163765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The continuous growth in availability and access to data presents a major challenge to the human analyst. As the manual analysis of large and complex datasets is nowadays practically impossible, the need for assisting tools that can automate the analysis process while keeping the human analyst in the loop is imperative. A large and growing body of literature recognizes the crucial role of automation in Visual Analytics and suggests that automation is among the most important constituents for effective Visual Analytics systems. Today, however, there is no appropriate taxonomy nor terminology for assessing the extent of automation in a Visual Analytics system. In this article, we aim to address this gap by introducing a model of levels of automation tailored for the Visual Analytics domain. The consistent terminology of the proposed taxonomy could provide a ground for users/readers/reviewers to describe and compare automation in Visual Analytics systems. Our taxonomy is grounded on a combination of several existing and well-established taxonomies of levels of automation in the human-machine interaction domain and relevant models within the visual analytics field. To exemplify the proposed taxonomy, we selected a set of existing systems from the event-sequence analytics domain and mapped the automation of their visual analytics process stages against the automation levels in our taxonomy.
Collapse
|
5
|
Afzal S, Ghani S, Hittawe MM, Rashid SF, Knio OM, Hadwiger M, Hoteit I. Visualization and Visual Analytics Approaches for Image and Video Datasets: A Survey. ACM T INTERACT INTEL 2023. [DOI: 10.1145/3576935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Image and video data analysis has become an increasingly important research area with applications in different domains such as security surveillance, healthcare, augmented and virtual reality, video and image editing, activity analysis and recognition, synthetic content generation, distance education, telepresence, remote sensing, sports analytics, art, non-photorealistic rendering, search engines, and social media. Recent advances in Artificial Intelligence (AI) and particularly deep learning have sparked new research challenges and led to significant advancements, especially in image and video analysis. These advancements have also resulted in significant research and development in other areas such as visualization and visual analytics, and have created new opportunities for future lines of research. In this survey paper, we present the current state of the art at the intersection of visualization and visual analytics, and image and video data analysis. We categorize the visualization papers included in our survey based on different taxonomies used in visualization and visual analytics research. We review these papers in terms of task requirements, tools, datasets, and application areas. We also discuss insights based on our survey results, trends and patterns, the current focus of visualization research, and opportunities for future research.
Collapse
Affiliation(s)
- Shehzad Afzal
- King Abdullah University of Science & Technology, Saudi Arabia
| | - Sohaib Ghani
- King Abdullah University of Science & Technology, Saudi Arabia
| | | | | | - Omar M Knio
- King Abdullah University of Science & Technology, Saudi Arabia
| | - Markus Hadwiger
- King Abdullah University of Science & Technology, Saudi Arabia
| | - Ibrahim Hoteit
- King Abdullah University of Science & Technology, Saudi Arabia
| |
Collapse
|
6
|
Jin S, Lee H, Park C, Chu H, Tae Y, Choo J, Ko S. A Visual Analytics System for Improving Attention-based Traffic Forecasting Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1102-1112. [PMID: 36155438 DOI: 10.1109/tvcg.2022.3209462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
With deep learning (DL) outperforming conventional methods for different tasks, much effort has been devoted to utilizing DL in various domains. Researchers and developers in the traffic domain have also designed and improved DL models for forecasting tasks such as estimation of traffic speed and time of arrival. However, there exist many challenges in analyzing DL models due to the black-box property of DL models and complexity of traffic data (i.e., spatio-temporal dependencies). Collaborating with domain experts, we design a visual analytics system, AttnAnalyzer, that enables users to explore how DL models make predictions by allowing effective spatio-temporal dependency analysis. The system incorporates dynamic time warping (DTW) and Granger causality tests for computational spatio-temporal dependency analysis while providing map, table, line chart, and pixel views to assist user to perform dependency and model behavior analysis. For the evaluation, we present three case studies showing how AttnAnalyzer can effectively explore model behaviors and improve model performance in two different road networks. We also provide domain expert feedback.
Collapse
|
7
|
Procopio M, Mosca A, Scheidegger C, Wu E, Chang R. Impact of Cognitive Biases on Progressive Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:3093-3112. [PMID: 33434132 DOI: 10.1109/tvcg.2021.3051013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Progressive visualization is fast becoming a technique in the visualization community to help users interact with large amounts of data. With progressive visualization, users can examine intermediate results of complex or long running computations, without waiting for the computation to complete. While this has shown to be beneficial to users, recent research has identified potential risks. For example, users may misjudge the uncertainty in the intermediate results and draw incorrect conclusions or see patterns that are not present in the final results. In this article, we conduct a comprehensive set of studies to quantify the advantages and limitations of progressive visualization. Based on a recent report by Micallef et al., we examine four types of cognitive biases that can occur with progressive visualization: uncertainty bias, illusion bias, control bias, and anchoring bias. The results of the studies suggest a cautious but promising use of progressive visualization - while there can be significant savings in task completion time, accuracy can be negatively affected in certain conditions. These findings confirm earlier reports of the benefits and drawbacks of progressive visualization and that continued research into mitigating the effects of cognitive biases is necessary.
Collapse
|
8
|
Hogräfer M, Angelini M, Santucci G, Schulz HJ. Steering-by-Example for Progressive Visual Analytics. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3531229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Progressive visual analytics allows users to interact with early, partial results of long-running computations on large datasets. In this context, computational steering is often brought up as a means to prioritize the progressive computation. This is meant to focus computational resources on data subspaces of interest, so as to ensure their computation is completed before all others. Yet, current approaches to select a region of the view space and then to prioritize its corresponding data subspace either require a 1-to-1 mapping between view and data space, or they need to establish and maintain computationally costly index structures to trace complex mappings between view and data space. We present steering-by-example, a novel interactive steering approach for progressive visual analytics, which allows prioritizing data subspaces for the progression by generating a relaxed query from a set of selected data items. Our approach works independently of the particular visualization technique and without additional index structures. First benchmark results show that steering-by-example considerably improves Precision and Recall for prioritizing unprocessed data for a selected view region, clearly outperforming random uniform sampling.
Collapse
|
9
|
van de Ruit M, Billeter M, Eisemann E. An Efficient Dual-Hierarchy t-SNE Minimization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:614-622. [PMID: 34587052 DOI: 10.1109/tvcg.2021.3114817] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
t-distributed Stochastic Neighbour Embedding (t-SNE) has become a standard for exploratory data analysis, as it is capable of revealing clusters even in complex data while requiring minimal user input. While its run-time complexity limited it to small datasets in the past, recent efforts improved upon the expensive similarity computations and the previously quadratic minimization. Nevertheless, t-SNE still has high runtime and memory costs when operating on millions of points. We present a novel method for executing the t-SNE minimization. While our method overall retains a linear runtime complexity, we obtain a significant performance increase in the most expensive part of the minimization. We achieve a significant improvement without a noticeable decrease in accuracy even when targeting a 3D embedding. Our method constructs a pair of spatial hierarchies over the embedding, which are simultaneously traversed to approximate many N-body interactions at once. We demonstrate an efficient GPGPU implementation and evaluate its performance against state-of-the-art methods on a variety of datasets.
Collapse
|
10
|
Oppermann M, Munzner T. VizSnippets: Compressing Visualization Bundles Into Representative Previews for Browsing Visualization Collections. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:747-757. [PMID: 34596545 DOI: 10.1109/tvcg.2021.3114841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Visualization collections, accessed by platforms such as Tableau Online or Power Bl, are used by millions of people to share and access diverse analytical knowledge in the form of interactive visualization bundles. Result snippets, compact previews of these bundles, are presented to users to help them identify relevant content when browsing collections. Our engagement with Tableau product teams and review of existing snippet designs on five platforms showed us that current practices fail to help people judge the relevance of bundles because they include only the title and one image. Users frequently need to undertake the time-consuming endeavour of opening a bundle within its visualization system to examine its many views and dashboards. In response, we contribute the first systematic approach to visualization snippet design. We propose a framework for snippet design that addresses eight key challenges that we identify. We present a computational pipeline to compress the visual and textual content of bundles into representative previews that is adaptive to a provided pixel budget and provides high information density with multiple images and carefully chosen keywords. We also reflect on the method of visual inspection through random sampling to gain confidence in model and parameter choices.
Collapse
|
11
|
|
12
|
Evaluating a Taxonomy of Textual Uncertainty for Collaborative Visualisation in the Digital Humanities. INFORMATION 2021. [DOI: 10.3390/info12110436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The capture, modelling and visualisation of uncertainty has become a hot topic in many areas of science, such as the digital humanities (DH). Fuelled by critical voices among the DH community, DH scholars are becoming more aware of the intrinsic advantages that incorporating the notion of uncertainty into their workflows may bring. Additionally, the increasing availability of ubiquitous, web-based technologies has given rise to many collaborative tools that aim to support DH scholars in performing remote work alongside distant peers from other parts of the world. In this context, this paper describes two user studies seeking to evaluate a taxonomy of textual uncertainty aimed at enabling remote collaborations on digital humanities (DH) research objects in a digital medium. Our study focuses on the task of free annotation of uncertainty in texts in two different scenarios, seeking to establish the requirements of the underlying data and uncertainty models that would be needed to implement a hypothetical collaborative annotation system (CAS) that uses information visualisation and visual analytics techniques to leverage the cognitive effort implied by these tasks. To identify user needs and other requirements, we held two user-driven design experiences with DH experts and lay users, focusing on the annotation of uncertainty in historical recipes and literary texts. The lessons learned from these experiments are gathered in a series of insights and observations on how these different user groups collaborated to adapt an uncertainty taxonomy to solve the proposed exercises. Furthermore, we extract a series of recommendations and future lines of work that we share with the community in an attempt to establish a common agenda of DH research that focuses on collaboration around the idea of uncertainty.
Collapse
|
13
|
Kim H, Drake B, Endert A, Park H. ArchiText: Interactive Hierarchical Topic Modeling. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3644-3655. [PMID: 32191890 DOI: 10.1109/tvcg.2020.2981456] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Human-in-the-loop topic modeling allows users to explore and steer the process to produce better quality topics that align with their needs. When integrated into visual analytic systems, many existing automated topic modeling algorithms are given interactive parameters to allow users to tune or adjust them. However, this has limitations when the algorithms cannot be easily adapted to changes, and it is difficult to realize interactivity closely supported by underlying algorithms. Instead, we emphasize the concept of tight integration, which advocates for the need to co-develop interactive algorithms and interactive visual analytic systems in parallel to allow flexibility and scalability. In this article, we describe design goals for efficiently and effectively executing the concept of tight integration among computation, visualization, and interaction for hierarchical topic modeling of text data. We propose computational base operations for interactive tasks to achieve the design goals. To instantiate our concept, we present ArchiText, a prototype system for interactive hierarchical topic modeling, which offers fast, flexible, and algorithmically valid analysis via tight integration. Utilizing interactive hierarchical topic modeling, our technique lets users generate, explore, and flexibly steer hierarchical topics to discover more informed topics and their document memberships.
Collapse
|
14
|
“That's (not) the output I expected!” On the role of end user expectations in creating explanations of AI systems. ARTIF INTELL 2021. [DOI: 10.1016/j.artint.2021.103507] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
15
|
Zhang Z, Citardi D, Wang D, Genc Y, Shan J, Fan X. Patients' perceptions of using artificial intelligence (AI)-based technology to comprehend radiology imaging data. Health Informatics J 2021; 27:14604582211011215. [PMID: 33913359 DOI: 10.1177/14604582211011215] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Results of radiology imaging studies are not typically comprehensible to patients. With the advances in artificial intelligence (AI) technology in recent years, it is expected that AI technology can aid patients' understanding of radiology imaging data. The aim of this study is to understand patients' perceptions and acceptance of using AI technology to interpret their radiology reports. We conducted semi-structured interviews with 13 participants to elicit reflections pertaining to the use of AI technology in radiology report interpretation. A thematic analysis approach was employed to analyze the interview data. Participants have a generally positive attitude toward using AI-based systems to comprehend their radiology reports. AI is perceived to be particularly useful in seeking actionable information, confirming the doctor's opinions, and preparing for the consultation. However, we also found various concerns related to the use of AI in this context, such as cyber-security, accuracy, and lack of empathy. Our results highlight the necessity of providing AI explanations to promote people's trust and acceptance of AI. Designers of patient-centered AI systems should employ user-centered design approaches to address patients' concerns. Such systems should also be designed to promote trust and deliver concerning health results in an empathetic manner to optimize the user experience.
Collapse
|
16
|
Jo J, LrYi S, Lee B, Seo J. ProReveal: Progressive Visual Analytics With Safeguards. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:3109-3122. [PMID: 31880556 DOI: 10.1109/tvcg.2019.2962404] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a new visual exploration concept-Progressive Visual Analytics with Safeguards-that helps people manage the uncertainty arising from progressive data exploration. Despite its potential benefits, intermediate knowledge from progressive analytics can be incorrect due to various machine and human factors, such as a sampling bias or misinterpretation of uncertainty. To alleviate this problem, we introduce PVA-Guards, safeguards people can leave on uncertain intermediate knowledge that needs to be verified, and derive seven PVA-Guards based on previous visualization task taxonomies. PVA-Guards provide a means of ensuring the correctness of the conclusion and understanding the reason when intermediate knowledge becomes invalid. We also present ProReveal, a proof-of-concept system designed and developed to integrate the seven safeguards into progressive data exploration. Finally, we report a user study with 14 participants, which shows people voluntarily employed PVA-Guards to safeguard their findings and ProReveal's PVA-Guard view provides an overview of uncertain intermediate knowledge. We believe our new concept can also offer better consistency in progressive data exploration, alleviating people's heterogeneous interpretation of uncertainty.
Collapse
|
17
|
Kamal A, Dhakal P, Javaid AY, Devabhaktuni VK, Kaur D, Zaientz J, Marinier R. Recent advances and challenges in uncertainty visualization: a survey. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00755-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
18
|
Palenik J, Spengler T, Hauser H. IsoTrotter: Visually Guided Empirical Modelling of Atmospheric Convection. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:775-784. [PMID: 33079665 DOI: 10.1109/tvcg.2020.3030389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Empirical models, fitted to data from observations, are often used in natural sciences to describe physical behaviour and support discoveries. However, with more complex models, the regression of parameters quickly becomes insufficient, requiring a visual parameter space analysis to understand and optimize the models. In this work, we present a design study for building a model describing atmospheric convection. We present a mixed-initiative approach to visually guided modelling, integrating an interactive visual parameter space analysis with partial automatic parameter optimization. Our approach includes a new, semi-automatic technique called IsoTrotting, where we optimize the procedure by navigating along isocontours of the model. We evaluate the model with unique observational data of atmospheric convection based on flight trajectories of paragliders.
Collapse
|
19
|
Liu J, Dwyer T, Tack G, Gratzl S, Marriott K. Supporting the Problem-Solving Loop: Designing Highly Interactive Optimisation Systems. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1764-1774. [PMID: 33112748 DOI: 10.1109/tvcg.2020.3030364] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Efficient optimisation algorithms have become important tools for finding high-quality solutions to hard, real-world problems such as production scheduling, timetabling, or vehicle routing. These algorithms are typically "black boxes" that work on mathematical models of the problem to solve. However, many problems are difficult to fully specify, and require a "human in the loop" who collaborates with the algorithm by refining the model and guiding the search to produce acceptable solutions. Recently, the Problem-Solving Loop was introduced as a high-level model of such interactive optimisation. Here, we present and evaluate nine recommendations for the design of interactive visualisation tools supporting the Problem-Solving Loop. They range from the choice of visual representation for solutions and constraints to the use of a solution gallery to support exploration of alternate solutions. We first examined the applicability of the recommendations by investigating how well they had been supported in previous interactive optimisation tools. We then evaluated the recommendations in the context of the vehicle routing problem with time windows (VRPTW). To do so we built a sophisticated interactive visual system for solving VRPTW that was informed by the recommendations. Ten participants then used this system to solve a variety of routing problems. We report on participant comments and interaction patterns with the tool. These showed the tool was regarded as highly usable and the results generally supported the usefulness of the underlying recommendations.
Collapse
|
20
|
Ma Y, Fan A, He J, Nelakurthi AR, Maciejewski R. A Visual Analytics Framework for Explaining and Diagnosing Transfer Learning Processes. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1385-1395. [PMID: 33035164 DOI: 10.1109/tvcg.2020.3028888] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Many statistical learning models hold an assumption that the training data and the future unlabeled data are drawn from the same distribution. However, this assumption is difficult to fulfill in real-world scenarios and creates barriers in reusing existing labels from similar application domains. Transfer Learning is intended to relax this assumption by modeling relationships between domains, and is often applied in deep learning applications to reduce the demand for labeled data and training time. Despite recent advances in exploring deep learning models with visual analytics tools, little work has explored the issue of explaining and diagnosing the knowledge transfer process between deep learning models. In this paper, we present a visual analytics framework for the multi-level exploration of the transfer learning processes when training deep neural networks. Our framework establishes a multi-aspect design to explain how the learned knowledge from the existing model is transferred into the new learning task when training deep neural networks. Based on a comprehensive requirement and task analysis, we employ descriptive visualization with performance measures and detailed inspections of model behaviors from the statistical, instance, feature, and model structure levels. We demonstrate our framework through two case studies on image classification by fine-tuning AlexNets to illustrate how analysts can utilize our framework.
Collapse
|
21
|
Somarakis A, Van Unen V, Koning F, Lelieveldt B, Hollt T. ImaCytE: Visual Exploration of Cellular Micro-Environments for Imaging Mass Cytometry Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:98-110. [PMID: 31369380 DOI: 10.1109/tvcg.2019.2931299] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Tissue functionality is determined by the characteristics of tissue-resident cells and their interactions within their microenvironment. Imaging Mass Cytometry offers the opportunity to distinguish cell types with high precision and link them to their spatial location in intact tissues at sub-cellular resolution. This technology produces large amounts of spatially-resolved high-dimensional data, which constitutes a serious challenge for the data analysis. We present an interactive visual analysis workflow for the end-to-end analysis of Imaging Mass Cytometry data that was developed in close collaboration with domain expert partners. We implemented the presented workflow in an interactive visual analysis tool; ImaCytE. Our workflow is designed to allow the user to discriminate cell types according to their protein expression profiles and analyze their cellular microenvironments, aiding in the formulation or verification of hypotheses on tissue architecture and function. Finally, we show the effectiveness of our workflow and ImaCytE through a case study performed by a collaborating specialist.
Collapse
|
22
|
Jo J, Seo J, Fekete JD. PANENE: A Progressive Algorithm for Indexing and Querying Approximate k-Nearest Neighbors. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1347-1360. [PMID: 30222575 DOI: 10.1109/tvcg.2018.2869149] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present PANENE, a progressive algorithm for approximate nearest neighbor indexing and querying. Although the use of k-nearest neighbor (KNN) libraries is common in many data analysis methods, most KNN algorithms can only be queried when the whole dataset has been indexed, i.e., they are not online. Even the few online implementations are not progressive in the sense that the time to index incoming data is not bounded and cannot satisfy the latency requirements of progressive systems. This long latency has significantly limited the use of many machine learning methods, such as t-SNE, in interactive visual analytics. PANENE is a novel algorithm for Progressive Approximate k-NEarest NEighbors, enabling fast KNN queries while continuously indexing new batches of data. Following the progressive computation paradigm, PANENE operations can be bounded in time, allowing analysts to access running results within an interactive latency. PANENE can also incrementally build and maintain a cache data structure, a KNN lookup table, to enable constant-time lookups for KNN queries. Finally, we present three progressive applications of PANENE, such as regression, density estimation, and responsive t-SNE, opening up new opportunities to use complex algorithms in interactive systems.
Collapse
|
23
|
Li Q, Wu Z, Yi L, Seann K, Qu H, Ma X. WeSeer: Visual Analysis for Better Information Cascade Prediction of WeChat Articles. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1399-1412. [PMID: 30176600 DOI: 10.1109/tvcg.2018.2867776] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Social media, such as Facebook and WeChat, empowers millions of users to create, consume, and disseminate online information on an unprecedented scale. The abundant information on social media intensifies the competition of WeChat Public Official Articles (i.e., posts) for gaining user attention due to the zero-sum nature of attention. Therefore, only a small portion of information tends to become extremely popular while the rest remains unnoticed or quickly disappears. Such a typical "long-tail" phenomenon is very common in social media. Thus, recent years have witnessed a growing interest in predicting the future trend in the popularity of social media posts and understanding the factors that influence the popularity of the posts. Nevertheless, existing predictive models either rely on cumbersome feature engineering or sophisticated parameter tuning, which are difficult to understand and improve. In this paper, we study and enhance a point process-based model by incorporating visual reasoning to support communication between the users and the predictive model for a better prediction result. The proposed system supports users to uncover the working mechanism behind the model and improve the prediction accuracy accordingly based on the insights gained. We use realistic WeChat articles to demonstrate the effectiveness of the system and verify the improved model on a large scale of WeChat articles. We also elicit and summarize the feedback from WeChat domain experts.
Collapse
|
24
|
Hou P, Jolliet O, Zhu J, Xu M. Estimate ecotoxicity characterization factors for chemicals in life cycle assessment using machine learning models. ENVIRONMENT INTERNATIONAL 2020; 135:105393. [PMID: 31862642 DOI: 10.1016/j.envint.2019.105393] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 12/03/2019] [Accepted: 12/03/2019] [Indexed: 06/10/2023]
Abstract
In life cycle assessment, characterization factors are used to convert the amount of the chemicals and other pollutants generated in a product's life cycle to the standard unit of an impact category, such as ecotoxicity. However, as a widely used impact assessment method, USEtox (version 2.11) only has ecotoxicity characterization factors for a small portion of chemicals due to the lack of laboratory experiment data. Here we develop machine learning models to estimate ecotoxicity hazardous concentrations 50% (HC50) in USEtox to calculate characterization factors for chemicals based on their physical-chemical properties in EPA's CompTox Chemical Dashborad and the classification of their mode of action. The model is validated by ten randomly selected test sets that are not used for training. The results show that the random forest model has the best predictive performance. The average root mean squared error of the estimated HC50 on the test sets is 0.761. The average coefficient of determination (R2) on the test set is 0.630, meaning 63% of the variability of HC50 in USEtox can be explained by the predicted HC50 from the random forest model. Our model outperforms a traditional quantitative structure-activity relationship (QSAR) model (ECOSAR) and linear regression models. We also provide estimates of missing ecotoxicity characterization factors for 552 chemicals in USEtox using the validated random forest model.
Collapse
Affiliation(s)
- Ping Hou
- School for Environment and Sustainability, University of Michigan, Ann Arbor, MI, USA; Michigan Institute for Computational Discovery & Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Olivier Jolliet
- Environmental Health Sciences, School of Public Heath, University of Michigan, Ann Arbor, MI, USA
| | - Ji Zhu
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Ming Xu
- School for Environment and Sustainability, University of Michigan, Ann Arbor, MI, USA; Department of Civil and Environmental Engineering, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
25
|
Li JK, Ma KL. P5: Portable Progressive Parallel Processing Pipelines for Interactive Data Analysis and Visualization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1151-1160. [PMID: 31442985 DOI: 10.1109/tvcg.2019.2934537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present P5, a web-based visualization toolkit that combines declarative visualization grammar and GPU computing for progressive data analysis and visualization. To interactively analyze and explore big data, progressive analytics and visualization methods have recently emerged. Progressive visualizations of incrementally refining results have the advantages of allowing users to steer the analysis process and make early decisions. P5 leverages declarative grammar for specifying visualization designs and exploits GPU computing to accelerate progressive data processing and rendering. The declarative specifications can be modified during progressive processing to create different visualizations for analyzing the intermediate results. To enable user interactions for progressive data analysis, P5 utilizes the GPU to automatically aggregate and index data based on declarative interaction specifications to facilitate effective interactive visualization. We demonstrate the effectiveness and usefulness of P5 through a variety of example applications and several performance benchmark tests.
Collapse
|
26
|
Ma Y, Xie T, Li J, Maciejewski R. Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1075-1085. [PMID: 31478859 DOI: 10.1109/tvcg.2019.2934631] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Machine learning models are currently being deployed in a variety of real-world applications where model predictions are used to make decisions about healthcare, bank loans, and numerous other critical tasks. As the deployment of artificial intelligence technologies becomes ubiquitous, it is unsurprising that adversaries have begun developing methods to manipulate machine learning models to their advantage. While the visual analytics community has developed methods for opening the black box of machine learning models, little work has focused on helping the user understand their model vulnerabilities in the context of adversarial attacks. In this paper, we present a visual analytics framework for explaining and exploring model vulnerabilities to adversarial attacks. Our framework employs a multi-faceted visualization scheme designed to support the analysis of data poisoning attacks from the perspective of models, data instances, features, and local structures. We demonstrate our framework through two case studies on binary classifiers and illustrate model vulnerabilities with respect to varying attack strategies.
Collapse
|
27
|
Fujiwara T, Chou JK, Xu P, Ren L, Ma KL. An Incremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:418-428. [PMID: 31449024 DOI: 10.1109/tvcg.2019.2934433] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Dimensionality reduction (DR) methods are commonly used for analyzing and visualizing multidimensional data. However, when data is a live streaming feed, conventional DR methods cannot be directly used because of their computational complexity and inability to preserve the projected data positions at previous time points. In addition, the problem becomes even more challenging when the dynamic data records have a varying number of dimensions as often found in real-world applications. This paper presents an incremental DR solution. We enhance an existing incremental PCA method in several ways to ensure its usability for visualizing streaming multidimensional data. First, we use geometric transformation and animation methods to help preserve a viewer's mental map when visualizing the incremental results. Second, to handle data dimension variants, we use an optimization method to estimate the projected data positions, and also convey the resulting uncertainty in the visualization. We demonstrate the effectiveness of our design with two case studies using real-world datasets.
Collapse
|
28
|
Abstract
As visualization becomes widespread in a broad range of cross-disciplinary academic domains, such as the digital humanities (DH), critical voices have been raised on the perils of neglecting the uncertain character of data in the visualization design process. Visualizations that, purposely or not, obscure or remove uncertainty in its different forms from the scholars’ vision may negatively affect the manner in which humanities scholars regard computational methods as useful tools in their daily work. In this paper, we address the issue of uncertainty representation in the context of the humanities from a theoretical perspective, in an attempt to provide the foundations of a framework that allows for the construction of ecological interface designs which are able to expose the computational power of the algorithms at play while, at the same time, respecting the particularities and needs of humanistic research. To this end, we review past uncertainty taxonomies in other domains typically related to the humanities and visualization, such as cartography and GIScience. From this review, we select an uncertainty taxonomy related to the humanities that we link to recent research in visualization for the DH. Finally, we bring a novel analytics method developed by other authors (Progressive Visual Analytics) into question, which we argue can be a good candidate to resolve the aforementioned difficulties in DH practice.
Collapse
|
29
|
Abstract
Progressive visualization offers a great deal of promise for big data visualization; however, current progressive visualization systems do not allow for continuous interaction. What if users want to see more confident results on a subset of the visualization? This can happen when users are in exploratory analysis mode but want to ask some directed questions of the data as well. In a progressive visualization system, the online aggregation algorithm determines the database sampling rate and resulting convergence rate, not the user. In this paper, we extend a recent method in online aggregation, called Wander Join, that is optimized for queries that join tables, one of the most computationally expensive operations. This extension leverages importance sampling to enable user-driven sampling when data joins are in the query. We applied user interaction techniques that allow the user to view and adjust the convergence rate, providing more transparency and control over the online aggregation process. By leveraging importance sampling, our extension of Wander Join also allows for stratified sampling of groups when there is data distribution skew. We also improve the convergence rate of filtering queries, but with additional overhead costs not needed in the original Wander Join algorithm.
Collapse
|
30
|
Visual Analysis Scenarios for Understanding Evolutionary Computational Techniques’ Behavior. INFORMATION 2019. [DOI: 10.3390/info10030088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Machine learning algorithms are used in many applications nowadays. Sometimes, we need to describe how the decision models created output, and this may not be an easy task. Information visualization (InfoVis) techniques (e.g., TreeMap, parallel coordinates, etc.) can be used for creating scenarios that visually describe the behavior of those models. Thus, InfoVis scenarios were used to analyze the evolutionary process of a tool named AutoClustering, which generates density-based clustering algorithms automatically for a given dataset using the EDA (estimation-of-distribution algorithm) evolutionary technique. Some scenarios were about fitness and population evolution (clustering algorithms) over time, algorithm parameters, the occurrence of the individual, and others. The analysis of those scenarios could lead to the development of better parameters for the AutoClustering tool and algorithms and thus have a direct impact on the processing time and quality of the generated algorithms.
Collapse
|
31
|
Martinez-Carrasco AL, Juarez JM, Campos M, Morales A, Palacios F, Lopez-Rodriguez L. Interpretable Patient Subgrouping Using Trace-Based Clustering. Artif Intell Med 2019. [DOI: 10.1007/978-3-030-21642-9_33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
32
|
Zhang J, Wang Y, Molino P, Li L, Ebert DS. Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:364-373. [PMID: 30130197 DOI: 10.1109/tvcg.2018.2864499] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Interpretation and diagnosis of machine learning models have gained renewed interest in recent years with breakthroughs in new approaches. We present Manifold, a framework that utilizes visual analysis techniques to support interpretation, debugging, and comparison of machine learning models in a more transparent and interactive manner. Conventional techniques usually focus on visualizing the internal logic of a specific model type (i.e., deep neural networks), lacking the ability to extend to a more complex scenario where different model types are integrated. To this end, Manifold is designed as a generic framework that does not rely on or access the internal logic of the model and solely observes the input (i.e., instances or features) and the output (i.e., the predicted result and probability distribution). We describe the workflow of Manifold as an iterative process consisting of three major phases that are commonly involved in the model development and diagnosis process: inspection (hypothesis), explanation (reasoning), and refinement (verification). The visual components supporting these tasks include a scatterplot-based visual summary that overviews the models' outcome and a customizable tabular view that reveals feature discrimination. We demonstrate current applications of the framework on the classification and regression tasks and discuss other potential machine learning use scenarios where Manifold can be applied.
Collapse
|
33
|
Rauber PE, Falcão AX, Telea AC. Projections as visual aids for classification system design. INFORMATION VISUALIZATION 2018; 17:282-305. [PMID: 30263012 PMCID: PMC6131729 DOI: 10.1177/1473871617713337] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Dimensionality reduction is a compelling alternative for high-dimensional data visualization. This method provides insight into high-dimensional feature spaces by mapping relationships between observations (high-dimensional vectors) to low (two or three) dimensional spaces. These low-dimensional representations support tasks such as outlier and group detection based on direct visualization. Supervised learning, a subfield of machine learning, is also concerned with observations. A key task in supervised learning consists of assigning class labels to observations based on generalization from previous experience. Effective development of such classification systems depends on many choices, including features descriptors, learning algorithms, and hyperparameters. These choices are not trivial, and there is no simple recipe to improve classification systems that perform poorly. In this context, we first propose the use of visual representations based on dimensionality reduction (projections) for predictive feedback on classification efficacy. Second, we propose a projection-based visual analytics methodology, and supportive tooling, that can be used to improve classification systems through feature selection. We evaluate our proposal through experiments involving four datasets and three representative learning algorithms.
Collapse
Affiliation(s)
- Paulo E Rauber
- Department of Mathematics and Computing
Science, University of Groningen, Groningen, The Netherlands
- University of Campinas, Campinas,
Brazil
| | | | - Alexandru C Telea
- Department of Mathematics and Computing
Science, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
34
|
El-Assady M, Sperrle F, Deussen O, Keim D, Collins C. Visual Analytics for Topic Model Optimization based on User-Steerable Speculative Execution. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:374-384. [PMID: 30235133 DOI: 10.1109/tvcg.2018.2864769] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
To effectively assess the potential consequences of human interventions in model-driven analytics systems, we establish the concept of speculative execution as a visual analytics paradigm for creating user-steerable preview mechanisms. This paper presents an explainable, mixed-initiative topic modeling framework that integrates speculative execution into the algorithmic decisionmaking process. Our approach visualizes the model-space of our novel incremental hierarchical topic modeling algorithm, unveiling its inner-workings. We support the active incorporation of the user's domain knowledge in every step through explicit model manipulation interactions. In addition, users can initialize the model with expected topic seeds, the backbone priors. For a more targeted optimization, the modeling process automatically triggers a speculative execution of various optimization strategies, and requests feedback whenever the measured model quality deteriorates. Users compare the proposed optimizations to the current model state and preview their effect on the next model iterations, before applying one of them. This supervised human-in-the-loop process targets maximum improvement for minimum feedback and has proven to be effective in three independent studies that confirm topic model quality improvements.
Collapse
|
35
|
Affiliation(s)
- Amelia McNamara
- Statistical and Data Sciences, Smith College, Northampton, MA
| |
Collapse
|
36
|
|
37
|
Vrotsou K, Nordman A. Exploratory Visual Sequence Mining Based on Pattern-Growth. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 25:2597-2610. [PMID: 29994660 DOI: 10.1109/tvcg.2018.2848247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Sequential pattern mining finds applications in numerous diverging fields. Due to the problem's combinatorial nature, two main challenges arise. First, existing algorithms output large numbers of patterns many of which are uninteresting from a user's perspective. Second, as datasets grow, mining large number of patterns gets computationally expensive. There is, thus, a need for mining approaches that make it possible to focus the pattern search towards directions of interest. This work tackles this problem by combining interactive visualization with sequential pattern mining in order to create a "transparent box" execution model. We propose a novel approach to interactive visual sequence mining that allows the user to guide the execution of a pattern-growth algorithm at suitable points through a powerful visual interface. Our approach (1) introduces the possibility of using local constraints during the mining process, (2) allows stepwise visualization of patterns being mined, and (3) enables the user to steer the mining algorithm towards directions of interest. The use of local constraints significantly improves users' capability to progressively refine the search space without the need to restart computations. We exemplify our approach using two event sequence datasets; one composed of web page visits and another composed of individuals' activity sequences.
Collapse
|
38
|
Muhlbacher T, Linhardt L, Moller T, Piringer H. TreePOD: Sensitivity-Aware Selection of Pareto-Optimal Decision Trees. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:174-183. [PMID: 28866575 DOI: 10.1109/tvcg.2017.2745158] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Balancing accuracy gains with other objectives such as interpretability is a key challenge when building decision trees. However, this process is difficult to automate because it involves know-how about the domain as well as the purpose of the model. This paper presents TreePOD, a new approach for sensitivity-aware model selection along trade-offs. TreePOD is based on exploring a large set of candidate trees generated by sampling the parameters of tree construction algorithms. Based on this set, visualizations of quantitative and qualitative tree aspects provide a comprehensive overview of possible tree characteristics. Along trade-offs between two objectives, TreePOD provides efficient selection guidance by focusing on Pareto-optimal tree candidates. TreePOD also conveys the sensitivities of tree characteristics on variations of selected parameters by extending the tree generation process with a full-factorial sampling. We demonstrate how TreePOD supports a variety of tasks involved in decision tree selection and describe its integration in a holistic workflow for building and selecting decision trees. For evaluation, we illustrate a case study for predicting critical power grid states, and we report qualitative feedback from domain experts in the energy sector. This feedback suggests that TreePOD enables users with and without statistical background a confident and efficient identification of suitable decision trees.
Collapse
|
39
|
Hollt T, Pezzotti N, van Unen V, Koning F, Lelieveldt BPF, Vilanova A. CyteGuide: Visual Guidance for Hierarchical Single-Cell Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:739-748. [PMID: 28866537 DOI: 10.1109/tvcg.2017.2744318] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Single-cell analysis through mass cytometry has become an increasingly important tool for immunologists to study the immune system in health and disease. Mass cytometry creates a high-dimensional description vector for single cells by time-of-flight measurement. Recently, t-Distributed Stochastic Neighborhood Embedding (t-SNE) has emerged as one of the state-of-the-art techniques for the visualization and exploration of single-cell data. Ever increasing amounts of data lead to the adoption of Hierarchical Stochastic Neighborhood Embedding (HSNE), enabling the hierarchical representation of the data. Here, the hierarchy is explored selectively by the analyst, who can request more and more detail in areas of interest. Such hierarchies are usually explored by visualizing disconnected plots of selections in different levels of the hierarchy. This poses problems for navigation, by imposing a high cognitive load on the analyst. In this work, we present an interactive summary-visualization to tackle this problem. CyteGuide guides the analyst through the exploration of hierarchically represented single-cell data, and provides a complete overview of the current state of the analysis. We conducted a two-phase user study with domain experts that use HSNE for data exploration. We first studied their problems with their current workflow using HSNE and the requirements to ease this workflow in a field study. These requirements have been the basis for our visual design. In the second phase, we verified our proposed solution in a user evaluation.
Collapse
|
40
|
Batch A, Elmqvist N. The Interactive Visualization Gap in Initial Exploratory Data Analysis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:278-287. [PMID: 28866512 DOI: 10.1109/tvcg.2017.2743990] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Data scientists and other analytic professionals often use interactive visualization in the dissemination phase at the end of a workflow during which findings are communicated to a wider audience. Visualization scientists, however, hold that interactive representation of data can also be used during exploratory analysis itself. Since the use of interactive visualization is optional rather than mandatory, this leaves a "visualization gap" during initial exploratory analysis that is the onus of visualization researchers to fill. In this paper, we explore areas where visualization would be beneficial in applied research by conducting a design study using a novel variation on contextual inquiry conducted with professional data analysts. Based on these interviews and experiments, we propose a set of interactive initial exploratory visualization guidelines which we believe will promote adoption by this type of user.
Collapse
|
41
|
Gleicher M. Considerations for Visualizing Comparison. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:413-423. [PMID: 28866530 DOI: 10.1109/tvcg.2017.2744199] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Supporting comparison is a common and diverse challenge in visualization. Such support is difficult to design because solutions must address both the specifics of their scenario as well as the general issues of comparison. This paper aids designers by providing a strategy for considering those general issues. It presents four considerations that abstract comparison. These considerations identify issues and categorize solutions in a domain independent manner. The first considers how the common elements of comparison-a target set of items that are related and an action the user wants to perform on that relationship-are present in an analysis problem. The second considers why these elements lead to challenges because of their scale, in number of items, complexity of items, or complexity of relationship. The third considers what strategies address the identified scaling challenges, grouping solutions into three broad categories. The fourth considers which visual designs map to these strategies to provide solutions for a comparison analysis problem. In sequence, these considerations provide a process for developers to consider support for comparison in the design of visualization tools. Case studies show how these considerations can help in the design and evaluation of visualization solutions for comparison problems.
Collapse
|
42
|
Pezzotti N, Hollt T, Van Gemert J, Lelieveldt BPF, Eisemann E, Vilanova A. DeepEyes: Progressive Visual Analytics for Designing Deep Neural Networks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2018; 24:98-108. [PMID: 28866543 DOI: 10.1109/tvcg.2017.2744358] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Deep neural networks are now rivaling human accuracy in several pattern recognition problems. Compared to traditional classifiers, where features are handcrafted, neural networks learn increasingly complex features directly from the data. Instead of handcrafting the features, it is now the network architecture that is manually engineered. The network architecture parameters such as the number of layers or the number of filters per layer and their interconnections are essential for good performance. Even though basic design guidelines exist, designing a neural network is an iterative trial-and-error process that takes days or even weeks to perform due to the large datasets used for training. In this paper, we present DeepEyes, a Progressive Visual Analytics system that supports the design of neural networks during training. We present novel visualizations, supporting the identification of layers that learned a stable set of patterns and, therefore, are of interest for a detailed analysis. The system facilitates the identification of problems, such as superfluous filters or layers, and information that is not being captured by the network. We demonstrate the effectiveness of our system through multiple use cases, showing how a trained network can be compressed, reshaped and adapted to different problems.
Collapse
|
43
|
Pattern discovery: A progressive visual analytic design to support categorical data analysis. JOURNAL OF VISUAL LANGUAGES AND COMPUTING 2017. [DOI: 10.1016/j.jvlc.2017.05.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
44
|
What you see is what you can change: Human-centered machine learning by interactive visualization. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.01.105] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
45
|
Federico P, Heimerl F, Koch S, Miksch S. A Survey on Visual Approaches for Analyzing Scientific Literature and Patents. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:2179-2198. [PMID: 27654646 DOI: 10.1109/tvcg.2016.2610422] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The increasingly large number of available writings describing technical and scientific progress, calls for advanced analytic tools for their efficient analysis. This is true for many application scenarios in science and industry and for different types of writings, comprising patents and scientific articles. Despite important differences between patents and scientific articles, both have a variety of common characteristics that lead to similar search and analysis tasks. However, the analysis and visualization of these documents is not a trivial task due to the complexity of the documents as well as the large number of possible relations between their multivariate attributes. In this survey, we review interactive analysis and visualization approaches of patents and scientific articles, ranging from exploration tools to sophisticated mining methods. In a bottom-up approach, we categorize them according to two aspects: (a) data type (text, citations, authors, metadata, and combinations thereof), and (b) task (finding and comparing single entities, seeking elementary relations, finding complex patterns, and in particular temporal patterns, and investigating connections between multiple behaviours). Finally, we identify challenges and research directions in this area that ask for future investigations.
Collapse
|
46
|
von Landesberger T, Fellner DW, Ruddle RA. Visualization System Requirements for Data Processing Pipeline Design and Optimization. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:2028-2041. [PMID: 28113376 DOI: 10.1109/tvcg.2016.2603178] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The rising quantity and complexity of data creates a need to design and optimize data processing pipelines-the set of data processing steps, parameters and algorithms that perform operations on the data. Visualization can support this process but, although there are many examples of systems for visual parameter analysis, there remains a need to systematically assess users' requirements and match those requirements to exemplar visualization methods. This article presents a new characterization of the requirements for pipeline design and optimization. This characterization is based on both a review of the literature and first-hand assessment of eight application case studies. We also match these requirements with exemplar functionality provided by existing visualization tools. Thus, we provide end-users and visualization developers with a way of identifying functionality that addresses data processing problems in an application. We also identify seven future challenges for visualization research that are not met by the capabilities of today's systems.
Collapse
|
47
|
Pezzotti N, Lelieveldt BPF, Van Der Maaten L, Hollt T, Eisemann E, Vilanova A. Approximated and User Steerable tSNE for Progressive Visual Analytics. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:1739-1752. [PMID: 28113434 DOI: 10.1109/tvcg.2016.2570755] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Progressive Visual Analytics aims at improving the interactivity in existing analytics techniques by means of visualization as well as interaction with intermediate results. One key method for data analysis is dimensionality reduction, for example, to produce 2D embeddings that can be visualized and analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a well-suited technique for the visualization of high-dimensional data. tSNE can create meaningful intermediate results but suffers from a slow initialization that constrains its application in Progressive Visual Analytics. We introduce a controllable tSNE approximation (A-tSNE), which trades off speed and accuracy, to enable interactive data exploration. We offer real-time visualization techniques, including a density-based solution and a Magic Lens to inspect the degree of approximation. With this feedback, the user can decide on local refinements and steer the approximation level during the analysis. We demonstrate our technique with several datasets, in a real-world research scenario and for the real-time analysis of high-dimensional streams to illustrate its effectiveness for interactive data analysis.
Collapse
|
48
|
|
49
|
Liu S, Maljovec D, Wang B, Bremer PT, Pascucci V. Visualizing High-Dimensional Data: Advances in the Past Decade. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:1249-1268. [PMID: 28113321 DOI: 10.1109/tvcg.2016.2640960] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Massive simulations and arrays of sensing devices, in combination with increasing computing resources, have generated large, complex, high-dimensional datasets used to study phenomena across numerous fields of study. Visualization plays an important role in exploring such datasets. We provide a comprehensive survey of advances in high-dimensional data visualization that focuses on the past decade. We aim at providing guidance for data practitioners to navigate through a modular view of the recent advances, inspiring the creation of new visualizations along the enriched visualization pipeline, and identifying future opportunities for visualization research.
Collapse
|
50
|
Dinkla K, Strobelt H, Genest B, Reiling S, Borowsky M, Pfister H. Screenit: Visual Analysis of Cellular Screens. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2017; 23:591-600. [PMID: 27875174 DOI: 10.1109/tvcg.2016.2598587] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
High-throughput and high-content screening enables large scale, cost-effective experiments in which cell cultures are exposed to a wide spectrum of drugs. The resulting multivariate data sets have a large but shallow hierarchical structure. The deepest level of this structure describes cells in terms of numeric features that are derived from image data. The subsequent level describes enveloping cell cultures in terms of imposed experiment conditions (exposure to drugs). We present Screenit, a visual analysis approach designed in close collaboration with screening experts. Screenit enables the navigation and analysis of multivariate data at multiple hierarchy levels and at multiple levels of detail. Screenit integrates the interactive modeling of cell physical states (phenotypes) and the effects of drugs on cell cultures (hits). In addition, quality control is enabled via the detection of anomalies that indicate low-quality data, while providing an interface that is designed to match workflows of screening experts. We demonstrate analyses for a real-world data set, CellMorph, with 6 million cells across 20,000 cell cultures.
Collapse
|