1
|
Ye Y, Xiao S, Zeng X, Zeng W. ModalChorus: Visual Probing and Alignment of Multi-Modal Embeddings via Modal Fusion Map. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:294-304. [PMID: 39250410 DOI: 10.1109/tvcg.2024.3456387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Multi-modal embeddings form the foundation for vision-language models, such as CLIP embeddings, the most widely used text-image embeddings. However, these embeddings are vulnerable to subtle misalignment of cross-modal features, resulting in decreased model performance and diminished generalization. To address this problem, we design ModalChorus, an interactive system for visual probing and alignment of multi-modal embeddings. ModalChorus primarily offers a two-stage process: 1) embedding probing with Modal Fusion Map (MFM), a novel parametric dimensionality reduction method that integrates both metric and nonmetric objectives to enhance modality fusion; and 2) embedding alignment that allows users to interactively articulate intentions for both point-set and set-set alignments. Quantitative and qualitative comparisons for CLIP embeddings with existing dimensionality reduction (e.g., t-SNE and MDS) and data fusion (e.g., data context map) methods demonstrate the advantages of MFM in showcasing cross-modal features over common vision-language datasets. Case studies reveal that ModalChorus can facilitate intuitive discovery of misalignment and efficient re-alignment in scenarios ranging from zero-shot classification to cross-modal retrieval and generation.
Collapse
|
2
|
Prasad V, van Sloun RJG, Vilanova A, Pezzotti N. ProactiV: Studying Deep Learning Model Behavior Under Input Transformations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5651-5665. [PMID: 37535493 DOI: 10.1109/tvcg.2023.3301722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Deep learning (DL) models have shown performance benefits across many applications, from classification to image-to-image translation. However, low interpretability often leads to unexpected model behavior once deployed in the real world. Usually, this unexpected behavior is because the training data domain does not reflect the deployment data domain. Identifying a model's breaking points under input conditions and domain shifts, i.e., input transformations, is essential to improve models. Although visual analytics (VA) has shown promise in studying the behavior of model outputs under continually varying inputs, existing methods mainly focus on per-class or instance-level analysis. We aim to generalize beyond classification where classes do not exist and provide a global view of model behavior under co-occurring input transformations. We present a DL model-agnostic VA method (ProactiV) to help model developers proactively study output behavior under input transformations to identify and verify breaking points. ProactiV relies on a proposed input optimization method to determine the changes to a given transformed input to achieve the desired output. The data from this optimization process allows the study of global and local model behavior under input transformations at scale. Additionally, the optimization method provides insights into the input characteristics that result in desired outputs and helps recognize model biases. We highlight how ProactiV effectively supports studying model behavior with example classification and image-to-image translation tasks.
Collapse
|
3
|
Xie L, Ouyang Y, Chen L, Wu Z, Li Q. Towards Better Modeling With Missing Data: A Contrastive Learning-Based Visual Analytics Perspective. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5129-5146. [PMID: 37310838 DOI: 10.1109/tvcg.2023.3285210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need for different imputation methods for various missing data mechanisms, heavy dependence on the assumption of data distribution, and potential introduction of bias. This study proposes a Contrastive Learning (CL) framework to model observed data with missing values, where the ML model learns the similarity between an incomplete sample and its complete counterpart and the dissimilarity between other samples. Our proposed approach demonstrates the advantages of CL without requiring any imputation. To enhance interpretability, we introduce CIVis, a visual analytics system that incorporates interpretable techniques to visualize the learning process and diagnose the model status. Users can leverage their domain knowledge through interactive sampling to identify negative and positive pairs in CL. The output of CIVis is an optimized model that takes specified features and predicts downstream tasks. We provide two usage scenarios in regression and classification tasks and conduct quantitative experiments, expert interviews, and a qualitative user study to demonstrate the effectiveness of our approach. In short, this study offers a valuable contribution to addressing the challenges associated with ML modeling in the presence of missing data by providing a practical solution that achieves high predictive accuracy and model interpretability.
Collapse
|
4
|
Li Y, Wang J, Aboagye P, Yeh CCM, Zheng Y, Wang L, Zhang W, Ma KL. Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:2875-2887. [PMID: 38625780 PMCID: PMC11412260 DOI: 10.1109/tvcg.2024.3388514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Recent advancements in pre-trained language-image models have ushered in a new era of visual comprehension. Leveraging the power of these models, this article tackles two issues within the realm of visual analytics: (1) the efficient exploration of large-scale image datasets and identification of data biases within them; (2) the evaluation of image captions and steering of their generation process. On the one hand, by visually examining the captions generated from language-image models for an image dataset, we gain deeper insights into the visual contents, unearthing data biases that may be entrenched within the dataset. On the other hand, by depicting the association between visual features and textual captions, we expose the weaknesses of pre-trained language-image models in their captioning capability and propose an interactive interface to steer caption generation. The two parts have been coalesced into a coordinated visual analytics system, fostering the mutual enrichment of visual and textual contents. We validate the effectiveness of the system with domain practitioners through concrete case studies with large-scale image datasets.
Collapse
|
5
|
Routray SK. Visualization and Visual Analytics in Autonomous Driving. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2024; 44:43-53. [PMID: 38526907 DOI: 10.1109/mcg.2024.3381450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/27/2024]
Abstract
Autonomous driving is no longer a topic of science fiction. Advancements of autonomous driving technologies are now reliable. Effectively harnessing the information is essential for enhancing the safety, reliability, and efficiency of autonomous vehicles. In this article, we explore the pivotal role of visualization and visual analytics (VA) techniques used in autonomous driving. By employing sophisticated data visualization methods, VA, researchers, and practitioners transform intricate datasets into intuitive visual representations, providing valuable insights for decision-making processes. This article delves into various visualization approaches, including spatial-temporal mapping, interactive dashboards, and machine learning-driven analytics, tailored specifically for autonomous driving scenarios. Furthermore, it investigates the integration of real-time sensor data, sensor coordination with VA, and machine learning algorithms to create comprehensive visualizations. This research advocates for the pivotal role of visualization and VA in shaping the future of autonomous driving systems, fostering innovation, and ensuring the safe integration of self-driving vehicles.
Collapse
|
6
|
Yang B, Tang H, Wang X, Liang X, Qin H, Hu H. Data-Driven Insights Into Urban Intersections: Visual Analytics of High-Value Scene. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2024; 44:30-42. [PMID: 38648158 DOI: 10.1109/mcg.2024.3392417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
In this article, we propose TraVis, an interactive system that allows users to explore and analyze complex traffic trajectory data at urban intersections. Trajectory data contain a large amount of spatio-temporal information, and while previous studies have mainly focused on the macroscopic aspects of traffic flow, TraVis employs visualization methods to investigate and analyze microscopic traffic events (i.e., high-value scenes in trajectory data). TraVis contains a novel view design and provides multiple interaction modalities to offer users the most intuitive insights into high-value scenes. With this system, users can gain a better understanding of urban intersection traffic information, identify different types of high-value scenes, explore the reasons behind their occurrence, and gain deeper insights into urban intersection traffic. Through two case studies, we illustrate how to use the system and validate its effectiveness.
Collapse
|
7
|
Wang Q, L'Yi S, Gehlenborg N. DRAVA: Aligning Human Concepts with Machine Learning Latent Dimensions for the Visual Exploration of Small Multiples. PROCEEDINGS OF THE SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS. CHI CONFERENCE 2023; 2023:833. [PMID: 38074525 PMCID: PMC10707479 DOI: 10.1145/3544548.3581127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Latent vectors extracted by machine learning (ML) are widely used in data exploration (e.g., t-SNE) but suffer from a lack of interpretability. While previous studies employed disentangled representation learning (DRL) to enable more interpretable exploration, they often overlooked the potential mismatches between the concepts of humans and the semantic dimensions learned by DRL. To address this issue, we propose Drava, a visual analytics system that supports users in 1) relating the concepts of humans with the semantic dimensions of DRL and identifying mismatches, 2) providing feedback to minimize the mismatches, and 3) obtaining data insights from concept-driven exploration. Drava provides a set of visualizations and interactions based on visual piles to help users understand and refine concepts and conduct concept-driven exploration. Meanwhile, Drava employs a concept adaptor model to fine-tune the semantic dimensions of DRL based on user refinement. The usefulness of Drava is demonstrated through application scenarios and experimental validation.
Collapse
Affiliation(s)
| | - Sehi L'Yi
- Harvard Medical School, Boston, MA, USA
| | | |
Collapse
|
8
|
Small-Object Detection based on YOLOv5 in Autonomous Driving Systems. Pattern Recognit Lett 2023. [DOI: 10.1016/j.patrec.2023.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
|
9
|
Espadoto M, Appleby G, Suh A, Cashman D, Li M, Scheidegger C, Anderson EW, Chang R, Telea AC. UnProjection: Leveraging Inverse-Projections for Visual Analytics of High-Dimensional Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1559-1572. [PMID: 34748493 DOI: 10.1109/tvcg.2021.3125576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Projection techniques are often used to visualize high-dimensional data, allowing users to better understand the overall structure of multi-dimensional spaces on a 2D screen. Although many such methods exist, comparably little work has been done on generalizable methods of inverse-projection - the process of mapping the projected points, or more generally, the projection space back to the original high-dimensional space. In this article we present NNInv, a deep learning technique with the ability to approximate the inverse of any projection or mapping. NNInv learns to reconstruct high-dimensional data from any arbitrary point on a 2D projection space, giving users the ability to interact with the learned high-dimensional representation in a visual analytics system. We provide an analysis of the parameter space of NNInv, and offer guidance in selecting these parameters. We extend validation of the effectiveness of NNInv through a series of quantitative and qualitative analyses. We then demonstrate the method's utility by applying it to three visualization tasks: interactive instance interpolation, classifier agreement, and gradient visualization.
Collapse
|
10
|
Hoque MN, He W, Shekar AK, Gou L, Ren L. Visual Concept Programming: A Visual Analytics Approach to Injecting Human Intelligence at Scale. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:74-83. [PMID: 36166533 DOI: 10.1109/tvcg.2022.3209466] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Data-centric AI has emerged as a new research area to systematically engineer the data to land AI models for real-world applications. As a core method for data-centric AI, data programming helps experts inject domain knowledge into data and label data at scale using carefully designed labeling functions (e.g., heuristic rules, logistics). Though data programming has shown great success in the NLP domain, it is challenging to program image data because of a) the challenge to describe images using visual vocabulary without human annotations and b) lacking efficient tools for data programming of images. We present Visual Concept Programming, a first-of-its-kind visual analytics approach of using visual concepts to program image data at scale while requiring a few human efforts. Our approach is built upon three unique components. It first uses a self-supervised learning approach to learn visual representation at the pixel level and extract a dictionary of visual concepts from images without using any human annotations. The visual concepts serve as building blocks of labeling functions for experts to inject their domain knowledge. We then design interactive visualizations to explore and understand visual concepts and compose labeling functions with concepts without writing code. Finally, with the composed labeling functions, users can label the image data at scale and use the labeled data to refine the pixel-wise visual representation and concept quality. We evaluate the learned pixel-wise visual representation for the downstream task of semantic segmentation to show the effectiveness and usefulness of our approach. In addition, we demonstrate how our approach tackles real-world problems of image retrieval for autonomous driving.
Collapse
|
11
|
Wang J, Zhang W, Yang H, Yeh CCM, Wang L. Visual Analytics for RNN-Based Deep Reinforcement Learning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:4141-4155. [PMID: 33929961 DOI: 10.1109/tvcg.2021.3076749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep reinforcement learning (DRL) targets to train an autonomous agent to interact with a pre-defined environment and strives to achieve specific goals through deep neural networks (DNN). Recurrent neural network (RNN) based DRL has demonstrated superior performance, as RNNs can effectively capture the temporal evolution of the environment and respond with proper agent actions. However, apart from the outstanding performance, little is known about how RNNs understand the environment internally and what has been memorized over time. Revealing these details is extremely important for deep learning experts to understand and improve DRLs, which in contrast, is also challenging due to the complicated data transformations inside these models. In this article, we propose Deep Reinforcement Learning Interactive Visual Explorer (DRLIVE), a visual analytics system to effectively explore, interpret, and diagnose RNN-based DRLs. Having focused on DRL agents trained for different Atari games, DRLIVE accomplishes three tasks: game episode exploration, RNN hidden/cell state examination, and interactive model perturbation. Using the system, one can flexibly explore a DRL agent through interactive visualizations, discover interpretable RNN cells by prioritizing RNN hidden/cell states with a set of metrics, and further diagnose the DRL model by interactively perturbing its inputs. Through concrete studies with multiple deep learning experts, we validated the efficacy of DRLIVE.
Collapse
|
12
|
Deng Z, Weng D, Liu S, Tian Y, Xu M, Wu Y. A survey of urban visual analytics: Advances and future directions. COMPUTATIONAL VISUAL MEDIA 2022; 9:3-39. [PMID: 36277276 PMCID: PMC9579670 DOI: 10.1007/s41095-022-0275-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 02/08/2022] [Indexed: 06/16/2023]
Abstract
Developing effective visual analytics systems demands care in characterization of domain problems and integration of visualization techniques and computational models. Urban visual analytics has already achieved remarkable success in tackling urban problems and providing fundamental services for smart cities. To promote further academic research and assist the development of industrial urban analytics systems, we comprehensively review urban visual analytics studies from four perspectives. In particular, we identify 8 urban domains and 22 types of popular visualization, analyze 7 types of computational method, and categorize existing systems into 4 types based on their integration of visualization techniques and computational models. We conclude with potential research directions and opportunities.
Collapse
Affiliation(s)
- Zikun Deng
- State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058 China
| | - Di Weng
- Microsoft Research Asia, Beijing, 100080 China
| | - Shuhan Liu
- State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058 China
| | - Yuan Tian
- State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058 China
| | - Mingliang Xu
- School of Information Engineering, Zhengzhou University, Zhengzhou, China
- Henan Institute of Advanced Technology, Zhengzhou University, Zhengzhou, 450001 China
| | - Yingcai Wu
- State Key Lab of CAD & CG, Zhejiang University, Hangzhou, 310058 China
| |
Collapse
|
13
|
Chen C, Wu J, Wang X, Xiang S, Zhang SH, Tang Q, Liu S. Towards Better Caption Supervision for Object Detection. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1941-1954. [PMID: 34962870 DOI: 10.1109/tvcg.2021.3138933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
As training high-performance object detectors requires expensive bounding box annotations, recent methods resort to free-available image captions. However, detectors trained on caption supervision perform poorly because captions are usually noisy and cannot provide precise location information. To tackle this issue, we present a visual analysis method, which tightly integrates caption supervision with object detection to mutually enhance each other. In particular, object labels are first extracted from captions, which are utilized to train the detectors. Then, the objects detected from images are fed into caption supervision for further improvement. To effectively loop users into the object detection process, a node-link-based set visualization supported by a multi-type relational co-clustering algorithm is developed to explain the relationships between the extracted labels and the images with detected objects. The co-clustering algorithm clusters labels and images simultaneously by utilizing both their representations and their relationships. Quantitative evaluations and a case study are conducted to demonstrate the efficiency and effectiveness of the developed method in improving the performance of object detectors.
Collapse
|
14
|
Hou Y, Wang C, Wang J, Xue X, Zhang XL, Zhu J, Wang D, Chen S. Visual Evaluation for Autonomous Driving. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1030-1039. [PMID: 34723804 DOI: 10.1109/tvcg.2021.3114777] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Autonomous driving technologies often use state-of-the-art artificial intelligence algorithms to understand the relationship between the vehicle and the external environment, to predict the changes of the environment, and then to plan and control the behaviors of the vehicle accordingly. The complexity of such technologies makes it challenging to evaluate the performance of autonomous driving systems and to find ways to improve them. The current approaches to evaluating such autonomous driving systems largely use a single score to indicate the overall performance of a system, but domain experts have difficulties in understanding how individual components or algorithms in an autonomous driving system may contribute to the score. To address this problem, we collaborate with domain experts on autonomous driving algorithms, and propose a visual evaluation method for autonomous driving. Our method considers the data generated in all components during the whole process of autonomous driving, including perception results, planning routes, prediction of obstacles, various controlling parameters, and evaluation of comfort. We develop a visual analytics workflow to integrate an evaluation mathematical model with adjustable parameters, support the evaluation of the system from the level of the overall performance to the level of detailed measures of individual components, and to show both evaluation scores and their contributing factors. Our implemented visual analytics system provides an overview evaluation score at the beginning and shows the animation of the dynamic change of the scores at each period. Experts can interactively explore the specific component at different time periods and identify related factors. With our method, domain experts not only learn about the performance of an autonomous driving system, but also identify and access the problematic parts of each component. Our visual evaluation system can be applied to the autonomous driving simulation system and used for various evaluation cases. The results of using our system in some simulation cases and the feedback from involved domain experts confirm the usefulness and efficiency of our method in helping people gain in-depth insight into autonomous driving systems.
Collapse
|
15
|
Jamonnak S, Zhao Y, Huang X, Amiruzzaman M. Geo-Context Aware Study of Vision-Based Autonomous Driving Models and Spatial Video Data. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1019-1029. [PMID: 34596546 DOI: 10.1109/tvcg.2021.3114853] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Vision-based deep learning (DL) methods have made great progress in learning autonomous driving models from large-scale crowd-sourced video datasets. They are trained to predict instantaneous driving behaviors from video data captured by on-vehicle cameras. In this paper, we develop a geo-context aware visualization system for the study of Autonomous Driving Model (ADM) predictions together with large-scale ADM video data. The visual study is seamlessly integrated with the geographical environment by combining DL model performance with geospatial visualization techniques. Model performance measures can be studied together with a set of geospatial attributes over map views. Users can also discover and compare prediction behaviors of multiple DL models in both city-wide and street-level analysis, together with road images and video contents. Therefore, the system provides a new visual exploration platform for DL model designers in autonomous driving. Use cases and domain expert evaluation show the utility and effectiveness of the visualization system.
Collapse
|
16
|
He W, Zou L, Shekar AK, Gou L, Ren L. Where Can We Help? A Visual Analytics Approach to Diagnosing and Improving Semantic Segmentation of Movable Objects. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1040-1050. [PMID: 34587077 DOI: 10.1109/tvcg.2021.3114855] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Semantic segmentation is a critical component in autonomous driving and has to be thoroughly evaluated due to safety concerns. Deep neural network (DNN) based semantic segmentation models are widely used in autonomous driving. However, it is challenging to evaluate DNN-based models due to their black-box-like nature, and it is even more difficult to assess model performance for crucial objects, such as lost cargos and pedestrians, in autonomous driving applications. In this work, we propose VASS, a Visual Analytics approach to diagnosing and improving the accuracy and robustness of Semantic Segmentation models, especially for critical objects moving in various driving scenes. The key component of our approach is a context-aware spatial representation learning that extracts important spatial information of objects, such as position, size, and aspect ratio, with respect to given scene contexts. Based on this spatial representation, we first use it to create visual summarization to analyze models' performance. We then use it to guide the generation of adversarial examples to evaluate models' spatial robustness and obtain actionable insights. We demonstrate the effectiveness of VASS via two case studies of lost cargo detection and pedestrian detection in autonomous driving. For both cases, we show quantitative evaluation on the improvement of models' performance with actionable insights obtained from VASS.
Collapse
|