1
|
Ye Y, Xiao S, Zeng X, Zeng W. ModalChorus: Visual Probing and Alignment of Multi-Modal Embeddings via Modal Fusion Map. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:294-304. [PMID: 39250410 DOI: 10.1109/tvcg.2024.3456387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Multi-modal embeddings form the foundation for vision-language models, such as CLIP embeddings, the most widely used text-image embeddings. However, these embeddings are vulnerable to subtle misalignment of cross-modal features, resulting in decreased model performance and diminished generalization. To address this problem, we design ModalChorus, an interactive system for visual probing and alignment of multi-modal embeddings. ModalChorus primarily offers a two-stage process: 1) embedding probing with Modal Fusion Map (MFM), a novel parametric dimensionality reduction method that integrates both metric and nonmetric objectives to enhance modality fusion; and 2) embedding alignment that allows users to interactively articulate intentions for both point-set and set-set alignments. Quantitative and qualitative comparisons for CLIP embeddings with existing dimensionality reduction (e.g., t-SNE and MDS) and data fusion (e.g., data context map) methods demonstrate the advantages of MFM in showcasing cross-modal features over common vision-language datasets. Case studies reveal that ModalChorus can facilitate intuitive discovery of misalignment and efficient re-alignment in scenarios ranging from zero-shot classification to cross-modal retrieval and generation.
Collapse
|
2
|
Li Y, Wang J, Aboagye P, Yeh CCM, Zheng Y, Wang L, Zhang W, Ma KL. Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:2875-2887. [PMID: 38625780 PMCID: PMC11412260 DOI: 10.1109/tvcg.2024.3388514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Recent advancements in pre-trained language-image models have ushered in a new era of visual comprehension. Leveraging the power of these models, this article tackles two issues within the realm of visual analytics: (1) the efficient exploration of large-scale image datasets and identification of data biases within them; (2) the evaluation of image captions and steering of their generation process. On the one hand, by visually examining the captions generated from language-image models for an image dataset, we gain deeper insights into the visual contents, unearthing data biases that may be entrenched within the dataset. On the other hand, by depicting the association between visual features and textual captions, we expose the weaknesses of pre-trained language-image models in their captioning capability and propose an interactive interface to steer caption generation. The two parts have been coalesced into a coordinated visual analytics system, fostering the mutual enrichment of visual and textual contents. We validate the effectiveness of the system with domain practitioners through concrete case studies with large-scale image datasets.
Collapse
|
3
|
Li R, Cui W, Song T, Xie X, Ding R, Wang Y, Zhang H, Zhou H, Wu Y. Causality-Based Visual Analysis of Questionnaire Responses. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:638-648. [PMID: 37903040 DOI: 10.1109/tvcg.2023.3327376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
As the final stage of questionnaire analysis, causal reasoning is the key to turning responses into valuable insights and actionable items for decision-makers. During the questionnaire analysis, classical statistical methods (e.g., Differences-in-Differences) have been widely exploited to evaluate causality between questions. However, due to the huge search space and complex causal structure in data, causal reasoning is still extremely challenging and time-consuming, and often conducted in a trial-and-error manner. On the other hand, existing visual methods of causal reasoning face the challenge of bringing scalability and expert knowledge together and can hardly be used in the questionnaire scenario. In this work, we present a systematic solution to help analysts effectively and efficiently explore questionnaire data and derive causality. Based on the association mining algorithm, we dig question combinations with potential inner causality and help analysts interactively explore the causal sub-graph of each question combination. Furthermore, leveraging the requirements collected from the experts, we built a visualization tool and conducted a comparative study with the state-of-the-art system to show the usability and efficiency of our system.
Collapse
|
4
|
Rashid U, Saddal M, Khan AR, Manzoor S, Ahmad N. I-Cubid: a nonlinear cubic graph-based approach to visualize and in-depth browse Flickr image results. PeerJ Comput Sci 2023; 9:e1476. [PMID: 37705650 PMCID: PMC10496001 DOI: 10.7717/peerj-cs.1476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 06/12/2023] [Indexed: 09/15/2023]
Abstract
The existing image search engines allow web users to explore images from the grids. The traditional interaction is linear and lookup-based. Notably, scanning web search results is horizontal-vertical and cannot support in-depth browsing. This research emphasizes the significance of a multidimensional exploration scheme over traditional grid layouts in visually exploring web image search results. This research aims to antecedent the implications of visualization and related in-depth browsing via a multidimensional cubic graph representation over a search engine result page (SERP). Furthermore, this research uncovers usability issues in the traditional grid and 3-dimensional web image search space. We provide multidimensional cubic visualization and nonlinear in-depth browsing of web image search results. The proposed approach employs textual annotations and descriptions to represent results in cubic graphs that further support in-depth browsing via a search user interface (SUI) design. It allows nonlinear navigation in web image search results and enables exploration, browsing, visualization, previewing/viewing, and accessing images in a nonlinear, interactive, and usable way. The usability tests and detailed statistical significance analysis confirm the efficacy of cubic presentation over grid layouts. The investigation reveals improvement in overall user satisfaction, screen design, information & terminology, and system capability in exploring web image search results.
Collapse
Affiliation(s)
- Umer Rashid
- Department of Computer Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Maha Saddal
- Department of Computer Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Abdur Rehman Khan
- Department of Computer Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Sadia Manzoor
- Department of Computer Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Naveed Ahmad
- College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia
| |
Collapse
|
5
|
Song Y, Tang F, Dong W, Huang F, Lee TY, Xu C. Balance-Aware Grid Collage for Small Image Collections. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1330-1344. [PMID: 34529567 DOI: 10.1109/tvcg.2021.3113031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Grid collages (GClg) of small image collections are popular and useful in many applications, such as personal album management, online photo posting, and graphic design. In this article, we focus on how visual effects influence individual preferences through various arrangements of multiple images under such scenarios. A novel balance-aware metric is proposed to bridge the gap between multi-image joint presentation and visual pleasure. The metric merges psychological achievements into the field of grid collage. To capture user preference, a bonus mechanism related to a user-specified special location in the grid and uniqueness values of the subimages is integrated into the metric. An end-to-end reinforcement learning mechanism empowers the model without tedious manual annotations. Experiments demonstrate that our metric can evaluate the GClg visual balance in line with human subjective perception, and the model can generate visually pleasant GClg results, which is comparable to manual designs.
Collapse
|
6
|
Afzal S, Ghani S, Hittawe MM, Rashid SF, Knio OM, Hadwiger M, Hoteit I. Visualization and Visual Analytics Approaches for Image and Video Datasets: A Survey. ACM T INTERACT INTEL 2023. [DOI: 10.1145/3576935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Image and video data analysis has become an increasingly important research area with applications in different domains such as security surveillance, healthcare, augmented and virtual reality, video and image editing, activity analysis and recognition, synthetic content generation, distance education, telepresence, remote sensing, sports analytics, art, non-photorealistic rendering, search engines, and social media. Recent advances in Artificial Intelligence (AI) and particularly deep learning have sparked new research challenges and led to significant advancements, especially in image and video analysis. These advancements have also resulted in significant research and development in other areas such as visualization and visual analytics, and have created new opportunities for future lines of research. In this survey paper, we present the current state of the art at the intersection of visualization and visual analytics, and image and video data analysis. We categorize the visualization papers included in our survey based on different taxonomies used in visualization and visual analytics research. We review these papers in terms of task requirements, tools, datasets, and application areas. We also discuss insights based on our survey results, trends and patterns, the current focus of visualization research, and opportunities for future research.
Collapse
Affiliation(s)
- Shehzad Afzal
- King Abdullah University of Science & Technology, Saudi Arabia
| | - Sohaib Ghani
- King Abdullah University of Science & Technology, Saudi Arabia
| | | | | | - Omar M Knio
- King Abdullah University of Science & Technology, Saudi Arabia
| | - Markus Hadwiger
- King Abdullah University of Science & Technology, Saudi Arabia
| | - Ibrahim Hoteit
- King Abdullah University of Science & Technology, Saudi Arabia
| |
Collapse
|
7
|
Bertucci D, Hamid MM, Anand Y, Ruangrotsakun A, Tabatabai D, Perez M, Kahng M. DendroMap: Visual Exploration of Large-Scale Image Datasets for Machine Learning with Treemaps. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:320-330. [PMID: 36166545 DOI: 10.1109/tvcg.2022.3209425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this paper, we present DendroMap, a novel approach to interactively exploring large-scale image datasets for machine learning (ML). ML practitioners often explore image datasets by generating a grid of images or projecting high-dimensional representations of images into 2-D using dimensionality reduction techniques (e.g., t-SNE). However, neither approach effectively scales to large datasets because images are ineffectively organized and interactions are insufficiently supported. To address these challenges, we develop DendroMap by adapting Treemaps, a well-known visualization technique. DendroMap effectively organizes images by extracting hierarchical cluster structures from high-dimensional representations of images. It enables users to make sense of the overall distributions of datasets and interactively zoom into specific areas of interests at multiple levels of abstraction. Our case studies with widely-used image datasets for deep learning demonstrate that users can discover insights about datasets and trained models by examining the diversity of images, identifying underperforming subgroups, and analyzing classification errors. We conducted a user study that evaluates the effectiveness of DendroMap in grouping and searching tasks by comparing it with a gridified version of t-SNE and found that participants preferred DendroMap. DendroMap is available at https://div-lab.github.io/dendromap/.
Collapse
|
8
|
Tang T, Wu Y, Wu Y, Yu L, Li Y. VideoModerator: A Risk-aware Framework for Multimodal Video Moderation in E-Commerce. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:846-856. [PMID: 34587029 DOI: 10.1109/tvcg.2021.3114781] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Video moderation, which refers to remove deviant or explicit content from e-commerce livestreams, has become prevalent owing to social and engaging features. However, this task is tedious and time consuming due to the difficulties associated with watching and reviewing multimodal video content, including video frames and audio clips. To ensure effective video moderation, we propose VideoModerator, a risk-aware framework that seamlessly integrates human knowledge with machine insights. This framework incorporates a set of advanced machine learning models to extract the risk-aware features from multimodal video content and discover potentially deviant videos. Moreover, this framework introduces an interactive visualization interface with three views, namely, a video view, a frame view, and an audio view. In the video view, we adopt a segmented timeline and highlight high-risk periods that may contain deviant information. In the frame view, we present a novel visual summarization method that combines risk-aware features and video context to enable quick video navigation. In the audio view, we employ a storyline-based design to provide a multi-faceted overview which can be used to explore audio content. Furthermore, we report the usage of VideoModerator through a case scenario and conduct experiments and a controlled user study to validate its effectiveness.
Collapse
|
9
|
Bhattacharya S, Shah VB, Kumar K, Biswas U. A Real-time Interactive Visualizer for Large Classroom. ACM T INTERACT INTEL 2021. [DOI: 10.1145/3418529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
In improving the teaching and learning experience in a classroom environment, it is crucial for a teacher to have a fair idea about the students who need help during a lecture. However, teachers of large classes usually face difficulties in identifying the students who are in a critical state. The current methods for classroom visualization are limited in showing both the status and location of a large number of students in a limited display area. Additionally, comprehension of the states adds cognitive load on the teacher working in a time-constrained classroom environment. In this article, we propose a two-level visualizer for large classrooms to address the challenges. In the first level, the visualizer generates a colored matrix representation of the classroom. The colored matrix is a quantitative illustration of the status of the class in terms of student clusters. We use three colors: red, yellow, and green, indicating the most critical, less critical, and the normal cluster on the screen, respectively. With tap/click on the first level, detailed information for a cluster is visualized as the second level. We conducted extensive studies for our visualizer in a simulated classroom with 12 tasks and 27 teacher participants. The results show that the visualizer is efficient and usable.
Collapse
Affiliation(s)
| | | | - Krishna Kumar
- Indian Institute of Technology Guwahati, Guwahati, India
| | - Ujjwal Biswas
- Indian Institute of Technology Guwahati, Guwahati, India
| |
Collapse
|
10
|
Zahalka J, Worring M, Van Wijk JJ. II-20: Intelligent and pragmatic analytic categorization of image collections. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:422-431. [PMID: 33074815 DOI: 10.1109/tvcg.2020.3030383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this paper, we introduce 11-20 (Image Insight 2020), a multimedia analytics approach for analytic categorization of image collections. Advanced visualizations for image collections exist, but they need tight integration with a machine model to support the task of analytic categorization. Directly employing computer vision and interactive learning techniques gravitates towards search. Analytic categorization, however, is not machine classification (the difference between the two is called the pragmatic gap): a human adds/redefines/deletes categories of relevance on the fly to build insight, whereas the machine classifier is rigid and non-adaptive. Analytic categorization that truly brings the user to insight requires a flexible machine model that allows dynamic sliding on the exploration-search axis, as well as semantic interactions: a human thinks about image data mostly in semantic terms. 11-20 brings three major contributions to multimedia analytics on image collections and towards closing the pragmatic gap. Firstly, a new machine model that closely follows the user's interactions and dynamically models her categories of relevance. II-20's machine model, in addition to matching and exceeding the state of the art's ability to produce relevant suggestions, allows the user to dynamically slide on the exploration-search axis without any additional input from her side. Secondly, the dynamic, 1-image-at-a-time Tetris metaphor that synergizes with the model. It allows a well-trained model to analyze the collection by itself with minimal interaction from the user and complements the classic grid metaphor. Thirdly, the fast-forward interaction, allowing the user to harness the model to quickly expand ("fast-forward") the categories of relevance, expands the multimedia analytics semantic interaction dictionary. Automated experiments show that II-20's machine model outperforms the existing state of the art and also demonstrate the Tetris metaphor's analytic quality. User studies further confirm that II-20 is an intuitive, efficient, and effective multimedia analytics tool.
Collapse
|
11
|
Chotisarn N, Lu J, Ma L, Xu J, Meng L, Lin B, Xu Y, Luo X, Chen W. Bubble storytelling with automated animation: a Brexit hashtag activism case study. J Vis (Tokyo) 2020; 24:101-115. [PMID: 32904885 PMCID: PMC7459253 DOI: 10.1007/s12650-020-00690-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 07/01/2020] [Accepted: 08/09/2020] [Indexed: 12/02/2022]
Abstract
Abstract Hashtag data are common and easy to acquire. Thus, they are widely used in studies and visual data storytelling. For example, a recent story by China Central Television Europe depicts Brexit as a hashtag movement displayed on an animated bubble chart. However, creating such a story is usually laborious and tedious, because narrators have to switch between different tools and discuss with different collaborators. To reduce the burden, we develop a prototype system to help explore the bubbles’ movement by automatically inserting animations connected to the storytelling of the video creators and the interaction of viewers to those videos. We demonstrate the usability of our method through both use cases and a semi-structured user study. Graphic abstract ![]()
Collapse
Affiliation(s)
| | - Junhua Lu
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China
| | - Libinzi Ma
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China
| | - Jingli Xu
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China
| | - Linhao Meng
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China
| | - Bingru Lin
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China
| | - Ying Xu
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China
| | - Xiaonan Luo
- Guilin University of Electronic Technology, Guilin, China
| | - Wei Chen
- State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China
| |
Collapse
|
12
|
Zeng W, Dong A, Chen X, Cheng ZL. VIStory: interactive storyboard for exploring visual information in scientific publications. J Vis (Tokyo) 2020; 24:69-84. [PMID: 32837222 PMCID: PMC7429144 DOI: 10.1007/s12650-020-00688-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 05/20/2020] [Accepted: 07/01/2020] [Indexed: 11/29/2022]
Abstract
Abstract Many visual analytics have been developed for examining scientific publications comprising wealthy data such as authors and citations. The studies provide unprecedented insights on a variety of applications, e.g., literature review and collaboration analysis. However, visual information (e.g., figures) that is widely employed for storytelling and methods description are often neglected. We present VIStory, an interactive storyboard for exploring visual information in scientific publications. We harvest a new dataset of a large corpora of figures, using an automatic figure extraction method. Each figure contains various attributes such as dominant color and width/height ratio, together with faceted metadata of the publication including venues, authors, and keywords. To depict these information, we develop an intuitive interface consisting of three components: (1) Faceted View enables efficient query by publication metadata, benefiting from a nested table structure, (2) Storyboard View arranges paper rings—a well-designed glyph for depicting figure attributes, in a themeriver layout to reveal temporal trends, and (3) Endgame View presents a highlighted figure together with the publication metadata. We illustrate the applicability of VIStory with case studies on two datasets, i.e., 10-year IEEE VIS publications, and publications by a research team at CVPR, ICCV, and ECCV conferences. Quantitative and qualitative results from a formal user study demonstrate the efficiency of VIStory in exploring visual information in scientific publications. Graphical abstract ![]()
Electronic supplementary material The online version of this article (10.1007/s12650-020-00688-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wei Zeng
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Ao Dong
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Xi Chen
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zhang-Lin Cheng
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| |
Collapse
|
13
|
|
14
|
Content semantic image analysis and storage method based on intelligent computing of machine learning annotation. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-04739-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|