1
|
Tian Y, Lai S, Cheng Z, Yu T. AI Painting Effect Evaluation of Artistic Improvement with Cross-Entropy and Attention. ENTROPY (BASEL, SWITZERLAND) 2025; 27:348. [PMID: 40282583 PMCID: PMC12026324 DOI: 10.3390/e27040348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2025] [Revised: 03/19/2025] [Accepted: 03/24/2025] [Indexed: 04/29/2025]
Abstract
With the rapid development of AI technology, AI painting tools are increasingly used in art creation. However, the effects of works created by different users using AI painting tools vary. Finding out the factors that affect the level of art creation after users use AI painting tools is a matter of concern. To solve this problem, this paper proposes a new Multi-Classification Attention Support Vector Machine (MCASVM) with cross-entropy loss function. By identifying and predicting the level of creativity of ordinary users after using AI painting tools, this model compares and analyzes the influencing factors behind the high and low effects of artistic creativity enhancement after using AI painting tools. The main contribution of this paper is to establish the Art Creation Ability Assessment Dataset (ACAAD) through real data collection to provide data support for subsequent assessments. Meanwhile, MCASVM directly handles the multi-classification problem in the established dataset by introducing multiple SVMs. Among other things, the probabilistic calibration network adjusts the model output so that its predicted probabilities are closer to the probability that the sample truly belongs to the classification. DBAM enhances the feature fusion capability of the model by explicitly focusing on the important channel and spatial features, and it enables the model to more accurately recognize and differentiate between changes in the creative abilities of different users before and after using AI painting tools. The experimental results show that the artistic creativity of ordinary users can be enhanced by AI painting tools, where the most central influencing factors are interest level and social support.
Collapse
Affiliation(s)
- Yihuan Tian
- Culture Design Lab, Graduate School of Techno Design, Kookmin University, Seoul 02707, Republic of Korea;
| | - Shiwen Lai
- Department of Global Convergence, Kangwon National University, Chuncheon-si 24341, Republic of Korea; (S.L.); (Z.C.)
| | - Zuling Cheng
- Department of Global Convergence, Kangwon National University, Chuncheon-si 24341, Republic of Korea; (S.L.); (Z.C.)
| | - Tao Yu
- Department of Smart Experience Design, Kookmin University, Seoul 02707, Republic of Korea
| |
Collapse
|
2
|
Qiu X, Shao S, Wang H, Tan X. Bio-K-Transformer: A pre-trained transformer-based sequence-to-sequence model for adverse drug reactions prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 260:108524. [PMID: 39667145 DOI: 10.1016/j.cmpb.2024.108524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 10/20/2024] [Accepted: 11/19/2024] [Indexed: 12/14/2024]
Abstract
BACKGROUND AND OBJECTIVE Adverse drug reactions (ADRs) pose a serious threat to patient health, potentially resulting in severe consequences, including mortality. Accurate prediction of ADRs before drug market release is crucial for early prevention. Traditional ADR detection, relying on clinical trials and voluntary reporting, has inherent limitations. Clinical trials face challenges in capturing rare and long-term reactions due to scale and time constraints, while voluntary reporting tends to neglect mild and common reactions. Consequently, drugs on the market may carry unknown risks, leading to an increasing demand for more accurate predictions of ADRs before their commercial release. This study aims to develop a more accurate prediction model for ADRs prior to drug market release. METHODS We frame the ADR prediction task as a sequence-to-sequence problem and propose the Bio-K-Transformer, which integrates the transformer model with pre-trained models (i.e., Bio_ClinicalBERT and K-bert), to forecast potential ADRs. We enhance the attention mechanism of the Transformer encoder structure and adjust embedding layers to model diverse relationships between drug adverse reactions. Additionally, we employ a masking technique to handle target data. Experimental findings demonstrate a notable improvement in predicting potential adverse reactions, achieving a predictive accuracy of 90.08%. It significantly exceeds current state-of-the-art baseline models and even the fine-tuned Llama-3.1-8B and Llama3-Aloe-8B-Alpha model, while being cost-effective. The results highlight the model's efficacy in identifying potential adverse reactions with high precision, sensitivity, and specificity. CONCLUSION The Bio-K-Transformer significantly enhances the prediction of ADRs, offering a cost-effective method with strong potential for improving pre-market safety evaluations of pharmaceuticals.
Collapse
Affiliation(s)
- Xihe Qiu
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Siyue Shao
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Haoyu Wang
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Xiaoyu Tan
- INF Technology (Shanghai) Co., Ltd., Shanghai, China.
| |
Collapse
|
3
|
Bendeck A, Stasko J. An Empirical Evaluation of the GPT-4 Multimodal Language Model on Visualization Literacy Tasks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:1105-1115. [PMID: 39255141 DOI: 10.1109/tvcg.2024.3456155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
Large Language Models (LLMs) like GPT-4 which support multimodal input (i.e., prompts containing images in addition to text) have immense potential to advance visualization research. However, many questions exist about the visual capabilities of such models, including how well they can read and interpret visually represented data. In our work, we address this question by evaluating the GPT-4 multimodal LLM using a suite of task sets meant to assess the model's visualization literacy. The task sets are based on existing work in the visualization community addressing both automated chart question answering and human visualization literacy across multiple settings. Our assessment finds that GPT-4 can perform tasks such as recognizing trends and extreme values, and also demonstrates some understanding of visualization design best-practices. By contrast, GPT-4 struggles with simple value retrieval when not provided with the original dataset, lacks the ability to reliably distinguish between colors in charts, and occasionally suffers from hallucination and inconsistency. We conclude by reflecting on the model's strengths and weaknesses as well as the potential utility of models like GPT-4 for future visualization research. We also release all code, stimuli, and results for the task sets at the following link: https://doi.org/10.17605/OSF.IO/F39J6.
Collapse
|
4
|
Li R, Hong W, Wu R, Wang Y, Wu X, Shi Z, Xu Y, Han Z, Lv C. Enhancing Wheat Spike Counting and Disease Detection Using a Probability Density Attention Mechanism in Deep Learning Models for Precision Agriculture. PLANTS (BASEL, SWITZERLAND) 2024; 13:3462. [PMID: 39771160 PMCID: PMC11676397 DOI: 10.3390/plants13243462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 11/17/2024] [Accepted: 11/25/2024] [Indexed: 01/11/2025]
Abstract
This study aims to improve the precision of wheat spike counting and disease detection, exploring the application of deep learning in the agricultural sector. Addressing the shortcomings of traditional detection methods, we propose an advanced feature extraction strategy and a model based on the probability density attention mechanism, designed to more effectively handle feature extraction in complex backgrounds and dense areas. Through comparative experiments with various advanced models, we comprehensively evaluate the performance of our model. In the disease detection task, our model performs excellently, achieving a precision of 0.93, a recall of 0.89, an accuracy of 0.91, and an mAP of 0.90. By introducing the density loss function, we are able to effectively improve the detection accuracy when dealing with high-density regions. In the wheat spike counting task, the model similarly demonstrates a strong performance, with a precision of 0.91, a recall of 0.88, an accuracy of 0.90, and an mAP of 0.90, further validating its effectiveness. Furthermore, this paper also conducts ablation experiments on different loss functions. The results of this research provide a new method for wheat spike counting and disease detection, fully reflecting the application value of deep learning in precision agriculture. By combining the probability density attention mechanism and the density loss function, the proposed model significantly improves the detection accuracy and efficiency, offering important references for future related research.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Chunli Lv
- China Agricultural University, Beijing 100083, China
| |
Collapse
|
5
|
Coscia A, Endert A. KnowledgeVIS: Interpreting Language Models by Comparing Fill-in-the-Blank Prompts. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:6520-6532. [PMID: 38145514 DOI: 10.1109/tvcg.2023.3346713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2023]
Abstract
Recent growth in the popularity of large language models has led to their increased usage for summarizing, predicting, and generating text, making it vital to help researchers and engineers understand how and why they work. We present KnowledgeVIS, a human-in-the-loop visual analytics system for interpreting language models using fill-in-the-blank sentences as prompts. By comparing predictions between sentences, KnowledgeVIS reveals learned associations that intuitively connect what language models learn during training to natural language tasks downstream, helping users create and test multiple prompt variations, analyze predicted words using a novel semantic clustering technique, and discover insights using interactive visualizations. Collectively, these visualizations help users identify the likelihood and uniqueness of individual predictions, compare sets of predictions between prompts, and summarize patterns and relationships between predictions across all prompts. We demonstrate the capabilities of KnowledgeVIS with feedback from six NLP experts as well as three different use cases: (1) probing biomedical knowledge in two domain-adapted models; and (2) evaluating harmful identity stereotypes and (3) discovering facts and relationships between three general-purpose models.
Collapse
|
6
|
Xie L, Ouyang Y, Chen L, Wu Z, Li Q. Towards Better Modeling With Missing Data: A Contrastive Learning-Based Visual Analytics Perspective. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:5129-5146. [PMID: 37310838 DOI: 10.1109/tvcg.2023.3285210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need for different imputation methods for various missing data mechanisms, heavy dependence on the assumption of data distribution, and potential introduction of bias. This study proposes a Contrastive Learning (CL) framework to model observed data with missing values, where the ML model learns the similarity between an incomplete sample and its complete counterpart and the dissimilarity between other samples. Our proposed approach demonstrates the advantages of CL without requiring any imputation. To enhance interpretability, we introduce CIVis, a visual analytics system that incorporates interpretable techniques to visualize the learning process and diagnose the model status. Users can leverage their domain knowledge through interactive sampling to identify negative and positive pairs in CL. The output of CIVis is an optimized model that takes specified features and predicts downstream tasks. We provide two usage scenarios in regression and classification tasks and conduct quantitative experiments, expert interviews, and a qualitative user study to demonstrate the effectiveness of our approach. In short, this study offers a valuable contribution to addressing the challenges associated with ML modeling in the presence of missing data by providing a practical solution that achieves high predictive accuracy and model interpretability.
Collapse
|
7
|
Liu S, Chen R, Ye M, Luo J, Yang D, Dai M. EcoDetect-YOLO: A Lightweight, High-Generalization Methodology for Real-Time Detection of Domestic Waste Exposure in Intricate Environmental Landscapes. SENSORS (BASEL, SWITZERLAND) 2024; 24:4666. [PMID: 39066064 PMCID: PMC11280945 DOI: 10.3390/s24144666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 07/05/2024] [Accepted: 07/16/2024] [Indexed: 07/28/2024]
Abstract
In response to the challenges of accurate identification and localization of garbage in intricate urban street environments, this paper proposes EcoDetect-YOLO, a garbage exposure detection algorithm based on the YOLOv5s framework, utilizing an intricate environment waste exposure detection dataset constructed in this study. Initially, a convolutional block attention module (CBAM) is integrated between the second level of the feature pyramid etwork (P2) and the third level of the feature pyramid network (P3) layers to optimize the extraction of relevant garbage features while mitigating background noise. Subsequently, a P2 small-target detection head enhances the model's efficacy in identifying small garbage targets. Lastly, a bidirectional feature pyramid network (BiFPN) is introduced to strengthen the model's capability for deep feature fusion. Experimental results demonstrate EcoDetect-YOLO's adaptability to urban environments and its superior small-target detection capabilities, effectively recognizing nine types of garbage, such as paper and plastic trash. Compared to the baseline YOLOv5s model, EcoDetect-YOLO achieved a 4.7% increase in mAP0.5, reaching 58.1%, with a compact model size of 15.7 MB and an FPS of 39.36. Notably, even in the presence of strong noise, the model maintained a mAP0.5 exceeding 50%, underscoring its robustness. In summary, EcoDetect-YOLO, as proposed in this paper, boasts high precision, efficiency, and compactness, rendering it suitable for deployment on mobile devices for real-time detection and management of urban garbage exposure, thereby advancing urban automation governance and digital economic development.
Collapse
Affiliation(s)
- Shenlin Liu
- School of Mathematics and Computer, Guangdong Ocean University, Zhanjiang 524008, China; (S.L.); (R.C.); (J.L.)
| | - Ruihan Chen
- School of Mathematics and Computer, Guangdong Ocean University, Zhanjiang 524008, China; (S.L.); (R.C.); (J.L.)
- Artificial Intelligence Research Institute, International (Macau) Institute of Academic Research, Macau 999078, China
| | - Minhua Ye
- College of Ocean Engineering and Energy, Guangdong Ocean University, Zhanjiang 524088, China
| | - Jiawei Luo
- School of Mathematics and Computer, Guangdong Ocean University, Zhanjiang 524008, China; (S.L.); (R.C.); (J.L.)
| | - Derong Yang
- School of Mathematics and Computer, Guangdong Ocean University, Zhanjiang 524008, China; (S.L.); (R.C.); (J.L.)
| | - Ming Dai
- School of Mathematics and Computer, Guangdong Ocean University, Zhanjiang 524008, China; (S.L.); (R.C.); (J.L.)
| |
Collapse
|
8
|
Kumar S, Sumers TR, Yamakoshi T, Goldstein A, Hasson U, Norman KA, Griffiths TL, Hawkins RD, Nastase SA. Shared functional specialization in transformer-based language models and the human brain. Nat Commun 2024; 15:5523. [PMID: 38951520 PMCID: PMC11217339 DOI: 10.1038/s41467-024-49173-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 05/24/2024] [Indexed: 07/03/2024] Open
Abstract
When processing language, the brain is thought to deploy specialized computations to construct meaning from complex linguistic structures. Recently, artificial neural networks based on the Transformer architecture have revolutionized the field of natural language processing. Transformers integrate contextual information across words via structured circuit computations. Prior work has focused on the internal representations ("embeddings") generated by these circuits. In this paper, we instead analyze the circuit computations directly: we deconstruct these computations into the functionally-specialized "transformations" that integrate contextual information across words. Using functional MRI data acquired while participants listened to naturalistic stories, we first verify that the transformations account for considerable variance in brain activity across the cortical language network. We then demonstrate that the emergent computations performed by individual, functionally-specialized "attention heads" differentially predict brain activity in specific cortical regions. These heads fall along gradients corresponding to different layers and context lengths in a low-dimensional cortical space.
Collapse
Affiliation(s)
- Sreejan Kumar
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA.
| | - Theodore R Sumers
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA.
| | - Takateru Yamakoshi
- Faculty of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Ariel Goldstein
- Department of Cognitive and Brain Sciences and Business School, Hebrew University, Jerusalem, 9190401, Israel
| | - Uri Hasson
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Kenneth A Norman
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Thomas L Griffiths
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Robert D Hawkins
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA
- Department of Psychology, Princeton University, Princeton, NJ, 08540, USA
| | - Samuel A Nastase
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ, 08540, USA.
| |
Collapse
|
9
|
Li Y, Wang J, Aboagye P, Yeh CCM, Zheng Y, Wang L, Zhang W, Ma KL. Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:2875-2887. [PMID: 38625780 PMCID: PMC11412260 DOI: 10.1109/tvcg.2024.3388514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Recent advancements in pre-trained language-image models have ushered in a new era of visual comprehension. Leveraging the power of these models, this article tackles two issues within the realm of visual analytics: (1) the efficient exploration of large-scale image datasets and identification of data biases within them; (2) the evaluation of image captions and steering of their generation process. On the one hand, by visually examining the captions generated from language-image models for an image dataset, we gain deeper insights into the visual contents, unearthing data biases that may be entrenched within the dataset. On the other hand, by depicting the association between visual features and textual captions, we expose the weaknesses of pre-trained language-image models in their captioning capability and propose an interactive interface to steer caption generation. The two parts have been coalesced into a coordinated visual analytics system, fostering the mutual enrichment of visual and textual contents. We validate the effectiveness of the system with domain practitioners through concrete case studies with large-scale image datasets.
Collapse
|
10
|
Delaforge A, Aze J, Bringay S, Mollevi C, Sallaberry A, Servajean M. EBBE-Text: Explaining Neural Networks by Exploring Text Classification Decision Boundaries. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4154-4171. [PMID: 35724275 DOI: 10.1109/tvcg.2022.3184247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
While neural networks (NN) have been successfully applied to many NLP tasks, the way they function is often difficult to interpret. In this article, we focus on binary text classification via NNs and propose a new tool, which includes a visualization of the decision boundary and the distances of data elements to this boundary. This tool increases the interpretability of NN. Our approach uses two innovative views: (1) an overview of the text representation space and (2) a local view allowing data exploration around the decision boundary for various localities of this representation space. These views are integrated into a visual platform, EBBE-Text, which also contains state-of-the-art visualizations of NN representation spaces and several kinds of information obtained from the classification process. The various views are linked through numerous interactive functionalities that enable easy exploration of texts and classification results via the various complementary views. A user study shows the effectiveness of the visual encoding and a case study illustrates the benefits of using our tool for the analysis of the classifications obtained with several recent NNs and two datasets.
Collapse
|
11
|
Singh S, Kumar M, Kumar A, Verma BK, Shitharth S. Pneumonia detection with QCSA network on chest X-ray. Sci Rep 2023; 13:9025. [PMID: 37270553 DOI: 10.1038/s41598-023-35922-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 05/25/2023] [Indexed: 06/05/2023] Open
Abstract
Worldwide, pneumonia is the leading cause of infant mortality. Experienced radiologists use chest X-rays to diagnose pneumonia and other respiratory diseases. The diagnostic procedure's complexity causes radiologists to disagree with the decision. Early diagnosis is the only feasible strategy for mitigating the disease's impact on the patent. Computer-aided diagnostics improve the accuracy of diagnosis. Recent studies established that Quaternion neural networks classify and predict better than real-valued neural networks, especially when dealing with multi-dimensional or multi-channel input. The attention mechanism has been derived from the human brain's visual and cognitive ability in which it focuses on some portion of the image and ignores the rest portion of the image. The attention mechanism maximizes the usage of the image's relevant aspects, hence boosting classification accuracy. In the current work, we propose a QCSA network (Quaternion Channel-Spatial Attention Network) by combining the spatial and channel attention mechanism with Quaternion residual network to classify chest X-Ray images for Pneumonia detection. We used a Kaggle X-ray dataset. The suggested architecture achieved 94.53% accuracy and 0.89 AUC. We have also shown that performance improves by integrating the attention mechanism in QCNN. Our results indicate that our approach to detecting pneumonia is promising.
Collapse
Affiliation(s)
| | - Manoj Kumar
- JSS Academy of Technical Education, Noida, India
| | - Abhay Kumar
- National Institute of Technology Patna, Patna, India
| | | | - S Shitharth
- Kebri Dehar University, Kebri Dehar, Ethiopia.
| |
Collapse
|
12
|
Li Y, Wang J, Dai X, Wang L, Yeh CCM, Zheng Y, Zhang W, Ma KL. How Does Attention Work in Vision Transformers? A Visual Analytics Attempt. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:2888-2900. [PMID: 37027263 PMCID: PMC10290521 DOI: 10.1109/tvcg.2023.3261935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Vision transformer (ViT) expands the success of transformer models from sequential data to images. The model decomposes an image into many smaller patches and arranges them into a sequence. Multi-head self-attentions are then applied to the sequence to learn the attention between patches. Despite many successful interpretations of transformers on sequential data, little effort has been devoted to the interpretation of ViTs, and many questions remain unanswered. For example, among the numerous attention heads, which one is more important? How strong are individual patches attending to their spatial neighbors in different heads? What attention patterns have individual heads learned? In this work, we answer these questions through a visual analytics approach. Specifically, we first identify what heads are more important in ViTs by introducing multiple pruning-based metrics. Then, we profile the spatial distribution of attention strengths between patches inside individual heads, as well as the trend of attention strengths across attention layers. Third, using an autoencoder-based learning solution, we summarize all possible attention patterns that individual heads could learn. Examining the attention strengths and patterns of the important heads, we answer why they are important. Through concrete case studies with experienced deep learning experts on multiple ViTs, we validate the effectiveness of our solution that deepens the understanding of ViTs from head importance, head attention strength, and head attention pattern.
Collapse
|
13
|
Jin S, Lee H, Park C, Chu H, Tae Y, Choo J, Ko S. A Visual Analytics System for Improving Attention-based Traffic Forecasting Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1102-1112. [PMID: 36155438 DOI: 10.1109/tvcg.2022.3209462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
With deep learning (DL) outperforming conventional methods for different tasks, much effort has been devoted to utilizing DL in various domains. Researchers and developers in the traffic domain have also designed and improved DL models for forecasting tasks such as estimation of traffic speed and time of arrival. However, there exist many challenges in analyzing DL models due to the black-box property of DL models and complexity of traffic data (i.e., spatio-temporal dependencies). Collaborating with domain experts, we design a visual analytics system, AttnAnalyzer, that enables users to explore how DL models make predictions by allowing effective spatio-temporal dependency analysis. The system incorporates dynamic time warping (DTW) and Granger causality tests for computational spatio-temporal dependency analysis while providing map, table, line chart, and pixel views to assist user to perform dependency and model behavior analysis. For the evaluation, we present three case studies showing how AttnAnalyzer can effectively explore model behaviors and improve model performance in two different road networks. We also provide domain expert feedback.
Collapse
|
14
|
Yuan J, Liu M, Tian F, Liu S. Visual Analysis of Neural Architecture Spaces for Summarizing Design Principles. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:288-298. [PMID: 36191103 DOI: 10.1109/tvcg.2022.3209404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Recent advances in artificial intelligence largely benefit from better neural network architectures. These architectures are a product of a costly process of trial-and-error. To ease this process, we develop ArchExplorer, a visual analysis method for understanding a neural architecture space and summarizing design principles. The key idea behind our method is to make the architecture space explainable by exploiting structural distances between architectures. We formulate the pairwise distance calculation as solving an all-pairs shortest path problem. To improve efficiency, we decompose this problem into a set of single-source shortest path problems. The time complexity is reduced from O(kn2N) to O(knN). Architectures are hierarchically clustered according to the distances between them. A circle-packing-based architecture visualization has been developed to convey both the global relationships between clusters and local neighborhoods of the architectures in each cluster. Two case studies and a post-analysis are presented to demonstrate the effectiveness of ArchExplorer in summarizing design principles and selecting better-performing architectures.
Collapse
|
15
|
Strobelt H, Webson A, Sanh V, Hoover B, Beyer J, Pfister H, Rush AM. Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:1146-1156. [PMID: 36191099 DOI: 10.1109/tvcg.2022.3209479] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
State-of-the-art neural language models can now be used to solve ad-hoc language tasks through zero-shot prompting without the need for supervised training. This approach has gained popularity in recent years, and researchers have demonstrated prompts that achieve strong accuracy on specific NLP tasks. However, finding a prompt for new tasks requires experimentation. Different prompt templates with different wording choices lead to significant accuracy differences. PromptIDE allows users to experiment with prompt variations, visualize prompt performance, and iteratively optimize prompts. We developed a workflow that allows users to first focus on model feedback using small data before moving on to a large data regime that allows empirical grounding of promising prompts using quantitative measures of the task. The tool then allows easy deployment of the newly created ad-hoc models. We demonstrate the utility of PromptIDE (demo: http://prompt.vizhub.ai) and our workflow using several real-world use cases.
Collapse
|
16
|
Deng X, Zhang J, Liu R, Liu K. Classifying ASD based on time-series fMRI using spatial-temporal transformer. Comput Biol Med 2022; 151:106320. [PMID: 36442277 DOI: 10.1016/j.compbiomed.2022.106320] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 10/11/2022] [Accepted: 11/14/2022] [Indexed: 11/18/2022]
Abstract
As the prevalence of autism spectrum disorder (ASD) increases globally, more and more patients need to receive timely diagnosis and treatment to alleviate their suffering. However, the current diagnosis method of ASD still adopts the subjective symptom-based criteria through clinical observation, which is time-consuming and costly. In recent years, functional magnetic resonance imaging (fMRI) neuroimaging techniques have emerged to facilitate the identification of potential biomarkers for diagnosing ASD. In this study, we developed a deep learning framework named spatial-temporal Transformer (ST-Transformer) to distinguish ASD subjects from typical controls based on fMRI data. Specifically, a linear spatial-temporal multi-headed attention unit is proposed to obtain the spatial and temporal representation of fMRI data. Moreover, a Gaussian GAN-based data balancing method is introduced to solve the data unbalance problem in real-world ASD datasets for subtype ASD diagnosis. Our proposed ST-Transformer is evaluated on a large cohort of subjects from two independent datasets (ABIDE I and ABIDE II) and achieves robust accuracies of 71.0% and 70.6%, respectively. Compared with state-of-the-art methods, our results demonstrate competitive performance in ASD diagnosis.
Collapse
Affiliation(s)
- Xin Deng
- The Key Laboratory of Data Engineering and Visual Computing, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Jiahao Zhang
- The Key Laboratory of Data Engineering and Visual Computing, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Rui Liu
- Department of Computer Science, City University of Hong Kong, 999077, Hong Kong, China.
| | - Ke Liu
- The Key Laboratory of Data Engineering and Visual Computing, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| |
Collapse
|
17
|
Sun G, Wu H, Zhu L, Xu C, Liang H, Xu B, Liang R. VSumVis: Interactive Visual Understanding and Diagnosis of Video Summarization Model. ACM T INTEL SYST TEC 2021. [DOI: 10.1145/3458928] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
With the rapid development of mobile Internet, the popularity of video capture devices has brought a surge in multimedia video resources. Utilizing machine learning methods combined with well-designed features, we could automatically obtain video summarization to relax video resource consumption and retrieval issues. However, there always exists a gap between the summarization obtained by the model and the ones annotated by users. How to help users understand the difference, provide insights in improving the model, and enhance the trust in the model remains challenging in the current study. To address these challenges, we propose VSumVis under a user-centered design methodology, a visual analysis system with multi-feature examination and multi-level exploration, which could help users explore and analyze video content, as well as the intrinsic relationship that existed in our video summarization model. The system contains multiple coordinated views, i.e., video view, projection view, detail view, and sequential frames view. A multi-level analysis process to integrate video events and frames are presented with clusters and nodes visualization in our system. Temporal patterns concerning the difference between the manual annotation score and the saliency score produced by our model are further investigated and distinguished with sequential frames view. Moreover, we propose a set of rich user interactions that enable an in-depth, multi-faceted analysis of the features in our video summarization model. We conduct case studies and interviews with domain experts to provide anecdotal evidence about the effectiveness of our approach. Quantitative feedback from a user study confirms the usefulness of our visual system for exploring the video summarization model.
Collapse
Affiliation(s)
- Guodao Sun
- Zhejiang University of Technology, Hangzhou, China
| | - Hao Wu
- Zhejiang University of Technology, Hangzhou, China
| | - Lin Zhu
- Zhejiang University of Technology, Hangzhou, China
| | - Chaoqing Xu
- Zhejiang University of Technology, Hangzhou, China
| | - Haoran Liang
- Zhejiang University of Technology, Hangzhou, China
| | - Binwei Xu
- Zhejiang University of Technology, Hangzhou, China
| | | |
Collapse
|