1
|
Boggust A, Sivaraman V, Assogba Y, Ren D, Moritz D, Hohman F. Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2025; 31:809-819. [PMID: 39255121 DOI: 10.1109/tvcg.2024.3456371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2024]
Abstract
To deploy machine learning models on-device, practitioners use compression algorithms to shrink and speed up models while maintaining their high-quality output. A critical aspect of compression in practice is model comparison, including tracking many compression experiments, identifying subtle changes in model behavior, and negotiating complex accuracy-efficiency trade-offs. However, existing compression tools poorly support comparison, leading to tedious and, sometimes, incomplete analyses spread across disjoint tools. To support real-world comparative workflows, we develop an interactive visual system called COMPRESS AND COMPARE. Within a single interface, COMPRESS AND COMPARE surfaces promising compression strategies by visualizing provenance relationships between compressed models and reveals compression-induced behavior changes by comparing models' predictions, weights, and activations. We demonstrate how COMPRESS AND COMPARE supports common compression analysis tasks through two case studies, debugging failed compression on generative language models and identifying compression artifacts in image classification models. We further evaluate COMPRESS AND COMPARE in a user study with eight compression experts, illustrating its potential to provide structure to compression workflows, help practitioners build intuition about compression, and encourage thorough analysis of compression's effect on model behavior. Through these evaluations, we identify compression-specific challenges that future visual analytics tools should consider and COMPRESS AND COMPARE visualizations that may generalize to broader model comparison tasks.
Collapse
|
2
|
Zhang K, Wang Y, Shi S, Wang Q, Wang C, Liu S. Improved yolov5 algorithm combined with depth camera and embedded system for blind indoor visual assistance. Sci Rep 2024; 14:23000. [PMID: 39362920 PMCID: PMC11452198 DOI: 10.1038/s41598-024-74416-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Accepted: 09/25/2024] [Indexed: 10/05/2024] Open
Abstract
To assist the visually impaired in their daily lives and solve the problems associated with poor portability, high hardware costs, and environmental susceptibility of indoor object-finding aids for the visually impaired, an improved YOLOv5 algorithm was proposed. It was combined with a RealSense D435i depth camera and a voice system to realise an indoor object-finding device for the visually impaired using a Raspberry Pi 4 B device as its core. The algorithm uses GhostNet instead of the YOLOv5s backbone network to reduce the number of parameters and computation of the model, incorporates an attention mechanism (coordinate attention), and replaces the YOLOv5 neck network with a bidirectional feature pyramid network to enhance feature extraction. Compared to the YOLOv5 model, the model size was reduced by 42.4%, number of parameters was reduced by 47.9%, and recall rate increased by 1.2% with the same precision. This study applied the improved YOLOv5 algorithm to an indoor object-finding device for the visually impaired, where the searched object was input by voice, and the RealSense D435i was used to acquire RGB and depth images to realize the detection and ranging of the object, broadcast the specific distance of the target object by voice, and assist the visually impaired in finding the object.
Collapse
Affiliation(s)
- Kaikai Zhang
- School of Computer Science and Technology, Huaibei Normal University, 235000, Huaibei, China
| | - Yanyan Wang
- School of Computer Science and Technology, Huaibei Normal University, 235000, Huaibei, China
| | - Shengzhe Shi
- School of Computer Science and Technology, Huaibei Normal University, 235000, Huaibei, China
| | - Qingqing Wang
- School of Computer Science and Technology, Huaibei Normal University, 235000, Huaibei, China
| | - Chun Wang
- School of Computer Science and Technology, Huaibei Normal University, 235000, Huaibei, China
| | - Sheng Liu
- School of Computer Science and Technology, Huaibei Normal University, 235000, Huaibei, China.
- Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior, 235000, Huaibei, China.
| |
Collapse
|
3
|
Li G, Wang J, Wang Y, Shan G, Zhao Y. An In-Situ Visual Analytics Framework for Deep Neural Networks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:6770-6786. [PMID: 38051629 DOI: 10.1109/tvcg.2023.3339585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
The past decade has witnessed the superior power of deep neural networks (DNNs) in applications across various domains. However, training a high-quality DNN remains a non-trivial task due to its massive number of parameters. Visualization has shown great potential in addressing this situation, as evidenced by numerous recent visualization works that aid in DNN training and interpretation. These works commonly employ a strategy of logging training-related data and conducting post-hoc analysis. Based on the results of offline analysis, the model can be further trained or fine-tuned. This strategy, however, does not cope with the increasing complexity of DNNs, because (1) the time-series data collected over the training are usually too large to be stored entirely; (2) the huge I/O overhead significantly impacts the training efficiency; (3) post-hoc analysis does not allow rapid human-interventions (e.g., stop training with improper hyper-parameter settings to save computational resources). To address these challenges, we propose an in-situ visualization and analysis framework for the training of DNNs. Specifically, we employ feature extraction algorithms to reduce the size of training-related data in-situ and use the reduced data for real-time visual analytics. The states of model training are disclosed to model designers in real-time, enabling human interventions on demand to steer the training. Through concrete case studies, we demonstrate how our in-situ framework helps deep learning experts optimize DNNs and improve their analysis efficiency.
Collapse
|
4
|
Hu K, Gao J, Mao F, Song X, Cheng L, Feng Z, Song M. Disassembling Convolutional Segmentation Network. Int J Comput Vis 2023. [DOI: 10.1007/s11263-023-01776-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
5
|
Wang J, Zhang W, Yang H, Yeh CCM, Wang L. Visual Analytics for RNN-Based Deep Reinforcement Learning. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:4141-4155. [PMID: 33929961 DOI: 10.1109/tvcg.2021.3076749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep reinforcement learning (DRL) targets to train an autonomous agent to interact with a pre-defined environment and strives to achieve specific goals through deep neural networks (DNN). Recurrent neural network (RNN) based DRL has demonstrated superior performance, as RNNs can effectively capture the temporal evolution of the environment and respond with proper agent actions. However, apart from the outstanding performance, little is known about how RNNs understand the environment internally and what has been memorized over time. Revealing these details is extremely important for deep learning experts to understand and improve DRLs, which in contrast, is also challenging due to the complicated data transformations inside these models. In this article, we propose Deep Reinforcement Learning Interactive Visual Explorer (DRLIVE), a visual analytics system to effectively explore, interpret, and diagnose RNN-based DRLs. Having focused on DRL agents trained for different Atari games, DRLIVE accomplishes three tasks: game episode exploration, RNN hidden/cell state examination, and interactive model perturbation. Using the system, one can flexibly explore a DRL agent through interactive visualizations, discover interpretable RNN cells by prioritizing RNN hidden/cell states with a set of metrics, and further diagnose the DRL model by interactively perturbing its inputs. Through concrete studies with multiple deep learning experts, we validated the efficacy of DRLIVE.
Collapse
|
6
|
YOLO-GD: A Deep Learning-Based Object Detection Algorithm for Empty-Dish Recycling Robots. MACHINES 2022. [DOI: 10.3390/machines10050294] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Due to the workforce shortage caused by the declining birth rate and aging population, robotics is one of the solutions to replace humans and overcome this urgent problem. This paper introduces a deep learning-based object detection algorithm for empty-dish recycling robots to automatically recycle dishes in restaurants and canteens, etc. In detail, a lightweight object detection model YOLO-GD (Ghost Net and Depthwise convolution) is proposed for detecting dishes in images such as cups, chopsticks, bowls, towels, etc., and an image processing-based catch point calculation is designed for extracting the catch point coordinates of the different-type dishes. The coordinates are used to recycle the target dishes by controlling the robot arm. Jetson Nano is equipped on the robot as a computer module, and the YOLO-GD model is also quantized by TensorRT for improving the performance. The experimental results demonstrate that the YOLO-GD model is only 1/5 size of the state-of-the-art model YOLOv4, and the mAP of YOLO-GD achieves 97.38%, 3.41% higher than YOLOv4. After quantization, the YOLO-GD model decreases the inference time per image from 207.92 ms to 32.75 ms, and the mAP is 97.42%, which is slightly higher than the model without quantization. Through the proposed image processing method, the catch points of various types of dishes are effectively extracted. The functions of empty-dish recycling are realized and will lead to further development toward practical use.
Collapse
|
7
|
Park H, Das N, Duggal R, Wright AP, Shaikh O, Hohman F, Polo Chau DH. NeuroCartography: Scalable Automatic Visual Summarization of Concepts in Deep Neural Networks. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:813-823. [PMID: 34587079 DOI: 10.1109/tvcg.2021.3114858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Existing research on making sense of deep neural networks often focuses on neuron-level interpretation, which may not adequately capture the bigger picture of how concepts are collectively encoded by multiple neurons. We present Neurocartography, an interactive system that scalably summarizes and visualizes concepts learned by neural networks. It automatically discovers and groups neurons that detect the same concepts, and describes how such neuron groups interact to form higher-level concepts and the subsequent predictions. Neurocartography introduces two scalable summarization techniques: (1) neuron clustering groups neurons based on the semantic similarity of the concepts detected by neurons (e.g., neurons detecting "dog faces" of different breeds are grouped); and (2) neuron embedding encodes the associations between related concepts based on how often they co-occur (e.g., neurons detecting "dog face" and "dog tail" are placed closer in the embedding space). Key to our scalable techniques is the ability to efficiently compute all neuron pairs' relationships, in time linear to the number of neurons instead of quadratic time. Neurocartography scales to large data, such as the ImageNet dataset with 1.2M images. The system's tightly coordinated views integrate the scalable techniques to visualize the concepts and their relationships, projecting the concept associations to a 2D space in Neuron Projection View, and summarizing neuron clusters and their relationships in Graph View. Through a large-scale human evaluation, we demonstrate that our technique discovers neuron groups that represent coherent, human-meaningful concepts. And through usage scenarios, we describe how our approaches enable interesting and surprising discoveries, such as concept cascades of related and isolated concepts. The Neurocartography visualization runs in modern browsers and is open-sourced.
Collapse
|
8
|
CNERVis: a visual diagnosis tool for Chinese named entity recognition. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00799-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
9
|
Yue R, Li G, Lu X, Li S, Shan G. EddyVis: A visual system to analyze eddies. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00798-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
10
|
Lu X, Li G, Yang B, Liu J, Shan G. StreamFlow: a visual analysis system for 2D streamlines based on workflow mining technique. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00795-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
11
|
Cheng S, Li X, Shan G, Niu B, Wang Y, Luo M. ACMViz: a visual analytics approach to understand DRL-based autonomous control model. J Vis (Tokyo) 2021. [DOI: 10.1007/s12650-021-00793-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|