1
|
Tang X, Ye S, Shi Y, Hu T, Peng Q, You X. Filter Pruning Based on Information Capacity and Independence. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8401-8413. [PMID: 39231052 DOI: 10.1109/tnnls.2024.3415068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/06/2024]
Abstract
Filter pruning has gained widespread adoption for the purpose of compressing and speeding up convolutional neural networks (CNNs). However, the existing approaches are still far from practical applications due to biased filter selection and heavy computation cost. This article introduces a new filter pruning method that selects filters in an interpretable, multiperspective, and lightweight manner. Specifically, we evaluate the contributions of filters from both individual and overall perspectives. For the amount of information contained in each filter, a new metric called information capacity is proposed. Inspired by the information theory, we utilize the interpretable entropy to measure the information capacity and develop a feature-guided approximation process. For correlations among filters, another metric called information independence is designed. Since the aforementioned metrics are evaluated in a simple but effective way, we can identify and prune the least important filters with less computation cost. We conduct comprehensive experiments on benchmark datasets employing various widely used CNN architectures to evaluate the performance of our method. For instance, on ILSVRC-2012, our method outperforms state-of-the-art methods by reducing floating-point operations (FLOPs) by 77.4% and parameters by 69.3% for ResNet-50 with only a minor decrease in an accuracy of 2.64%.
Collapse
|
2
|
Sui Y, Yin M, Gong Y, Yuan B. Co-Exploring Structured Sparsification and Low-Rank Tensor Decomposition for Compact DNNs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6642-6654. [PMID: 38935471 DOI: 10.1109/tnnls.2024.3408099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/29/2024]
Abstract
Sparsification and low-rank decomposition are two important techniques to compress deep neural network (DNN) models. To date, these two popular yet distinct approaches are typically used in separate ways; while their efficient integration for better compression performance is little explored, especially for structured sparsification and decomposition. In this article, we perform systematic co-exploration on structured sparsification and decomposition toward compact DNN models. We first investigate and analyze several important design factors for joint structured sparsification and decomposition, including operational sequence, decomposition format, and optimization procedure. Based on the observations from our analysis, we then propose CEPD, a unified DNN compression framework that can co-explore the benefits of structured sparsification and tensor decomposition in an efficient way. Empirical experiments demonstrate the promising performance of our proposed solution. Notably, on the CIFAR-10 dataset, CEPD brings 0.72%-0.45% accuracy increase over the baseline ResNet-56 and MobileNetV2 models, respectively, and meanwhile, the computational costs are reduced by 43.0%-44.2%, respectively. On the ImageNet dataset, our approach can enable 0.10%-1.39% accuracy increase over the baseline ResNet-18 and ResNet-50 models with 59.4%-54.6% fewer parameters, respectively.
Collapse
|
3
|
Guo K, Lin Z, Chen C, Xing X, Liu F, Xu X. Compact Model Training by Low-Rank Projection With Energy Transfer. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6708-6722. [PMID: 38843062 DOI: 10.1109/tnnls.2024.3400928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
Low-rankness plays an important role in traditional machine learning but is not so popular in deep learning. Most previous low-rank network compression methods compress networks by approximating pretrained models and retraining. However, the optimal solution in the Euclidean space may be quite different from the one with low-rank constraint. A well-pretrained model is not a good initialization for the model with low-rank constraints. Thus, the performance of a low-rank compressed network degrades significantly. Compared with other network compression methods such as pruning, low-rank methods attract less attention in recent years. In this article, we devise a new training method, low-rank projection with energy transfer (LRPET), that trains low-rank compressed networks from scratch and achieves competitive performance. We propose to alternately perform stochastic gradient descent training and projection of each weight matrix onto the corresponding low-rank manifold. Compared to retraining on the compact model, this enables full utilization of model capacity since solution space is relaxed back to Euclidean space after projection. The matrix energy (the sum of squares of singular values) reduction caused by projection is compensated by energy transfer. We uniformly transfer the energy of the pruned singular values to the remaining ones. We theoretically show that energy transfer eases the trend of gradient vanishing caused by projection. In modern networks, a batch normalization (BN) layer can be merged into the previous convolution layer for inference, thereby influencing the optimal low-rank approximation (LRA) of the previous layer. We propose BN rectification to cut off its effect on the optimal LRA, which further improves the performance. Comprehensive experiments on CIFAR-10 and ImageNet have justified that our method is superior to other low-rank compression methods and also outperforms recent state-of-the-art pruning methods. For object detection and semantic segmentation, our method still achieves good compression results. In addition, we combine LRPET with quantization and hashing methods and achieve even better compression than the original single method. We further apply it in Transformer-based models to demonstrate its transferability. Our code is available at https://github.com/BZQLin/LRPET.
Collapse
|
4
|
Pham VT, Zniyed Y, Nguyen TP. Enhanced Network Compression Through Tensor Decompositions and Pruning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4358-4370. [PMID: 38457323 DOI: 10.1109/tnnls.2024.3370294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Network compression techniques that combine tensor decompositions and pruning have shown promise in leveraging the advantages of both strategies. In this work, we propose enhanced Network cOmpRession through TensOr decompositions and pruNing (NORTON), a novel method for network compression. NORTON introduces the concept of filter decomposition, enabling a more detailed decomposition of the network while preserving the weight's multidimensional properties. Our method incorporates a novel structured pruning approach, effectively integrating the decomposed model. Through extensive experiments on various architectures, benchmark datasets, and representative vision tasks, we demonstrate the usefulness of our method. NORTON achieves superior results compared to state-of-the-art (SOTA) techniques in terms of complexity and accuracy. Our code is also available for research purposes.
Collapse
|
5
|
Fracastoro G, Fosson SM, Migliorati A, Calafiore GC. Playing the Lottery With Concave Regularizers for Sparse Trainable Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4575-4585. [PMID: 38478446 DOI: 10.1109/tnnls.2024.3373609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
The design of sparse neural networks, i.e., of networks with a reduced number of parameters, has been attracting increasing research attention in the last few years. The use of sparse models may significantly reduce the computational and storage footprint in the inference phase. In this context, the lottery ticket hypothesis (LTH) constitutes a breakthrough result, that addresses not only the performance of the inference phase, but also of the training phase. It states that it is possible to extract effective sparse subnetworks, called winning tickets, that can be trained in isolation. The development of effective methods to play the lottery, i.e., to find winning tickets, is still an open problem. In this article, we propose a novel class of methods to play the lottery. The key point is the use of concave regularization to promote the sparsity of a relaxed binary mask, which represents the network topology. We theoretically analyze the effectiveness of the proposed method in the convex framework. Then, we propose extended numerical tests on various datasets and architectures, that show that the proposed method can improve the performance of state-of-the-art algorithms.
Collapse
|
6
|
Zhang X, Xie W, Li Y, Lei J, Jiang K, Fang L, Du Q. Block-Wise Partner Learning for Model Compression. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17582-17595. [PMID: 37656638 DOI: 10.1109/tnnls.2023.3306512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/03/2023]
Abstract
Despite the great potential of convolutional neural networks (CNNs) in various tasks, the resource-hungry nature greatly hinders their wide deployment in cost-sensitive and low-powered scenarios, especially applications in remote sensing. Existing model pruning approaches, implemented by a "subtraction" operation, impose a performance ceiling on the slimmed model. Self-knowledge distillation (Self-KD) resorts to auxiliary networks that are only active in the training phase for performance improvement. However, the knowledge is holistic and crude, and the learning-based knowledge transfer is mediate and lossy. Here, we propose a novel model-compression method, termed block-wise partner learning (BPL), which comprises "extension" and "fusion" operations and liberates the compressed model from the bondage of baseline. Different from the Self-KD, the proposed BPL creates a partner for each block for performance enhancement in training. For the model to absorb more diverse information, a diversity loss (DL) is designed to evaluate the difference between the original block and the partner. Besides, the partner is fused equivalently instead of being discarded directly. After training, we can simply adopt the fused compressed model that contains the enhancement information of partners but with fewer parameters and less inference cost. As validated using the UC Merced land-use, NWPU-RESISC45, and RSD46-WHU datasets, the BPL demonstrates superiority over other compared model-compression approaches. For example, it attains a substantial floating-point operations (FLOPs) reduction of 73.97% with only 0.24 accuracy (ACC.) loss for ResNet-50 on the UC Merced land-use dataset. The code is available at https://github.com/zhangxin-xd/BPL.
Collapse
|
7
|
Liu Y, Fan K, Zhou W. FPWT: Filter pruning via wavelet transform for CNNs. Neural Netw 2024; 179:106577. [PMID: 39098265 DOI: 10.1016/j.neunet.2024.106577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 07/18/2024] [Accepted: 07/24/2024] [Indexed: 08/06/2024]
Abstract
The enormous data and computational resources required by Convolutional Neural Networks (CNNs) hinder the practical application on mobile devices. To solve this restrictive problem, filter pruning has become one of the practical approaches. At present, most existing pruning methods are currently developed and practiced with respect to the spatial domain, which ignores the potential interconnections in the model structure and the decentralized distribution of image energy in the spatial domain. The image frequency domain transform method can remove the correlation between image pixels and concentrate the image energy distribution, which results in lossy compression of images. In this study, we find that the frequency domain transform method is also applicable to the feature maps of CNNs. The filter pruning via wavelet transform (WT) is proposed in this paper (FPWT), which combines the frequency domain information of WT with the output feature map to more obviously find the correlation between feature maps and make the energy into a relatively concentrated distribution in the frequency domain. Moreover, the importance score of each feature map is calculated by the cosine similarity and the energy-weighted coefficients of the high and low frequency components, and prune the filter based on its importance score. Experiments on two image classification datasets validate the effectiveness of FPWT. For ResNet-110 on CIFAR-10, FPWT reduces FLOPs and parameters by more than 60.0 % with 0.53 % accuracy improvement. For ResNet-50 on ImageNet, FPWT reduces FLOPs by 53.8 % and removes parameters by 49.7 % with only 0.97 % loss of Top-1 accuracy.
Collapse
Affiliation(s)
- Yajun Liu
- School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China
| | - Kefeng Fan
- China Electronics Standardization Institute, Beijing, 100007, China.
| | - Wenju Zhou
- School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China
| |
Collapse
|
8
|
Guo J, Xu D, Ouyang W. Multidimensional Pruning and Its Extension: A Unified Framework for Model Compression. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13056-13070. [PMID: 37220047 DOI: 10.1109/tnnls.2023.3266435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Observing that the existing model compression approaches only focus on reducing the redundancies in convolutional neural networks (CNNs) along one particular dimension (e.g., the channel or spatial or temporal dimension), in this work, we propose our multidimensional pruning (MDP) framework, which can compress both 2-D CNNs and 3-D CNNs along multiple dimensions in an end-to-end fashion. Specifically, MDP indicates the simultaneous reduction of channels and more redundancy on other additional dimensions. The redundancy of additional dimensions depends on the input data, i.e., spatial dimension for 2-D CNNs when using images as the input data, and spatial and temporal dimensions for 3-D CNNs when using videos as the input data. We further extend our MDP framework to the MDP-Point approach for compressing point cloud neural networks (PCNNs) whose inputs are irregular point clouds (e.g., PointNet). In this case, the redundancy along the additional dimension indicates the point dimension (i.e., the number of points). Comprehensive experiments on six benchmark datasets demonstrate the effectiveness of our MDP framework and its extended version MDP-Point for compressing CNNs and PCNNs, respectively.
Collapse
|
9
|
Qian Y, He Z, Wang Y, Wang B, Ling X, Gu Z, Wang H, Zeng S, Swaileh W. Hierarchical Threshold Pruning Based on Uniform Response Criterion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10869-10881. [PMID: 37071515 DOI: 10.1109/tnnls.2023.3244994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Convolutional neural networks (CNNs) have been successfully applied to various fields. However, CNNs' overparameterization requires more memory and training time, making it unsuitable for some resource-constrained devices. To address this issue, filter pruning as one of the most efficient ways was proposed. In this article, we propose a feature-discrimination-based filter importance criterion, uniform response criterion (URC), as a key component of filter pruning. It converts the maximum activation responses into probabilities and then measures the importance of the filter through the distribution of these probabilities over classes. However, applying URC directly to global threshold pruning may cause some problems. The first problem is that some layers will be completely pruned under global pruning settings. The second problem is that global threshold pruning neglects that filters in different layers have different importance. To address these issues, we propose hierarchical threshold pruning (HTP) with URC. It performs a pruning step limited in a relatively redundant layer rather than comparing the filters' importance across all layers, which can avoid some important filters being pruned. The effectiveness of our method benefits from three techniques: 1) measuring filter importance by URC; 2) normalizing filter scores; and 3) conducting prune in relatively redundant layers. Extensive experiments on CIFAR-10/100 and ImageNet show that our method achieves the state-of-the-art performance on multiple benchmarks.
Collapse
|
10
|
Yang C, An Z, Cai L, Xu Y. Knowledge Distillation Using Hierarchical Self-Supervision Augmented Distribution. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2094-2108. [PMID: 35820013 DOI: 10.1109/tnnls.2022.3186807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Knowledge distillation (KD) is an effective framework that aims to transfer meaningful information from a large teacher to a smaller student. Generally, KD often involves how to define and transfer knowledge. Previous KD methods often focus on mining various forms of knowledge, for example, feature maps and refined information. However, the knowledge is derived from the primary supervised task, and thus, is highly task-specific. Motivated by the recent success of self-supervised representation learning, we propose an auxiliary self-supervision augmented task to guide networks to learn more meaningful features. Therefore, we can derive soft self-supervision augmented distributions as richer dark knowledge from this task for KD. Unlike previous knowledge, this distribution encodes joint knowledge from supervised and self-supervised feature learning. Beyond knowledge exploration, we propose to append several auxiliary branches at various hidden layers, to fully take advantage of hierarchical feature maps. Each auxiliary branch is guided to learn self-supervision augmented tasks and distill this distribution from teacher to student. Overall, we call our KD method a hierarchical self-supervision augmented KD (HSSAKD). Experiments on standard image classification show that both offline and online HSSAKD achieves state-of-the-art performance in the field of KD. Further transfer experiments on object detection further verify that HSSAKD can guide the network to learn better features. The code is available at https://github.com/winycg/HSAKD.
Collapse
|
11
|
Tao C, Lin R, Chen Q, Zhang Z, Luo P, Wong N. FAT: Frequency-Aware Transformation for Bridging Full-Precision and Low-Precision Deep Representations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2640-2654. [PMID: 35867358 DOI: 10.1109/tnnls.2022.3190607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Learning low-bitwidth convolutional neural networks (CNNs) is challenging because performance may drop significantly after quantization. Prior arts often quantize the network weights by carefully tuning hyperparameters such as nonuniform stepsize and layerwise bitwidths, which are complicated since the full- and low-precision representations have large discrepancies. This work presents a novel quantization pipeline, named frequency-aware transformation (FAT), that features important benefits: 1) instead of designing complicated quantizers, FAT learns to transform network weights in the frequency domain to remove redundant information before quantization, making them amenable to training in low bitwidth with simple quantizers; 2) FAT readily embeds CNNs in low bitwidths using standard quantizers without tedious hyperparameter tuning and theoretical analyses show that FAT minimizes the quantization errors in both uniform and nonuniform quantizations; and 3) FAT can be easily plugged into various CNN architectures. Using FAT with a simple uniform/logarithmic quantizer can achieve the state-of-the-art performance in different bitwidths on various model architectures. Consequently, FAT serves to provide a novel frequency-based perspective for model quantization.
Collapse
|
12
|
FPFS: Filter-level pruning via distance weight measuring filter similarity. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
13
|
ReLP: Reinforcement Learning Pruning Method Based on Prior Knowledge. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11058-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
14
|
Yin W, Dong G, Zhao Y, Li R. Coresets based asynchronous network slimming. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04092-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
AbstractPruning is effective to reduce neural networks’ parameters and accelerate inferences, facilitating deep learning in resource-limited scenarios. This paper proposes an asynchronous pruning method for multi-branch networks on the basis of our previous work on channel coresets constructions, to achieve module-level pruning. Firstly, this paper accelerates coreset based pruning by batch sampling with a sampling probability decided on our-designed importance function. Secondly, this paper gives asynchronous pruning solutions with an in-place distillation of feature maps for deployment on multi-branch networks such as ResNet and SqueezeNet. Thirdly, this paper provides an extension to neuron pruning by grouping weights as channels. During tests on sensitivity of different layers to channel pruning, our method outperforms comparison schemes on object detection networks, indicating advantages of data-independent channel selections in maintaining precision. As shown in tests of asynchronous pruning solutions on multi-branch classification networks, our method further decreases FLOPs with a small accuracy decline on ResNet and acquires a small accuracy increment on SqueezeNet. In tests on neuron pruning, our method achieves an accuracy comparable to existing coreset based pruning methods by two solutions of precision recovery.
Collapse
|
15
|
Zhao M, Tong X, Wu W, Wang Z, Zhou B, Huang X. A Novel Deep-Learning Model Compression Based on Filter-Stripe Group Pruning and Its IoT Application. SENSORS (BASEL, SWITZERLAND) 2022; 22:5623. [PMID: 35957176 PMCID: PMC9371170 DOI: 10.3390/s22155623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/21/2022] [Accepted: 07/25/2022] [Indexed: 06/15/2023]
Abstract
Nowadays, there is a tradeoff between the deep-learning module-compression ratio and the module accuracy. In this paper, a strategy for refining the pruning quantification and weights based on neural network filters is proposed. Firstly, filters in the neural network were refined into strip-like filter strips. Then, the evaluation of the filter strips was used to refine the partial importance of the filter, cut off the unimportant filter strips and reorganize the remaining filter strips. Finally, the training of the neural network after recombination was quantified to further compress the computational amount of the neural network. The results show that the method can significantly reduce the computational effort of the neural network and compress the number of parameters in the model. Based on experimental results on ResNet56, this method can reduce the number of parameters to 1/4 and the amount of calculation to 1/5, and the loss of model accuracy is only 0.01. On VGG16, the number of parameters is reduced to 1/14, the amount of calculation is reduced to 1/3, and the accuracy loss is 0.5%.
Collapse
Affiliation(s)
- Ming Zhao
- School of Computer Science, Yangtze University, Jingzhou 434023, China; (M.Z.); (W.W.); (Z.W.); (B.Z.); (X.H.)
| | - Xindi Tong
- Department of Mathematics and Information Engineering, The Chinese University of Hong Kong, Hong Kong 999077, China
| | - Weixian Wu
- School of Computer Science, Yangtze University, Jingzhou 434023, China; (M.Z.); (W.W.); (Z.W.); (B.Z.); (X.H.)
| | - Zhen Wang
- School of Computer Science, Yangtze University, Jingzhou 434023, China; (M.Z.); (W.W.); (Z.W.); (B.Z.); (X.H.)
| | - Bingxue Zhou
- School of Computer Science, Yangtze University, Jingzhou 434023, China; (M.Z.); (W.W.); (Z.W.); (B.Z.); (X.H.)
| | - Xiaodan Huang
- School of Computer Science, Yangtze University, Jingzhou 434023, China; (M.Z.); (W.W.); (Z.W.); (B.Z.); (X.H.)
| |
Collapse
|
16
|
A Novel Fusion Pruning Algorithm Based on Information Entropy Stratification and IoT Application. ELECTRONICS 2022. [DOI: 10.3390/electronics11081212] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
To further reduce the size of the neural network model and enable the network to be deployed on mobile devices, a novel fusion pruning algorithm based on information entropy stratification is proposed in this paper. Firstly, the method finds similar filters and removes redundant parts by Affinity Propagation Clustering, then secondly further prunes the channels by using information entropy stratification and batch normalization (BN) layer scaling factor, and finally restores the accuracy training by fine-tuning to achieve a reduced network model size without losing network accuracy. Experiments are conducted on the vgg16 and Resnet56 network using the cifar10 dataset. On vgg16, the results show that, compared with the original model, the parametric amount of the algorithm proposed in this paper is reduced by 90.69% and the computation is reduced to 24.46% of the original one. In ResNet56, we achieve a 63.82%-FLOPs reduction by removing 63.53% parameters. The memory occupation and computation speed of the new model are better than the baseline model while maintaining a high network accuracy. Compared with similar algorithms, the algorithm has obvious advantages in the dimensions of computational speed and model size. The pruned model is also deployed to the Internet of Things (IoT) as a target detection system. In addition, experiments show that the proposed model is able to detect targets accurately with low reasoning time and memory. It takes only 252.84 ms on embedded devices, thus matching the limited resources of IoT.
Collapse
|
17
|
Shao L, Zuo H, Zhang J, Xu Z, Yao J, Wang Z, Li H. Filter Pruning via Measuring Feature Map Information. SENSORS (BASEL, SWITZERLAND) 2021; 21:6601. [PMID: 34640921 PMCID: PMC8512244 DOI: 10.3390/s21196601] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Revised: 09/15/2021] [Accepted: 09/28/2021] [Indexed: 11/17/2022]
Abstract
Neural network pruning, an important method to reduce the computational complexity of deep models, can be well applied to devices with limited resources. However, most current methods focus on some kind of information about the filter itself to prune the network, rarely exploring the relationship between the feature maps and the filters. In this paper, two novel pruning methods are proposed. First, a new pruning method is proposed, which reflects the importance of filters by exploring the information in the feature maps. Based on the premise that the more information there is, more important the feature map is, the information entropy of feature maps is used to measure information, which is used to evaluate the importance of each filter in the current layer. Further, normalization is used to realize cross layer comparison. As a result, based on the method mentioned above, the network structure is efficiently pruned while its performance is well reserved. Second, we proposed a parallel pruning method using the combination of our pruning method above and slimming pruning method which has better results in terms of computational cost. Our methods perform better in terms of accuracy, parameters, and FLOPs compared to most advanced methods. On ImageNet, it is achieved 72.02% top1 accuracy for ResNet50 with merely 11.41M parameters and 1.12B FLOPs.For DenseNet40, it is obtained 94.04% accuracy with only 0.38M parameters and 110.72M FLOPs on CIFAR10, and our parallel pruning method makes the parameters and FLOPs are just 0.37M and 100.12M, respectively, with little loss of accuracy.
Collapse
Affiliation(s)
- Linsong Shao
- Key Laboratory of Optical Engineering, Chinese Academy of Sciences, Chengdu 610200, China;
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610200, China; (J.Z.); (Z.X.); (J.Y.); (Z.W.); (H.L.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haorui Zuo
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610200, China; (J.Z.); (Z.X.); (J.Y.); (Z.W.); (H.L.)
| | - Jianlin Zhang
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610200, China; (J.Z.); (Z.X.); (J.Y.); (Z.W.); (H.L.)
| | - Zhiyong Xu
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610200, China; (J.Z.); (Z.X.); (J.Y.); (Z.W.); (H.L.)
| | - Jinzhen Yao
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610200, China; (J.Z.); (Z.X.); (J.Y.); (Z.W.); (H.L.)
| | - Zhixing Wang
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610200, China; (J.Z.); (Z.X.); (J.Y.); (Z.W.); (H.L.)
| | - Hong Li
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610200, China; (J.Z.); (Z.X.); (J.Y.); (Z.W.); (H.L.)
| |
Collapse
|