1
|
Lu Y, Gong M, Feng K, Liu J, Guan Z, Li H. SAAF: Self-Adaptive Attention Factor-Based Taylor-Pruning on Convolutional Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8540-8553. [PMID: 39213269 DOI: 10.1109/tnnls.2024.3439435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Nowadays, pruning techniques have drawn attention to convolutional neural networks (CNNs) for reducing the consumption of computation resources. In particular, the Taylor-based method simplifies the evaluation of importance for each filter as the product of the gradient and weight value of the output features, which outperforms other methods in reductions of parameters and floating point operations (FLOPs). However, the Taylor-based method sacrifices too much accuracy when the overall pruning rate is relatively large compared with other pruning algorithms. In this article, we propose a self-adaptive attention factor (SAAF) to improve the performance of the slimmed model when conventional Taylor-based pruning is utilized under higher pruning. Specifically, SAAF can be calculated by leveraging the remaining ratio of filters at the early pruning stage of the Taylor-based method, and then, some pruned filters can be recovered for improving the accuracy of the slimmed model in terms of SAAF. It means that SAAF can protect filters from being overslimmed to eliminate the degeneration of Taylor-based pruning when the pruning rate is large as well as can compress models apparently across various datasets. We test the efficiency of SAAF on VGG-16 and ResNet-50 with CIFAR-10, Tiny-ImageNet, ImageNet-1000, and remote sensing images. Our method outperforms the traditional Taylor-based method obviously in accuracy, and there are only tiny sacrifices in the reduction of parameters and FLOPs, which is better than other pruning methods.
Collapse
|
2
|
Guo K, Lin Z, Chen C, Xing X, Liu F, Xu X. Compact Model Training by Low-Rank Projection With Energy Transfer. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6708-6722. [PMID: 38843062 DOI: 10.1109/tnnls.2024.3400928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
Low-rankness plays an important role in traditional machine learning but is not so popular in deep learning. Most previous low-rank network compression methods compress networks by approximating pretrained models and retraining. However, the optimal solution in the Euclidean space may be quite different from the one with low-rank constraint. A well-pretrained model is not a good initialization for the model with low-rank constraints. Thus, the performance of a low-rank compressed network degrades significantly. Compared with other network compression methods such as pruning, low-rank methods attract less attention in recent years. In this article, we devise a new training method, low-rank projection with energy transfer (LRPET), that trains low-rank compressed networks from scratch and achieves competitive performance. We propose to alternately perform stochastic gradient descent training and projection of each weight matrix onto the corresponding low-rank manifold. Compared to retraining on the compact model, this enables full utilization of model capacity since solution space is relaxed back to Euclidean space after projection. The matrix energy (the sum of squares of singular values) reduction caused by projection is compensated by energy transfer. We uniformly transfer the energy of the pruned singular values to the remaining ones. We theoretically show that energy transfer eases the trend of gradient vanishing caused by projection. In modern networks, a batch normalization (BN) layer can be merged into the previous convolution layer for inference, thereby influencing the optimal low-rank approximation (LRA) of the previous layer. We propose BN rectification to cut off its effect on the optimal LRA, which further improves the performance. Comprehensive experiments on CIFAR-10 and ImageNet have justified that our method is superior to other low-rank compression methods and also outperforms recent state-of-the-art pruning methods. For object detection and semantic segmentation, our method still achieves good compression results. In addition, we combine LRPET with quantization and hashing methods and achieve even better compression than the original single method. We further apply it in Transformer-based models to demonstrate its transferability. Our code is available at https://github.com/BZQLin/LRPET.
Collapse
|
3
|
Pham VT, Zniyed Y, Nguyen TP. Enhanced Network Compression Through Tensor Decompositions and Pruning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4358-4370. [PMID: 38457323 DOI: 10.1109/tnnls.2024.3370294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Network compression techniques that combine tensor decompositions and pruning have shown promise in leveraging the advantages of both strategies. In this work, we propose enhanced Network cOmpRession through TensOr decompositions and pruNing (NORTON), a novel method for network compression. NORTON introduces the concept of filter decomposition, enabling a more detailed decomposition of the network while preserving the weight's multidimensional properties. Our method incorporates a novel structured pruning approach, effectively integrating the decomposed model. Through extensive experiments on various architectures, benchmark datasets, and representative vision tasks, we demonstrate the usefulness of our method. NORTON achieves superior results compared to state-of-the-art (SOTA) techniques in terms of complexity and accuracy. Our code is also available for research purposes.
Collapse
|
4
|
Fracastoro G, Fosson SM, Migliorati A, Calafiore GC. Playing the Lottery With Concave Regularizers for Sparse Trainable Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4575-4585. [PMID: 38478446 DOI: 10.1109/tnnls.2024.3373609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
The design of sparse neural networks, i.e., of networks with a reduced number of parameters, has been attracting increasing research attention in the last few years. The use of sparse models may significantly reduce the computational and storage footprint in the inference phase. In this context, the lottery ticket hypothesis (LTH) constitutes a breakthrough result, that addresses not only the performance of the inference phase, but also of the training phase. It states that it is possible to extract effective sparse subnetworks, called winning tickets, that can be trained in isolation. The development of effective methods to play the lottery, i.e., to find winning tickets, is still an open problem. In this article, we propose a novel class of methods to play the lottery. The key point is the use of concave regularization to promote the sparsity of a relaxed binary mask, which represents the network topology. We theoretically analyze the effectiveness of the proposed method in the convex framework. Then, we propose extended numerical tests on various datasets and architectures, that show that the proposed method can improve the performance of state-of-the-art algorithms.
Collapse
|
5
|
Liu Y, Fan K, Zhou W. FPWT: Filter pruning via wavelet transform for CNNs. Neural Netw 2024; 179:106577. [PMID: 39098265 DOI: 10.1016/j.neunet.2024.106577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 07/18/2024] [Accepted: 07/24/2024] [Indexed: 08/06/2024]
Abstract
The enormous data and computational resources required by Convolutional Neural Networks (CNNs) hinder the practical application on mobile devices. To solve this restrictive problem, filter pruning has become one of the practical approaches. At present, most existing pruning methods are currently developed and practiced with respect to the spatial domain, which ignores the potential interconnections in the model structure and the decentralized distribution of image energy in the spatial domain. The image frequency domain transform method can remove the correlation between image pixels and concentrate the image energy distribution, which results in lossy compression of images. In this study, we find that the frequency domain transform method is also applicable to the feature maps of CNNs. The filter pruning via wavelet transform (WT) is proposed in this paper (FPWT), which combines the frequency domain information of WT with the output feature map to more obviously find the correlation between feature maps and make the energy into a relatively concentrated distribution in the frequency domain. Moreover, the importance score of each feature map is calculated by the cosine similarity and the energy-weighted coefficients of the high and low frequency components, and prune the filter based on its importance score. Experiments on two image classification datasets validate the effectiveness of FPWT. For ResNet-110 on CIFAR-10, FPWT reduces FLOPs and parameters by more than 60.0 % with 0.53 % accuracy improvement. For ResNet-50 on ImageNet, FPWT reduces FLOPs by 53.8 % and removes parameters by 49.7 % with only 0.97 % loss of Top-1 accuracy.
Collapse
Affiliation(s)
- Yajun Liu
- School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China
| | - Kefeng Fan
- China Electronics Standardization Institute, Beijing, 100007, China.
| | - Wenju Zhou
- School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China
| |
Collapse
|
6
|
Chen Z, Xiang J, Lu Y, Xuan Q, Wang Z, Chen G, Yang X. RGP: Neural Network Pruning Through Regular Graph With Edges Swapping. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:14671-14683. [PMID: 37310824 DOI: 10.1109/tnnls.2023.3280899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Deep learning technology has found a promising application in lightweight model design, for which pruning is an effective means of achieving a large reduction in both model parameters and float points operations (FLOPs). The existing neural network pruning methods mostly start from the consideration of the importance of model parameters and design parameter evaluation metrics to perform parameter pruning iteratively. These methods were not studied from the perspective of network model topology, so they might be effective but not efficient, and they require completely different pruning for different datasets. In this article, we study the graph structure of the neural network and propose a regular graph pruning (RGP) method to perform a one-shot neural network pruning. Specifically, we first generate a regular graph and set its node-degree values to meet the preset pruning ratio. Then, we reduce the average shortest path-length (ASPL) of the graph by swapping edges to obtain the optimal edge distribution. Finally, we map the obtained graph to a neural network structure to realize pruning. Our experiments demonstrate that the ASPL of the graph is negatively correlated with the classification accuracy of the neural network and that RGP has a strong precision retention capability with high parameter reduction (more than 90%) and FLOPs reduction (more than 90%) (the code for quick use and reproduction is available at https://github.com/Holidays1999/Neural-Network-Pruning-through-its-RegularGraph-Structure).
Collapse
|
7
|
He Y, Xiao L. Structured Pruning for Deep Convolutional Neural Networks: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:2900-2919. [PMID: 38015707 DOI: 10.1109/tpami.2023.3334614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
The remarkable performance of deep Convolutional neural networks (CNNs) is generally attributed to their deeper and wider architectures, which can come with significant computational costs. Pruning neural networks has thus gained interest since it effectively lowers storage and computational costs. In contrast to weight pruning, which results in unstructured models, structured pruning provides the benefit of realistic acceleration by producing models that are friendly to hardware implementation. The special requirements of structured pruning have led to the discovery of numerous new challenges and the development of innovative solutions. This article surveys the recent progress towards structured pruning of deep CNNs. We summarize and compare the state-of-the-art structured pruning techniques with respect to filter ranking methods, regularization methods, dynamic execution, neural architecture search, the lottery ticket hypothesis, and the applications of pruning. While discussing structured pruning algorithms, we briefly introduce the unstructured pruning counterpart to emphasize their differences. Furthermore, we provide insights into potential research opportunities in the field of structured pruning. A curated list of neural network pruning papers can be found at: https://github.com/he-y/Awesome-Pruning. A dedicated website offering a more interactive comparison of structured pruning methods can be found at: https://huggingface.co/spaces/he-yang/Structured-Pruning-Survey.
Collapse
|
8
|
Tao C, Lin R, Chen Q, Zhang Z, Luo P, Wong N. FAT: Frequency-Aware Transformation for Bridging Full-Precision and Low-Precision Deep Representations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2640-2654. [PMID: 35867358 DOI: 10.1109/tnnls.2022.3190607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Learning low-bitwidth convolutional neural networks (CNNs) is challenging because performance may drop significantly after quantization. Prior arts often quantize the network weights by carefully tuning hyperparameters such as nonuniform stepsize and layerwise bitwidths, which are complicated since the full- and low-precision representations have large discrepancies. This work presents a novel quantization pipeline, named frequency-aware transformation (FAT), that features important benefits: 1) instead of designing complicated quantizers, FAT learns to transform network weights in the frequency domain to remove redundant information before quantization, making them amenable to training in low bitwidth with simple quantizers; 2) FAT readily embeds CNNs in low bitwidths using standard quantizers without tedious hyperparameter tuning and theoretical analyses show that FAT minimizes the quantization errors in both uniform and nonuniform quantizations; and 3) FAT can be easily plugged into various CNN architectures. Using FAT with a simple uniform/logarithmic quantizer can achieve the state-of-the-art performance in different bitwidths on various model architectures. Consequently, FAT serves to provide a novel frequency-based perspective for model quantization.
Collapse
|
9
|
Sun C, Chen J, Li Y, Wang W, Ma T. Random pruning: channel sparsity by expectation scaling factor. PeerJ Comput Sci 2023; 9:e1564. [PMID: 37705629 PMCID: PMC10495938 DOI: 10.7717/peerj-cs.1564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 08/13/2023] [Indexed: 09/15/2023]
Abstract
Pruning is an efficient method for deep neural network model compression and acceleration. However, existing pruning strategies, both at the filter level and at the channel level, often introduce a large amount of computation and adopt complex methods for finding sub-networks. It is found that there is a linear relationship between the sum of matrix elements of the channels in convolutional neural networks (CNNs) and the expectation scaling ratio of the image pixel distribution, which is reflects the relationship between the expectation change of the pixel distribution between the feature mapping and the input data. This implies that channels with similar expectation scaling factors (δ E ) cause similar expectation changes to the input data, thus producing redundant feature mappings. Thus, this article proposes a new structured pruning method called EXP. In the proposed method, the channels with similar δ E are randomly removed in each convolutional layer, and thus the whole network achieves random sparsity to obtain non-redundant and non-unique sub-networks. Experiments on pruning various networks show that EXP can achieve a significant reduction of FLOPs. For example, on the CIFAR-10 dataset, EXP reduces the FLOPs of the ResNet-56 model by 71.9% with a 0.23% loss in Top-1 accuracy. On ILSVRC-2012, it reduces the FLOPs of the ResNet-50 model by 60.0% with a 1.13% loss of Top-1 accuracy. Our code is available at: https://github.com/EXP-Pruning/EXP_Pruning and DOI: 10.5281/zenodo.8141065.
Collapse
Affiliation(s)
- Chuanmeng Sun
- North University of China, State Key Laboratory of Dynamic Measurement Technology, Taiyuan, Shanxi, China
- North University of China, School of Electrical and Control Engineering, Taiyuan, Shanxi, China
| | - Jiaxin Chen
- North University of China, State Key Laboratory of Dynamic Measurement Technology, Taiyuan, Shanxi, China
- North University of China, School of Electrical and Control Engineering, Taiyuan, Shanxi, China
| | - Yong Li
- Chongqing University, State Key Laboratory of Coal Mine Disaster Dynamics and Control, Chongqing, China
| | - Wenbo Wang
- North University of China, State Key Laboratory of Dynamic Measurement Technology, Taiyuan, Shanxi, China
- North University of China, School of Electrical and Control Engineering, Taiyuan, Shanxi, China
| | - Tiehua Ma
- North University of China, State Key Laboratory of Dynamic Measurement Technology, Taiyuan, Shanxi, China
- North University of China, School of Electrical and Control Engineering, Taiyuan, Shanxi, China
| |
Collapse
|
10
|
FPFS: Filter-level pruning via distance weight measuring filter similarity. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
11
|
A Novel Fusion Pruning Algorithm Based on Information Entropy Stratification and IoT Application. ELECTRONICS 2022. [DOI: 10.3390/electronics11081212] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
To further reduce the size of the neural network model and enable the network to be deployed on mobile devices, a novel fusion pruning algorithm based on information entropy stratification is proposed in this paper. Firstly, the method finds similar filters and removes redundant parts by Affinity Propagation Clustering, then secondly further prunes the channels by using information entropy stratification and batch normalization (BN) layer scaling factor, and finally restores the accuracy training by fine-tuning to achieve a reduced network model size without losing network accuracy. Experiments are conducted on the vgg16 and Resnet56 network using the cifar10 dataset. On vgg16, the results show that, compared with the original model, the parametric amount of the algorithm proposed in this paper is reduced by 90.69% and the computation is reduced to 24.46% of the original one. In ResNet56, we achieve a 63.82%-FLOPs reduction by removing 63.53% parameters. The memory occupation and computation speed of the new model are better than the baseline model while maintaining a high network accuracy. Compared with similar algorithms, the algorithm has obvious advantages in the dimensions of computational speed and model size. The pruned model is also deployed to the Internet of Things (IoT) as a target detection system. In addition, experiments show that the proposed model is able to detect targets accurately with low reasoning time and memory. It takes only 252.84 ms on embedded devices, thus matching the limited resources of IoT.
Collapse
|