1
|
Tian Y, Xie L, Fang J, Jiao J, Ye Q, Tian Q. Exploring Complicated Search Spaces With Interleaving-Free Sampling. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:7764-7771. [PMID: 39024083 DOI: 10.1109/tnnls.2024.3408329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Conventional neural architecture search (NAS) algorithms typically work on search spaces with short-distance node connections. We argue that such designs, though safe and stable, are obstacles to exploring more effective network architectures. In this brief, we explore the search algorithm upon a complicated search space with long-distance connections and show that existing weight-sharing search algorithms fail due to the existence of interleaved connections (ICs). Based on the observation, we present a simple-yet-effective algorithm, termed interleaving-free neural architecture search (IF-NAS). We further design a periodic sampling strategy to construct subnetworks during the search procedure, avoiding the ICs to emerge in any of them. In the proposed search space, IF-NAS outperforms both random sampling and previous weight-sharing search algorithms by significant margins. It can also be well-generalized to the microcell-based spaces. This study emphasizes the importance of macrostructure and we look forward to further efforts in this direction. The code is available at github.com/sunsmarterjie/IFNAS.
Collapse
|
2
|
Yang A, Liu Y, Li C, Ren Q. Deeply Supervised Block-Wise Neural Architecture Search. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2451-2464. [PMID: 38231812 DOI: 10.1109/tnnls.2023.3347542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Neural architecture search (NAS) has shown great promise in automatically designing neural network models. Recently, block-wise NAS has been proposed to alleviate deep coupling problem between architectures and weights existed in the well-known weight-sharing NAS, by training the huge weight-sharing supernet block-wisely. However, the existing block-wise NAS methods, which resort to either supervised distillation or self-supervised contrastive learning scheme to enable block-wise optimization, take massive computational cost. To be specific, the former introduces an external high-capacity teacher model, while the latter involves supernet-scale momentum model and requires a long training schedule. Considering this, in this work, we propose a resource-friendly deeply supervised block-wise NAS (DBNAS) method. In the proposed DBNAS, we construct a lightweight deeply-supervised module after each block to enable a simple supervised learning scheme and leverage ground-truth labels to indirectly supervise optimization of each block progressively. Besides, the deeply-supervised module is specifically designed as structural and functional condensation of the supernet, which establishes global awareness for progressive block-wise optimization and helps search for promising architectures. Experimental results show that the DBNAS method only takes less than 1 GPU day to search out promising architectures on the ImageNet dataset with less GPU memory footprint than the other block-wise NAS works. The best-performing model among the searched DBNAS family achieves 75.6% Top-1 accuracy on ImageNet, which is competitive with the state-of-the-art NAS models. Moreover, our DBNAS family models also achieve good transfer performance on CIFAR-10/100, as well as two downstream tasks: object detection and semantic segmentation.
Collapse
|
3
|
Zhang C, Meng G, Fan B, Tian K, Zhang Z, Xiang S, Pan C. Reusable Architecture Growth for Continual Stereo Matching. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6167-6184. [PMID: 38502627 DOI: 10.1109/tpami.2024.3378884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
The remarkable performance of recent stereo depth estimation models benefits from the successful use of convolutional neural networks to regress dense disparity. Akin to most tasks, this needs gathering training data that covers a number of heterogeneous scenes at deployment time. However, training samples are typically acquired continuously in practical applications, making the capability to learn new scenes continually even more crucial. For this purpose, we propose to perform continual stereo matching where a model is tasked to 1) continually learn new scenes, 2) overcome forgetting previously learned scenes, and 3) continuously predict disparities at inference. We achieve this goal by introducing a Reusable Architecture Growth (RAG) framework. RAG leverages task-specific neural unit search and architecture growth to learn new scenes continually in both supervised and self-supervised manners. It can maintain high reusability during growth by reusing previous units while obtaining good performance. Additionally, we present a Scene Router module to adaptively select the scene-specific architecture path at inference. Comprehensive experiments on numerous datasets show that our framework performs impressively in various weather, road, and city circumstances and surpasses the state-of-the-art methods in more challenging cross-dataset settings. Further experiments also demonstrate the adaptability of our method to unseen scenes, which can facilitate end-to-end stereo architecture learning and practical deployment.
Collapse
|
4
|
Zhen C, Zhang W, Mo J, Ji M, Zhou H, Zhu J. RASP: Regularization-based Amplitude Saliency Pruning. Neural Netw 2023; 168:1-13. [PMID: 37734135 DOI: 10.1016/j.neunet.2023.09.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 07/14/2023] [Accepted: 09/01/2023] [Indexed: 09/23/2023]
Abstract
Due to the prevalent data-dependent nature of existing pruning criteria, norm criteria with data independence play a crucial role in filter pruning criteria, providing promising prospects for deploying deep neural networks on resource-constrained devices. However, norm criteria based on amplitude measurements have long posed challenges in terms of theoretical feasibility. Existing methods rely on data-derived information such as derivatives to establish reasonable pruning standards. Nonetheless, achieving quantitative analysis of the "smaller-norm-less-important" notion remains elusive within the norm criterion context. To address the need for data independence and theoretical feasibility, we conducted saliency analysis on filters and proposed a regularization-based amplitude saliency pruning criterion (RASP). This amplitude saliency not only attains data independence but also establishes norm criteria for usage guidelines. Furthermore, we further investigated the amplitude saliency, addressing the issues of data dependency in model evaluation and inter-class filter selection. We introduced model saliency and an adaptive parameter group lasso (AGL) regularization approach sensitive to different layers. Theoretically, we thoroughly analyzed the feasibility of amplitude saliency and employed quantitative saliency analysis to validate the advantages of our method over previous approaches. Experimentally, conducted on the CIFAR-10 and ImageNet image classification benchmarks, we extensively validated the improved top-level performance of our method compared to previous methods. Even when the pruned model has the same or even smaller number of FLOP, our method can achieve equivalent or higher model accuracy. Notably, in our ImageNet experiment, RASP achieved a 51.9% reduction in FLOPs while maintaining an accuracy of 76.19% on ResNet-50.
Collapse
Affiliation(s)
- Chenghui Zhen
- College of Information Science and Engineering, Huaqiao University, Xiamen, 361021, Fujian, China.
| | - Weiwei Zhang
- College of Engineering, Huaqiao University, Quanzhou, 362021, Fujian, China.
| | - Jian Mo
- College of Engineering, Huaqiao University, Quanzhou, 362021, Fujian, China.
| | - Ming Ji
- College of Engineering, Huaqiao University, Quanzhou, 362021, Fujian, China.
| | - Hongbo Zhou
- College of Engineering, Huaqiao University, Quanzhou, 362021, Fujian, China; Intelligent Software Research Center, Institute of Software Chinese Academy of Sciences, 100190, Beijing, China.
| | - Jianqing Zhu
- College of Engineering, Huaqiao University, Quanzhou, 362021, Fujian, China.
| |
Collapse
|
5
|
Du Y, Zhou X, Huang T, Yang C. A hierarchical evolution of neural architecture search method based on state transition algorithm. INT J MACH LEARN CYB 2023. [DOI: 10.1007/s13042-023-01794-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
|
6
|
He J, Ding Y, Zhang M, Li D. Towards efficient network compression via Few-Shot Slimming. Neural Netw 2021; 147:113-125. [PMID: 34999388 DOI: 10.1016/j.neunet.2021.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 10/13/2021] [Accepted: 12/20/2021] [Indexed: 10/19/2022]
Abstract
While previous network compression methods achieve great success, most of them rely on the abundant training data which is, unfortunately, often unavailable in practice due to some reasons, e.g., privacy issues, storage constraints, and transmission limitations. A promising way to solve this problem is to perform compression with a few unlabeled data. Proceeding along this way, we propose a novel few-shot network compression framework named Few-Shot Slimming (FSS). FSS follows the student/teacher paradigm, and contains two steps: (1) construct the student by inheriting principal feature maps from the teacher; (2) refine the student feature representation by knowledge distillation with an enhanced mixing data augmentation method called GridMix. Specifically, in the first step, we employ normalized cross correlation to perform the principal feature analysis, and then theoretically construct a new indicator to select the most informative feature maps from the teacher for the student. The indicator is based on the variances of feature maps which can efficiently quantitate the information richness of the input feature maps in a feature-agnostic manner. In the second step, we perform the knowledge distillation for the initialized student in first step with a novel grid-based mixing data augmentation technique which greatly extends the limited sample dataset. In this way, the student is able to refine its feature representation and achieves a better result. Extensive experiments on multiple benchmarks demonstrate the state-of-the-art performance of FSS. For example, by using 0.2% label-free data of full training set, FSS yields a 60% FLOPs reduction for DenseNet-40 on CIFAR-10 with only a loss of 0.8% in top-1 accuracy, achieving a result on par with that obtained by the conventional full-data methods.
Collapse
Affiliation(s)
- Junjie He
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310007, China; Zhejiang Provincial Key Laboratory of Information Processing, Communication and Networking, Hangzhou, 310007, China
| | - Yinzhang Ding
- DAMO Academy, Alibaba Group, Hangzhou, 311121, China
| | - Ming Zhang
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310007, China; Zhejiang Provincial Key Laboratory of Information Processing, Communication and Networking, Hangzhou, 310007, China
| | - Dongxiao Li
- College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, 310007, China; Zhejiang Provincial Key Laboratory of Information Processing, Communication and Networking, Hangzhou, 310007, China.
| |
Collapse
|
7
|
Zhang M, Li H, Pan S, Chang X, Zhou C, Ge Z, Su S. One-Shot Neural Architecture Search: Maximising Diversity to Overcome Catastrophic Forgetting. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:2921-2935. [PMID: 33147140 DOI: 10.1109/tpami.2020.3035351] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
One-shot neural architecture search (NAS) has recently become mainstream in the NAS community because it significantly improves computational efficiency through weight sharing. However, the supernet training paradigm in one-shot NAS introduces catastrophic forgetting, where each step of the training can deteriorate the performance of other architectures that contain partially-shared weights with current architecture. To overcome this problem of catastrophic forgetting, we formulate supernet training for one-shot NAS as a constrained continual learning optimization problem such that learning the current architecture does not degrade the validation accuracy of previous architectures. The key to solving this constrained optimization problem is a novelty search based architecture selection (NSAS) loss function that regularizes the supernet training by using a greedy novelty search method to find the most representative subset. We applied the NSAS loss function to two one-shot NAS baselines and extensively tested them on both a common search space and a NAS benchmark dataset. We further derive three variants based on the NSAS loss function, the NSAS with depth constrain (NSAS-C) to improve the transferability, and NSAS-G and NSAS-LG to handle the situation with a limited number of constraints. The experiments on the common NAS search space demonstrate that NSAS and it variants improve the predictive ability of supernet training in one-shot NAS with remarkable and efficient performance on the CIFAR-10, CIFAR-100, and ImageNet datasets. The results with the NAS benchmark dataset also confirm the significant improvements these one-shot NAS baselines can make.
Collapse
|
8
|
Zhang X, Chang J, Guo Y, Meng G, Xiang S, Lin Z, Pan C. DATA: Differentiable ArchiTecture Approximation With Distribution Guided Sampling. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:2905-2920. [PMID: 32866094 DOI: 10.1109/tpami.2020.3020315] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Neural architecture search (NAS) is inherently subject to the gap of architectures during searching and validating. To bridge this gap effectively, we develop Differentiable ArchiTecture Approximation (DATA) with Ensemble Gumbel-Softmax (EGS) estimator and Architecture Distribution Constraint (ADC) to automatically approximate architectures during searching and validating in a differentiable manner. Technically, the EGS estimator consists of a group of Gumbel-Softmax estimators, which is capable of converting probability vectors to binary codes and passing gradients reversely, reducing the estimation bias in a differentiable way. To narrow the distribution gap between sampled architectures and supernet, further, the ADC is introduced to reduce the variance of sampling during searching. Benefiting from such modeling, architecture probabilities and network weights in the NAS model can be jointly optimized with the standard back-propagation, yielding an end-to-end learning mechanism for searching deep neural architectures in an extended search space. Conclusively, in the validating process, a high-performance architecture that approaches to the learned one during searching is readily built. Extensive experiments on various tasks including image classification, few-shot learning, unsupervised clustering, semantic segmentation and language modeling strongly demonstrate that DATA is capable of discovering high-performance architectures while guaranteeing the required efficiency. Code is available at https://github.com/XinbangZhang/DATA-NAS.
Collapse
|