1
|
Xie X, Gao Y, Zhang Y. An improved Artificial Protozoa Optimizer for CNN architecture optimization. Neural Netw 2025; 187:107368. [PMID: 40112636 DOI: 10.1016/j.neunet.2025.107368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 02/20/2025] [Accepted: 03/05/2025] [Indexed: 03/22/2025]
Abstract
In this paper, we propose a novel neural architecture search (NAS) method called MAPOCNN, which leverages an enhanced version of the Artificial Protozoa Optimizer (APO) to optimize the architecture of Convolutional Neural Networks (CNNs). The APO is known for its rapid convergence, high stability, and minimal parameter involvement. To further improve its performance, we introduce MAPO (Modified Artificial Protozoa Optimizer), which incorporates the phototaxis behavior of protozoa. This addition helps mitigate the risk of premature convergence, allowing the algorithm to explore a broader range of possible CNN architectures and ultimately identify more optimal solutions. Through rigorous experimentation on benchmark datasets, including Rectangle and Mnist-random, we demonstrate that MAPOCNN not only achieves faster convergence times but also performs competitively when compared to other state-of-the-art NAS algorithms. The results highlight the effectiveness of MAPOCNN in efficiently discovering CNN architectures that outperform existing methods in terms of both speed and accuracy. This work presents a promising direction for optimizing deep learning architectures using biologically inspired optimization techniques.
Collapse
Affiliation(s)
- Xiaofeng Xie
- School of Mathematics and information Science, North Minzu University, YinChuan, 750021, NingXia, China; Scientific Computing and Intelligent Information Processing Collaborative Innovation Center, YinChuan, 750021, NingXia, China.
| | - Yuelin Gao
- School of Mathematics and information Science, North Minzu University, YinChuan, 750021, NingXia, China; Scientific Computing and Intelligent Information Processing Collaborative Innovation Center, YinChuan, 750021, NingXia, China; Ningxia Key Laboratory of Intelligent Information and Big Data Processing, YinChuan, 750021, NingXia, China.
| | - Yuming Zhang
- School of Mathematics and information Science, North Minzu University, YinChuan, 750021, NingXia, China; Ningxia Key Laboratory of Intelligent Information and Big Data Processing, YinChuan, 750021, NingXia, China.
| |
Collapse
|
2
|
Huang J, Xue B, Sun Y, Zhang M, Yen GG. Split-Level Evolutionary Neural Architecture Search With Elite Weight Inheritance. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13523-13537. [PMID: 37224355 DOI: 10.1109/tnnls.2023.3269816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Neural architecture search (NAS) has recently gained extensive interest in the deep learning community because of its great potential in automating the construction process of deep models. Among a variety of NAS approaches, evolutionary computation (EC) plays a pivotal role with its merit of gradient-free search ability. However, a massive number of the current EC-based NAS approaches evolve neural architectures in an absolutely discrete manner, which makes it tough to flexibly handle the number of filters for each layer, since they often reduce it to a limit set rather than searching for all possible values. Moreover, EC-based NAS methods are often criticized for their inefficiency in performance evaluation, which usually requires laborious full training for hundreds of candidate architectures generated. To address the inflexible search issue on the number of filters, this work proposes a split-level particle swarm optimization (PSO) approach. Each dimension of the particle is subdivided into an integer part and a fractional part, encoding the configurations of the corresponding layer, and the number of filters within a large range, respectively. In addition, the evaluation time is greatly saved by a novel elite weight inheritance method based on an online updating weight pool, and a customized fitness function considering multiple objectives is developed to well control the complexity of the searched candidate architectures. The proposed method, termed split-level evolutionary NAS (SLE-NAS), is computationally efficient, and outperforms many state-of-the-art peer competitors at much lower complexity across three popular image classification benchmark datasets.
Collapse
|
3
|
Garciarena U, Santana R, Mendiburu A. Redefining Neural Architecture Search of Heterogeneous Multinetwork Models by Characterizing Variation Operators and Model Components. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10561-10575. [PMID: 37022857 DOI: 10.1109/tnnls.2023.3242877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
With neural architecture search (NAS) methods gaining ground on manually designed deep neural networks-even more rapidly as model sophistication escalates-the research trend is shifting toward arranging different and often increasingly complex NAS spaces. In this conjuncture, delineating algorithms which can efficiently explore these search spaces can result in a significant improvement over currently used methods, which, in general, randomly select the structural variation operator, hoping for a performance gain. In this article, we investigate the effect of different variation operators in a complex domain, that of multinetwork heterogeneous neural models. These models have an extensive and complex search space of structures as they require multiple subnetworks within the general model in order to answer different output types. From that investigation, we extract a set of general guidelines whose application is not limited to that particular type of model and are useful to determine the direction in which an architecture optimization method could find the largest improvement. To deduce the set of guidelines, we characterize both the variation operators, according to their effect on the complexity and performance of the model; and the models, relying on diverse metrics which estimate the quality of the different parts composing it.
Collapse
|
4
|
Ren J, Chen Z, Yang Y, Wang Z, Sun M, Sun Q. A New Grey Wolf Optimizer Tuned Extended Generalized Predictive Control for Distillation Process. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5880-5890. [PMID: 37018090 DOI: 10.1109/tnnls.2023.3262556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The distillation process plays an essential role in the petrochemical industry. However, the high-purity distillation column has complicated dynamic characteristics such as strong coupling and large time delay. To control the distillation column accurately, we proposed an extended generalized predictive control (EGPC) method inspired by the principles of extended state observer and proportional-integral-type generalized predictive control method; the proposed EGPC can adaptively compensate the system for the effects of coupling and model mismatch online and performs well in controlling time-delay systems. The strong coupling of the distillation column needs fast control, and the large time delay requires soft control. To balance the requirement for fast and soft control at the same time, a grey wolf optimizer with reverse learning and adaptive leaders number strategies (RAGWO) was proposed to tune the parameters of EGPC, and these strategies enable RAGWO to have a better initial population and improve its exploitation and exploration ability. The benchmark test results indicate that the RAGWO outperforms the existing optimizers for most of the selected benchmark functions. Extensive simulations show that the proposed method in terms of fluctuation and response time is superior to other methods for controlling the distillation process.
Collapse
|
5
|
Wang D, Zhai L, Fang J, Li Y, Xu Z. psoResNet: An improved PSO-based residual network search algorithm. Neural Netw 2024; 172:106104. [PMID: 38219681 DOI: 10.1016/j.neunet.2024.106104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 10/26/2023] [Accepted: 01/04/2024] [Indexed: 01/16/2024]
Abstract
Neural Architecture Search (NAS) methods are widely employed to address the time-consuming and costly challenges associated with manual operation and design of deep convolutional neural networks (DCNNs). Nonetheless, prevailing methods still encounter several pressing obstacles, including limited network architecture design, excessively lengthy search periods, and insufficient utilization of the search space. In light of these concerns, this study proposes an optimization strategy for residual networks that leverages an enhanced Particle swarm optimization algorithm. Primarily, low-complexity residual architecture block is employed as the foundational unit for architecture exploration, facilitating a more diverse investigation into network architectures while minimizing parameters. Additionally, we employ a depth initialization strategy to confine the search space within a reasonable range, thereby mitigating unnecessary particle exploration. Lastly, we present a novel approach for computing particle differences and updating velocity mechanisms to enhance the exploration of updated trajectories. This method significantly contributes to the improved utilization of the search space and the augmentation of particle diversity. Moreover, we constructed a crime-dataset comprising 13 classes to assess the effectiveness of the proposed algorithm. Experimental results demonstrate that our algorithm can design lightweight networks with superior classification performance on both benchmark datasets and the crime-dataset.
Collapse
Affiliation(s)
- Dianwei Wang
- School of Telecommunication and Information Engineering, Xi'an University of Posts and Telecommunications, Xi'an 710121, PR China.
| | - Leilei Zhai
- School of Telecommunication and Information Engineering, Xi'an University of Posts and Telecommunications, Xi'an 710121, PR China
| | - Jie Fang
- School of Telecommunication and Information Engineering, Xi'an University of Posts and Telecommunications, Xi'an 710121, PR China
| | - Yuanqing Li
- School of Telecommunication and Information Engineering, Xi'an University of Posts and Telecommunications, Xi'an 710121, PR China
| | - Zhijie Xu
- School of Computing and Engineering, University of Huddersfield, Huddersfield HD1 3DH, UK
| |
Collapse
|
6
|
Zhang Y, Jiang H, Tian Y, Ma H, Zhang X. Multigranularity Surrogate Modeling for Evolutionary Multiobjective Optimization With Expensive Constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2956-2968. [PMID: 37527320 DOI: 10.1109/tnnls.2023.3297624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Multiobjective optimization problems (MOPs) with expensive constraints pose stiff challenges to existing surrogate-assisted evolutionary algorithms (SAEAs) in a very limited computational cost, due to the fact that the number of expensive constraints for an MOP is often large. For existing SAEAs, they always approximate constraint functions in a single granularity, namely, approximating the constraint violation (CV, coarse-grained) or each constraint (fine-grained). However, the landscape of CV is often too complex to be accurately approximated by a surrogate model. Although the modeling of each constraint function may be simpler than that of CV, approximating all the constraint functions independently may result in tremendous cumulative errors and high computational costs. To address this issue, in this article, we develop a multigranularity surrogate modeling framework for evolutionary algorithms (EAs), where the approximation granularity of constraint surrogates is adaptively determined by the position of the population in the fitness landscape. Moreover, a dedicated model management strategy is also developed to reduce the impact resulting from the errors introduced by constraint surrogates and prevent the population from trapping into local optima. To evaluate the performance of the proposed framework, an implementation called K-MGSAEA is proposed, and the experimental results on a large number of test problems show that the proposed framework is superior to seven state-of-the-art competitors.
Collapse
|
7
|
Li N, Ma L, Yu G, Xue B, Zhang M, Jin Y. Survey on Evolutionary Deep Learning: Principles, Algorithms, Applications, and Open Issues. ACM COMPUTING SURVEYS 2024; 56:1-34. [DOI: 10.1145/3603704] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Accepted: 05/31/2023] [Indexed: 01/04/2025]
Abstract
Over recent years, there has been a rapid development of deep learning (DL) in both industry and academia fields. However, finding the optimal hyperparameters of a DL model often needs high computational cost and human expertise. To mitigate the above issue, evolutionary computation (EC) as a powerful heuristic search approach has shown significant merits in the automated design of DL models, so-called evolutionary deep learning (EDL). This article aims to analyze EDL from the perspective of automated machine learning (AutoML). Specifically, we first illuminate EDL from DL and EC and regard EDL as an optimization problem. According to the DL pipeline, we systematically introduce EDL methods ranging from data preparation, model generation, to model deployment with a new taxonomy (i.e., what and how to evolve/optimize), and focus on the discussions of solution representation and search paradigm in handling the optimization problem by EC. Finally, key applications, open issues, and potentially promising lines of future research are suggested. This survey has reviewed recent developments of EDL and offers insightful guidelines for the development of EDL.
Collapse
Affiliation(s)
- Nan Li
- Northeastern University, China
| | | | - Guo Yu
- Nanjing Tech University, China
| | - Bing Xue
- Victoria University of Wellington, New Zealand
| | | | | |
Collapse
|
8
|
Xu Y, Ma Y. Evolutionary neural architecture search combining multi-branch ConvNet and improved transformer. Sci Rep 2023; 13:15791. [PMID: 37737271 PMCID: PMC10516961 DOI: 10.1038/s41598-023-42931-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 09/16/2023] [Indexed: 09/23/2023] Open
Abstract
Deep convolutional neural networks (CNNs) have achieved promising performance in the field of deep learning, but the manual design turns out to be very difficult due to the increasingly complex topologies of CNNs. Recently, neural architecture search (NAS) methods have been proposed to automatically design network architectures, which are superior to handcrafted counterparts. Unfortunately, most current NAS methods suffer from either highly computational complexity of generated architectures or limitations in the flexibility of architecture design. To address above issues, this article proposes an evolutionary neural architecture search (ENAS) method based on improved Transformer and multi-branch ConvNet. The multi-branch block enriches the feature space and enhances the representational capacity of a network by combining paths with different complexities. Since convolution is inherently a local operation, a simple yet powerful "batch-free normalization Transformer Block" (BFNTBlock) is proposed to leverage both local information and long-range feature dependencies. In particular, the design of batch-free normalization (BFN) and batch normalization (BN) mixed in the BFNTBlock blocks the accumulation of estimation shift ascribe to the stack of BN, which has favorable effects for performance improvement. The proposed method achieves remarkable accuracies, 97.24 [Formula: see text] and 80.06 [Formula: see text] on CIFAR10 and CIFAR100, respectively, with high computational efficiency, i.e. only 1.46 and 1.53 GPU days. To validate the universality of our method in application scenarios, the proposed algorithm is verified on two real-world applications, including the GTSRB and NEU-CLS dataset, and achieves a better performance than common methods.
Collapse
Affiliation(s)
- Yang Xu
- College of Physics and Electronic Engineering, Northwest Normal University, Lanzhou, 730070, China
| | - Yongjie Ma
- College of Physics and Electronic Engineering, Northwest Normal University, Lanzhou, 730070, China.
| |
Collapse
|
9
|
Abstract
AbstractFor the goal of automated design of high-performance deep convolutional neural networks (CNNs), neural architecture search (NAS) methodology is becoming increasingly important for both academia and industries. Due to the costly stochastic gradient descent training of CNNs for performance evaluation, most existing NAS methods are computationally expensive for real-world deployments. To address this issue, we first introduce a new performance estimation metric, named random-weight evaluation (RWE) to quantify the quality of CNNs in a cost-efficient manner. Instead of fully training the entire CNN, the RWE only trains its last layer and leaves the remainders with randomly initialized weights, which results in a single network evaluation in seconds. Second, a complexity metric is adopted for multi-objective NAS to balance the model size and performance. Overall, our proposed method obtains a set of efficient models with state-of-the-art performance in two real-world search spaces. Then the results obtained on the CIFAR-10 dataset are transferred to the ImageNet dataset to validate the practicality of the proposed algorithm. Moreover, ablation studies on NAS-Bench-301 datasets reveal the effectiveness of the proposed RWE in estimating the performance compared to existing methods.
Collapse
|