1
|
Sum J, Leung CS. Regularization Effect of Random Node Fault/Noise on Gradient Descent Learning Algorithm. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2619-2632. [PMID: 34487503 DOI: 10.1109/tnnls.2021.3107051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
For decades, adding fault/noise during training by gradient descent has been a technique for getting a neural network (NN) tolerant to persistent fault/noise or getting an NN with better generalization. In recent years, this technique has been readvocated in deep learning to avoid overfitting. Yet, the objective function of such fault/noise injection learning has been misinterpreted as the desired measure (i.e., the expected mean squared error (mse) of the training samples) of the NN with the same fault/noise. The aims of this article are: 1) to clarify the above misconception and 2) investigate the actual regularization effect of adding node fault/noise when training by gradient descent. Based on the previous works on adding fault/noise during training, we speculate the reason why the misconception appears. In the sequel, it is shown that the learning objective of adding random node fault during gradient descent learning (GDL) for a multilayer perceptron (MLP) is identical to the desired measure of the MLP with the same fault. If additive (resp. multiplicative) node noise is added during GDL for an MLP, the learning objective is not identical to the desired measure of the MLP with such noise. For radial basis function (RBF) networks, it is shown that the learning objective is identical to the corresponding desired measure for all three fault/noise conditions. Empirical evidence is presented to support the theoretical results and, hence, clarify the misconception that the objective function of a fault/noise injection learning might not be interpreted as the desired measure of the NN with the same fault/noise. Afterward, the regularization effect of adding node fault/noise during training is revealed for the case of RBF networks. Notably, it is shown that the regularization effect of adding additive or multiplicative node noise (MNN) during training an RBF is reducing network complexity. Applying dropout regularization in RBF networks, its effect is the same as adding MNN during training.
Collapse
|
2
|
Wong HT, Leung CS, Kwong S. Convergence analysis on the deterministic mini-batch learning algorithm for noise resilient radial basis function networks. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01550-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
3
|
Wang X, Wang J, Zhang K, Lin F, Chang Q. Convergence and objective functions of noise-injected multilayer perceptrons with hidden multipliers. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.03.119] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
4
|
Zhang H, Zhang Y, Zhu S, Xu D. Deterministic convergence of complex mini-batch gradient learning algorithm for fully complex-valued neural networks. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.114] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
5
|
Sum J, Leung CS, Ho K. A Limitation of Gradient Descent Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2227-2232. [PMID: 31398136 DOI: 10.1109/tnnls.2019.2927689] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Over decades, gradient descent has been applied to develop learning algorithm to train a neural network (NN). In this brief, a limitation of applying such algorithm to train an NN with persistent weight noise is revealed. Let V(w) be the performance measure of an ideal NN. V(w) is applied to develop the gradient descent learning (GDL). With weight noise, the desired performance measure (denoted as J(w) ) is E[V(~w)|w] , where ~w is the noisy weight vector. Applying GDL to train an NN with weight noise, the actual learning objective is clearly not V(w) but another scalar function L(w) . For decades, there is a misconception that L(w) = J(w) , and hence, the actual model attained by the GDL is the desired model. However, we show that it might not: 1) with persistent additive weight noise, the actual model attained is the desired model as L(w) = J(w) ; and 2) with persistent multiplicative weight noise, the actual model attained is unlikely the desired model as L(w) ≠ J(w) . Accordingly, the properties of the models attained as compared with the desired models are analyzed and the learning curves are sketched. Simulation results on 1) a simple regression problem and 2) the MNIST handwritten digit recognition are presented to support our claims.
Collapse
|
6
|
Xiao S, Zhang Y, Zhang B. ℓ 1-gain filter design of discrete-time positive neural networks with mixed delays. Neural Netw 2020; 122:152-162. [PMID: 31683143 DOI: 10.1016/j.neunet.2019.10.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2019] [Revised: 08/28/2019] [Accepted: 10/07/2019] [Indexed: 10/25/2022]
Abstract
This paper mainly focuses on the filter design with ℓ1-gain disturbance attenuation performance for a class of discrete-time positive neural networks. Discrete and distributed time-varying delays occurring in neuron transmission are taken into account. Especially, the probabilistic distribution of distributed delays is described by a Bernoulli random process in the system model. First, criteria on the positiveness and the unique equilibrium of discrete-time neural networks are presented. Second, through linear Lyapunov method, sufficient conditions for globally asymptotic stability with ℓ1-gain disturbance attenuation performance of positive neural networks are proposed. Third, using the results obtained above, criteria on ℓ1-gain stability of the established filtering error system are presented, based on which a linear programming (LP) approach is put forward to design the desired positive filter. Finally, two examples of applications to water distribution network and genetic regulatory network are given to demonstrate the effectiveness and applicability of the derived results.
Collapse
Affiliation(s)
- Shunyuan Xiao
- School of Automation, Nanjing University of Science and Technology, Nanjing, 210094, PR China.
| | - Yijun Zhang
- School of Automation, Nanjing University of Science and Technology, Nanjing, 210094, PR China.
| | - Baoyong Zhang
- School of Automation, Nanjing University of Science and Technology, Nanjing, 210094, PR China.
| |
Collapse
|
7
|
Wang J, Chang Q, Chang Q, Liu Y, Pal NR. Weight Noise Injection-Based MLPs With Group Lasso Penalty: Asymptotic Convergence and Application to Node Pruning. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4346-4364. [PMID: 30530381 DOI: 10.1109/tcyb.2018.2864142] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The application and theoretical analysis of fault tolerant learning are very important for neural networks. Our objective here is to realize fault tolerant sparse multilayer perceptron (MLP) networks. The stochastic gradient descent method has been employed to perform online learning for MLPs. For weight noise injection-based network models, it is a common strategy to add a weight decay regularizer while constructing the objective function for learning. However, this l2 -norm penalty does not generate sparse optimal solutions. In this paper, a group lasso penalty term is used as a regularizer, where a group is defined by the set of weights connected to a node from nodes in the preceding layer. Group lasso penalty enables us to prune redundant hidden nodes. Due to its nondifferentiability at the origin, a smooth approximation of the group lasso penalty is developed. Then, a rigorous proof for the asymptotic convergence of the learning algorithm is provided. Finally, some simulations have been performed to verify the sparseness of the network and the theoretical results.
Collapse
|
8
|
Sum J, Leung CS. Learning Algorithm for Boltzmann Machines With Additive Weight and Bias Noise. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3200-3204. [PMID: 30668482 DOI: 10.1109/tnnls.2018.2889072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This brief presents analytical results on the effect of additive weight/bias noise on a Boltzmann machine (BM), in which the unit output is in {-1, 1} instead of {0, 1}. With such noise, it is found that the state distribution is yet another Boltzmann distribution but the temperature factor is elevated. Thus, the desired gradient ascent learning algorithm is derived, and the corresponding learning procedure is developed. This learning procedure is compared with the learning procedure applied to train a BM with noise. It is found that these two procedures are identical. Therefore, the learning algorithm for noise-free BMs is suitable for implementing as an online learning algorithm for an analog circuit-implemented BM, even if the variances of the additive weight noise and bias noise are unknown.
Collapse
|
9
|
Wang J, Xu C, Yang X, Zurada JM. A Novel Pruning Algorithm for Smoothing Feedforward Neural Networks Based on Group Lasso Method. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2012-2024. [PMID: 28961129 DOI: 10.1109/tnnls.2017.2748585] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, we propose four new variants of the backpropagation algorithm to improve the generalization ability for feedforward neural networks. The basic idea of these methods stems from the Group Lasso concept which deals with the variable selection problem at the group level. There are two main drawbacks when the Group Lasso penalty has been directly employed during network training. They are numerical oscillations and theoretical challenges in computing the gradients at the origin. To overcome these obstacles, smoothing functions have then been introduced by approximating the Group Lasso penalty. Numerical experiments for classification and regression problems demonstrate that the proposed algorithms perform better than the other three classical penalization methods, Weight Decay, Weight Elimination, and Approximate Smoother, on both generalization and pruning efficiency. In addition, detailed simulations based on a specific data set have been performed to compare with some other common pruning strategies, which verify the advantages of the proposed algorithm. The pruning abilities of the proposed strategy have been investigated in detail for a relatively large data set, MNIST, in terms of various smoothing approximation cases.
Collapse
|
10
|
|
11
|
Wang J, Cai Q, Chang Q, Zurada JM. Convergence analyses on sparse feedforward neural networks via group lasso regularization. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2016.11.020] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
12
|
Yeung DS, Li JC, Ng WWY, Chan PPK. MLPNN Training via a Multiobjective Optimization of Training Error and Stochastic Sensitivity. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:978-992. [PMID: 26054075 DOI: 10.1109/tnnls.2015.2431251] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The training of a multilayer perceptron neural network (MLPNN) concerns the selection of its architecture and the connection weights via the minimization of both the training error and a penalty term. Different penalty terms have been proposed to control the smoothness of the MLPNN for better generalization capability. However, controlling its smoothness using, for instance, the norm of weights or the Vapnik-Chervonenkis dimension cannot distinguish individual MLPNNs with the same number of free parameters or the same norm. In this paper, to enhance generalization capabilities, we propose a stochastic sensitivity measure (ST-SM) to realize a new penalty term for MLPNN training. The ST-SM determines the expectation of the squared output differences between the training samples and the unseen samples located within their Q -neighborhoods for a given MLPNN. It provides a direct measurement of the MLPNNs output fluctuations, i.e., smoothness. We adopt a two-phase Pareto-based multiobjective training algorithm for minimizing both the training error and the ST-SM as biobjective functions. Experiments on 20 UCI data sets show that the MLPNNs trained by the proposed algorithm yield better accuracies on testing data than several recent and classical MLPNN training methods.
Collapse
|
13
|
Zhang H, Zhang Y, Xu D, Liu X. Deterministic convergence of chaos injection-based gradient method for training feedforward neural networks. Cogn Neurodyn 2015; 9:331-40. [PMID: 25972981 DOI: 10.1007/s11571-014-9323-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Revised: 11/26/2014] [Accepted: 12/10/2014] [Indexed: 02/03/2023] Open
Abstract
It has been shown that, by adding a chaotic sequence to the weight update during the training of neural networks, the chaos injection-based gradient method (CIBGM) is superior to the standard backpropagation algorithm. This paper presents the theoretical convergence analysis of CIBGM for training feedforward neural networks. We consider both the case of batch learning as well as the case of online learning. Under mild conditions, we prove the weak convergence, i.e., the training error tends to a constant and the gradient of the error function tends to zero. Moreover, the strong convergence of CIBGM is also obtained with the help of an extra condition. The theoretical results are substantiated by a simulation example.
Collapse
Affiliation(s)
- Huisheng Zhang
- Department of Mathematics, Dalian Maritime University, Dalian, 116026 People's Republic of China ; Research Center of Information and Control, Dalian University of Technology, Dalian, 116024 People's Republic of China
| | - Ying Zhang
- Department of Mathematics, Dalian Maritime University, Dalian, 116026 People's Republic of China
| | - Dongpo Xu
- College of Science, Harbin Engineering University, Harbin, 150001 People's Republic of China
| | - Xiaodong Liu
- Research Center of Information and Control, Dalian University of Technology, Dalian, 116024 People's Republic of China
| |
Collapse
|
14
|
Zhang H, Tang Y, Liu X. Batch gradient training method with smoothing $$\boldsymbol{\ell}_{\bf 0}$$ ℓ 0 regularization for feedforward neural networks. Neural Comput Appl 2014. [DOI: 10.1007/s00521-014-1730-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
15
|
|