1
|
|
2
|
Wang J, Chang Q, Chang Q, Liu Y, Pal NR. Weight Noise Injection-Based MLPs With Group Lasso Penalty: Asymptotic Convergence and Application to Node Pruning. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4346-4364. [PMID: 30530381 DOI: 10.1109/tcyb.2018.2864142] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The application and theoretical analysis of fault tolerant learning are very important for neural networks. Our objective here is to realize fault tolerant sparse multilayer perceptron (MLP) networks. The stochastic gradient descent method has been employed to perform online learning for MLPs. For weight noise injection-based network models, it is a common strategy to add a weight decay regularizer while constructing the objective function for learning. However, this l2 -norm penalty does not generate sparse optimal solutions. In this paper, a group lasso penalty term is used as a regularizer, where a group is defined by the set of weights connected to a node from nodes in the preceding layer. Group lasso penalty enables us to prune redundant hidden nodes. Due to its nondifferentiability at the origin, a smooth approximation of the group lasso penalty is developed. Then, a rigorous proof for the asymptotic convergence of the learning algorithm is provided. Finally, some simulations have been performed to verify the sparseness of the network and the theoretical results.
Collapse
|
3
|
Xiao Y, Feng RB, Leung CS, Sum J. Objective Function and Learning Algorithm for the General Node Fault Situation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:863-874. [PMID: 26990391 DOI: 10.1109/tnnls.2015.2427331] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Fault tolerance is one interesting property of artificial neural networks. However, the existing fault models are able to describe limited node fault situations only, such as stuck-at-zero and stuck-at-one. There is no general model that is able to describe a large class of node fault situations. This paper studies the performance of faulty radial basis function (RBF) networks for the general node fault situation. We first propose a general node fault model that is able to describe a large class of node fault situations, such as stuck-at-zero, stuck-at-one, and the stuck-at level being with arbitrary distribution. Afterward, we derive an expression to describe the performance of faulty RBF networks. An objective function is then identified from the formula. With the objective function, a training algorithm for the general node situation is developed. Finally, a mean prediction error (MPE) formula that is able to estimate the test set error of faulty networks is derived. The application of the MPE formula in the selection of basis width is elucidated. Simulation experiments are then performed to demonstrate the effectiveness of the proposed method.
Collapse
|
4
|
Han Z, Feng RB, Yan Wan W, Leung CS. Online training and its convergence for faulty networks with multiplicative weight noise. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.12.049] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
5
|
Xiao Y, Feng R, Leung CS, Sum PF. Online Training for Open Faulty RBF Networks. Neural Process Lett 2014. [DOI: 10.1007/s11063-014-9363-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
6
|
Feng R, Xiao Y, Leung CS, Tsang PWM, Sum J. An Improved Fault-Tolerant Objective Function and Learning Algorithm for Training the Radial Basis Function Neural Network. Cognit Comput 2013. [DOI: 10.1007/s12559-013-9236-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
7
|
Sum J, Leung CS, Ho K. Convergence analyses on on-line weight noise injection-based training algorithms for MLPs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:1827-1840. [PMID: 24808076 DOI: 10.1109/tnnls.2012.2210243] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Injecting weight noise during training is a simple technique that has been proposed for almost two decades. However, little is known about its convergence behavior. This paper studies the convergence of two weight noise injection-based training algorithms, multiplicative weight noise injection with weight decay and additive weight noise injection with weight decay. We consider that they are applied to multilayer perceptrons either with linear or sigmoid output nodes. Let w(t) be the weight vector, let V(w) be the corresponding objective function of the training algorithm, let α >; 0 be the weight decay constant, and let μ(t) be the step size. We show that if μ(t)→ 0, then with probability one E[||w(t)||2(2)] is bound and lim(t) → ∞ ||w(t)||2 exists. Based on these two properties, we show that if μ(t)→ 0, Σtμ(t)=∞, and Σtμ(t)(2) <; ∞, then with probability one these algorithms converge. Moreover, w(t) converges with probability one to a point where ∇wV(w)=0.
Collapse
|
8
|
Leung CS, Sum PF, Liu Y. Optimization of tuning parameters for open node fault regularizer. Neurocomputing 2012. [DOI: 10.1016/j.neucom.2012.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
9
|
Leung CS, Sum JPF. RBF networks under the concurrent fault situation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:1148-1155. [PMID: 24807140 DOI: 10.1109/tnnls.2012.2196054] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Fault tolerance is an interesting topic in neural networks. However, many existing results on this topic focus only on the situation of a single fault source. In fact, a trained network may be affected by multiple fault sources. This brief studies the performance of faulty radial basis function (RBF) networks that suffer from multiplicative weight noise and open weight fault concurrently. We derive a mean prediction error (MPE) formula to estimate the generalization ability of faulty networks. The MPE formula provides us a way to understand the generalization ability of faulty networks without using a test set or generating a number of potential faulty networks. Based on the MPE result, we propose methods to optimize the regularization parameter, as well as the RBF width.
Collapse
|
10
|
Leung ACS, Xiao Y, Xu Y, Wong KW. Decouple implementation of weight decay for recursive least square. Neural Comput Appl 2012. [DOI: 10.1007/s00521-012-0832-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
11
|
Sum JPF, Leung CS, Ho KIJ. On-line node fault injection training algorithm for MLP networks: objective function and convergence analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:211-222. [PMID: 24808501 DOI: 10.1109/tnnls.2011.2178477] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Improving fault tolerance of a neural network has been studied for more than two decades. Various training algorithms have been proposed in sequel. The on-line node fault injection-based algorithm is one of these algorithms, in which hidden nodes randomly output zeros during training. While the idea is simple, theoretical analyses on this algorithm are far from complete. This paper presents its objective function and the convergence proof. We consider three cases for multilayer perceptrons (MLPs). They are: (1) MLPs with single linear output node; (2) MLPs with multiple linear output nodes; and (3) MLPs with single sigmoid output node. For the convergence proof, we show that the algorithm converges with probability one. For the objective function, we show that the corresponding objective functions of cases (1) and (2) are of the same form. They both consist of a mean square errors term, a regularizer term, and a weight decay term. For case (3), the objective function is slight different from that of cases (1) and (2). With the objective functions derived, we can compare the similarities and differences among various algorithms and various cases.
Collapse
|
12
|
Ho K, Leung CS, Sum J. Objective functions of online weight noise injection training algorithms for MLPs. IEEE TRANSACTIONS ON NEURAL NETWORKS 2010; 22:317-23. [PMID: 21189237 DOI: 10.1109/tnn.2010.2095881] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Injecting weight noise during training has been a simple strategy to improve the fault tolerance of multilayer perceptrons (MLPs) for almost two decades, and several online training algorithms have been proposed in this regard. However, there are some misconceptions about the objective functions being minimized by these algorithms. Some existing results misinterpret that the prediction error of a trained MLP affected by weight noise is equivalent to the objective function of a weight noise injection algorithm. In this brief, we would like to clarify these misconceptions. Two weight noise injection scenarios will be considered: one is based on additive weight noise injection and the other is based on multiplicative weight noise injection. To avoid the misconceptions, we use their mean updating equations to analyze the objective functions. For injecting additive weight noise during training, we show that the true objective function is identical to the prediction error of a faulty MLP whose weights are affected by additive weight noise. It consists of the conventional mean square error and a smoothing regularizer. For injecting multiplicative weight noise during training, we show that the objective function is different from the prediction error of a faulty MLP whose weights are affected by multiplicative weight noise. With our results, some existing misconceptions regarding MLP training with weight noise injection can now be resolved.
Collapse
Affiliation(s)
- Kevin Ho
- Department of Computer Science and Communication Engineering, Providence University, Taichung 43301, Taiwan.
| | | | | |
Collapse
|
13
|
Leung CS, Wang HJ, Sum J. On the selection of weight decay parameter for faulty networks. IEEE TRANSACTIONS ON NEURAL NETWORKS 2010; 21:1232-44. [PMID: 20682468 DOI: 10.1109/tnn.2010.2049580] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The weight-decay technique is an effective approach to handle overfitting and weight fault. For fault-free networks, without an appropriate value of decay parameter, the trained network is either overfitted or underfitted. However, many existing results on the selection of decay parameter focus on fault-free networks only. It is well known that the weight-decay method can also suppress the effect of weight fault. For the faulty case, using a test set to select the decay parameter is not practice because there are huge number of possible faulty networks for a trained network. This paper develops two mean prediction error (MPE) formulae for predicting the performance of faulty radial basis function (RBF) networks. Two fault models, multiplicative weight noise and open weight fault, are considered. Our MPE formulae involve the training error and trained weights only. Besides, in our method, we do not need to generate a huge number of faulty networks to measure the test error for the fault situation. The MPE formulae allow us to select appropriate values of decay parameter for faulty networks. Our experiments showed that, although there are small differences between the true test errors (from the test set) and the MPE values, the MPE formulae can accurately locate the appropriate value of the decay parameter for minimizing the true test error of faulty networks.
Collapse
Affiliation(s)
- Chi Sing Leung
- Department of Electronic Engineering, City University of Hong Kong, Kowloon 852, Hong Kong.
| | | | | |
Collapse
|
14
|
Ho KIJ, Leung CS, Sum J. Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks. IEEE TRANSACTIONS ON NEURAL NETWORKS 2010; 21:938-47. [PMID: 20388593 DOI: 10.1109/tnn.2010.2046179] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In the last two decades, many online fault/noise injection algorithms have been developed to attain a fault tolerant neural network. However, not much theoretical works related to their convergence and objective functions have been reported. This paper studies six common fault/noise-injection-based online learning algorithms for radial basis function (RBF) networks, namely 1) injecting additive input noise, 2) injecting additive/multiplicative weight noise, 3) injecting multiplicative node noise, 4) injecting multiweight fault (random disconnection of weights), 5) injecting multinode fault during training, and 6) weight decay with injecting multinode fault. Based on the Gladyshev theorem, we show that the convergence of these six online algorithms is almost sure. Moreover, their true objective functions being minimized are derived. For injecting additive input noise during training, the objective function is identical to that of the Tikhonov regularizer approach. For injecting additive/multiplicative weight noise during training, the objective function is the simple mean square training error. Thus, injecting additive/multiplicative weight noise during training cannot improve the fault tolerance of an RBF network. Similar to injective additive input noise, the objective functions of other fault/noise-injection-based online algorithms contain a mean square error term and a specialized regularization term.
Collapse
Affiliation(s)
- Kevin I-J Ho
- Department of Computer Science and Communication Engineering, Providence University, Sha-Lu 433, Taiwan.
| | | | | |
Collapse
|
15
|
Sum JF, Chi-Sing Leung, Ho KJ. On Objective Function, Regularizer, and Prediction Error of a Learning Algorithm for Dealing With Multiplicative Weight Noise. ACTA ACUST UNITED AC 2009; 20:124-38. [DOI: 10.1109/tnn.2008.2005596] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|