1
|
Sum J, Leung CS. Regularization Effect of Random Node Fault/Noise on Gradient Descent Learning Algorithm. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2619-2632. [PMID: 34487503 DOI: 10.1109/tnnls.2021.3107051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
For decades, adding fault/noise during training by gradient descent has been a technique for getting a neural network (NN) tolerant to persistent fault/noise or getting an NN with better generalization. In recent years, this technique has been readvocated in deep learning to avoid overfitting. Yet, the objective function of such fault/noise injection learning has been misinterpreted as the desired measure (i.e., the expected mean squared error (mse) of the training samples) of the NN with the same fault/noise. The aims of this article are: 1) to clarify the above misconception and 2) investigate the actual regularization effect of adding node fault/noise when training by gradient descent. Based on the previous works on adding fault/noise during training, we speculate the reason why the misconception appears. In the sequel, it is shown that the learning objective of adding random node fault during gradient descent learning (GDL) for a multilayer perceptron (MLP) is identical to the desired measure of the MLP with the same fault. If additive (resp. multiplicative) node noise is added during GDL for an MLP, the learning objective is not identical to the desired measure of the MLP with such noise. For radial basis function (RBF) networks, it is shown that the learning objective is identical to the corresponding desired measure for all three fault/noise conditions. Empirical evidence is presented to support the theoretical results and, hence, clarify the misconception that the objective function of a fault/noise injection learning might not be interpreted as the desired measure of the NN with the same fault/noise. Afterward, the regularization effect of adding node fault/noise during training is revealed for the case of RBF networks. Notably, it is shown that the regularization effect of adding additive or multiplicative node noise (MNN) during training an RBF is reducing network complexity. Applying dropout regularization in RBF networks, its effect is the same as adding MNN during training.
Collapse
|
2
|
Lai X, Cao J, Lin Z. An Accelerated Maximally Split ADMM for a Class of Generalized Ridge Regression. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:958-972. [PMID: 34437070 DOI: 10.1109/tnnls.2021.3104840] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Ridge regression (RR) has been commonly used in machine learning, but is facing computational challenges in big data applications. To meet the challenges, this article develops a highly parallel new algorithm, i.e., an accelerated maximally split alternating direction method of multipliers (A-MS-ADMM), for a class of generalized RR (GRR) that allows different regularization factors for different regression coefficients. Linear convergence of the new algorithm along with its convergence ratio is established. Optimal parameters of the algorithm for the GRR with a particular set of regularization factors are derived, and a selection scheme of the algorithm parameters for the GRR with general regularization factors is also discussed. The new algorithm is then applied in the training of single-layer feedforward neural networks. Experimental results on performance validation on real-world benchmark datasets for regression and classification and comparisons with existing methods demonstrate the fast convergence, low computational complexity, and high parallelism of the new algorithm.
Collapse
|
3
|
Wang J, Chang Q, Chang Q, Liu Y, Pal NR. Weight Noise Injection-Based MLPs With Group Lasso Penalty: Asymptotic Convergence and Application to Node Pruning. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4346-4364. [PMID: 30530381 DOI: 10.1109/tcyb.2018.2864142] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The application and theoretical analysis of fault tolerant learning are very important for neural networks. Our objective here is to realize fault tolerant sparse multilayer perceptron (MLP) networks. The stochastic gradient descent method has been employed to perform online learning for MLPs. For weight noise injection-based network models, it is a common strategy to add a weight decay regularizer while constructing the objective function for learning. However, this l2 -norm penalty does not generate sparse optimal solutions. In this paper, a group lasso penalty term is used as a regularizer, where a group is defined by the set of weights connected to a node from nodes in the preceding layer. Group lasso penalty enables us to prune redundant hidden nodes. Due to its nondifferentiability at the origin, a smooth approximation of the group lasso penalty is developed. Then, a rigorous proof for the asymptotic convergence of the learning algorithm is provided. Finally, some simulations have been performed to verify the sparseness of the network and the theoretical results.
Collapse
|
4
|
Feng RB, Han ZF, Wan WY, Leung CS. Properties and learning algorithms for faulty RBF networks with coexistence of weight and node failures. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.11.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
5
|
Müller AT, Kaymaz AC, Gabernet G, Posselt G, Wessler S, Hiss JA, Schneider G. Sparse Neural Network Models of Antimicrobial Peptide-Activity Relationships. Mol Inform 2016; 35:606-614. [DOI: 10.1002/minf.201600029] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 06/13/2016] [Indexed: 01/07/2023]
Affiliation(s)
- Alex T. Müller
- Swiss Federal Institute of Technology (ETH); Department of Chemistry and Applied Biosciences; Vladimir-Prelog-Weg 4 CH-8093 Zurich Switzerland
| | - Aral C. Kaymaz
- Swiss Federal Institute of Technology (ETH); Department of Chemistry and Applied Biosciences; Vladimir-Prelog-Weg 4 CH-8093 Zurich Switzerland
| | - Gisela Gabernet
- Swiss Federal Institute of Technology (ETH); Department of Chemistry and Applied Biosciences; Vladimir-Prelog-Weg 4 CH-8093 Zurich Switzerland
| | - Gernot Posselt
- Department of Molecular Biology, Division of Microbiology, Paris Lodron; University of Salzburg; Billrothstr. 11 A-5020 Salzburg Austria
| | - Silja Wessler
- Department of Molecular Biology, Division of Microbiology, Paris Lodron; University of Salzburg; Billrothstr. 11 A-5020 Salzburg Austria
| | - Jan A. Hiss
- Swiss Federal Institute of Technology (ETH); Department of Chemistry and Applied Biosciences; Vladimir-Prelog-Weg 4 CH-8093 Zurich Switzerland
| | - Gisbert Schneider
- Swiss Federal Institute of Technology (ETH); Department of Chemistry and Applied Biosciences; Vladimir-Prelog-Weg 4 CH-8093 Zurich Switzerland
| |
Collapse
|
6
|
Han Z, Feng RB, Yan Wan W, Leung CS. Online training and its convergence for faulty networks with multiplicative weight noise. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.12.049] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
7
|
Xiao Y, Feng R, Leung CS, Sum PF. Online Training for Open Faulty RBF Networks. Neural Process Lett 2014. [DOI: 10.1007/s11063-014-9363-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
8
|
Sum J, Leung CS, Ho K. Convergence analyses on on-line weight noise injection-based training algorithms for MLPs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:1827-1840. [PMID: 24808076 DOI: 10.1109/tnnls.2012.2210243] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Injecting weight noise during training is a simple technique that has been proposed for almost two decades. However, little is known about its convergence behavior. This paper studies the convergence of two weight noise injection-based training algorithms, multiplicative weight noise injection with weight decay and additive weight noise injection with weight decay. We consider that they are applied to multilayer perceptrons either with linear or sigmoid output nodes. Let w(t) be the weight vector, let V(w) be the corresponding objective function of the training algorithm, let α >; 0 be the weight decay constant, and let μ(t) be the step size. We show that if μ(t)→ 0, then with probability one E[||w(t)||2(2)] is bound and lim(t) → ∞ ||w(t)||2 exists. Based on these two properties, we show that if μ(t)→ 0, Σtμ(t)=∞, and Σtμ(t)(2) <; ∞, then with probability one these algorithms converge. Moreover, w(t) converges with probability one to a point where ∇wV(w)=0.
Collapse
|
9
|
Wu Y, Wang H, Zhang B, Du KL. Using Radial Basis Function Networks for Function Approximation and Classification. ACTA ACUST UNITED AC 2012. [DOI: 10.5402/2012/324194] [Citation(s) in RCA: 94] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The radial basis function (RBF) network has its foundation in the conventional approximation theory. It has the capability of universal approximation. The RBF network is a popular alternative to the well-known multilayer perceptron (MLP), since it has a simpler structure and a much faster training process. In this paper, we give a comprehensive survey on the RBF network and its learning. Many aspects associated with the RBF network, such as network structure, universal approimation capability, radial basis functions, RBF network learning, structure optimization, normalized RBF networks, application to dynamic system modeling, and nonlinear complex-valued signal processing, are described. We also compare the features and capability of the two models.
Collapse
Affiliation(s)
- Yue Wu
- Enjoyor Laboratories, Enjoyor Inc., Hangzhou 310030, China
| | - Hui Wang
- Enjoyor Laboratories, Enjoyor Inc., Hangzhou 310030, China
| | - Biaobiao Zhang
- Enjoyor Laboratories, Enjoyor Inc., Hangzhou 310030, China
| | - K.-L. Du
- Enjoyor Laboratories, Enjoyor Inc., Hangzhou 310030, China
- Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada H3G 1M8
| |
Collapse
|
10
|
Leung ACS, Xiao Y, Xu Y, Wong KW. Decouple implementation of weight decay for recursive least square. Neural Comput Appl 2012. [DOI: 10.1007/s00521-012-0832-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
11
|
Sum JPF, Leung CS, Ho KIJ. On-line node fault injection training algorithm for MLP networks: objective function and convergence analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:211-222. [PMID: 24808501 DOI: 10.1109/tnnls.2011.2178477] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Improving fault tolerance of a neural network has been studied for more than two decades. Various training algorithms have been proposed in sequel. The on-line node fault injection-based algorithm is one of these algorithms, in which hidden nodes randomly output zeros during training. While the idea is simple, theoretical analyses on this algorithm are far from complete. This paper presents its objective function and the convergence proof. We consider three cases for multilayer perceptrons (MLPs). They are: (1) MLPs with single linear output node; (2) MLPs with multiple linear output nodes; and (3) MLPs with single sigmoid output node. For the convergence proof, we show that the algorithm converges with probability one. For the objective function, we show that the corresponding objective functions of cases (1) and (2) are of the same form. They both consist of a mean square errors term, a regularizer term, and a weight decay term. For case (3), the objective function is slight different from that of cases (1) and (2). With the objective functions derived, we can compare the similarities and differences among various algorithms and various cases.
Collapse
|
12
|
Hoang Xuan Huan, Dang Thi Thu Hien, Huynh Huu Tue. Efficient Algorithm for Training Interpolation RBF Networks With Equally Spaced Nodes. ACTA ACUST UNITED AC 2011; 22:982-8. [DOI: 10.1109/tnn.2011.2120619] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
13
|
|
14
|
Ho K, Leung CS, Sum J. Objective functions of online weight noise injection training algorithms for MLPs. IEEE TRANSACTIONS ON NEURAL NETWORKS 2010; 22:317-23. [PMID: 21189237 DOI: 10.1109/tnn.2010.2095881] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Injecting weight noise during training has been a simple strategy to improve the fault tolerance of multilayer perceptrons (MLPs) for almost two decades, and several online training algorithms have been proposed in this regard. However, there are some misconceptions about the objective functions being minimized by these algorithms. Some existing results misinterpret that the prediction error of a trained MLP affected by weight noise is equivalent to the objective function of a weight noise injection algorithm. In this brief, we would like to clarify these misconceptions. Two weight noise injection scenarios will be considered: one is based on additive weight noise injection and the other is based on multiplicative weight noise injection. To avoid the misconceptions, we use their mean updating equations to analyze the objective functions. For injecting additive weight noise during training, we show that the true objective function is identical to the prediction error of a faulty MLP whose weights are affected by additive weight noise. It consists of the conventional mean square error and a smoothing regularizer. For injecting multiplicative weight noise during training, we show that the objective function is different from the prediction error of a faulty MLP whose weights are affected by multiplicative weight noise. With our results, some existing misconceptions regarding MLP training with weight noise injection can now be resolved.
Collapse
Affiliation(s)
- Kevin Ho
- Department of Computer Science and Communication Engineering, Providence University, Taichung 43301, Taiwan.
| | | | | |
Collapse
|
15
|
Ho KIJ, Leung CS, Sum J. Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks. IEEE TRANSACTIONS ON NEURAL NETWORKS 2010; 21:938-47. [PMID: 20388593 DOI: 10.1109/tnn.2010.2046179] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In the last two decades, many online fault/noise injection algorithms have been developed to attain a fault tolerant neural network. However, not much theoretical works related to their convergence and objective functions have been reported. This paper studies six common fault/noise-injection-based online learning algorithms for radial basis function (RBF) networks, namely 1) injecting additive input noise, 2) injecting additive/multiplicative weight noise, 3) injecting multiplicative node noise, 4) injecting multiweight fault (random disconnection of weights), 5) injecting multinode fault during training, and 6) weight decay with injecting multinode fault. Based on the Gladyshev theorem, we show that the convergence of these six online algorithms is almost sure. Moreover, their true objective functions being minimized are derived. For injecting additive input noise during training, the objective function is identical to that of the Tikhonov regularizer approach. For injecting additive/multiplicative weight noise during training, the objective function is the simple mean square training error. Thus, injecting additive/multiplicative weight noise during training cannot improve the fault tolerance of an RBF network. Similar to injective additive input noise, the objective functions of other fault/noise-injection-based online algorithms contain a mean square error term and a specialized regularization term.
Collapse
Affiliation(s)
- Kevin I-J Ho
- Department of Computer Science and Communication Engineering, Providence University, Sha-Lu 433, Taiwan.
| | | | | |
Collapse
|
16
|
Ho TY, Leung CS, Lam PM, Wong TT. Efficient relighting of RBF-based illumination adjustable images. IEEE TRANSACTIONS ON NEURAL NETWORKS 2009; 20:1987-1993. [PMID: 19822473 DOI: 10.1109/tnn.2009.2032765] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
An illumination adjustable image (IAI) contains a large number of prerecorded images under various light directions. Relighting a scene under complicated lighting conditions can be achieved from the IAI. Using the radial basis function (RBF) approach to represent an IAI is proven to be more efficient than using the spherical harmonic approach. However, to represent high-frequency lighting effects, we need to use many RBFs. Hence, the relighting speed could be very slow. This brief investigates a partial reconstruction scheme for relighting an IAI based on the locality of RBFs. Compared with the conventional RBF and spherical harmonics (SH) approaches, the proposed scheme has a much faster relighting speed under the similar distortion performance.
Collapse
Affiliation(s)
- Tze-Yiu Ho
- Department of Electronic Engineering, The City University of Hong Kong, Kowloon, Hong Kong
| | | | | | | |
Collapse
|