1
|
Pawlak WA, Howard N. Neuromorphic algorithms for brain implants: a review. Front Neurosci 2025; 19:1570104. [PMID: 40292025 PMCID: PMC12021827 DOI: 10.3389/fnins.2025.1570104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2025] [Accepted: 03/26/2025] [Indexed: 04/30/2025] Open
Abstract
Neuromorphic computing technologies are about to change modern computing, yet most work thus far has emphasized hardware development. This review focuses on the latest progress in algorithmic advances specifically for potential use in brain implants. We discuss current algorithms and emerging neurocomputational models that, when implemented on neuromorphic hardware, could match or surpass traditional methods in efficiency. Our aim is to inspire the creation and deployment of models that not only enhance computational performance for implants but also serve broader fields like medical diagnostics and robotics inspiring next generations of neural implants.
Collapse
|
2
|
Sum J, Leung CS. Regularization Effect of Random Node Fault/Noise on Gradient Descent Learning Algorithm. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2619-2632. [PMID: 34487503 DOI: 10.1109/tnnls.2021.3107051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
For decades, adding fault/noise during training by gradient descent has been a technique for getting a neural network (NN) tolerant to persistent fault/noise or getting an NN with better generalization. In recent years, this technique has been readvocated in deep learning to avoid overfitting. Yet, the objective function of such fault/noise injection learning has been misinterpreted as the desired measure (i.e., the expected mean squared error (mse) of the training samples) of the NN with the same fault/noise. The aims of this article are: 1) to clarify the above misconception and 2) investigate the actual regularization effect of adding node fault/noise when training by gradient descent. Based on the previous works on adding fault/noise during training, we speculate the reason why the misconception appears. In the sequel, it is shown that the learning objective of adding random node fault during gradient descent learning (GDL) for a multilayer perceptron (MLP) is identical to the desired measure of the MLP with the same fault. If additive (resp. multiplicative) node noise is added during GDL for an MLP, the learning objective is not identical to the desired measure of the MLP with such noise. For radial basis function (RBF) networks, it is shown that the learning objective is identical to the corresponding desired measure for all three fault/noise conditions. Empirical evidence is presented to support the theoretical results and, hence, clarify the misconception that the objective function of a fault/noise injection learning might not be interpreted as the desired measure of the NN with the same fault/noise. Afterward, the regularization effect of adding node fault/noise during training is revealed for the case of RBF networks. Notably, it is shown that the regularization effect of adding additive or multiplicative node noise (MNN) during training an RBF is reducing network complexity. Applying dropout regularization in RBF networks, its effect is the same as adding MNN during training.
Collapse
|
3
|
Wu C, Yang X, Yu H, Peng R, Takeuchi I, Chen Y, Li M. Harnessing optoelectronic noises in a photonic generative network. SCIENCE ADVANCES 2022; 8:eabm2956. [PMID: 35061531 PMCID: PMC8782447 DOI: 10.1126/sciadv.abm2956] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 11/29/2021] [Indexed: 06/14/2023]
Abstract
Integrated optoelectronics is emerging as a promising platform of neural network accelerator, which affords efficient in-memory computing and high bandwidth interconnectivity. The inherent optoelectronic noises, however, make the photonic systems error-prone in practice. It is thus imperative to devise strategies to mitigate and, if possible, harness noises in photonic computing systems. Here, we demonstrate a photonic generative network as a part of a generative adversarial network (GAN). This network is implemented with a photonic core consisting of an array of programable phase-change memory cells to perform four-element vector-vector dot multiplication. The GAN can generate a handwritten number ("7") in experiments and full 10 digits in simulation. We realize an optical random number generator, apply noise-aware training by injecting additional noise, and demonstrate the network's resilience to hardware nonidealities. Our results suggest the resilience and potential of more complex photonic generative networks based on large-scale, realistic photonic hardware.
Collapse
Affiliation(s)
- Changming Wu
- Department of Electrical and Computer Engineering, University of Washington, Seattle, WA 98195, USA
| | - Xiaoxuan Yang
- Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA
| | - Heshan Yu
- Department of Materials Science and Engineering, University of Maryland, College Park, MD 20742, USA
| | - Ruoming Peng
- Department of Electrical and Computer Engineering, University of Washington, Seattle, WA 98195, USA
| | - Ichiro Takeuchi
- Department of Materials Science and Engineering, University of Maryland, College Park, MD 20742, USA
| | - Yiran Chen
- Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA
| | - Mo Li
- Department of Electrical and Computer Engineering, University of Washington, Seattle, WA 98195, USA
- Department of Physics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
4
|
Wang X, Wang J, Zhang K, Lin F, Chang Q. Convergence and objective functions of noise-injected multilayer perceptrons with hidden multipliers. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.03.119] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
5
|
An adaptive threshold neuron for recurrent spiking neural networks with nanodevice hardware implementation. Nat Commun 2021; 12:4234. [PMID: 34244491 PMCID: PMC8270926 DOI: 10.1038/s41467-021-24427-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 06/14/2021] [Indexed: 11/19/2022] Open
Abstract
We propose a Double EXponential Adaptive Threshold (DEXAT) neuron model that improves the performance of neuromorphic Recurrent Spiking Neural Networks (RSNNs) by providing faster convergence, higher accuracy and a flexible long short-term memory. We present a hardware efficient methodology to realize the DEXAT neurons using tightly coupled circuit-device interactions and experimentally demonstrate the DEXAT neuron block using oxide based non-filamentary resistive switching devices. Using experimentally extracted parameters we simulate a full RSNN that achieves a classification accuracy of 96.1% on SMNIST dataset and 91% on Google Speech Commands (GSC) dataset. We also demonstrate full end-to-end real-time inference for speech recognition using real fabricated resistive memory circuit based DEXAT neurons. Finally, we investigate the impact of nanodevice variability and endurance illustrating the robustness of DEXAT based RSNNs. Recurrent spiking neural networks have garnered interest due to their energy efficiency; however, they suffer from lower accuracy compared to conventional neural networks. Here, the authors present an alternative neuron model and its efficient hardware implementation, demonstrating high classification accuracy across a range of datasets.
Collapse
|
6
|
Sebastian A, Le Gallo M, Khaddam-Aljameh R, Eleftheriou E. Memory devices and applications for in-memory computing. NATURE NANOTECHNOLOGY 2020; 15:529-544. [PMID: 32231270 DOI: 10.1038/s41565-020-0655-z] [Citation(s) in RCA: 370] [Impact Index Per Article: 74.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 02/10/2020] [Indexed: 05/02/2023]
Abstract
Traditional von Neumann computing systems involve separate processing and memory units. However, data movement is costly in terms of time and energy and this problem is aggravated by the recent explosive growth in highly data-centric applications related to artificial intelligence. This calls for a radical departure from the traditional systems and one such non-von Neumann computational approach is in-memory computing. Hereby certain computational tasks are performed in place in the memory itself by exploiting the physical attributes of the memory devices. Both charge-based and resistance-based memory devices are being explored for in-memory computing. In this Review, we provide a broad overview of the key computational primitives enabled by these memory devices as well as their applications spanning scientific computing, signal processing, optimization, machine learning, deep learning and stochastic computing.
Collapse
|
7
|
Sum J, Leung CS, Ho K. A Limitation of Gradient Descent Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2227-2232. [PMID: 31398136 DOI: 10.1109/tnnls.2019.2927689] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Over decades, gradient descent has been applied to develop learning algorithm to train a neural network (NN). In this brief, a limitation of applying such algorithm to train an NN with persistent weight noise is revealed. Let V(w) be the performance measure of an ideal NN. V(w) is applied to develop the gradient descent learning (GDL). With weight noise, the desired performance measure (denoted as J(w) ) is E[V(~w)|w] , where ~w is the noisy weight vector. Applying GDL to train an NN with weight noise, the actual learning objective is clearly not V(w) but another scalar function L(w) . For decades, there is a misconception that L(w) = J(w) , and hence, the actual model attained by the GDL is the desired model. However, we show that it might not: 1) with persistent additive weight noise, the actual model attained is the desired model as L(w) = J(w) ; and 2) with persistent multiplicative weight noise, the actual model attained is unlikely the desired model as L(w) ≠ J(w) . Accordingly, the properties of the models attained as compared with the desired models are analyzed and the learning curves are sketched. Simulation results on 1) a simple regression problem and 2) the MNIST handwritten digit recognition are presented to support our claims.
Collapse
|
8
|
Joshi V, Le Gallo M, Haefeli S, Boybat I, Nandakumar SR, Piveteau C, Dazzi M, Rajendran B, Sebastian A, Eleftheriou E. Accurate deep neural network inference using computational phase-change memory. Nat Commun 2020; 11:2473. [PMID: 32424184 PMCID: PMC7235046 DOI: 10.1038/s41467-020-16108-9] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 04/03/2020] [Indexed: 11/11/2022] Open
Abstract
In-memory computing using resistive memory devices is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware. However, due to device variability and noise, the network needs to be trained in a specific way so that transferring the digitally trained weights to the analog resistive memory devices will not result in significant loss of accuracy. Here, we introduce a methodology to train ResNet-type convolutional neural networks that results in no appreciable accuracy loss when transferring weights to phase-change memory (PCM) devices. We also propose a compensation technique that exploits the batch normalization parameters to improve the accuracy retention over time. We achieve a classification accuracy of 93.7% on CIFAR-10 and a top-1 accuracy of 71.6% on ImageNet benchmarks after mapping the trained weights to PCM. Our hardware results on CIFAR-10 with ResNet-32 demonstrate an accuracy above 93.5% retained over a one-day period, where each of the 361,722 synaptic weights is programmed on just two PCM devices organized in a differential configuration.
Collapse
Affiliation(s)
- Vinay Joshi
- IBM Research - Zurich, Säumerstrasse 4, 8803, Rüschlikon, Switzerland
- King's College London, Strand, London, WC2R 2LS, UK
| | - Manuel Le Gallo
- IBM Research - Zurich, Säumerstrasse 4, 8803, Rüschlikon, Switzerland.
| | - Simon Haefeli
- IBM Research - Zurich, Säumerstrasse 4, 8803, Rüschlikon, Switzerland
- ETH Zurich, Rämistrasse 101, 8092, Zurich, Switzerland
| | - Irem Boybat
- IBM Research - Zurich, Säumerstrasse 4, 8803, Rüschlikon, Switzerland
- Ecole Polytechnique Federale de Lausanne (EPFL), 1015, Lausanne, Switzerland
| | - S R Nandakumar
- IBM Research - Zurich, Säumerstrasse 4, 8803, Rüschlikon, Switzerland
| | - Christophe Piveteau
- IBM Research - Zurich, Säumerstrasse 4, 8803, Rüschlikon, Switzerland
- ETH Zurich, Rämistrasse 101, 8092, Zurich, Switzerland
| | - Martino Dazzi
- IBM Research - Zurich, Säumerstrasse 4, 8803, Rüschlikon, Switzerland
- ETH Zurich, Rämistrasse 101, 8092, Zurich, Switzerland
| | | | - Abu Sebastian
- IBM Research - Zurich, Säumerstrasse 4, 8803, Rüschlikon, Switzerland.
| | | |
Collapse
|
9
|
Xie X, Zhang H, Wang J, Chang Q, Wang J, Pal NR. Learning Optimized Structure of Neural Networks by Hidden Node Pruning With L 1 Regularization. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1333-1346. [PMID: 31765323 DOI: 10.1109/tcyb.2019.2950105] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We propose three different methods to determine the optimal number of hidden nodes based on L1 regularization for a multilayer perceptron network. The first two methods, respectively, use a set of multiplier functions and multipliers for the hidden-layer nodes and implement the L1 regularization on those, while the third method equipped with the same multipliers uses a smoothing approximation of the L1 regularization. Each of these methods begins with a given number of hidden nodes, then the network is trained to obtain an optimal architecture discarding redundant hidden nodes using the multiplier functions or multipliers. A simple and generic method, namely, the matrix-based convergence proving method (MCPM), is introduced to prove the weak and strong convergence of the presented smoothing algorithms. The performance of the three pruning methods has been tested on 11 different classification datasets. The results demonstrate the efficient pruning abilities and competitive generalization by the proposed methods. The theoretical results are also validated by the results.
Collapse
|
10
|
|
11
|
Wang J, Chang Q, Chang Q, Liu Y, Pal NR. Weight Noise Injection-Based MLPs With Group Lasso Penalty: Asymptotic Convergence and Application to Node Pruning. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4346-4364. [PMID: 30530381 DOI: 10.1109/tcyb.2018.2864142] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The application and theoretical analysis of fault tolerant learning are very important for neural networks. Our objective here is to realize fault tolerant sparse multilayer perceptron (MLP) networks. The stochastic gradient descent method has been employed to perform online learning for MLPs. For weight noise injection-based network models, it is a common strategy to add a weight decay regularizer while constructing the objective function for learning. However, this l2 -norm penalty does not generate sparse optimal solutions. In this paper, a group lasso penalty term is used as a regularizer, where a group is defined by the set of weights connected to a node from nodes in the preceding layer. Group lasso penalty enables us to prune redundant hidden nodes. Due to its nondifferentiability at the origin, a smooth approximation of the group lasso penalty is developed. Then, a rigorous proof for the asymptotic convergence of the learning algorithm is provided. Finally, some simulations have been performed to verify the sparseness of the network and the theoretical results.
Collapse
|
12
|
Sum J, Leung CS. Learning Algorithm for Boltzmann Machines With Additive Weight and Bias Noise. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3200-3204. [PMID: 30668482 DOI: 10.1109/tnnls.2018.2889072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This brief presents analytical results on the effect of additive weight/bias noise on a Boltzmann machine (BM), in which the unit output is in {-1, 1} instead of {0, 1}. With such noise, it is found that the state distribution is yet another Boltzmann distribution but the temperature factor is elevated. Thus, the desired gradient ascent learning algorithm is derived, and the corresponding learning procedure is developed. This learning procedure is compared with the learning procedure applied to train a BM with noise. It is found that these two procedures are identical. Therefore, the learning algorithm for noise-free BMs is suitable for implementing as an online learning algorithm for an analog circuit-implemented BM, even if the variances of the additive weight noise and bias noise are unknown.
Collapse
|
13
|
Abandah GA, Graves A, Al-Shagoor B, Arabiyat A, Jamour F, Al-Taee M. Automatic diacritization of Arabic text using recurrent neural networks. INT J DOC ANAL RECOG 2015. [DOI: 10.1007/s10032-015-0242-2] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
14
|
Abstract
Dropout is a recently introduced algorithm for training neural network by randomly dropping units during training to prevent their co-adaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using Bernoulli gating variables, general enough to accommodate dropout on units or connections, and with variable rates. The framework allows a complete analysis of the ensemble averaging properties of dropout in linear networks, which is useful to understand the non-linear case. The ensemble averaging properties of dropout in non-linear logistic networks result from three fundamental equations: (1) the approximation of the expectations of logistic functions by normalized geometric means, for which bounds and estimates are derived; (2) the algebraic equality between normalized geometric means of logistic functions with the logistic of the means, which mathematically characterizes logistic functions; and (3) the linearity of the means with respect to sums, as well as products of independent variables. The results are also extended to other classes of transfer functions, including rectified linear functions. Approximation errors tend to cancel each other and do not accumulate. Dropout can also be connected to stochastic neurons and used to predict firing rates, and to backpropagation by viewing the backward propagation as ensemble averaging in a dropout linear network. Moreover, the convergence properties of dropout can be understood in terms of stochastic gradient descent. Finally, for the regularization properties of dropout, the expectation of the dropout gradient is the gradient of the corresponding approximation ensemble, regularized by an adaptive weight decay term with a propensity for self-consistent variance minimization and sparse representations.
Collapse
Affiliation(s)
- Pierre Baldi
- Department of Computer Science University of California, Irvine Irvine, CA 92697-3435
| | - Peter Sadowski
- Department of Computer Science University of California, Irvine Irvine, CA 92697-3435
| |
Collapse
|
15
|
Hsieh HY, Tang KT. Hardware friendly probabilistic spiking neural network with long-term and short-term plasticity. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:2063-2074. [PMID: 24805223 DOI: 10.1109/tnnls.2013.2271644] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This paper proposes a probabilistic spiking neural network (PSNN) with unimodal weight distribution, possessing long- and short-term plasticity. The proposed algorithm is derived by both the arithmetic gradient decent calculation and bioinspired algorithms. The algorithm is benchmarked by the Iris and Wisconsin breast cancer (WBC) data sets. The network features fast convergence speed and high accuracy. In the experiment, the PSNN took not more than 40 epochs for convergence. The average testing accuracy for Iris and WBC data is 96.7% and 97.2%, respectively. To test the usefulness of the PSNN to real world application, the PSNN was also tested with the odor data, which was collected by our self-developed electronic nose (e-nose). Compared with the algorithm (K-nearest neighbor) that has the highest classification accuracy in the e-nose for the same odor data, the classification accuracy of the PSNN is only 1.3% less but the memory requirement can be reduced at least 40%. All the experiments suggest that the PSNN is hardware friendly. First, it requires only nine-bits weight resolution for training and testing. Second, the PSNN can learn complex data sets with a little number of neurons that in turn reduce the cost of VLSI implementation. In addition, the algorithm is insensitive to synaptic noise and the parameter variation induced by the VLSI fabrication. Therefore, the algorithm can be implemented by either software or hardware, making it suitable for wider application.
Collapse
|
16
|
|
17
|
An analog multilayer perceptron neural network for a portable electronic nose. SENSORS 2012; 13:193-207. [PMID: 23262482 PMCID: PMC3574673 DOI: 10.3390/s130100193] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/10/2012] [Revised: 12/17/2012] [Accepted: 12/19/2012] [Indexed: 11/16/2022]
Abstract
This study examines an analog circuit comprising a multilayer perceptron neural network (MLPNN). This study proposes a low-power and small-area analog MLP circuit to implement in an E-nose as a classifier, such that the E-nose would be relatively small, power-efficient, and portable. The analog MLP circuit had only four input neurons, four hidden neurons, and one output neuron. The circuit was designed and fabricated using a 0.18 μm standard CMOS process with a 1.8 V supply. The power consumption was 0.553 mW, and the area was approximately 1.36 × 1.36 mm2. The chip measurements showed that this MLPNN successfully identified the fruit odors of bananas, lemons, and lychees with 91.7% accuracy.
Collapse
|
18
|
Sum J, Leung CS, Ho K. Convergence analyses on on-line weight noise injection-based training algorithms for MLPs. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:1827-1840. [PMID: 24808076 DOI: 10.1109/tnnls.2012.2210243] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Injecting weight noise during training is a simple technique that has been proposed for almost two decades. However, little is known about its convergence behavior. This paper studies the convergence of two weight noise injection-based training algorithms, multiplicative weight noise injection with weight decay and additive weight noise injection with weight decay. We consider that they are applied to multilayer perceptrons either with linear or sigmoid output nodes. Let w(t) be the weight vector, let V(w) be the corresponding objective function of the training algorithm, let α >; 0 be the weight decay constant, and let μ(t) be the step size. We show that if μ(t)→ 0, then with probability one E[||w(t)||2(2)] is bound and lim(t) → ∞ ||w(t)||2 exists. Based on these two properties, we show that if μ(t)→ 0, Σtμ(t)=∞, and Σtμ(t)(2) <; ∞, then with probability one these algorithms converge. Moreover, w(t) converges with probability one to a point where ∇wV(w)=0.
Collapse
|
19
|
Jim KC, Giles CL, Horne BG. An analysis of noise in recurrent neural networks: convergence and generalization. ACTA ACUST UNITED AC 2012; 7:1424-38. [PMID: 18263536 DOI: 10.1109/72.548170] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Concerns the effect of noise on the performance of feedforward neural nets. We introduce and analyze various methods of injecting synaptic noise into dynamically driven recurrent nets during training. Theoretical results show that applying a controlled amount of noise during training may improve convergence and generalization performance. We analyze the effects of various noise parameters and predict that best overall performance can be achieved by injecting additive noise at each time step. Noise contributes a second-order gradient term to the error function which can be viewed as an anticipatory agent to aid convergence. This term appears to find promising regions of weight space in the beginning stages of training when the training error is large and should improve convergence on error surfaces with local minima. The first-order term is a regularization term that can improve generalization. Specifically, it can encourage internal representations where the state nodes operate in the saturated regions of the sigmoid discriminant function. While this effect can improve performance on automata inference problems with binary inputs and target outputs, it is unclear what effect it will have on other types of problems. To substantiate these predictions, we present simulations on learning the dual parity grammar from temporal strings for all noise models, and present simulations on learning a randomly generated six-state grammar using the predicted best noise model.
Collapse
Affiliation(s)
- K C Jim
- NEC Res. Inst., Princeton, NJ
| | | | | |
Collapse
|
20
|
Leung CS, Sum PF, Liu Y. Optimization of tuning parameters for open node fault regularizer. Neurocomputing 2012. [DOI: 10.1016/j.neucom.2012.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
21
|
Schück A, Labruyère R, Vallery H, Riener R, Duschau-Wicke A. Feasibility and effects of patient-cooperative robot-aided gait training applied in a 4-week pilot trial. J Neuroeng Rehabil 2012; 9:31. [PMID: 22650320 PMCID: PMC3533836 DOI: 10.1186/1743-0003-9-31] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Accepted: 04/20/2012] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Functional training is becoming the state-of-the-art therapy approach for rehabilitation of individuals after stroke and spinal cord injury. Robot-aided treadmill training reduces personnel effort, especially when treating severely affected patients. Improving rehabilitation robots towards more patient-cooperative behavior may further increase the effects of robot-aided training. This pilot study aims at investigating the feasibility of applying patient-cooperative robot-aided gait rehabilitation to stroke and incomplete spinal cord injury during a therapy period of four weeks. Short-term effects within one training session as well as the effects of the training on walking function are evaluated. METHODS Two individuals with chronic incomplete spinal cord injury and two with chronic stroke trained with the Lokomat gait rehabilitation robot which was operated in a new, patient-cooperative mode for a period of four weeks with four training sessions of 45 min per week. At baseline, after two and after four weeks, walking function was assessed with the ten meter walking test. Additionally, muscle activity of the major leg muscles, heart rate and the Borg scale were measured under different walking conditions including a non-cooperative position control mode to investigate the short-term effects of patient-cooperative versus non-cooperative robot-aided gait training. RESULTS Patient-cooperative robot-aided gait training was tolerated well by all subjects and performed without difficulties. The subjects trained more actively and with more physiological muscle activity than in a non-cooperative position-control mode. One subject showed a significant and relevant increase of gait speed after the therapy, the three remaining subjects did not show significant changes. CONCLUSIONS Patient-cooperative robot-aided gait training is feasible in clinical practice and overcomes the main points of criticism against robot-aided gait training: It enables patients to train in an active, variable and more natural way. The limited number of subjects in this pilot trial does not permit valid conclusions on the effect of patient-cooperative robot-aided gait training on walking function. A large, possibly multi-center randomized controlled clinical trial is required to shed more light on this question.
Collapse
Affiliation(s)
- Alex Schück
- Spinal Cord Injury Center, University Hospital Balgrist, University of Zurich, Zurich, Switzerland
| | | | | | | | | |
Collapse
|
22
|
Soudry D, Meir R. Conductance-based neuron models and the slow dynamics of excitability. Front Comput Neurosci 2012; 6:4. [PMID: 22355288 PMCID: PMC3280430 DOI: 10.3389/fncom.2012.00004] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 01/11/2012] [Indexed: 12/03/2022] Open
Abstract
In recent experiments, synaptically isolated neurons from rat cortical culture, were stimulated with periodic extracellular fixed-amplitude current pulses for extended durations of days. The neuron’s response depended on its own history, as well as on the history of the input, and was classified into several modes. Interestingly, in one of the modes the neuron behaved intermittently, exhibiting irregular firing patterns changing in a complex and variable manner over the entire range of experimental timescales, from seconds to days. With the aim of developing a minimal biophysical explanation for these results, we propose a general scheme, that, given a few assumptions (mainly, a timescale separation in kinetics) closely describes the response of deterministic conductance-based neuron models under pulse stimulation, using a discrete time piecewise linear mapping, which is amenable to detailed mathematical analysis. Using this method we reproduce the basic modes exhibited by the neuron experimentally, as well as the mean response in each mode. Specifically, we derive precise closed-form input-output expressions for the transient timescale and firing rates, which are expressed in terms of experimentally measurable variables, and conform with the experimental results. However, the mathematical analysis shows that the resulting firing patterns in these deterministic models are always regular and repeatable (i.e., no chaos), in contrast to the irregular and variable behavior displayed by the neuron in certain regimes. This fact, and the sensitive near-threshold dynamics of the model, indicate that intrinsic ion channel noise has a significant impact on the neuronal response, and may help reproduce the experimentally observed variability, as we also demonstrate numerically. In a companion paper, we extend our analysis to stochastic conductance-based models, and show how these can be used to reproduce the details of the observed irregular and variable neuronal response.
Collapse
Affiliation(s)
- Daniel Soudry
- Department of Electrical Engineering, The Laboratory for Network Biology Research Technion, Haifa, Israel
| | | |
Collapse
|
23
|
Sum JPF, Leung CS, Ho KIJ. On-line node fault injection training algorithm for MLP networks: objective function and convergence analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:211-222. [PMID: 24808501 DOI: 10.1109/tnnls.2011.2178477] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Improving fault tolerance of a neural network has been studied for more than two decades. Various training algorithms have been proposed in sequel. The on-line node fault injection-based algorithm is one of these algorithms, in which hidden nodes randomly output zeros during training. While the idea is simple, theoretical analyses on this algorithm are far from complete. This paper presents its objective function and the convergence proof. We consider three cases for multilayer perceptrons (MLPs). They are: (1) MLPs with single linear output node; (2) MLPs with multiple linear output nodes; and (3) MLPs with single sigmoid output node. For the convergence proof, we show that the algorithm converges with probability one. For the objective function, we show that the corresponding objective functions of cases (1) and (2) are of the same form. They both consist of a mean square errors term, a regularizer term, and a weight decay term. For case (3), the objective function is slight different from that of cases (1) and (2). With the objective functions derived, we can compare the similarities and differences among various algorithms and various cases.
Collapse
|
24
|
|
25
|
Pajarinen J, Peltonen J, Uusitalo MA. Fault tolerant machine learning for nanoscale cognitive radio. Neurocomputing 2011. [DOI: 10.1016/j.neucom.2010.10.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
26
|
Ho K, Leung CS, Sum J. Objective functions of online weight noise injection training algorithms for MLPs. IEEE TRANSACTIONS ON NEURAL NETWORKS 2010; 22:317-23. [PMID: 21189237 DOI: 10.1109/tnn.2010.2095881] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Injecting weight noise during training has been a simple strategy to improve the fault tolerance of multilayer perceptrons (MLPs) for almost two decades, and several online training algorithms have been proposed in this regard. However, there are some misconceptions about the objective functions being minimized by these algorithms. Some existing results misinterpret that the prediction error of a trained MLP affected by weight noise is equivalent to the objective function of a weight noise injection algorithm. In this brief, we would like to clarify these misconceptions. Two weight noise injection scenarios will be considered: one is based on additive weight noise injection and the other is based on multiplicative weight noise injection. To avoid the misconceptions, we use their mean updating equations to analyze the objective functions. For injecting additive weight noise during training, we show that the true objective function is identical to the prediction error of a faulty MLP whose weights are affected by additive weight noise. It consists of the conventional mean square error and a smoothing regularizer. For injecting multiplicative weight noise during training, we show that the objective function is different from the prediction error of a faulty MLP whose weights are affected by multiplicative weight noise. With our results, some existing misconceptions regarding MLP training with weight noise injection can now be resolved.
Collapse
Affiliation(s)
- Kevin Ho
- Department of Computer Science and Communication Engineering, Providence University, Taichung 43301, Taiwan.
| | | | | |
Collapse
|
27
|
Leung CS, Wang HJ, Sum J. On the selection of weight decay parameter for faulty networks. IEEE TRANSACTIONS ON NEURAL NETWORKS 2010; 21:1232-44. [PMID: 20682468 DOI: 10.1109/tnn.2010.2049580] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The weight-decay technique is an effective approach to handle overfitting and weight fault. For fault-free networks, without an appropriate value of decay parameter, the trained network is either overfitted or underfitted. However, many existing results on the selection of decay parameter focus on fault-free networks only. It is well known that the weight-decay method can also suppress the effect of weight fault. For the faulty case, using a test set to select the decay parameter is not practice because there are huge number of possible faulty networks for a trained network. This paper develops two mean prediction error (MPE) formulae for predicting the performance of faulty radial basis function (RBF) networks. Two fault models, multiplicative weight noise and open weight fault, are considered. Our MPE formulae involve the training error and trained weights only. Besides, in our method, we do not need to generate a huge number of faulty networks to measure the test error for the fault situation. The MPE formulae allow us to select appropriate values of decay parameter for faulty networks. Our experiments showed that, although there are small differences between the true test errors (from the test set) and the MPE values, the MPE formulae can accurately locate the appropriate value of the decay parameter for minimizing the true test error of faulty networks.
Collapse
Affiliation(s)
- Chi Sing Leung
- Department of Electronic Engineering, City University of Hong Kong, Kowloon 852, Hong Kong.
| | | | | |
Collapse
|
28
|
Kernel Width Optimization for Faulty RBF Neural Networks with Multi-node Open Fault. Neural Process Lett 2010. [DOI: 10.1007/s11063-010-9145-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
29
|
Ho KIJ, Leung CS, Sum J. Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks. IEEE TRANSACTIONS ON NEURAL NETWORKS 2010; 21:938-47. [PMID: 20388593 DOI: 10.1109/tnn.2010.2046179] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In the last two decades, many online fault/noise injection algorithms have been developed to attain a fault tolerant neural network. However, not much theoretical works related to their convergence and objective functions have been reported. This paper studies six common fault/noise-injection-based online learning algorithms for radial basis function (RBF) networks, namely 1) injecting additive input noise, 2) injecting additive/multiplicative weight noise, 3) injecting multiplicative node noise, 4) injecting multiweight fault (random disconnection of weights), 5) injecting multinode fault during training, and 6) weight decay with injecting multinode fault. Based on the Gladyshev theorem, we show that the convergence of these six online algorithms is almost sure. Moreover, their true objective functions being minimized are derived. For injecting additive input noise during training, the objective function is identical to that of the Tikhonov regularizer approach. For injecting additive/multiplicative weight noise during training, the objective function is the simple mean square training error. Thus, injecting additive/multiplicative weight noise during training cannot improve the fault tolerance of an RBF network. Similar to injective additive input noise, the objective functions of other fault/noise-injection-based online algorithms contain a mean square error term and a specialized regularization term.
Collapse
Affiliation(s)
- Kevin I-J Ho
- Department of Computer Science and Communication Engineering, Providence University, Sha-Lu 433, Taiwan.
| | | | | |
Collapse
|
30
|
Islam M, Sattar M, Amin M, Xin Yao, Murase K. A New Constructive Algorithm for Architectural and Functional Adaptation of Artificial Neural Networks. ACTA ACUST UNITED AC 2009; 39:1590-605. [DOI: 10.1109/tsmcb.2009.2021849] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
31
|
|
32
|
Leung CS, Sum JPF. A fault-tolerant regularizer for RBF networks. IEEE TRANSACTIONS ON NEURAL NETWORKS 2008; 19:493-507. [PMID: 18334367 DOI: 10.1109/tnn.2007.912320] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In classical training methods for node open fault, we need to consider many potential faulty networks. When the multinode fault situation is considered, the space of potential faulty networks is very large. Hence, the objective function and the corresponding learning algorithm would be computationally complicated. This paper uses the Kullback-Leibler divergence to define an objective function for improving the fault tolerance of radial basis function (RBF) networks. With the assumption that there is a Gaussian distributed noise term in the output data, a regularizer in the objective function is identified. Finally, the corresponding learning algorithm is developed. In our approach, the objective function and the learning algorithm are computationally simple. Compared with some conventional approaches, including weight-decay-based regularizers, our approach has a better fault-tolerant ability. Besides, our empirical study shows that our approach can improve the generalization ability of a fault-free RBF network.
Collapse
Affiliation(s)
- Chi-Sing Leung
- Department of Electronic Engineering, the City Universityof Hong Kong, Kowloon Tong, Hong Kong.
| | | |
Collapse
|
33
|
Basalyga G, Salinas E. When response variability increases neural network robustness to synaptic noise. Neural Comput 2006; 18:1349-79. [PMID: 16764507 DOI: 10.1162/neco.2006.18.6.1349] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Cortical sensory neurons are known to be highly variable, in the sense that responses evoked by identical stimuli often change dramatically from trial to trial. The origin of this variability is uncertain, but it is usually interpreted as detrimental noise that reduces the computational accuracy of neural circuits. Here we investigate the possibility that such response variability might in fact be beneficial, because it may partially compensate for a decrease in accuracy due to stochastic changes in the synaptic strengths of a network. We study the interplay between two kinds of noise, response (or neuronal) noise and synaptic noise, by analyzing their joint influence on the accuracy of neural networks trained to perform various tasks. We find an interesting, generic interaction: when fluctuations in the synaptic connections are proportional to their strengths (multiplicative noise), a certain amount of response noise in the input neurons can significantly improve network performance, compared to the same network without response noise. Performance is enhanced because response noise and multiplicative synaptic noise are in some ways equivalent. So if the algorithm used to find the optimal synaptic weights can take into account the variability of the model neurons, it can also take into account the variability of the synapses. Thus, the connection patterns generated with response noise are typically more resistant to synaptic degradation than those obtained without response noise. As a consequence of this interplay, if multiplicative synaptic noise is present, it is better to have response noise in the network than not to have it. These results are demonstrated analytically for the most basic network consisting of two input neurons and one output neuron performing a simple classification task, but computer simulations show that the phenomenon persists in a wide range of architectures, including recurrent (attractor) networks and sensorimotor networks that perform coordinate transformations. The results suggest that response variability could play an important dynamic role in networks that continuously learn.
Collapse
Affiliation(s)
- Gleb Basalyga
- Department of Neurobiology and Anatomy, Wake Forest University School of Medicine, Winston-Salem, NC 27157-1010, USA.
| | | |
Collapse
|
34
|
Tchernev EB, Mulvaney RG, Phatak DS. Investigating the Fault Tolerance of Neural Networks. Neural Comput 2005; 17:1646-64. [PMID: 15901410 DOI: 10.1162/0899766053723096] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Particular levels of partial fault tolerance (PFT) in feedforward artificial neural networks of a given size can be obtained by redundancy (replicating a smaller normally trained network), by design (training specifically to increase PFT), and by a combination of the two (replicating a smaller PFT-trained network). This letter investigates the method of achieving the highest PFT per network size (total number of units and connections) for classification problems. It concludes that for nontoy problems, there exists a normally trained network of optimal size that produces the smallest fully fault-tolerant network when replicated. In addition, it shows that for particular network sizes, the best level of PFT is achieved by training a network of that size for fault tolerance. The results and discussion demonstrate how the outcome depends on the levels of saturation of the network nodes when classifying data points. With simple training tasks, where the complexity of the problem and the size of the network are well within the ability of the training method, the hidden-layer nodes operate close to their saturation points, and classification is clean. Under such circumstances, replicating the smallest normally trained correct network yields the highest PFT for any given network size. For hard training tasks (difficult classification problems or network sizes close to the minimum), normal training obtains networks that do not operate close to their saturation points, and outputs are not as close to their targets. In this case, training a larger network for fault tolerance yields better PFT than replicating a smaller, normally trained network. However, since fault-tolerant training on its own produces networks that operate closer to their linear areas than normal training, replicating normally trained networks ultimately leads to better PFT than replicating fault-tolerant networks of the same initial size.
Collapse
Affiliation(s)
- Elko B Tchernev
- Computer Science and Electrical Engineering Department, University of Maryland Baltimore County, Baltimore, MD 21250, USA.
| | | | | |
Collapse
|
35
|
Chandra P, Singh Y. Feedforward sigmoidal networks--equicontinuity and fault-tolerance properties. IEEE TRANSACTIONS ON NEURAL NETWORKS 2004; 15:1350-66. [PMID: 15565765 DOI: 10.1109/tnn.2004.831198] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Sigmoidal feedforward artificial neural networks (FFANNs) have been established to be universal approximators of continuous functions. The universal approximation results are summarized to identify the function sets represented by the sigmoidal FFANNs with the universal approximation properties. The equicontinuous properties of the identified sets is analyzed. The equicontinuous property is related to the fault tolerance of the sigmoidal FFANNs. The generally used arbitrary weight sigmoidal FFANNs are shown to be nonequicontinuous sets. A class of bounded weight sigmoidal FFANNs is established to be equicontinuous. The fault-tolerance behavior of the networks is analyzed and error bounds for the induced errors established.
Collapse
Affiliation(s)
- Pravin Chandra
- School of Information Technology, GGS Indraprastha University, Delhi-110006, India.
| | | |
Collapse
|
36
|
|
37
|
Abstract
We show that minimizing the expected error of a feedforward network over a distribution of weights results in an approximation that tends to be independent of network size as the number of hidden units grows. This minimization can be easily performed, and the complexity of the resulting function implemented by the network is regulated by the variance of the weight distribution. For a fixed variance, there is a number of hidden units above which either the implemented function does not change or the change is slight and tends to zero as the size of the network grows. In sum, the control of the complexity depends on only the variance, not the architecture, provided it is large enough.
Collapse
Affiliation(s)
- V Ruiz De Angulo
- Institut de Robòtica i Informàtica Industrial, (CSIC-UPC), 08034-Barcelona, Spain
| | | |
Collapse
|
38
|
Abstract
The history and some of the methods of analogue neural VLSI are described. The strengths of analogue techniques are described, along with residual problems to be solved. The nature of hardware-friendly and hardware-appropriate algorithms is reviewed and suggestions are offered as to where analogue neural VLSI's future lies.
Collapse
Affiliation(s)
- A Murray
- Dept. of Electronics and Electrical Engineering, University of Edinburgh, UK.
| |
Collapse
|
39
|
Bernier JL, Ortega J, Ros E, Rojas I, Prieto A. A quantitative study of fault tolerance, noise immunity, and generalization ability of MLPs. Neural Comput 2000; 12:2941-64. [PMID: 11112261 DOI: 10.1162/089976600300014782] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
An analysis of the influence of weight and input perturbations in a multilayer perceptron (MLP) is made in this article. Quantitative measurements of fault tolerance, noise immunity, and generalization ability are provided. From the expressions obtained, it is possible to justify some previously reported conjectures and experimentally obtained results (e.g., the influence of weight magnitudes, the relation between training with noise and the generalization ability, the relation between fault tolerance and the generalization ability). The measurements introduced here are explicitly related to the mean squared error degradation in the presence of perturbations, thus constituting a selection criterion between different alternatives of weight configurations. Moreover, they allow us to predict the degradation of the learning performance of an MLP when its weights or inputs are deviated from their nominal values and thus, the behavior of a physical implementation can be evaluated before the weights are mapped on it according to its accuracy.
Collapse
Affiliation(s)
- JL Bernier
- Departamento de Arquitectura y Tecnologia de Computadores, Universidad de Granada, Spain
| | | | | | | | | |
Collapse
|
40
|
Improving the tolerance of multilayer perceptrons by minimizing the statistical sensitivity to weight deviations. Neurocomputing 2000. [DOI: 10.1016/s0925-2312(99)00150-2] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
41
|
Conti M, Orcioni S, Turchetti C. Training neural networks to be insensitive to weight random variations. Neural Netw 2000; 13:125-32. [PMID: 10935464 DOI: 10.1016/s0893-6080(99)00101-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Neural network weights are subject to errors caused by technological tolerances when implemented in digital or analog hardware. Since these random variations are unavoidable and unpredictable, they can seriously affect the expected performances. This work proposes a learning algorithm that takes weight tolerances into account and guarantees a low sensitivity to them. Some experimental results show the validity of the suggested approach.
Collapse
Affiliation(s)
- M Conti
- Department of Electronics, University of Ancona, Italy
| | | | | |
Collapse
|
42
|
|
43
|
Abstract
This article introduces the concept of optimally distributed computation in feedforward neural networks via regularization of weight saliency. By constraining the relative importance of the parameters, computation can be distributed thinly and evenly throughout the network. We propose that this will have beneficial effects on fault-tolerance performance and generalization ability in large network architectures. These theoretical predictions are verified by simulation experiments on two problems: one artificial and the other a real-world task. In summary, this article presents regularization terms for distributing neural computation optimally.
Collapse
Affiliation(s)
- P J Edwards
- Department of Electrical Engineering, Edinburgh University, UK
| | | |
Collapse
|
44
|
|
45
|
Neural Network Classification and Prior Class Probabilities. LECTURE NOTES IN COMPUTER SCIENCE 1998. [DOI: 10.1007/3-540-49430-8_15] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
46
|
Edwards P, Murray A. Fault tolerance via weight noise in analog VLSI implementations of MLPs-a case study with EPSILON. ACTA ACUST UNITED AC 1998. [DOI: 10.1109/82.718593] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
47
|
Abstract
In recent years, the efforts of analogue, neural-hardware designers have shifted from generic analogue neurocomputers to "niche" markets in sensor fusion and robotics, and we explain why this is so. We describe the main differences between digital and analogue computation, and consider the advantages of pure analogue and pulsed methods of design. We then investigate some important issues in analogue design of neural machines, namely weight storage (volatile and non-volatile), on-chip learning, and arithmetic accuracy and its relationship to noise. Finally, we outline those areas in which analogue techniques are likely to prove most useful, and speculate as to their likely long-term utility.
Collapse
Affiliation(s)
- A F Murray
- Department of Electrical Engineering, University of Edinburgh, Scotland, UK.m afm,
| | | |
Collapse
|
48
|
An G. The Effects of Adding Noise During Backpropagation Training on a Generalization Performance. Neural Comput 1996. [DOI: 10.1162/neco.1996.8.3.643] [Citation(s) in RCA: 246] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
We study the effects of adding noise to the inputs, outputs, weight connections, and weight changes of multilayer feedforward neural networks during backpropagation training. We rigorously derive and analyze the objective functions that are minimized by the noise-affected training processes. We show that input noise and weight noise encourage the neural-network output to be a smooth function of the input or its weights, respectively. In the weak-noise limit, noise added to the output of the neural networks only changes the objective function by a constant. Hence, it cannot improve generalization. Input noise introduces penalty terms in the objective function that are related to, but distinct from, those found in the regularization approaches. Simulations have been performed on a regression and a classification problem to further substantiate our analysis. Input noise is found to be effective in improving the generalization performance for both problems. However, weight noise is found to be effective in improving the generalization performance only for the classification problem. Other forms of noise have practically no effect on generalization.
Collapse
Affiliation(s)
- Guozhong An
- Shell Research, P.O. Box 60, 2280 AB Rijswijk, The Netherlands
| |
Collapse
|
49
|
Murray A, Edwards P. Synaptic weight noise during multilayer perceptron training: fault tolerance and training improvements. ACTA ACUST UNITED AC 1993; 4:722-5. [DOI: 10.1109/72.238328] [Citation(s) in RCA: 55] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|