1
|
Siddique A, Vai MI, Pun SH. A low cost neuromorphic learning engine based on a high performance supervised SNN learning algorithm. Sci Rep 2023; 13:6280. [PMID: 37072443 PMCID: PMC10113267 DOI: 10.1038/s41598-023-32120-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 03/22/2023] [Indexed: 05/03/2023] Open
Abstract
Spiking neural networks (SNNs) are more energy- and resource-efficient than artificial neural networks (ANNs). However, supervised SNN learning is a challenging task due to non-differentiability of spikes and computation of complex terms. Moreover, the design of SNN learning engines is not an easy task due to limited hardware resources and tight energy constraints. In this article, a novel hardware-efficient SNN back-propagation scheme that offers fast convergence is proposed. The learning scheme does not require any complex operation such as error normalization and weight-threshold balancing, and can achieve an accuracy of around 97.5% on MNIST dataset using only 158,800 synapses. The multiplier-less inference engine trained using the proposed hard sigmoid SNN training (HaSiST) scheme can operate at a frequency of 135 MHz and consumes only 1.03 slice registers per synapse, 2.8 slice look-up tables, and can infer about 0.03[Formula: see text] features in a second, equivalent to 9.44 giga synaptic operations per second (GSOPS). The article also presents a high-speed, cost-efficient SNN training engine that consumes only 2.63 slice registers per synapse, 37.84 slice look-up tables per synapse, and can operate at a maximum computational frequency of around 50 MHz on a Virtex 6 FPGA.
Collapse
Affiliation(s)
- Ali Siddique
- Department of Electrical and Computer Engineering, Faculty of Science and Technology, University of Macau, Taipa, 999078, Macau.
| | - Mang I Vai
- Department of Electrical and Computer Engineering, Faculty of Science and Technology, University of Macau, Taipa, 999078, Macau
| | - Sio Hang Pun
- Department of Electrical and Computer Engineering, Faculty of Science and Technology, University of Macau, Taipa, 999078, Macau
| |
Collapse
|
2
|
Siddique A, Iqbal MA, Aleem M, Lin JCW. A high-performance, hardware-based deep learning system for disease diagnosis. PeerJ Comput Sci 2022; 8:e1034. [PMID: 36091996 PMCID: PMC9454880 DOI: 10.7717/peerj-cs.1034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 06/20/2022] [Indexed: 06/15/2023]
Abstract
Modern deep learning schemes have shown human-level performance in the area of medical science. However, the implementation of deep learning algorithms on dedicated hardware remains a challenging task because modern algorithms and neuronal activation functions are generally not hardware-friendly and require a lot of resources. Recently, researchers have come up with some hardware-friendly activation functions that can yield high throughput and high accuracy at the same time. In this context, we propose a hardware-based neural network that can predict the presence of cancer in humans with 98.23% accuracy. This is done by making use of cost-efficient, highly accurate activation functions, Sqish and LogSQNL. Due to its inherently parallel components, the system can classify a given sample in just one clock cycle, i.e., 15.75 nanoseconds. Though this system is dedicated to cancer diagnosis, it can predict the presence of many other diseases such as those of the heart. This is because the system is reconfigurable and can be programmed to classify any sample into one of two classes. The proposed hardware system requires about 983 slice registers, 2,655 slice look-up tables, and only 1.1 kilobits of on-chip memory. The system can predict about 63.5 million cancer samples in a second and can perform about 20 giga-operations per second. The proposed system is about 5-16 times cheaper and at least four times speedier than other dedicated hardware systems using neural networks for classification tasks.
Collapse
Affiliation(s)
- Ali Siddique
- National University of Computer and Emerging Sciences, Lahore Campus, Pakistan
- University of Macau, Taipa, Macau
| | | | - Muhammad Aleem
- National University of Computer and Emerging Sciences, Islamabad, Pakistan
| | | |
Collapse
|
3
|
Approximate Computing Circuits for Embedded Tactile Data Processing. ELECTRONICS 2022. [DOI: 10.3390/electronics11020190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
In this paper, we demonstrate the feasibility and efficiency of approximate computing techniques (ACTs) in the embedded Support Vector Machine (SVM) tensorial kernel circuit implementation in tactile sensing systems. Improving the performance of the embedded SVM in terms of power, area, and delay can be achieved by implementing approximate multipliers in the SVD. Singular Value Decomposition (SVD) is the main computational bottleneck of the tensorial kernel approach; since digital multipliers are extensively used in SVD implementation, we aim to optimize the implementation of the multiplier circuit. We present the implementation of the approximate SVD circuit based on the Approximate Baugh-Wooley (Approx-BW) multiplier. The approximate SVD achieves an energy consumption reduction of up to 16% at the cost of a Mean Relative Error decrease (MRE) of less than 5%. We assess the impact of the approximate SVD on the accuracy of the classification; showing that approximate SVD increases the Error rate (Err) within a range of one to eight percent. Besides, we propose a hybrid evaluation test approach that consists of implementing three different approximate SVD circuits having different numbers of approximated Least Significant Bits (LSBs). The results show that energy consumption is reduced by more than five percent with the same accuracy loss.
Collapse
|
4
|
Wang X. Establishment of an Internet-Based Epidemiological Survey Data Collection Customized System Model. Front Public Health 2021; 9:761031. [PMID: 34722454 PMCID: PMC8553986 DOI: 10.3389/fpubh.2021.761031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 09/23/2021] [Indexed: 11/29/2022] Open
Abstract
Epidemiology occupies a very important position in preventive medicine. Its essence is to summarize the etiology and epidemic laws by studying the distribution and possible effects of diseases, so as to promote the formation of scientific epidemic prevention measures. The purpose of this article is to help relevant personnel complete the collection, induction, and analysis of epidemiological survey data by establishing a data collection system model to improve work efficiency. This article focuses on the new coronavirus pneumonia (COVID-19), investigates the development status of epidemiological survey data collection, and analyzes the problems in the current business process, and on this basis, develops a dedicated epidemiological survey System model for data collection. From the experimental data, the optimized correction evaluation index has been increased from 8.384 to 9.067. It can be seen that the combination of data mining algorithms and backpropagation algorithms can better improve the system's ability to process information. Professional information disclosure platforms can have a good positive impression on the prevention and treatment of epidemics. The Internet-based epidemiological survey customized system model established in this article is to integrate various epidemiological data so that people can correctly understand the spread of epidemics and promote the development of preventive medicine.
Collapse
Affiliation(s)
- Xusheng Wang
- Faculty of Printing, Packaging Engineering and Digital Media Technology, Xi'an University of Technology, Xi'an, China
| |
Collapse
|
5
|
Huang H, Yang J, Rong HJ, Du S. A generic FPGA-based hardware architecture for recursive least mean p-power extreme learning machine. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.05.069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
6
|
Carminati M, Scandurra G. Impact and trends in embedding field programmable gate arrays and microcontrollers in scientific instrumentation. THE REVIEW OF SCIENTIFIC INSTRUMENTS 2021; 92:091501. [PMID: 34598486 DOI: 10.1063/5.0050999] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Accepted: 08/16/2021] [Indexed: 06/13/2023]
Abstract
Microcontrollers and field-programmable gate arrays have been largely leveraged in scientific instrumentation since decades. Recent advancements in the performance of these programmable digital devices, with hundreds of I/O pins, up to millions of logic cells, >10 Gb/s connectivity, and hundreds of MHz multiple clocks, have been accelerating this trend, extending the range of functions. The diversification of devices from very low-cost 8-bit microcontrollers up to 32-bit ARM-based ones and a system of chip combining programmable logic with processors make them ubiquitous in modern electronic systems, addressing diverse challenges from ultra-low power operation, with sub-µA quiescent current in sleep mode for portable and Internet of Things applications, to high-performance computing, such as in machine vision. In this Review, the main motivations (compactness, re-configurability, parallelization, low latency for sub-ns timing, and real-time control), the possible approaches of the adoption of embedded devices, and the achievable performances are discussed. Relevant examples of applications in opto-electronics, physics experiments, impedance, vibration, and temperature sensing from the recent literature are also reviewed. From this bird-eye view, key paradigms emerge, such as the blurring of boundaries between digital platforms and the pervasiveness of machine learning algorithms, significantly fostered by the possibility to be run in embedded devices for distributing intelligence in the environment.
Collapse
Affiliation(s)
- M Carminati
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milano 20133, Italy
| | - G Scandurra
- Dipartimento di Ingegneria, Università degli Studi di Messina, Messina 98166, Italy
| |
Collapse
|
7
|
Sanchez-Iborra R. LPWAN and Embedded Machine Learning as Enablers for the Next Generation of Wearable Devices. SENSORS 2021; 21:s21155218. [PMID: 34372455 PMCID: PMC8347601 DOI: 10.3390/s21155218] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 07/27/2021] [Accepted: 07/28/2021] [Indexed: 01/14/2023]
Abstract
The penetration of wearable devices in our daily lives is unstoppable. Although they are very popular, so far, these elements provide a limited range of services that are mostly focused on monitoring tasks such as fitness, activity, or health tracking. Besides, given their hardware and power constraints, wearable units are dependent on a master device, e.g., a smartphone, to make decisions or send the collected data to the cloud. However, a new wave of both communication and artificial intelligence (AI)-based technologies fuels the evolution of wearables to an upper level. Concretely, they are the low-power wide-area network (LPWAN) and tiny machine-learning (TinyML) technologies. This paper reviews and discusses these solutions, and explores the major implications and challenges of this technological transformation. Finally, the results of an experimental study are presented, analyzing (i) the long-range connectivity gained by a wearable device in a university campus scenario, thanks to the integration of LPWAN communications, and (ii) how complex the intelligence embedded in this wearable unit can be. This study shows the interesting characteristics brought by these state-of-the-art paradigms, concluding that a wide variety of novel services and applications will be supported by the next generation of wearables.
Collapse
Affiliation(s)
- Ramon Sanchez-Iborra
- Department of Engineering and Applied Techniques, University Center of Defense at General Air Force Academy, Santiago de la Ribera, 30729 Murcia, Spain
| |
Collapse
|
8
|
Knoblauch A. Power Function Error Initialization Can Improve Convergence of Backpropagation Learning in Neural Networks for Classification. Neural Comput 2021; 33:2193-2225. [PMID: 34310673 DOI: 10.1162/neco_a_01407] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 03/11/2021] [Indexed: 11/04/2022]
Abstract
Supervised learning corresponds to minimizing a loss or cost function expressing the differences between model predictions yn and the target values tn given by the training data. In neural networks, this means backpropagating error signals through the transposed weight matrixes from the output layer toward the input layer. For this, error signals in the output layer are typically initialized by the difference yn- tn, which is optimal for several commonly used loss functions like cross-entropy or sum of squared errors. Here I evaluate a more general error initialization method using power functions |yn- tn|q for q>0, corresponding to a new family of loss functions that generalize cross-entropy. Surprisingly, experiments on various learning tasks reveal that a proper choice of q can significantly improve the speed and convergence of backpropagation learning, in particular in deep and recurrent neural networks. The results suggest two main reasons for the observed improvements. First, compared to cross-entropy, the new loss functions provide better fits to the distribution of error signals in the output layer and therefore maximize the model's likelihood more efficiently. Second, the new error initialization procedure may often provide a better gradient-to-loss ratio over a broad range of neural output activity, thereby avoiding flat loss landscapes with vanishing gradients.
Collapse
|
9
|
Abstract
The adoption of intelligent systems with Artificial Neural Networks (ANNs) embedded in hardware for real-time applications currently faces a growing demand in fields such as the Internet of Things (IoT) and Machine to Machine (M2M). However, the application of ANNs in this type of system poses a significant challenge due to the high computational power required to process its basic operations. This paper aims to show an implementation strategy of a Multilayer Perceptron (MLP)-type neural network, in a microcontroller (a low-cost, low-power platform). A modular matrix-based MLP with the full classification process was implemented as was the backpropagation training in the microcontroller. The testing and validation were performed through Hardware-In-the-Loop (HIL) of the Mean Squared Error (MSE) of the training process, classification results, and the processing time of each implementation module. The results revealed a linear relationship between the values of the hyperparameters and the processing time required for classification, also the processing time concurs with the required time for many applications in the fields mentioned above. These findings show that this implementation strategy and this platform can be applied successfully in real-time applications that require the capabilities of ANNs.
Collapse
|
10
|
Improving learning and generalization capabilities of the C-Mantec constructive neural network algorithm. Neural Comput Appl 2020. [DOI: 10.1007/s00521-019-04388-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
11
|
Digital hardware realization of a novel adaptive ink drop spread operator and its application in modeling and classification and on-chip training. INT J MACH LEARN CYB 2018. [DOI: 10.1007/s13042-018-0890-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
12
|
Cheng J, Wu J, Leng C, Wang Y, Hu Q. Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4730-4743. [PMID: 29990226 DOI: 10.1109/tnnls.2017.2774288] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We are witnessing an explosive development and widespread application of deep neural networks (DNNs) in various fields. However, DNN models, especially a convolutional neural network (CNN), usually involve massive parameters and are computationally expensive, making them extremely dependent on high-performance hardware. This prohibits their further extensions, e.g., applications on mobile devices. In this paper, we present a quantized CNN, a unified approach to accelerate and compress convolutional networks. Guided by minimizing the approximation error of individual layer's response, both fully connected and convolutional layers are carefully quantized. The inference computation can be effectively carried out on the quantized network, with much lower memory and storage consumption. Quantitative evaluation on two publicly available benchmarks demonstrates the promising performance of our approach: with comparable classification accuracy, it achieves 4 to $6 \times $ acceleration and 15 to $20\times $ compression. With our method, accurate image classification can even be directly carried out on mobile devices within 1 s.
Collapse
|
13
|
Mostafa H, Pedroni B, Sheik S, Cauwenberghs G. Hardware-Efficient On-line Learning through Pipelined Truncated-Error Backpropagation in Binary-State Networks. Front Neurosci 2017; 11:496. [PMID: 28932180 PMCID: PMC5592276 DOI: 10.3389/fnins.2017.00496] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2017] [Accepted: 08/22/2017] [Indexed: 11/13/2022] Open
Abstract
Artificial neural networks (ANNs) trained using backpropagation are powerful learning architectures that have achieved state-of-the-art performance in various benchmarks. Significant effort has been devoted to developing custom silicon devices to accelerate inference in ANNs. Accelerating the training phase, however, has attracted relatively little attention. In this paper, we describe a hardware-efficient on-line learning technique for feedforward multi-layer ANNs that is based on pipelined backpropagation. Learning is performed in parallel with inference in the forward pass, removing the need for an explicit backward pass and requiring no extra weight lookup. By using binary state variables in the feedforward network and ternary errors in truncated-error backpropagation, the need for any multiplications in the forward and backward passes is removed, and memory requirements for the pipelining are drastically reduced. Further reduction in addition operations owing to the sparsity in the forward neural and backpropagating error signal paths contributes to highly efficient hardware implementation. For proof-of-concept validation, we demonstrate on-line learning of MNIST handwritten digit classification on a Spartan 6 FPGA interfacing with an external 1Gb DDR2 DRAM, that shows small degradation in test error performance compared to an equivalently sized binary ANN trained off-line using standard back-propagation and exact errors. Our results highlight an attractive synergy between pipelined backpropagation and binary-state networks in substantially reducing computation and memory requirements, making pipelined on-line learning practical in deep networks.
Collapse
Affiliation(s)
- Hesham Mostafa
- Institute for Neural Computation, University of California, San DiegoSan Diego, CA, United States
| | - Bruno Pedroni
- Department of Bioengineering, University of California, San DiegoSan Diego, CA, United States
| | - Sadique Sheik
- BioCircuits Institute, University of California, San DiegoSan Diego, CA, United States
| | - Gert Cauwenberghs
- Institute for Neural Computation, University of California, San DiegoSan Diego, CA, United States.,Department of Bioengineering, University of California, San DiegoSan Diego, CA, United States.,BioCircuits Institute, University of California, San DiegoSan Diego, CA, United States
| |
Collapse
|
14
|
Ortega-Zamorano F, Jerez JM, Juárez GE, Franco L. FPGA Implementation of Neurocomputational Models: Comparison Between Standard Back-Propagation and C-Mantec Constructive Algorithm. Neural Process Lett 2017. [DOI: 10.1007/s11063-017-9655-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
15
|
Rubio JDJ. Least square neural network model of the crude oil blending process. Neural Netw 2016; 78:88-96. [DOI: 10.1016/j.neunet.2016.02.006] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Revised: 02/13/2016] [Accepted: 02/17/2016] [Indexed: 11/27/2022]
|