1
|
Yoon R, Oh S, Cho S, Min KS. Memristor-CMOS Hybrid Circuits Implementing Event-Driven Neural Networks for Dynamic Vision Sensor Camera. MICROMACHINES 2024; 15:426. [PMID: 38675238 PMCID: PMC11052483 DOI: 10.3390/mi15040426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/28/2024] [Accepted: 03/20/2024] [Indexed: 04/28/2024]
Abstract
For processing streaming events from a Dynamic Vision Sensor camera, two types of neural networks can be considered. One are spiking neural networks, where simple spike-based computation is suitable for low-power consumption, but the discontinuity in spikes can make the training complicated in terms of hardware. The other one are digital Complementary Metal Oxide Semiconductor (CMOS)-based neural networks that can be trained directly using the normal backpropagation algorithm. However, the hardware and energy overhead can be significantly large, because all streaming events must be accumulated and converted into histogram data, which requires a large amount of memory such as SRAM. In this paper, to combine the spike-based operation with the normal backpropagation algorithm, memristor-CMOS hybrid circuits are proposed for implementing event-driven neural networks in hardware. The proposed hybrid circuits are composed of input neurons, synaptic crossbars, hidden/output neurons, and a neural network's controller. Firstly, the input neurons perform preprocessing for the DVS camera's events. The events are converted to histogram data using very simple memristor-based latches in the input neurons. After preprocessing the events, the converted histogram data are delivered to an ANN implemented using synaptic memristor crossbars. The memristor crossbars can perform low-power Multiply-Accumulate (MAC) calculations according to the memristor's current-voltage relationship. The hidden and output neurons can convert the crossbar's column currents to the output voltages according to the Rectified Linear Unit (ReLU) activation function. The neural network's controller adjusts the MAC calculation frequency according to the workload of the event computation. Moreover, the controller can disable the MAC calculation clock automatically to minimize unnecessary power consumption. The proposed hybrid circuits have been verified by circuit simulation for several event-based datasets such as POKER-DVS and MNIST-DVS. The circuit simulation results indicate that the neural network's performance proposed in this paper is degraded by as low as 0.5% while saving as much as 79% in power consumption for POKER-DVS. The recognition rate of the proposed scheme is lower by 0.75% compared to the conventional one, for the MNIST-DVS dataset. In spite of this little loss, the power consumption can be reduced by as much as 75% for the proposed scheme.
Collapse
Affiliation(s)
| | | | | | - Kyeong-Sik Min
- School of Electrical Engineering, Kookmin University, Seoul 02707, Republic of Korea; (R.Y.)
| |
Collapse
|
2
|
Le M, Truong SN. Research on the Impact of Data Density on Memristor Crossbar Architectures in Neuromorphic Pattern Recognition. MICROMACHINES 2023; 14:1990. [PMID: 38004846 PMCID: PMC10672814 DOI: 10.3390/mi14111990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 09/22/2023] [Accepted: 10/25/2023] [Indexed: 11/26/2023]
Abstract
Binary memristor crossbars have great potential for use in brain-inspired neuromorphic computing. The complementary crossbar array has been proposed to perform the Exclusive-NOR function for neuromorphic pattern recognition. The single crossbar obtained by shortening the Exclusive-NOR function has more advantages in terms of power consumption, area occupancy, and fault tolerance. In this paper, we present the impact of data density on the single memristor crossbar architecture for neuromorphic image recognition. The impact of data density on the single memristor architecture is mathematically derived from the reduced formula of the Exclusive-NOR function, and then verified via circuit simulation. The complementary and single crossbar architectures are tested by using ten 32 × 32 images with different data densities of 0.25, 0.5, and 0.75. The simulation results showed that the data density of images has a negative effect on the single memristor crossbar architecture while not affecting the complementary memristor crossbar architecture. The maximum output column current produced by the single memristor crossbar array decreases as data density decreases while the complementary memristor crossbar array architecture provides stable maximum output column currents. When recognizing images with data density as low as 0.25, the maximum output column currents of the single memristor crossbar architecture is reduced four-fold compared with the maximum currents from the complementary memristor crossbar architecture. This reduction causes the Winner-take-all circuit to work incorrectly and will reduce the recognition rate of the single memristor crossbar architecture. These simulation results show that the single memristor crossbar architecture has more advantages compared with the complementary crossbar architecture when the images do have not many different densities, and none of the images have very low densities. This work also indicates that the single crossbar architecture must be improved by adding a constant term to deal with images that have low data densities. These are valuable case studies for archiving the advantages of single memristor crossbar architecture in neuromorphic computing applications.
Collapse
Affiliation(s)
| | - Son Ngoc Truong
- Faculty of Electrical and Electronics Engineering, Ho Chi Minh City University of Technology and Education, Ho Chi Minh City 70000, Vietnam;
| |
Collapse
|
3
|
Oh S, An J, Cho S, Yoon R, Min KS. Memristor Crossbar Circuits Implementing Equilibrium Propagation for On-Device Learning. MICROMACHINES 2023; 14:1367. [PMID: 37512678 PMCID: PMC10384638 DOI: 10.3390/mi14071367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 05/22/2023] [Accepted: 07/01/2023] [Indexed: 07/30/2023]
Abstract
Equilibrium propagation (EP) has been proposed recently as a new neural network training algorithm based on a local learning concept, where only local information is used to calculate the weight update of the neural network. Despite the advantages of local learning, numerical iteration for solving the EP dynamic equations makes the EP algorithm less practical for realizing edge intelligence hardware. Some analog circuits have been suggested to solve the EP dynamic equations physically, not numerically, using the original EP algorithm. However, there are still a few problems in terms of circuit implementation: for example, the need for storing the free-phase solution and the lack of essential peripheral circuits for calculating and updating synaptic weights. Therefore, in this paper, a new analog circuit technique is proposed to realize the EP algorithm in practical and implementable hardware. This work has two major contributions in achieving this objective. First, the free-phase and nudge-phase solutions are calculated by the proposed analog circuits simultaneously, not at different times. With this process, analog voltage memories or digital memories with converting circuits between digital and analog domains for storing the free-phase solution temporarily can be eliminated in the proposed EP circuit. Second, a simple EP learning rule relying on a fixed amount of conductance change per programming pulse is newly proposed and implemented in peripheral circuits. The modified EP learning rule can make the weight update circuit practical and implementable without requiring the use of a complicated program verification scheme. The proposed memristor conductance update circuit is simulated and verified for training synaptic weights on memristor crossbars. The simulation results showed that the proposed EP circuit could be used for realizing on-device learning in edge intelligence hardware.
Collapse
Affiliation(s)
- Seokjin Oh
- School of Electrical Engineering, Kookmin University, Seoul 02707, Republic of Korea
| | - Jiyong An
- School of Electrical Engineering, Kookmin University, Seoul 02707, Republic of Korea
| | - Seungmyeong Cho
- School of Electrical Engineering, Kookmin University, Seoul 02707, Republic of Korea
| | - Rina Yoon
- School of Electrical Engineering, Kookmin University, Seoul 02707, Republic of Korea
| | - Kyeong-Sik Min
- School of Electrical Engineering, Kookmin University, Seoul 02707, Republic of Korea
| |
Collapse
|
4
|
Oh S, An J, Min KS. Area-Efficient Mapping of Convolutional Neural Networks to Memristor Crossbars Using Sub-Image Partitioning. MICROMACHINES 2023; 14:309. [PMID: 36838009 PMCID: PMC9959389 DOI: 10.3390/mi14020309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 01/21/2023] [Accepted: 01/22/2023] [Indexed: 06/18/2023]
Abstract
Memristor crossbars can be very useful for realizing edge-intelligence hardware, because the neural networks implemented by memristor crossbars can save significantly more computing energy and layout area than the conventional CMOS (complementary metal-oxide-semiconductor) digital circuits. One of the important operations used in neural networks is convolution. For performing the convolution by memristor crossbars, the full image should be partitioned into several sub-images. By doing so, each sub-image convolution can be mapped to small-size unit crossbars, of which the size should be defined as 128 × 128 or 256 × 256 to avoid the line resistance problem caused from large-size crossbars. In this paper, various convolution schemes with 3D, 2D, and 1D kernels are analyzed and compared in terms of neural network's performance and overlapping overhead. The neural network's simulation indicates that the 2D + 1D kernels can perform the sub-image convolution using a much smaller number of unit crossbars with less rate loss than the 3D kernels. When the CIFAR-10 dataset is tested, the mapping of sub-image convolution of 2D + 1D kernels to crossbars shows that the number of unit crossbars can be reduced almost by 90% and 95%, respectively, for 128 × 128 and 256 × 256 crossbars, compared with the 3D kernels. On the contrary, the rate loss of 2D + 1D kernels can be less than 2%. To improve the neural network's performance more, the 2D + 1D kernels can be combined with 3D kernels in one neural network. When the normalized ratio of 2D + 1D layers is around 0.5, the neural network's performance indicates very little rate loss compared to when the normalized ratio of 2D + 1D layers is zero. However, the number of unit crossbars for the normalized ratio = 0.5 can be reduced by half compared with that for the normalized ratio = 0.
Collapse
|
5
|
An J, Oh S, Nguyen TV, Min KS. Synapse-Neuron-Aware Training Scheme of Defect-Tolerant Neural Networks with Defective Memristor Crossbars. MICROMACHINES 2022; 13:mi13020273. [PMID: 35208396 PMCID: PMC8878212 DOI: 10.3390/mi13020273] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 02/04/2022] [Accepted: 02/06/2022] [Indexed: 11/21/2022]
Abstract
To overcome the limitations of CMOS digital systems, emerging computing circuits such as memristor crossbars have been investigated as potential candidates for significantly increasing the speed and energy efficiency of next-generation computing systems, which are required for implementing future AI hardware. Unfortunately, manufacturing yield still remains a serious challenge in adopting memristor-based computing systems due to the limitations of immature fabrication technology. To compensate for malfunction of neural networks caused from the fabrication-related defects, a new crossbar training scheme combining the synapse-aware with the neuron-aware together is proposed in this paper, for optimizing the defect map size and the neural network’s performance simultaneously. In the proposed scheme, the memristor crossbar’s columns are divided into 3 groups, which are the severely-defective, moderately-defective, and normal columns, respectively. Here, each group is trained according to the trade-off relationship between the neural network’s performance and the hardware overhead of defect-tolerant training. As a result of this group-based training method combining the neuron-aware with the synapse-aware, in this paper, the new scheme can be successful in improving the network’s performance better than both the synapse-aware and the neuron-aware while minimizing its hardware burden. For example, when testing the defect percentage = 10% with MNIST dataset, the proposed scheme outperforms the synapse-aware and the neuron-aware by 3.8% and 3.4% for the number of crossbar’s columns trained for synapse defects = 10 and 138 among 310, respectively, while maintaining the smaller memory size than the synapse-aware. When the trained columns = 138, the normalized memory size of the synapse-neuron-aware scheme can be smaller by 3.1% than the synapse-aware.
Collapse
|
6
|
Nguyen TV, An J, Min KS. Memristor-CMOS Hybrid Neuron Circuit with Nonideal-Effect Correction Related to Parasitic Resistance for Binary-Memristor-Crossbar Neural Networks. MICROMACHINES 2021; 12:mi12070791. [PMID: 34357201 PMCID: PMC8304214 DOI: 10.3390/mi12070791] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/27/2021] [Accepted: 06/28/2021] [Indexed: 11/23/2022]
Abstract
Voltages and currents in a memristor crossbar can be significantly affected due to nonideal effects such as parasitic source, line, and neuron resistance. These nonideal effects related to the parasitic resistance can cause the degradation of the neural network’s performance realized with the nonideal memristor crossbar. To avoid performance degradation due to the parasitic-resistance-related nonideal effects, adaptive training methods were proposed previously. However, the complicated training algorithm could add a heavy computational burden to the neural network hardware. Especially, the hardware and algorithmic burden can be more serious for edge intelligence applications such as Internet of Things (IoT) sensors. In this paper, a memristor-CMOS hybrid neuron circuit is proposed for compensating the parasitic-resistance-related nonideal effects during not the training phase but the inference one, where the complicated adaptive training is not needed. Moreover, unlike the previous linear correction method performed by the external hardware, the proposed correction circuit can be included in the memristor crossbar to minimize the power and hardware overheads for compensating the nonideal effects. The proposed correction circuit has been verified to be able to restore the degradation of source and output voltages in the nonideal crossbar. For the source voltage, the average percentage error of the uncompensated crossbar is as large as 36.7%. If the correction circuit is used, the percentage error in the source voltage can be reduced from 36.7% to 7.5%. For the output voltage, the average percentage error of the uncompensated crossbar is as large as 65.2%. The correction circuit can improve the percentage error in the output voltage from 65.2% to 8.6%. Almost the percentage error can be reduced to ~1/7 if the correction circuit is used. The nonideal memristor crossbar with the correction circuit has been tested for MNIST and CIFAR-10 datasets in this paper. For MNIST, the uncompensated and compensated crossbars indicate the recognition rate of 90.4% and 95.1%, respectively, compared to 95.5% of the ideal crossbar. For CIFAR-10, the nonideal crossbars without and with the nonideal-effect correction show the rate of 85.3% and 88.1%, respectively, compared to the ideal crossbar achieving the rate as large as 88.9%.
Collapse
Affiliation(s)
- Tien Van Nguyen
- School of Electrical Engineering, Kookmin University, Seoul 02707, Korea
| | - Jiyong An
- School of Electrical Engineering, Kookmin University, Seoul 02707, Korea
| | - Kyeong-Sik Min
- School of Electrical Engineering, Kookmin University, Seoul 02707, Korea
| |
Collapse
|
7
|
Exploiting defective RRAM array as synapses of HTM spatial pooler with boost-factor adjustment scheme for defect-tolerant neuromorphic systems. Sci Rep 2020; 10:11703. [PMID: 32678139 PMCID: PMC7367284 DOI: 10.1038/s41598-020-68547-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Accepted: 06/19/2020] [Indexed: 11/08/2022] Open
Abstract
A crossbar array architecture employing resistive switching memory (RRAM) as a synaptic element accelerates vector-matrix multiplication in a parallel fashion, enabling energy-efficient pattern recognition. To implement the function of the synapse in the RRAM, multilevel resistance states are required. More importantly, a large on/off ratio of the RRAM should be preferentially obtained to ensure a reasonable margin between each state taking into account the inevitable variability caused by the inherent switching mechanism. The on/off ratio is basically adjusted in two ways by modulating measurement conditions such as compliance current or voltage pulses modulation. The latter technique is not only more suitable for practical systems, but also can achieve multiple states in low current range. However, at the expense of applying a high negative voltage aimed at enlarging the on/off ratio, a breakdown of the RRAM occurs unexpectedly. This stuck-at-short fault of the RRAM adversely affects the recognition process based on reading and judging each column current changed by the multiplication of the input voltage and resistance of the RRAM in the array, degrading the accuracy. To address this challenge, we introduce a boost-factor adjustment technique as a fault-tolerant scheme based on simple circuitry that eliminates the additional process to identify specific locations of the failed RRAMs in the array. Spectre circuit simulation is performed to verify the effect of the scheme on Modified National Institute of Standards and Technology dataset using convolutional neural networks in non-ideal crossbar arrays, where experimentally observed imperfective RRAMs are configured. Our results show that the recognition accuracy can be maintained similar to the ideal case because the interruption of the failure is suppressed by the scheme.
Collapse
|
8
|
Hybrid Circuit of Memristor and Complementary Metal-Oxide-Semiconductor for Defect-Tolerant Spatial Pooling with Boost-Factor Adjustment. MATERIALS 2019; 12:ma12132122. [PMID: 31266255 PMCID: PMC6651624 DOI: 10.3390/ma12132122] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 06/27/2019] [Accepted: 06/29/2019] [Indexed: 11/16/2022]
Abstract
Hierarchical Temporal Memory (HTM) has been known as a software framework to model the brain's neocortical operation. However, mimicking the brain's neocortical operation by not software but hardware is more desirable, because the hardware can not only describe the neocortical operation, but can also employ the brain's architectural advantages. To develop a hybrid circuit of memristor and Complementary Metal-Oxide-Semiconductor (CMOS) for realizing HTM's spatial pooler (SP) by hardware, memristor defects such as stuck-at-faults and variations should be considered. For solving the defect problem, we first show that the boost-factor adjustment can make HTM's SP defect-tolerant, because the false activation of defective columns are suppressed. Second, we propose a memristor-CMOS hybrid circuit with the boost-factor adjustment to realize this defect-tolerant SP by hardware. The proposed circuit does not rely on the conventional defect-aware mapping scheme, which cannot avoid the false activation of defective columns. For the Modified subset of National Institute of Standards and Technology (MNIST) vectors, the boost-factor adjusted crossbar with defects = 10% shows a rate loss of only ~0.6%, compared to the ideal crossbar with defects = 0%. On the contrary, the defect-aware mapping without the boost-factor adjustment demonstrates a significant rate loss of ~21.0%. The energy overhead of the boost-factor adjustment is only ~0.05% of the programming energy of memristor synapse crossbar.
Collapse
|
9
|
Partial-Gated Memristor Crossbar for Fast and Power-Efficient Defect-Tolerant Training. MICROMACHINES 2019; 10:mi10040245. [PMID: 31013938 PMCID: PMC6523436 DOI: 10.3390/mi10040245] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 04/09/2019] [Accepted: 04/12/2019] [Indexed: 11/17/2022]
Abstract
A real memristor crossbar has defects, which should be considered during the retraining time after the pre-training of the crossbar. For retraining the crossbar with defects, memristors should be updated with the weights that are calculated by the back-propagation algorithm. Unfortunately, programming the memristors takes a very long time and consumes a large amount of power, because of the incremental behavior of memristor’s program-verify scheme for the fine-tuning of memristor’s conductance. To reduce the programming time and power, the partial gating scheme is proposed here to realize the partial training, where only some part of neurons are trained, which are more responsible in the recognition error. By retraining the part, rather than the entire crossbar, the programming time and power of memristor crossbar can be significantly reduced. The proposed scheme has been verified by CADENCE circuit simulation with the real memristor’s Verilog-A model. When compared to retraining the entire crossbar, the loss of recognition rate of the partial gating scheme has been estimated only as small as 2.5% and 2.9%, for the MNIST and CIFAR-10 datasets, respectively. However, the programming time and power can be saved by 86% and 89.5% than the 100% retraining, respectively.
Collapse
|