1
|
Tan G, Wan Z, Wang Y, Cao Y, Zha ZJ. Tackling Event-Based Lip-Reading by Exploring Multigrained Spatiotemporal Clues. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8279-8291. [PMID: 39288038 DOI: 10.1109/tnnls.2024.3440495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
Automatic lip-reading (ALR) is the task of recognizing words based on visual information obtained from the speaker's lip movements. In this study, we introduce event cameras, a novel type of sensing device, for ALR. Event cameras offer both technical and application advantages over conventional cameras for ALR due to their higher temporal resolution, less redundant visual information, and lower power consumption. To recognize words from the event data, we propose a novel multigrained spatiotemporal features learning framework, which is capable of perceiving fine-grained spatiotemporal features from microsecond time-resolved event data. Specifically, we first convert the event data into event frames of multiple temporal resolutions to avoid losing too much visual information at the event representation stage. Then, they are fed into a multibranch subnetwork where the branch operating on low-rate frames can perceive spatially complete but temporally coarse features, while the branch operating on high frame rate can perceive spatially coarse but temporally fine features. Thus, fine-grained spatial and temporal features can be simultaneously learned by integrating the features perceived by different branches. Furthermore, to model the temporal relationships in the event stream, we design a temporal aggregation subnetwork to aggregate the features perceived by the multibranch subnetwork. In addition, we collect two event-based lip-reading datasets (DVS-Lip and DVS-LRW100) for the study of the event-based lip-reading task. Experimental results demonstrate the superiority of the proposed model over the state-of-the-art event-based action recognition models and video-based lip-reading models.
Collapse
|
2
|
Zhang S, Zha F, Wang X, Li M, Guo W, Wang P, Li X, Sun L. High-efficiency sparse convolution operator for event-based cameras. Front Neurorobot 2025; 19:1537673. [PMID: 40144017 PMCID: PMC11936924 DOI: 10.3389/fnbot.2025.1537673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Accepted: 02/24/2025] [Indexed: 03/28/2025] Open
Abstract
Event-based cameras are bio-inspired vision sensors that mimic the sparse and asynchronous activation of the animal retina, offering advantages such as low latency and low computational load in various robotic applications. However, despite their inherent sparsity, most existing visual processing algorithms are optimized for conventional standard cameras and dense images captured from them, resulting in computational redundancy and high latency when applied to event-based cameras. To address this gap, we propose a sparse convolution operator tailored for event-based cameras. By selectively skipping invalid sub-convolutions and efficiently reorganizing valid computations, our operator reduces computational workload by nearly 90% and achieves almost 2× acceleration in processing speed, while maintaining the same accuracy as dense convolution operators. This innovation unlocks the potential of event-based cameras in applications such as autonomous navigation, real-time object tracking, and industrial inspection, enabling low-latency and high-efficiency perception in resource-constrained robotic systems.
Collapse
Affiliation(s)
- Sen Zhang
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Fusheng Zha
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
- Lanzhou University of Technology, Lanzhou, China
| | - Xiangji Wang
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Mantian Li
- Institute of Intelligent Manufacturing Technology, Shenzhen Polytechnic University, Shenzhen, China
| | - Wei Guo
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Pengfei Wang
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Xiaolin Li
- Institute of Intelligent Manufacturing Technology, Shenzhen Polytechnic University, Shenzhen, China
| | - Lining Sun
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
3
|
Zhang B, Suo J, Dai Q. Event-Enhanced Snapshot Compressive Videography at 10K FPS. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1266-1278. [PMID: 39527439 DOI: 10.1109/tpami.2024.3496788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Video snapshot compressive imaging (SCI) encodes the target dynamic scene compactly into a snapshot and reconstructs its high-speed frame sequence afterward, greatly reducing the required data footprint and transmission bandwidth as well as enabling high-speed imaging with a low frame rate intensity camera. In implementation, high-speed dynamics are encoded via temporally varying patterns, and only frames at corresponding temporal intervals can be reconstructed, while the dynamics occurring between consecutive frames are lost. To unlock the potential of conventional snapshot compressive videography, we propose a novel hybrid "intensity event imaging scheme by incorporating an event camera into a video SCI setup. Our proposed system consists of a dual-path optical setup to record the coded intensity measurement and intermediate event signals simultaneously, which is compact and photon-efficient by collecting the half photons discarded in conventional video SCI. Correspondingly, we developed a dual-branch Transformer utilizing the reciprocal relationship between two data modes to decode dense video frames. Extensive experiments on both simulated and real-captured data demonstrate our superiority to state-of-the-art video SCI and video frame interpolation (VFI) methods. Benefiting from the new hybrid design leveraging both intrinsic redundancy in videos and the unique feature of event cameras, we achieve high-quality videography at 0.1ms time intervals with a low-cost CMOS image sensor working at 24 FPS.
Collapse
|
4
|
Zhang A, Shi J, Wu J, Zhou Y, Yu W. Low Latency and Sparse Computing Spiking Neural Networks With Self-Driven Adaptive Threshold Plasticity. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17177-17188. [PMID: 37581976 DOI: 10.1109/tnnls.2023.3300514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Spiking neural networks (SNNs) have captivated the attention worldwide owing to their compelling advantages in low power consumption, high biological plausibility, and strong robustness. However, the intrinsic latency associated with SNNs during inference poses a significant challenge, impeding their further development and application. This latency is caused by the need for spiking neurons to collect electrical stimuli and generate spikes only when their membrane potential exceeds a firing threshold. Considering the firing threshold plays a crucial role in SNN performance, this article proposes a self-driven adaptive threshold plasticity (SATP) mechanism, wherein neurons autonomously adjust the firing thresholds based on their individual state information using unsupervised learning rules, of which the adjustment is triggered by their own firing events. SATP is based on the principle of maximizing the information contained in the output spike rate distribution of each neuron. This article derives the mathematical expression of SATP and provides extensive experimental results, demonstrating that SATP effectively reduces SNN inference latency, further reduces the computation density while improving computational accuracy, so that SATP facilitates SNN models to be with low latency, sparse computing, and high accuracy.
Collapse
|
5
|
Tang S, Zhao Y, Lv H, Sun M, Feng Y, Zhang Z. Adaptive Optimization and Dynamic Representation Method for Asynchronous Data Based on Regional Correlation Degree. SENSORS (BASEL, SWITZERLAND) 2024; 24:7430. [PMID: 39685963 DOI: 10.3390/s24237430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Revised: 11/11/2024] [Accepted: 11/19/2024] [Indexed: 12/18/2024]
Abstract
Event cameras, as bio-inspired visual sensors, offer significant advantages in their high dynamic range and high temporal resolution for visual tasks. These capabilities enable efficient and reliable motion estimation even in the most complex scenes. However, these advantages come with certain trade-offs. For instance, current event-based vision sensors have low spatial resolution, and the process of event representation can result in varying degrees of data redundancy and incompleteness. Additionally, due to the inherent characteristics of event stream data, they cannot be utilized directly; pre-processing steps such as slicing and frame compression are required. Currently, various pre-processing algorithms exist for slicing and compressing event streams. However, these methods fall short when dealing with multiple subjects moving at different and varying speeds within the event stream, potentially exacerbating the inherent deficiencies of the event information flow. To address this longstanding issue, we propose a novel and efficient Asynchronous Spike Dynamic Metric and Slicing algorithm (ASDMS). ASDMS adaptively segments the event stream into fragments of varying lengths based on the spatiotemporal structure and polarity attributes of the events. Moreover, we introduce a new Adaptive Spatiotemporal Subject Surface Compensation algorithm (ASSSC). ASSSC compensates for missing motion information in the event stream and removes redundant information, thereby achieving better performance and effectiveness in event stream segmentation compared to existing event representation algorithms. Additionally, after compressing the processed results into frame images, the imaging quality is significantly improved. Finally, we propose a new evaluation metric, the Actual Performance Efficiency Discrepancy (APED), which combines actual distortion rate and event information entropy to quantify and compare the effectiveness of our method against other existing event representation methods. The final experimental results demonstrate that our event representation method outperforms existing approaches and addresses the shortcomings of current methods in handling event streams with multiple entities moving at varying speeds simultaneously.
Collapse
Affiliation(s)
- Sichao Tang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yuchen Zhao
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Hengyi Lv
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Ming Sun
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Yang Feng
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Zeshu Zhang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| |
Collapse
|
6
|
Zhang T, Wang Q, Xu B. Self-Lateral Propagation Elevates Synaptic Modifications in Spiking Neural Networks for the Efficient Spatial and Temporal Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:15359-15371. [PMID: 37389999 DOI: 10.1109/tnnls.2023.3286458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2023]
Abstract
The brain's mystery for efficient and intelligent computation hides in the neuronal encoding, functional circuits, and plasticity principles in natural neural networks. However, many plasticity principles have not been fully incorporated into artificial or spiking neural networks (SNNs). Here, we report that incorporating a novel feature of synaptic plasticity found in natural networks, whereby synaptic modifications self-propagate to nearby synapses, named self-lateral propagation (SLP), could further improve the accuracy of SNNs in three benchmark spatial and temporal classification tasks. The SLP contains lateral pre ( SLP pre ) and lateral post ( SLP post ) synaptic propagation, describing the spread of synaptic modifications among output synapses made by axon collaterals or among converging synapses on the postsynaptic neuron, respectively. The SLP is biologically plausible and can lead to a coordinated synaptic modification within layers that endow higher efficiency without losing much accuracy. Furthermore, the experimental results showed the impressive role of SLP in sharpening the normal distribution of synaptic weights and broadening the more uniform distribution of misclassified samples, which are both considered essential for understanding the learning convergence and network generalization of neural networks.
Collapse
|
7
|
Grimaldi A, Boutin V, Ieng SH, Benosman R, Perrinet LU. A robust event-driven approach to always-on object recognition. Neural Netw 2024; 178:106415. [PMID: 38852508 DOI: 10.1016/j.neunet.2024.106415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 04/05/2024] [Accepted: 05/29/2024] [Indexed: 06/11/2024]
Abstract
We propose a neuromimetic architecture capable of always-on pattern recognition, i.e. at any time during processing. To achieve this, we have extended an existing event-based algorithm (Lagorce et al., 2017), which introduced novel spatio-temporal features as a Hierarchy Of Time-Surfaces (HOTS). Built from asynchronous events captured by a neuromorphic camera, these time surfaces allow to encode the local dynamics of a visual scene and to create an efficient event-based pattern recognition architecture. Inspired by neuroscience, we have extended this method to improve its performance. First, we add a homeostatic gain control on the activity of neurons to improve the learning of spatio-temporal patterns (Grimaldi et al., 2021). We also provide a new mathematical formalism that allows an analogy to be drawn between the HOTS algorithm and Spiking Neural Networks (SNN). Following this analogy, we transform the offline pattern categorization method into an online and event-driven layer. This classifier uses the spiking output of the network to define new time surfaces and we then perform the online classification with a neuromimetic implementation of a multinomial logistic regression. These improvements not only consistently increase the performance of the network, but also bring this event-driven pattern recognition algorithm fully online. The results have been validated on different datasets: Poker-DVS (Serrano-Gotarredona and Linares-Barranco, 2015), N-MNIST (Orchard, Jayawant et al., 2015) and DVS Gesture (Amir et al., 2017). This demonstrates the efficiency of this bio-realistic SNN for ultra-fast object recognition through an event-by-event categorization process.
Collapse
Affiliation(s)
- Antoine Grimaldi
- Aix-Marseille Universit, Institut de Neurosciences de la Timone, CNRS, Marseille, France.
| | - Victor Boutin
- Carney Institute for Brain Science, Brown University, Providence, RI, United States; Artificial and Natural Intelligence Toulouse Institute, Université de Toulouse, Toulouse, France.
| | - Sio-Hoi Ieng
- Institut de la Vision, Sorbonne Université, CNRS, Paris, France.
| | - Ryad Benosman
- Robotics Institute, Carnegie Mellon University, Pittsburg, PA, United States.
| | - Laurent U Perrinet
- Aix-Marseille Universit, Institut de Neurosciences de la Timone, CNRS, Marseille, France.
| |
Collapse
|
8
|
Zhou X, Bei C. Backlight and dim space object detection based on a novel event camera. PeerJ Comput Sci 2024; 10:e2192. [PMID: 39145218 PMCID: PMC11323122 DOI: 10.7717/peerj-cs.2192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 06/21/2024] [Indexed: 08/16/2024]
Abstract
Background For space object detection tasks, conventional optical cameras face various application challenges, including backlight issues and dim light conditions. As a novel optical camera, the event camera has the advantages of high temporal resolution and high dynamic range due to asynchronous output characteristics, which provides a new solution to the above challenges. However, the asynchronous output characteristic of event cameras makes them incompatible with conventional object detection methods designed for frame images. Methods Asynchronous convolutional memory network (ACMNet) for processing event camera data is proposed to solve the problem of backlight and dim space object detection. The key idea of ACMNet is to first characterize the asynchronous event streams with the Event Spike Tensor (EST) voxel grid through the exponential kernel function, then extract spatial features using a feed-forward feature extraction network, and aggregate temporal features using a proposed convolutional spatiotemporal memory module ConvLSTM, and finally, the end-to-end object detection using continuous event streams is realized. Results Comparison experiments among ACMNet and classical object detection methods are carried out on Event_DVS_space7, which is a large-scale space synthetic event dataset based on event cameras. The results show that the performance of ACMNet is superior to the others, and the mAP is improved by 12.7% while maintaining the processing speed. Moreover, event cameras still have a good performance in backlight and dim light conditions where conventional optical cameras fail. This research offers a novel possibility for detection under intricate lighting and motion conditions, emphasizing the superior benefits of event cameras in the realm of space object detection.
Collapse
Affiliation(s)
- Xiaoli Zhou
- Graduate School, The Second Research Academy of CASIC, Beijing, China
- CASIC Space Engineering Development Co., Ltd., Beijing, China
| | - Chao Bei
- CASIC Space Engineering Development Co., Ltd., Beijing, China
| |
Collapse
|
9
|
Zhang B, Han Y, Suo J, Dai Q. An event-oriented diffusion-refinement method for sparse events completion. Sci Rep 2024; 14:6802. [PMID: 38514718 PMCID: PMC10958031 DOI: 10.1038/s41598-024-57333-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 03/18/2024] [Indexed: 03/23/2024] Open
Abstract
Event cameras or dynamic vision sensors (DVS) record asynchronous response to brightness changes instead of conventional intensity frames, and feature ultra-high sensitivity at low bandwidth. The new mechanism demonstrates great advantages in challenging scenarios with fast motion and large dynamic range. However, the recorded events might be highly sparse due to either limited hardware bandwidth or extreme photon starvation in harsh environments. To unlock the full potential of event cameras, we propose an inventive event sequence completion approach conforming to the unique characteristics of event data in both the processing stage and the output form. Specifically, we treat event streams as 3D event clouds in the spatiotemporal domain, develop a diffusion-based generative model to generate dense clouds in a coarse-to-fine manner, and recover exact timestamps to maintain the temporal resolution of raw data successfully. To validate the effectiveness of our method comprehensively, we perform extensive experiments on three widely used public datasets with different spatial resolutions, and additionally collect a novel event dataset covering diverse scenarios with highly dynamic motions and under harsh illumination. Besides generating high-quality dense events, our method can benefit downstream applications such as object classification and intensity frame reconstruction.
Collapse
Affiliation(s)
- Bo Zhang
- Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Yuqi Han
- Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Jinli Suo
- Department of Automation, Tsinghua University, Beijing, 100084, China.
- Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing, 100084, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China.
| | - Qionghai Dai
- Department of Automation, Tsinghua University, Beijing, 100084, China
- Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
10
|
Yu Q, Gao J, Wei J, Li J, Tan KC, Huang T. Improving Multispike Learning With Plastic Synaptic Delays. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10254-10265. [PMID: 35442893 DOI: 10.1109/tnnls.2022.3165527] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Emulating the spike-based processing in the brain, spiking neural networks (SNNs) are developed and act as a promising candidate for the new generation of artificial neural networks that aim to produce efficient cognitions as the brain. Due to the complex dynamics and nonlinearity of SNNs, designing efficient learning algorithms has remained a major difficulty, which attracts great research attention. Most existing ones focus on the adjustment of synaptic weights. However, other components, such as synaptic delays, are found to be adaptive and important in modulating neural behavior. How could plasticity on different components cooperate to improve the learning of SNNs remains as an interesting question. Advancing our previous multispike learning, we propose a new joint weight-delay plasticity rule, named TDP-DL, in this article. Plastic delays are integrated into the learning framework, and as a result, the performance of multispike learning is significantly improved. Simulation results highlight the effectiveness and efficiency of our TDP-DL rule compared to baseline ones. Moreover, we reveal the underlying principle of how synaptic weights and delays cooperate with each other through a synthetic task of interval selectivity and show that plastic delays can enhance the selectivity and flexibility of neurons by shifting information across time. Due to this capability, useful information distributed away in the time domain can be effectively integrated for a better accuracy performance, as highlighted in our generalization tasks of the image, speech, and event-based object recognitions. Our work is thus valuable and significant to improve the performance of spike-based neuromorphic computing.
Collapse
|
11
|
Dorzhigulov A, Saxena V. Spiking CMOS-NVM mixed-signal neuromorphic ConvNet with circuit- and training-optimized temporal subsampling. Front Neurosci 2023; 17:1177592. [PMID: 37534034 PMCID: PMC10390782 DOI: 10.3389/fnins.2023.1177592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 06/26/2023] [Indexed: 08/04/2023] Open
Abstract
We increasingly rely on deep learning algorithms to process colossal amount of unstructured visual data. Commonly, these deep learning algorithms are deployed as software models on digital hardware, predominantly in data centers. Intrinsic high energy consumption of Cloud-based deployment of deep neural networks (DNNs) inspired researchers to look for alternatives, resulting in a high interest in Spiking Neural Networks (SNNs) and dedicated mixed-signal neuromorphic hardware. As a result, there is an emerging challenge to transfer DNN architecture functionality to energy-efficient spiking non-volatile memory (NVM)-based hardware with minimal loss in the accuracy of visual data processing. Convolutional Neural Network (CNN) is the staple choice of DNN for visual data processing. However, the lack of analog-friendly spiking implementations and alternatives for some core CNN functions, such as MaxPool, hinders the conversion of CNNs into the spike domain, thus hampering neuromorphic hardware development. To address this gap, in this work, we propose MaxPool with temporal multiplexing for Spiking CNNs (SCNNs), which is amenable for implementation in mixed-signal circuits. In this work, we leverage the temporal dynamics of internal membrane potential of Integrate & Fire neurons to enable MaxPool decision-making in the spiking domain. The proposed MaxPool models are implemented and tested within the SCNN architecture using a modified version of the aihwkit framework, a PyTorch-based toolkit for modeling and simulating hardware-based neural networks. The proposed spiking MaxPool scheme can decide even before the complete spatiotemporal input is applied, thus selectively trading off latency with accuracy. It is observed that by allocating just 10% of the spatiotemporal input window for a pooling decision, the proposed spiking MaxPool achieves up to 61.74% accuracy with a 2-bit weight resolution in the CIFAR10 dataset classification task after training with back propagation, with only about 1% performance drop compared to 62.78% accuracy of the 100% spatiotemporal window case with the 2-bit weight resolution to reflect foundry-integrated ReRAM limitations. In addition, we propose the realization of one of the proposed spiking MaxPool techniques in an NVM crossbar array along with periphery circuits designed in a 130nm CMOS technology. The energy-efficiency estimation results show competitive performance compared to recent neuromorphic chip designs.
Collapse
|
12
|
Yi Z, Lian J, Liu Q, Zhu H, Liang D, Liu J. Learning Rules in Spiking Neural Networks: A Survey. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.02.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
13
|
Baldwin RW, Liu R, Almatrafi M, Asari V, Hirakawa K. Time-Ordered Recent Event (TORE) Volumes for Event Cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2519-2532. [PMID: 35503820 DOI: 10.1109/tpami.2022.3172212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Event cameras are an exciting, new sensor modality enabling high-speed imaging with extremely low-latency and wide dynamic range. Unfortunately, most machine learning architectures are not designed to directly handle sparse data, like that generated from event cameras. Many state-of-the-art algorithms for event cameras rely on interpolated event representations-obscuring crucial timing information, increasing the data volume, and limiting overall network performance. This paper details an event representation called Time-Ordered Recent Event (TORE) volumes. TORE volumes are designed to compactly store raw spike timing information with minimal information loss. This bio-inspired design is memory efficient, computationally fast, avoids time-blocking (i.e., fixed and predefined frame rates), and contains "local memory" from past data. The design is evaluated on a wide range of challenging tasks (e.g., event denoising, image reconstruction, classification, and human pose estimation) and is shown to dramatically improve state-of-the-art performance. TORE volumes are an easy-to-implement replacement for any algorithm currently utilizing event representations.
Collapse
|
14
|
Wu J, Chua Y, Zhang M, Li G, Li H, Tan KC. A Tandem Learning Rule for Effective Training and Rapid Inference of Deep Spiking Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:446-460. [PMID: 34288879 DOI: 10.1109/tnnls.2021.3095724] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Spiking neural networks (SNNs) represent the most prominent biologically inspired computing model for neuromorphic computing (NC) architectures. However, due to the nondifferentiable nature of spiking neuronal functions, the standard error backpropagation algorithm is not directly applicable to SNNs. In this work, we propose a tandem learning framework that consists of an SNN and an artificial neural network (ANN) coupled through weight sharing. The ANN is an auxiliary structure that facilitates the error backpropagation for the training of the SNN at the spike-train level. To this end, we consider the spike count as the discrete neural representation in the SNN and design an ANN neuronal activation function that can effectively approximate the spike count of the coupled SNN. The proposed tandem learning rule demonstrates competitive pattern recognition and regression capabilities on both the conventional frame- and event-based vision datasets, with at least an order of magnitude reduced inference time and total synaptic operations over other state-of-the-art SNN implementations. Therefore, the proposed tandem learning rule offers a novel solution to training efficient, low latency, and high-accuracy deep SNNs with low computing resources.
Collapse
|
15
|
Zhang S, Wang W, Li H, Zhang S. EVtracker: An Event-Driven Spatiotemporal Method for Dynamic Object Tracking. SENSORS (BASEL, SWITZERLAND) 2022; 22:6090. [PMID: 36015851 PMCID: PMC9414578 DOI: 10.3390/s22166090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 07/06/2022] [Accepted: 08/12/2022] [Indexed: 06/15/2023]
Abstract
An event camera is a novel bio-inspired sensor that effectively compensates for the shortcomings of current frame cameras, which include high latency, low dynamic range, motion blur, etc. Rather than capturing images at a fixed frame rate, an event camera produces an asynchronous signal by measuring the brightness change of each pixel. Consequently, an appropriate algorithm framework that can handle the unique data types of event-based vision is required. In this paper, we propose a dynamic object tracking framework using an event camera to achieve long-term stable tracking of event objects. One of the key novel features of our approach is to adopt an adaptive strategy that adjusts the spatiotemporal domain of event data. To achieve this, we reconstruct event images from high-speed asynchronous streaming data via online learning. Additionally, we apply the Siamese network to extract features from event data. In contrast to earlier models that only extract hand-crafted features, our method provides powerful feature description and a more flexible reconstruction strategy for event data. We assess our algorithm in three challenging scenarios: 6-DoF (six degrees of freedom), translation, and rotation. Unlike fixed cameras in traditional object tracking tasks, all three tracking scenarios involve the simultaneous violent rotation and shaking of both the camera and objects. Results from extensive experiments suggest that our proposed approach achieves superior accuracy and robustness compared to other state-of-the-art methods. Without reducing time efficiency, our novel method exhibits a 30% increase in accuracy over other recent models. Furthermore, results indicate that event cameras are capable of robust object tracking, which is a task that conventional cameras cannot adequately perform, especially for super-fast motion tracking and challenging lighting situations.
Collapse
Affiliation(s)
| | - Wenmin Wang
- School of Computer Science and Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau
| | | | | |
Collapse
|
16
|
Dong J, Jiang R, Xiao R, Yan R, Tang H. Event stream learning using spatio-temporal event surface. Neural Netw 2022; 154:543-559. [DOI: 10.1016/j.neunet.2022.07.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 06/12/2022] [Accepted: 07/10/2022] [Indexed: 11/29/2022]
|
17
|
Yang Y, Ren J, Duan F. The Spiking Rates Inspired Encoder and Decoder for Spiking Neural Networks: An Illustration of Hand Gesture Recognition. Cognit Comput 2022. [DOI: 10.1007/s12559-022-10027-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
Annamalai L, Ramanathan V, Thakur CS. Event-LSTM: An Unsupervised and Asynchronous Learning-Based Representation for Event-Based Data. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3151426] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
19
|
Xie B, Deng Y, Shao Z, Liu H, Li Y. VMV-GCN: Volumetric Multi-View Based Graph CNN for Event Stream Classification. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3140819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
20
|
Abstract
The design of robots that interact autonomously with the environment and exhibit complex behaviours is an open challenge that can benefit from understanding what makes living beings fit to act in the world. Neuromorphic engineering studies neural computational principles to develop technologies that can provide a computing substrate for building compact and low-power processing systems. We discuss why endowing robots with neuromorphic technologies - from perception to motor control - represents a promising approach for the creation of robots which can seamlessly integrate in society. We present initial attempts in this direction, highlight open challenges, and propose actions required to overcome current limitations.
Collapse
Affiliation(s)
- Chiara Bartolozzi
- Event-Driven Perception for Robotics, Istituto Italiano di Tecnologia, via San Quirico 19D, 16163, Genova, Italy.
| | - Giacomo Indiveri
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Winterthurerstr. 190, 8057, Zurich, Switzerland
| | - Elisa Donati
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Winterthurerstr. 190, 8057, Zurich, Switzerland
| |
Collapse
|
21
|
Milde MB, Afshar S, Xu Y, Marcireau A, Joubert D, Ramesh B, Bethi Y, Ralph NO, El Arja S, Dennler N, van Schaik A, Cohen G. Neuromorphic Engineering Needs Closed-Loop Benchmarks. Front Neurosci 2022; 16:813555. [PMID: 35237122 PMCID: PMC8884247 DOI: 10.3389/fnins.2022.813555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 01/24/2022] [Indexed: 12/02/2022] Open
Abstract
Neuromorphic engineering aims to build (autonomous) systems by mimicking biological systems. It is motivated by the observation that biological organisms—from algae to primates—excel in sensing their environment, reacting promptly to their perils and opportunities. Furthermore, they do so more resiliently than our most advanced machines, at a fraction of the power consumption. It follows that the performance of neuromorphic systems should be evaluated in terms of real-time operation, power consumption, and resiliency to real-world perturbations and noise using task-relevant evaluation metrics. Yet, following in the footsteps of conventional machine learning, most neuromorphic benchmarks rely on recorded datasets that foster sensing accuracy as the primary measure for performance. Sensing accuracy is but an arbitrary proxy for the actual system's goal—taking a good decision in a timely manner. Moreover, static datasets hinder our ability to study and compare closed-loop sensing and control strategies that are central to survival for biological organisms. This article makes the case for a renewed focus on closed-loop benchmarks involving real-world tasks. Such benchmarks will be crucial in developing and progressing neuromorphic Intelligence. The shift towards dynamic real-world benchmarking tasks should usher in richer, more resilient, and robust artificially intelligent systems in the future.
Collapse
|
22
|
Gallego G, Delbruck T, Orchard G, Bartolozzi C, Taba B, Censi A, Leutenegger S, Davison AJ, Conradt J, Daniilidis K, Scaramuzza D. Event-Based Vision: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:154-180. [PMID: 32750812 DOI: 10.1109/tpami.2020.3008413] [Citation(s) in RCA: 227] [Impact Index Per Article: 75.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of μs), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.
Collapse
|
23
|
Kim Y, Panda P. Revisiting Batch Normalization for Training Low-Latency Deep Spiking Neural Networks From Scratch. Front Neurosci 2021; 15:773954. [PMID: 34955725 PMCID: PMC8695433 DOI: 10.3389/fnins.2021.773954] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 11/08/2021] [Indexed: 12/02/2022] Open
Abstract
Spiking Neural Networks (SNNs) have recently emerged as an alternative to deep learning owing to sparse, asynchronous and binary event (or spike) driven processing, that can yield huge energy efficiency benefits on neuromorphic hardware. However, SNNs convey temporally-varying spike activation through time that is likely to induce a large variation of forward activation and backward gradients, resulting in unstable training. To address this training issue in SNNs, we revisit Batch Normalization (BN) and propose a temporal Batch Normalization Through Time (BNTT) technique. Different from previous BN techniques with SNNs, we find that varying the BN parameters at every time-step allows the model to learn the time-varying input distribution better. Specifically, our proposed BNTT decouples the parameters in a BNTT layer along the time axis to capture the temporal dynamics of spikes. We demonstrate BNTT on CIFAR-10, CIFAR-100, Tiny-ImageNet, event-driven DVS-CIFAR10 datasets, and Sequential MNIST and show near state-of-the-art performance. We conduct comprehensive analysis on the temporal characteristic of BNTT and showcase interesting benefits toward robustness against random and adversarial noise. Further, by monitoring the learnt parameters of BNTT, we find that we can do temporal early exit. That is, we can reduce the inference latency by ~5 − 20 time-steps from the original training latency. The code has been released at https://github.com/Intelligent-Computing-Lab-Yale/BNTT-Batch-Normalization-Through-Time.
Collapse
Affiliation(s)
- Youngeun Kim
- Department of Electrical Engineering, Yale University, New Haven, CT, United States
| | - Priyadarshini Panda
- Department of Electrical Engineering, Yale University, New Haven, CT, United States
| |
Collapse
|
24
|
ESPEE: Event-Based Sensor Pose Estimation Using an Extended Kalman Filter. SENSORS 2021; 21:s21237840. [PMID: 34883852 PMCID: PMC8659537 DOI: 10.3390/s21237840] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/18/2021] [Accepted: 11/20/2021] [Indexed: 12/03/2022]
Abstract
Event-based vision sensors show great promise for use in embedded applications requiring low-latency passive sensing at a low computational cost. In this paper, we present an event-based algorithm that relies on an Extended Kalman Filter for 6-Degree of Freedom sensor pose estimation. The algorithm updates the sensor pose event-by-event with low latency (worst case of less than 2 μs on an FPGA). Using a single handheld sensor, we test the algorithm on multiple recordings, ranging from a high contrast printed planar scene to a more natural scene consisting of objects viewed from above. The pose is accurately estimated under rapid motions, up to 2.7 m/s. Thereafter, an extension to multiple sensors is described and tested, highlighting the improved performance of such a setup, as well as the integration with an off-the-shelf mapping algorithm to allow point cloud updates with a 3D scene and enhance the potential applications of this visual odometry solution.
Collapse
|
25
|
|
26
|
Kim Y, Panda P. Optimizing Deeper Spiking Neural Networks for Dynamic Vision Sensing. Neural Netw 2021; 144:686-698. [PMID: 34662827 DOI: 10.1016/j.neunet.2021.09.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 09/22/2021] [Accepted: 09/24/2021] [Indexed: 11/20/2022]
Abstract
Spiking Neural Networks (SNNs) have recently emerged as a new generation of low-power deep neural networks due to sparse, asynchronous, and binary event-driven processing. Most previous deep SNN optimization methods focus on static datasets (e.g., MNIST) from a conventional frame-based camera. On the other hand, optimization techniques for event data from Dynamic Vision Sensor (DVS) cameras are still at infancy. Most prior SNN techniques handling DVS data are limited to shallow networks and thus, show low performance. Generally, we observe that the integrate-and-fire behavior of spiking neurons diminishes spike activity in deeper layers. The sparse spike activity results in a sub-optimal solution during training (i.e., performance degradation). To address this limitation, we propose novel algorithmic and architectural advances to accelerate the training of very deep SNNs on DVS data. Specifically, we propose Spike Activation Lift Training (SALT) which increases spike activity across all layers by optimizing both weights and thresholds in convolutional layers. After applying SALT, we train the weights based on the cross-entropy loss. SALT helps the networks to convey ample information across all layers during training and therefore improves the performance. Furthermore, we propose a simple and effective architecture, called Switched-BN, which exploits Batch Normalization (BN). Previous methods show that the standard BN is incompatible with the temporal dynamics of SNNs. Therefore, in Switched-BN architecture, we apply BN to the last layer of an SNN after accumulating all the spikes from previous layer with a spike voltage accumulator (i.e., converting temporal spike information to float value). Even though we apply BN in just one layer of SNNs, our results demonstrate a considerable performance gain without any significant computational overhead. Through extensive experiments, we show the effectiveness of SALT and Switched-BN for training very deep SNNs from scratch on various benchmarks including, DVS-Cifar10, N-Caltech, DHP19, CIFAR10, and CIFAR100. To the best of our knowledge, this is the first work showing state-of-the-art performance with deep SNNs on DVS data.
Collapse
Affiliation(s)
- Youngeun Kim
- Department of Electrical Engineering, Yale University, New Haven, CT, USA.
| | | |
Collapse
|
27
|
Madhavan A, Daniels MW, Stiles MD. Temporal State Machines: Using Temporal Memory to Stitch Time-based Graph Computations. ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS 2021; 17:10.1145/3451214. [PMID: 36575655 PMCID: PMC9792072 DOI: 10.1145/3451214] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 02/01/2021] [Indexed: 06/17/2023]
Abstract
Race logic, an arrival-time-coded logic family, has demonstrated energy and performance improvements for applications ranging from dynamic programming to machine learning. However, the various ad hoc mappings of algorithms into hardware rely on researcher ingenuity and result in custom architectures that are difficult to systematize. We propose to associate race logic with the mathematical field of tropical algebra, enabling a more methodical approach toward building temporal circuits. This association between the mathematical primitives of tropical algebra and generalized race logic computations guides the design of temporally coded tropical circuits. It also serves as a framework for expressing high-level timing-based algorithms. This abstraction, when combined with temporal memory, allows for the systematic exploration of race logic-based temporal architectures by making it possible to partition feed-forward computations into stages and organize them into a state machine. We leverage analog memristor-based temporal memories to design such a state machine that operates purely on time-coded wavefronts. We implement a version of Dijkstra's algorithm to evaluate this temporal state machine. This demonstration shows the promise of expanding the expressibility of temporal computing to enable it to deliver significant energy and throughput advantages.
Collapse
Affiliation(s)
- Advait Madhavan
- University of Maryland and National Institute of Standards and Technology
| | | | | |
Collapse
|
28
|
Rebecq H, Ranftl R, Koltun V, Scaramuzza D. High Speed and High Dynamic Range Video with an Event Camera. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:1964-1980. [PMID: 31902754 DOI: 10.1109/tpami.2019.2963386] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our quantitative experiments show that our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality ( ), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos ( frames per second) of high-speed phenomena (e.g., a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. As an additional contribution, we demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data. We release the reconstruction code, a pre-trained model and the datasets to enable further research.
Collapse
|
29
|
Tayarani-Najaran MH, Schmuker M. Event-Based Sensing and Signal Processing in the Visual, Auditory, and Olfactory Domain: A Review. Front Neural Circuits 2021; 15:610446. [PMID: 34135736 PMCID: PMC8203204 DOI: 10.3389/fncir.2021.610446] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
The nervous systems converts the physical quantities sensed by its primary receptors into trains of events that are then processed in the brain. The unmatched efficiency in information processing has long inspired engineers to seek brain-like approaches to sensing and signal processing. The key principle pursued in neuromorphic sensing is to shed the traditional approach of periodic sampling in favor of an event-driven scheme that mimicks sampling as it occurs in the nervous system, where events are preferably emitted upon the change of the sensed stimulus. In this paper we highlight the advantages and challenges of event-based sensing and signal processing in the visual, auditory and olfactory domains. We also provide a survey of the literature covering neuromorphic sensing and signal processing in all three modalities. Our aim is to facilitate research in event-based sensing and signal processing by providing a comprehensive overview of the research performed previously as well as highlighting conceptual advantages, current progress and future challenges in the field.
Collapse
Affiliation(s)
| | - Michael Schmuker
- School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield, United Kingdom
| |
Collapse
|
30
|
Deng Y, Chen H, Chen H, Li Y. Learning From Images: A Distillation Learning Framework for Event Cameras. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:4919-4931. [PMID: 33961557 DOI: 10.1109/tip.2021.3077136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Event cameras have recently drawn massive attention in the computer vision community because of their low power consumption and high response speed. These cameras produce sparse and non-uniform spatiotemporal representations of a scene. These characteristics of representations make it difficult for event-based models to extract discriminative cues (such as textures and geometric relationships). Consequently, event-based methods usually perform poorly compared to their conventional image counterparts. Considering that traditional images and event signals share considerable visual information, this paper aims to improve the feature extraction ability of event-based models by using knowledge distilled from the image domain to additionally provide explicit feature-level supervision for the learning of event data. Specifically, we propose a simple yet effective distillation learning framework, including multi-level customized knowledge distillation constraints. Our framework can significantly boost the feature extraction process for event data and is applicable to various downstream tasks. We evaluate our framework on high-level and low-level tasks, i.e., object classification and optical flow prediction. Experimental results show that our framework can effectively improve the performance of event-based models on both tasks by a large margin. Furthermore, we present a 10K dataset (CEP-DVS) for event-based object classification. This dataset consists of samples recorded under random motion trajectories that can better evaluate the motion robustness of the event-based model and is compatible with multi-modality vision tasks.
Collapse
|
31
|
Iyer LR, Chua Y, Li H. Is Neuromorphic MNIST Neuromorphic? Analyzing the Discriminative Power of Neuromorphic Datasets in the Time Domain. Front Neurosci 2021; 15:608567. [PMID: 33841072 PMCID: PMC8027306 DOI: 10.3389/fnins.2021.608567] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 03/01/2021] [Indexed: 11/26/2022] Open
Abstract
A major characteristic of spiking neural networks (SNNs) over conventional artificial neural networks (ANNs) is their ability to spike, enabling them to use spike timing for coding and efficient computing. In this paper, we assess if neuromorphic datasets recorded from static images are able to evaluate the ability of SNNs to use spike timings in their calculations. We have analyzed N-MNIST, N-Caltech101 and DvsGesture along these lines, but focus our study on N-MNIST. First we evaluate if additional information is encoded in the time domain in a neuromorphic dataset. We show that an ANN trained with backpropagation on frame-based versions of N-MNIST and N-Caltech101 images achieve 99.23 and 78.01% accuracy. These are comparable to the state of the art-showing that an algorithm that purely works on spatial data can classify these datasets. Second we compare N-MNIST and DvsGesture on two STDP algorithms, RD-STDP, that can classify only spatial data, and STDP-tempotron that classifies spatiotemporal data. We demonstrate that RD-STDP performs very well on N-MNIST, while STDP-tempotron performs better on DvsGesture. Since DvsGesture has a temporal dimension, it requires STDP-tempotron, while N-MNIST can be adequately classified by an algorithm that works on spatial data alone. This shows that precise spike timings are not important in N-MNIST. N-MNIST does not, therefore, highlight the ability of SNNs to classify temporal data. The conclusions of this paper open the question-what dataset can evaluate SNN ability to classify temporal data?
Collapse
Affiliation(s)
- Laxmi R. Iyer
- Neuromorphic Computing, Institute of Infocomms Research, A*Star, Singapore, Singapore
| | - Yansong Chua
- Neuromorphic Computing, Institute of Infocomms Research, A*Star, Singapore, Singapore
| | - Haizhou Li
- Neuromorphic Computing, Institute of Infocomms Research, A*Star, Singapore, Singapore
- Huawei Technologies Co., Ltd., Shenzhen, China
| |
Collapse
|
32
|
Wang T, Shi C, Zhou X, Lin Y, He J, Gan P, Li P, Wang Y, Liu L, Wu N, Luo G. CompSNN: A lightweight spiking neural network based on spatiotemporally compressive spike features. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.10.100] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
33
|
Liu Z, Huang B, Wu J, Shi G. Lightweight Convolutional SNN for Address Event Representation Signal Recognition. ARTIF INTELL 2021. [DOI: 10.1007/978-3-030-93046-2_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
34
|
Liu Q, Pan G, Ruan H, Xing D, Xu Q, Tang H. Unsupervised AER Object Recognition Based on Multiscale Spatio-Temporal Features and Spiking Neurons. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5300-5311. [PMID: 32054587 DOI: 10.1109/tnnls.2020.2966058] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
This article proposes an unsupervised address event representation (AER) object recognition approach. The proposed approach consists of a novel multiscale spatio-temporal feature (MuST) representation of input AER events and a spiking neural network (SNN) using spike-timing-dependent plasticity (STDP) for object recognition with MuST. MuST extracts the features contained in both the spatial and temporal information of AER event flow, and forms an informative and compact feature spike representation. We show not only how MuST exploits spikes to convey information more effectively, but also how it benefits the recognition using SNN. The recognition process is performed in an unsupervised manner, which does not need to specify the desired status of every single neuron of SNN, and thus can be flexibly applied in real-world recognition tasks. The experiments are performed on five AER datasets including a new one named GESTURE-DVS. Extensive experimental results show the effectiveness and advantages of the proposed approach.
Collapse
|
35
|
Ramesh B, Yang H, Orchard G, Le Thi NA, Zhang S, Xiang C. DART: Distribution Aware Retinal Transform for Event-Based Cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2767-2780. [PMID: 31144625 DOI: 10.1109/tpami.2019.2919301] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We introduce a generic visual descriptor, termed as distribution aware retinal transform (DART), that encodes the structural context using log-polar grids for event cameras. The DART descriptor is applied to four different problems, namely object classification, tracking, detection and feature matching: (1) The DART features are directly employed as local descriptors in a bag-of-words classification framework and testing is carried out on four standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS, NCaltech-101); (2) Extending the classification system, tracking is demonstrated using two key novelties: (i) Statistical bootstrapping is leveraged with online learning for overcoming the low-sample problem during the one-shot learning of the tracker, (ii) Cyclical shifts are induced in the log-polar domain of the DART descriptor to achieve robustness to object scale and rotation variations; (3) To solve the long-term object tracking problem, an object detector is designed using the principle of cluster majority voting. The detection scheme is then combined with the tracker to result in a high intersection-over-union score with augmented ground truth annotations on the publicly available event camera dataset; (4) Finally, the event context encoded by DART greatly simplifies the feature correspondence problem, especially for spatio-temporal slices far apart in time, which has not been explicitly tackled in the event-based vision domain.
Collapse
|
36
|
Chen R, Li L. Analyzing and Accelerating the Bottlenecks of Training Deep SNNs With Backpropagation. Neural Comput 2020; 32:2557-2600. [PMID: 32946710 DOI: 10.1162/neco_a_01319] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Spiking neural networks (SNNs) with the event-driven manner of transmitting spikes consume ultra-low power on neuromorphic chips. However, training deep SNNs is still challenging compared to convolutional neural networks (CNNs). The SNN training algorithms have not achieved the same performance as CNNs. In this letter, we aim to understand the intrinsic limitations of SNN training to design better algorithms. First, the pros and cons of typical SNN training algorithms are analyzed. Then it is found that the spatiotemporal backpropagation algorithm (STBP) has potential in training deep SNNs due to its simplicity and fast convergence. Later, the main bottlenecks of the STBP algorithm are analyzed, and three conditions for training deep SNNs with the STBP algorithm are derived. By analyzing the connection between CNNs and SNNs, we propose a weight initialization algorithm to satisfy the three conditions. Moreover, we propose an error minimization method and a modified loss function to further improve the training performance. Experimental results show that the proposed method achieves 91.53% accuracy on the CIFAR10 data set with 1% accuracy increase over the STBP algorithm and decreases the training epochs on the MNIST data set to 15 epochs (over 13 times speed-up compared to the STBP algorithm). The proposed method also decreases classification latency by over 25 times compared to the CNN-SNN conversion algorithms. In addition, the proposed method works robustly for very deep SNNs, while the STBP algorithm fails in a 19-layer SNN.
Collapse
Affiliation(s)
- Ruizhi Chen
- State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190, and University of Chinese Academy of Sciences, Beijing, China 100049
| | - Ling Li
- State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190, and University of Chinese Academy of Sciences, Beijing, China 100049
| |
Collapse
|
37
|
Bi Y, Chadha A, Abbas A, Bourtsoulatze E, Andreopoulos Y. Graph-based Spatio-Temporal Feature Learning for Neuromorphic Vision Sensing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:9084-9098. [PMID: 32941136 DOI: 10.1109/tip.2020.3023597] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Neuromorphic vision sensing (NVS) devices represent visual information as sequences of asynchronous discrete events (a.k.a., "spikes") in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, feature representation for NVS is far behind its APS-based counterparts, resulting in lower performance in high-level computer vision tasks. To fully utilize its sparse and asynchronous nature, we propose a compact graph representation for NVS, which allows for end-to-end learning with graph convolution neural networks. We couple this with a novel end-to-end feature learning framework that accommodates both appearancebased and motion-based tasks. The core of our framework comprises a spatial feature learning module, which utilizes residual-graph convolutional neural networks (RG-CNN), for end-to-end learning of appearance-based features directly from graphs. We extend this with our proposed Graph2Grid block and temporal feature learning module for efficiently modelling temporal dependencies over multiple graphs and a long temporal extent. We show how our framework can be configured for object classification, action recognition and action similarity labeling. Importantly, our approach preserves the spatial and temporal coherence of spike events, while requiring less computation and memory. The experimental validation shows that our proposed framework outperforms all recent methods on standard datasets. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we introduce, evaluate and make available the American Sign Language letters (ASL-DVS), as well as human action dataset (UCF101-DVS, HMDB51-DVS and ASLAN-DVS).
Collapse
|
38
|
Xiao R, Tang H, Ma Y, Yan R, Orchard G. An Event-Driven Categorization Model for AER Image Sensors Using Multispike Encoding and Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:3649-3657. [PMID: 31714243 DOI: 10.1109/tnnls.2019.2945630] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, we present a systematic computational model to explore brain-based computation for object recognition. The model extracts temporal features embedded in address-event representation (AER) data and discriminates different objects by using spiking neural networks (SNNs). We use multispike encoding to extract temporal features contained in the AER data. These temporal patterns are then learned through the tempotron learning rule. The presented model is consistently implemented in a temporal learning framework, where the precise timing of spikes is considered in the feature-encoding and learning process. A noise-reduction method is also proposed by calculating the correlation of an event with the surrounding spatial neighborhood based on the recently proposed time-surface technique. The model evaluated on wide spectrum data sets (MNIST, N-MNIST, MNIST-DVS, AER Posture, and Poker Card) demonstrates its superior recognition performance, especially for the events with noise.
Collapse
|
39
|
Comparing SNNs and RNNs on neuromorphic vision datasets: Similarities and differences. Neural Netw 2020; 132:108-120. [PMID: 32866745 DOI: 10.1016/j.neunet.2020.08.001] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 06/13/2020] [Accepted: 08/03/2020] [Indexed: 01/16/2023]
Abstract
Neuromorphic data, recording frameless spike events, have attracted considerable attention for the spatiotemporal information components and the event-driven processing fashion. Spiking neural networks (SNNs) represent a family of event-driven models with spatiotemporal dynamics for neuromorphic computing, which are widely benchmarked on neuromorphic data. Interestingly, researchers in the machine learning community can argue that recurrent (artificial) neural networks (RNNs) also have the capability to extract spatiotemporal features although they are not event-driven. Thus, the question of "what will happen if we benchmark these two kinds of models together on neuromorphic data" comes out but remains unclear. In this work, we make a systematic study to compare SNNs and RNNs on neuromorphic data, taking the vision datasets as a case study. First, we identify the similarities and differences between SNNs and RNNs (including the vanilla RNNs and LSTM) from the modeling and learning perspectives. To improve comparability and fairness, we unify the supervised learning algorithm based on backpropagation through time (BPTT), the loss function exploiting the outputs at all timesteps, the network structure with stacked fully-connected or convolutional layers, and the hyper-parameters during training. Especially, given the mainstream loss function used in RNNs, we modify it inspired by the rate coding scheme to approach that of SNNs. Furthermore, we tune the temporal resolution of datasets to test model robustness and generalization. At last, a series of contrast experiments are conducted on two types of neuromorphic datasets: DVS-converted (N-MNIST) and DVS-captured (DVS Gesture). Extensive insights regarding recognition accuracy, feature extraction, temporal resolution and contrast, learning generalization, computational complexity and parameter volume are provided, which are beneficial for the model selection on different workloads and even for the invention of novel neural models in the future.
Collapse
|
40
|
Lin S, Xu F, Wang X, Yang W, Yu L. Efficient Spatial-Temporal Normalization of SAE Representation for Event Camera. IEEE Robot Autom Lett 2020. [DOI: 10.1109/lra.2020.2995332] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
41
|
Deng Y, Li Y, Chen H. AMAE: Adaptive Motion-Agnostic Encoder for Event-Based Object Classification. IEEE Robot Autom Lett 2020. [DOI: 10.1109/lra.2020.3002480] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
42
|
Kheradpisheh SR, Masquelier T. Temporal Backpropagation for Spiking Neural Networks with One Spike per Neuron. Int J Neural Syst 2020; 30:2050027. [DOI: 10.1142/s0129065720500276] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We propose a new supervised learning rule for multilayer spiking neural networks (SNNs) that use a form of temporal coding known as rank-order-coding. With this coding scheme, all neurons fire exactly one spike per stimulus, but the firing order carries information. In particular, in the readout layer, the first neuron to fire determines the class of the stimulus. We derive a new learning rule for this sort of network, named S4NN, akin to traditional error backpropagation, yet based on latencies. We show how approximated error gradients can be computed backward in a feedforward network with any number of layers. This approach reaches state-of-the-art performance with supervised multi-fully connected layer SNNs: test accuracy of 97.4% for the MNIST dataset, and 99.2% for the Caltech Face/Motorbike dataset. Yet, the neuron model that we use, nonleaky integrate-and-fire, is much simpler than the one used in all previous works. The source codes of the proposed S4NN are publicly available at https://github.com/SRKH/S4NN .
Collapse
Affiliation(s)
- Saeed Reza Kheradpisheh
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | | |
Collapse
|
43
|
Jiang R, Mou X, Shi S, Zhou Y, Wang Q, Dong M, Chen S. Object tracking on event cameras with offline–online learning. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2020. [DOI: 10.1049/trit.2019.0107] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Rui Jiang
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
| | - Xiaozheng Mou
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
| | - Shunshun Shi
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
| | - Yueyin Zhou
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
| | - Qinyi Wang
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
| | - Meng Dong
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
| | - Shoushun Chen
- CelePixel Technology Co. Ltd71 Nanyang Drive638075SingaporeSingapore
- School of Electrical and Electronic EngineeringNanyang Technological UniversitySingapore639798Singapore
| |
Collapse
|
44
|
Maro JM, Ieng SH, Benosman R. Event-Based Gesture Recognition With Dynamic Background Suppression Using Smartphone Computational Capabilities. Front Neurosci 2020; 14:275. [PMID: 32327968 PMCID: PMC7160298 DOI: 10.3389/fnins.2020.00275] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 03/10/2020] [Indexed: 11/13/2022] Open
Abstract
In this paper, we introduce a framework for dynamic gesture recognition with background suppression operating on the output of a moving event-based camera. The system is developed to operate in real-time using only the computational capabilities of a mobile phone. It introduces a new development around the concept of time-surfaces. It also presents a novel event-based methodology to dynamically remove backgrounds that uses the high temporal resolution properties of event-based cameras. To our knowledge, this is the first Android event-based framework for vision-based recognition of dynamic gestures running on a smartphone without off-board processing. We assess the performances by considering several scenarios in both indoors and outdoors, for static and dynamic conditions, in uncontrolled lighting conditions. We also introduce a new event-based dataset for gesture recognition with static and dynamic backgrounds (made publicly available). The set of gestures has been selected following a clinical trial to allow human-machine interaction for the visually impaired and older adults. We finally report comparisons with prior work that addressed event-based gesture recognition reporting comparable results, without the use of advanced classification techniques nor power greedy hardware.
Collapse
Affiliation(s)
| | - Sio-Hoi Ieng
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
- CHNO des Quinze-Vingts, INSERM-DGOS CIC 1423, Paris, France
| | - Ryad Benosman
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
- CHNO des Quinze-Vingts, INSERM-DGOS CIC 1423, Paris, France
- Departments of Ophthalmology/ECE/BioE, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Computer Science, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
45
|
Bergner F, Dean-Leon E, Cheng G. Design and Realization of an Efficient Large-Area Event-Driven E-Skin. SENSORS (BASEL, SWITZERLAND) 2020; 20:E1965. [PMID: 32244511 PMCID: PMC7180917 DOI: 10.3390/s20071965] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 03/26/2020] [Accepted: 03/27/2020] [Indexed: 12/20/2022]
Abstract
The sense of touch enables us to safely interact and control our contacts with our surroundings. Many technical systems and applications could profit from a similar type of sense. Yet, despite the emergence of e-skin systems covering more extensive areas, large-area realizations of e-skin effectively boosting applications are still rare. Recent advancements have improved the deployability and robustness of e-skin systems laying the basis for their scalability. However, the upscaling of e-skin systems introduces yet another challenge-the challenge of handling a large amount of heterogeneous tactile information with complex spatial relations between sensing points. We targeted this challenge and proposed an event-driven approach for large-area skin systems. While our previous works focused on the implementation and the experimental validation of the approach, this work now provides the consolidated foundations for realizing, designing, and understanding large-area event-driven e-skin systems for effective applications. This work homogenizes the different perspectives on event-driven systems and assesses the applicability of existing event-driven implementations in large-area skin systems. Additionally, we provide novel guidelines for tuning the novelty-threshold of event generators. Overall, this work develops a systematic approach towards realizing a flexible event-driven information handling system on standard computer systems for large-scale e-skin with detailed descriptions on the effective design of event generators and decoders. All designs and guidelines are validated by outlining their impacts on our implementations, and by consolidating various experimental results. The resulting system design for e-skin systems is scalable, efficient, flexible, and capable of handling large amounts of information without customized hardware. The system provides the feasibility of complex large-area tactile applications, for instance in robotics.
Collapse
Affiliation(s)
- Florian Bergner
- Institute for Cognitive Systems (ICS), Technische Universität München, Arcisstraße 21, 80333 München, Germany; (E.D.-L.); (G.C.)
| | | | | |
Collapse
|
46
|
Afshar S, Ralph N, Xu Y, Tapson J, van Schaik A, Cohen G. Event-Based Feature Extraction Using Adaptive Selection Thresholds. SENSORS 2020; 20:s20061600. [PMID: 32183052 PMCID: PMC7146588 DOI: 10.3390/s20061600] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 03/07/2020] [Accepted: 03/08/2020] [Indexed: 11/25/2022]
Abstract
Unsupervised feature extraction algorithms form one of the most important building blocks in machine learning systems. These algorithms are often adapted to the event-based domain to perform online learning in neuromorphic hardware. However, not designed for the purpose, such algorithms typically require significant simplification during implementation to meet hardware constraints, creating trade offs with performance. Furthermore, conventional feature extraction algorithms are not designed to generate useful intermediary signals which are valuable only in the context of neuromorphic hardware limitations. In this work a novel event-based feature extraction method is proposed that focuses on these issues. The algorithm operates via simple adaptive selection thresholds which allow a simpler implementation of network homeostasis than previous works by trading off a small amount of information loss in the form of missed events that fall outside the selection thresholds. The behavior of the selection thresholds and the output of the network as a whole are shown to provide uniquely useful signals indicating network weight convergence without the need to access network weights. A novel heuristic method for network size selection is proposed which makes use of noise events and their feature representations. The use of selection thresholds is shown to produce network activation patterns that predict classification accuracy allowing rapid evaluation and optimization of system parameters without the need to run back-end classifiers. The feature extraction method is tested on both the N-MNIST (Neuromorphic-MNIST) benchmarking dataset and a dataset of airplanes passing through the field of view. Multiple configurations with different classifiers are tested with the results quantifying the resultant performance gains at each processing stage.
Collapse
|
47
|
Ramesh B, Ussa A, Della Vedova L, Yang H, Orchard G. Low-Power Dynamic Object Detection and Classification With Freely Moving Event Cameras. Front Neurosci 2020; 14:135. [PMID: 32153357 PMCID: PMC7044237 DOI: 10.3389/fnins.2020.00135] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 02/03/2020] [Indexed: 11/13/2022] Open
Abstract
We present the first purely event-based, energy-efficient approach for dynamic object detection and categorization with a freely moving event camera. Compared to traditional cameras, event-based object recognition systems are considerably behind in terms of accuracy and algorithmic maturity. To this end, this paper presents an event-based feature extraction method devised by accumulating local activity across the image frame and then applying principal component analysis (PCA) to the normalized neighborhood region. Subsequently, we propose a backtracking-free k-d tree mechanism for efficient feature matching by taking advantage of the low-dimensionality of the feature representation. Additionally, the proposed k-d tree mechanism allows for feature selection to obtain a lower-dimensional object representation when hardware resources are limited to implement PCA. Consequently, the proposed system can be realized on a field-programmable gate array (FPGA) device leading to high performance over resource ratio. The proposed system is tested on real-world event-based datasets for object categorization, showing superior classification performance compared to state-of-the-art algorithms. Additionally, we verified the real-time FPGA performance of the proposed object detection method, trained with limited data as opposed to deep learning methods, under a closed-loop aerial vehicle flight mode. We also compare the proposed object categorization framework to pre-trained convolutional neural networks using transfer learning and highlight the drawbacks of using frame-based sensors under dynamic camera motion. Finally, we provide critical insights about the feature extraction method and the classification parameters on the system performance, which aids in understanding the framework to suit various low-power (less than a few watts) application scenarios.
Collapse
Affiliation(s)
- Bharath Ramesh
- Life Science Institute, The N.1 Institute for Health, National University of Singapore, Singapore, Singapore.,Temasek Laboratories, National University of Singapore, Singapore, Singapore
| | - Andrés Ussa
- Life Science Institute, The N.1 Institute for Health, National University of Singapore, Singapore, Singapore.,Temasek Laboratories, National University of Singapore, Singapore, Singapore
| | - Luca Della Vedova
- Temasek Laboratories, National University of Singapore, Singapore, Singapore
| | - Hong Yang
- Temasek Laboratories, National University of Singapore, Singapore, Singapore
| | - Garrick Orchard
- Life Science Institute, The N.1 Institute for Health, National University of Singapore, Singapore, Singapore.,Temasek Laboratories, National University of Singapore, Singapore, Singapore
| |
Collapse
|
48
|
Xu Q, Peng J, Shen J, Tang H, Pan G. Deep CovDenseSNN: A hierarchical event-driven dynamic framework with spiking neurons in noisy environment. Neural Netw 2020; 121:512-519. [DOI: 10.1016/j.neunet.2019.08.034] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Revised: 08/20/2019] [Accepted: 08/25/2019] [Indexed: 11/29/2022]
|
49
|
Lin J, Yuan JS. A scalable and reconfigurable in-memory architecture for ternary deep spiking neural network with ReRAM based neurons. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.09.082] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
50
|
Towards spike-based machine intelligence with neuromorphic computing. Nature 2019; 575:607-617. [PMID: 31776490 DOI: 10.1038/s41586-019-1677-2] [Citation(s) in RCA: 410] [Impact Index Per Article: 68.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 07/09/2019] [Indexed: 11/08/2022]
Abstract
Guided by brain-like 'spiking' computational frameworks, neuromorphic computing-brain-inspired computing for machine intelligence-promises to realize artificial intelligence while reducing the energy requirements of computing platforms. This interdisciplinary field began with the implementation of silicon circuits for biological neural routines, but has evolved to encompass the hardware implementation of algorithms with spike-based encoding and event-driven representations. Here we provide an overview of the developments in neuromorphic computing for both algorithms and hardware and highlight the fundamentals of learning and hardware frameworks. We discuss the main challenges and the future prospects of neuromorphic computing, with emphasis on algorithm-hardware codesign.
Collapse
|