1
|
Tan G, Wan Z, Wang Y, Cao Y, Zha ZJ. Tackling Event-Based Lip-Reading by Exploring Multigrained Spatiotemporal Clues. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:8279-8291. [PMID: 39288038 DOI: 10.1109/tnnls.2024.3440495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/19/2024]
Abstract
Automatic lip-reading (ALR) is the task of recognizing words based on visual information obtained from the speaker's lip movements. In this study, we introduce event cameras, a novel type of sensing device, for ALR. Event cameras offer both technical and application advantages over conventional cameras for ALR due to their higher temporal resolution, less redundant visual information, and lower power consumption. To recognize words from the event data, we propose a novel multigrained spatiotemporal features learning framework, which is capable of perceiving fine-grained spatiotemporal features from microsecond time-resolved event data. Specifically, we first convert the event data into event frames of multiple temporal resolutions to avoid losing too much visual information at the event representation stage. Then, they are fed into a multibranch subnetwork where the branch operating on low-rate frames can perceive spatially complete but temporally coarse features, while the branch operating on high frame rate can perceive spatially coarse but temporally fine features. Thus, fine-grained spatial and temporal features can be simultaneously learned by integrating the features perceived by different branches. Furthermore, to model the temporal relationships in the event stream, we design a temporal aggregation subnetwork to aggregate the features perceived by the multibranch subnetwork. In addition, we collect two event-based lip-reading datasets (DVS-Lip and DVS-LRW100) for the study of the event-based lip-reading task. Experimental results demonstrate the superiority of the proposed model over the state-of-the-art event-based action recognition models and video-based lip-reading models.
Collapse
|
2
|
Zhang S, Zha F, Wang X, Li M, Guo W, Wang P, Li X, Sun L. High-efficiency sparse convolution operator for event-based cameras. Front Neurorobot 2025; 19:1537673. [PMID: 40144017 PMCID: PMC11936924 DOI: 10.3389/fnbot.2025.1537673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Accepted: 02/24/2025] [Indexed: 03/28/2025] Open
Abstract
Event-based cameras are bio-inspired vision sensors that mimic the sparse and asynchronous activation of the animal retina, offering advantages such as low latency and low computational load in various robotic applications. However, despite their inherent sparsity, most existing visual processing algorithms are optimized for conventional standard cameras and dense images captured from them, resulting in computational redundancy and high latency when applied to event-based cameras. To address this gap, we propose a sparse convolution operator tailored for event-based cameras. By selectively skipping invalid sub-convolutions and efficiently reorganizing valid computations, our operator reduces computational workload by nearly 90% and achieves almost 2× acceleration in processing speed, while maintaining the same accuracy as dense convolution operators. This innovation unlocks the potential of event-based cameras in applications such as autonomous navigation, real-time object tracking, and industrial inspection, enabling low-latency and high-efficiency perception in resource-constrained robotic systems.
Collapse
Affiliation(s)
- Sen Zhang
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Fusheng Zha
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
- Lanzhou University of Technology, Lanzhou, China
| | - Xiangji Wang
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Mantian Li
- Institute of Intelligent Manufacturing Technology, Shenzhen Polytechnic University, Shenzhen, China
| | - Wei Guo
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Pengfei Wang
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| | - Xiaolin Li
- Institute of Intelligent Manufacturing Technology, Shenzhen Polytechnic University, Shenzhen, China
| | - Lining Sun
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
3
|
Silva DA, Smagulova K, Elsheikh A, Fouda ME, Eltawil AM. A recurrent YOLOv8-based framework for event-based object detection. Front Neurosci 2025; 18:1477979. [PMID: 39911408 PMCID: PMC11794834 DOI: 10.3389/fnins.2024.1477979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 12/20/2024] [Indexed: 02/07/2025] Open
Abstract
Object detection plays a crucial role in various cutting-edge applications, such as autonomous vehicles and advanced robotics systems, primarily relying on conventional frame-based RGB sensors. However, these sensors face challenges such as motion blur and poor performance under extreme lighting conditions. Novel event-based cameras, inspired by biological vision systems, offer a promising solution with superior performance in fast-motion and challenging lighting environments while consuming less power. This work explores the integration of event-based cameras with advanced object detection frameworks, introducing Recurrent YOLOv8 (ReYOLOV8), a refined object detection framework that enhances a leading frame-based YOLO detection system with spatiotemporal modeling capabilities by adding recurrency. ReYOLOv8 incorporates a low-latency, memory-efficient method for encoding event data called Volume of Ternary Event Images (VTEI) and introduces a novel data augmentation technique based on Random Polarity Suppression (RPS) optimized for event-based sensors and tailored to leverage the unique attributes of event data. The framework was evaluated using two comprehensive event-based datasets Prophesee's Generation 1 (GEN1) and Person Detection for Robotics (PEDRo). On the GEN1 dataset, ReYOLOv8 achieved mAP improvements of 5%, 2.8%, and 2.5% across nano, small, and medium scales, respectively, while reducing trainable parameters by 4.43% on average and maintaining real-time processing speeds between 9.2 ms and 15.5 ms. For the PEDRo dataset, ReYOLOv8 demonstrated mAP improvements ranging from 9% to 18%, with models reduced in size by factors of 14.5 × and 3.8 × and an average speed improvement of 1.67 × . The results demonstrate the significant potential of bio-inspired event-based vision sensors when combined with advanced object detection frameworks. In particular, the ReYOLOv8 system effectively bridges the gap between biological principles of vision and artificial intelligence, enabling robust and efficient visual processing in dynamic and complex environments. The codes are available on GitHub at the following link https://github.com/silvada95/ReYOLOv8.
Collapse
Affiliation(s)
- Diego A. Silva
- Communication and Computing Systems Lab, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Kamilya Smagulova
- Communication and Computing Systems Lab, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Ahmed Elsheikh
- Mathematics and Engineering Physics Department, Faculty of Engineering, Cairo University, Giza, Egypt
| | | | - Ahmed M. Eltawil
- Communication and Computing Systems Lab, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
4
|
Gao Y, Lu J, Li S, Li Y, Du S. Hypergraph-Based Multi-View Action Recognition Using Event Cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6610-6622. [PMID: 38536691 DOI: 10.1109/tpami.2024.3382117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Action recognition from video data forms a cornerstone with wide-ranging applications. Single-view action recognition faces limitations due to its reliance on a single viewpoint. In contrast, multi-view approaches capture complementary information from various viewpoints for improved accuracy. Recently, event cameras have emerged as innovative bio-inspired sensors, leading to advancements in event-based action recognition. However, existing works predominantly focus on single-view scenarios, leaving a gap in multi-view event data exploitation, particularly in challenges like information deficit and semantic misalignment. To bridge this gap, we introduce HyperMV, a multi-view event-based action recognition framework. HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. By treating segments as vertices and constructing hyperedges using rule-based and KNN-based strategies, a multi-view hypergraph neural network that captures relationships across viewpoint and temporal features is established. The vertex attention hypergraph propagation is also introduced for enhanced feature fusion. To prompt research in this area, we present the largest multi-view event-based action dataset THUMV-EACT-50, comprising 50 actions from 6 viewpoints, which surpasses existing datasets by over tenfold. Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.
Collapse
|
5
|
Grimaldi A, Boutin V, Ieng SH, Benosman R, Perrinet LU. A robust event-driven approach to always-on object recognition. Neural Netw 2024; 178:106415. [PMID: 38852508 DOI: 10.1016/j.neunet.2024.106415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 04/05/2024] [Accepted: 05/29/2024] [Indexed: 06/11/2024]
Abstract
We propose a neuromimetic architecture capable of always-on pattern recognition, i.e. at any time during processing. To achieve this, we have extended an existing event-based algorithm (Lagorce et al., 2017), which introduced novel spatio-temporal features as a Hierarchy Of Time-Surfaces (HOTS). Built from asynchronous events captured by a neuromorphic camera, these time surfaces allow to encode the local dynamics of a visual scene and to create an efficient event-based pattern recognition architecture. Inspired by neuroscience, we have extended this method to improve its performance. First, we add a homeostatic gain control on the activity of neurons to improve the learning of spatio-temporal patterns (Grimaldi et al., 2021). We also provide a new mathematical formalism that allows an analogy to be drawn between the HOTS algorithm and Spiking Neural Networks (SNN). Following this analogy, we transform the offline pattern categorization method into an online and event-driven layer. This classifier uses the spiking output of the network to define new time surfaces and we then perform the online classification with a neuromimetic implementation of a multinomial logistic regression. These improvements not only consistently increase the performance of the network, but also bring this event-driven pattern recognition algorithm fully online. The results have been validated on different datasets: Poker-DVS (Serrano-Gotarredona and Linares-Barranco, 2015), N-MNIST (Orchard, Jayawant et al., 2015) and DVS Gesture (Amir et al., 2017). This demonstrates the efficiency of this bio-realistic SNN for ultra-fast object recognition through an event-by-event categorization process.
Collapse
Affiliation(s)
- Antoine Grimaldi
- Aix-Marseille Universit, Institut de Neurosciences de la Timone, CNRS, Marseille, France.
| | - Victor Boutin
- Carney Institute for Brain Science, Brown University, Providence, RI, United States; Artificial and Natural Intelligence Toulouse Institute, Université de Toulouse, Toulouse, France.
| | - Sio-Hoi Ieng
- Institut de la Vision, Sorbonne Université, CNRS, Paris, France.
| | - Ryad Benosman
- Robotics Institute, Carnegie Mellon University, Pittsburg, PA, United States.
| | - Laurent U Perrinet
- Aix-Marseille Universit, Institut de Neurosciences de la Timone, CNRS, Marseille, France.
| |
Collapse
|
6
|
Wan Z, Tan G, Wang Y, Zhai W, Cao Y, Zha ZJ. Event-Based Optical Flow via Transforming Into Motion-Dependent View. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:5327-5339. [PMID: 39058603 DOI: 10.1109/tip.2024.3426469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2024]
Abstract
Event cameras respond to temporal dynamics, helping to resolve ambiguities in spatio-temporal changes for optical flow estimation. However, the unique spatio-temporal event distribution challenges the feature extraction, and the direct construction of motion representation through the orthogonal view is less than ideal due to the entanglement of appearance and motion. This paper proposes to transform the orthogonal view into a motion-dependent one for enhancing event-based motion representation and presents a Motion View-based Network (MV-Net) for practical optical flow estimation. Specifically, this motion-dependent view transformation is achieved through the Event View Transformation Module, which captures the relationship between the steepest temporal changes and motion direction, incorporating these temporal cues into the view transformation process for feature gathering. This module includes two phases: extracting the temporal evolution clues by central difference operation in the extraction phase and capturing the motion pattern by evolution-guided deformable convolution in the perception phase. Besides, the MV-Net constructs an eccentric downsampling process to avoid response weakening from the sparsity of events in the downsampling stage. The whole network is trained end-to-end in a self-supervised manner, and the evaluations conducted on four challenging datasets reveal the superior performance of the proposed model compared to state-of-the-art (SOTA) methods.
Collapse
|
7
|
Yu H, Li H, Yang W, Yu L, Xia GS. Detecting Line Segments in Motion-Blurred Images With Events. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:2866-2881. [PMID: 37983154 DOI: 10.1109/tpami.2023.3334877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Making line segment detectors more reliable under motion blurs is one of the most important challenges for practical applications, such as visual SLAM and 3D line mapping. Existing line segment detection methods face severe performance degradation for accurately detecting and locating line segments when motion blur occurs. While event data shows strong complementary characteristics to images for minimal blur and edge awareness at high-temporal resolution, potentially beneficial for reliable line segment recognition. To robustly detect line segments over motion blurs, we propose to leverage the complementary information of images and events. Specifically, we first design a general frame-event feature fusion network to extract and fuse the detailed image textures and low-latency event edges, which consists of a channel-attention-based shallow fusion module and a self-attention-based dual hourglass module. We then utilize the state-of-the-art wireframe parsing networks to detect line segments on the fused feature map. Moreover, due to the lack of line segment detection datasets with pairwise motion-blurred images and events, we contribute two datasets, i.e., synthetic FE-Wireframe and realistic FE-Blurframe, for network training and evaluation. Extensive analyses on the component configurations demonstrate the design effectiveness of our fusion network. When compared to the state-of-the-arts, the proposed approach achieves the highest detection accuracy while maintaining comparable real-time performance. In addition to being robust to motion blur, our method also exhibits superior performance for line detection under high dynamic range scenes.
Collapse
|
8
|
Schoepe T, Janotte E, Milde MB, Bertrand OJN, Egelhaaf M, Chicca E. Finding the gap: neuromorphic motion-vision in dense environments. Nat Commun 2024; 15:817. [PMID: 38280859 PMCID: PMC10821932 DOI: 10.1038/s41467-024-45063-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 01/15/2024] [Indexed: 01/29/2024] Open
Abstract
Animals have evolved mechanisms to travel safely and efficiently within different habitats. On a journey in dense terrains animals avoid collisions and cross narrow passages while controlling an overall course. Multiple hypotheses target how animals solve challenges faced during such travel. Here we show that a single mechanism enables safe and efficient travel. We developed a robot inspired by insects. It has remarkable capabilities to travel in dense terrain, avoiding collisions, crossing gaps and selecting safe passages. These capabilities are accomplished by a neuromorphic network steering the robot toward regions of low apparent motion. Our system leverages knowledge about vision processing and obstacle avoidance in insects. Our results demonstrate how insects might safely travel through diverse habitats. We anticipate our system to be a working hypothesis to study insects' travels in dense terrains. Furthermore, it illustrates that we can design novel hardware systems by understanding the underlying mechanisms driving behaviour.
Collapse
Affiliation(s)
- Thorben Schoepe
- Peter Grünberg Institut 15, Forschungszentrum Jülich, Aachen, Germany.
- Faculty of Technology and Cognitive Interaction Technology Center of Excellence (CITEC), Bielefeld University, Bielefeld, Germany.
- Bio-Inspired Circuits and Systems (BICS) Lab. Zernike Institute for Advanced Materials (Zernike Inst Adv Mat), University of Groningen, Groningen, Netherlands.
- CogniGron (Groningen Cognitive Systems and Materials Center), University of Groningen, Groningen, Netherlands.
| | - Ella Janotte
- Event Driven Perception for Robotics, Italian Institute of Technology, iCub facility, Genoa, Italy
| | - Moritz B Milde
- International Centre for Neuromorphic Systems, MARCS Institute, Western Sydney University, Penrith, Australia
| | | | - Martin Egelhaaf
- Neurobiology, Faculty of Biology, Bielefeld University, Bielefeld, Germany
| | - Elisabetta Chicca
- Faculty of Technology and Cognitive Interaction Technology Center of Excellence (CITEC), Bielefeld University, Bielefeld, Germany
- Bio-Inspired Circuits and Systems (BICS) Lab. Zernike Institute for Advanced Materials (Zernike Inst Adv Mat), University of Groningen, Groningen, Netherlands
- CogniGron (Groningen Cognitive Systems and Materials Center), University of Groningen, Groningen, Netherlands
| |
Collapse
|
9
|
Gao Y, Lu J, Li S, Ma N, Du S, Li Y, Dai Q. Action Recognition and Benchmark Using Event Cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14081-14097. [PMID: 37527291 DOI: 10.1109/tpami.2023.3300741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
Recent years have witnessed remarkable achievements in video-based action recognition. Apart from traditional frame-based cameras, event cameras are bio-inspired vision sensors that only record pixel-wise brightness changes rather than the brightness value. However, little effort has been made in event-based action recognition, and large-scale public datasets are also nearly unavailable. In this paper, we propose an event-based action recognition framework called EV-ACT. The Learnable Multi-Fused Representation (LMFR) is first proposed to integrate multiple event information in a learnable manner. The LMFR with dual temporal granularity is fed into the event-based slow-fast network for the fusion of appearance and motion features. A spatial-temporal attention mechanism is introduced to further enhance the learning capability of action recognition. To prompt research in this direction, we have collected the largest event-based action recognition benchmark named THUE-ACT-50 and the accompanying THUE-ACT-50-CHL dataset under challenging environments, including a total of over 12,830 recordings from 50 action categories, which is over 4 times the size of the previous largest dataset. Experimental results show that our proposed framework could achieve improvements of over 14.5%, 7.6%, 11.2%, and 7.4% compared to previous works on four benchmarks. We have also deployed our proposed EV-ACT framework on a mobile platform to validate its practicality and efficiency.
Collapse
|
10
|
Schmid D, Jarvers C, Neumann H. Canonical circuit computations for computer vision. BIOLOGICAL CYBERNETICS 2023; 117:299-329. [PMID: 37306782 PMCID: PMC10600314 DOI: 10.1007/s00422-023-00966-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 05/18/2023] [Indexed: 06/13/2023]
Abstract
Advanced computer vision mechanisms have been inspired by neuroscientific findings. However, with the focus on improving benchmark achievements, technical solutions have been shaped by application and engineering constraints. This includes the training of neural networks which led to the development of feature detectors optimally suited to the application domain. However, the limitations of such approaches motivate the need to identify computational principles, or motifs, in biological vision that can enable further foundational advances in machine vision. We propose to utilize structural and functional principles of neural systems that have been largely overlooked. They potentially provide new inspirations for computer vision mechanisms and models. Recurrent feedforward, lateral, and feedback interactions characterize general principles underlying processing in mammals. We derive a formal specification of core computational motifs that utilize these principles. These are combined to define model mechanisms for visual shape and motion processing. We demonstrate how such a framework can be adopted to run on neuromorphic brain-inspired hardware platforms and can be extended to automatically adapt to environment statistics. We argue that the identified principles and their formalization inspires sophisticated computational mechanisms with improved explanatory scope. These and other elaborated, biologically inspired models can be employed to design computer vision solutions for different tasks and they can be used to advance neural network architectures of learning.
Collapse
Affiliation(s)
- Daniel Schmid
- Institute for Neural Information Processing, Ulm University, James-Franck-Ring, Ulm, 89081 Germany
| | - Christian Jarvers
- Institute for Neural Information Processing, Ulm University, James-Franck-Ring, Ulm, 89081 Germany
| | - Heiko Neumann
- Institute for Neural Information Processing, Ulm University, James-Franck-Ring, Ulm, 89081 Germany
| |
Collapse
|
11
|
Fu Q. Motion perception based on ON/OFF channels: A survey. Neural Netw 2023; 165:1-18. [PMID: 37263088 DOI: 10.1016/j.neunet.2023.05.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 04/02/2023] [Accepted: 05/17/2023] [Indexed: 06/03/2023]
Abstract
Motion perception is an essential ability for animals and artificially intelligent systems interacting effectively, safely with surrounding objects and environments. Biological visual systems, that have naturally evolved over hundreds-million years, are quite efficient and robust for motion perception, whereas artificial vision systems are far from such capability. This paper argues that the gap can be significantly reduced by formulation of ON/OFF channels in motion perception models encoding luminance increment (ON) and decrement (OFF) responses within receptive field, separately. Such signal-bifurcating structure has been found in neural systems of many animal species articulating early motion is split and processed in segregated pathways. However, the corresponding biological substrates, and the necessity for artificial vision systems have never been elucidated together, leaving concerns on uniqueness and advantages of ON/OFF channels upon building dynamic vision systems to address real world challenges. This paper highlights the importance of ON/OFF channels in motion perception through surveying current progress covering both neuroscience and computationally modelling works with applications. Compared to related literature, this paper for the first time provides insights into implementation of different selectivity to directional motion of looming, translating, and small-sized target movement based on ON/OFF channels in keeping with soundness and robustness of biological principles. Existing challenges and future trends of such bio-plausible computational structure for visual perception in connection with hotspots of machine learning, advanced vision sensors like event-driven camera finally are discussed.
Collapse
Affiliation(s)
- Qinbing Fu
- Machine Life and Intelligence Research Centre, School of Mathematics and Information Science, Guangzhou University, Guangzhou, 510006, China.
| |
Collapse
|
12
|
Baldwin RW, Liu R, Almatrafi M, Asari V, Hirakawa K. Time-Ordered Recent Event (TORE) Volumes for Event Cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2519-2532. [PMID: 35503820 DOI: 10.1109/tpami.2022.3172212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Event cameras are an exciting, new sensor modality enabling high-speed imaging with extremely low-latency and wide dynamic range. Unfortunately, most machine learning architectures are not designed to directly handle sparse data, like that generated from event cameras. Many state-of-the-art algorithms for event cameras rely on interpolated event representations-obscuring crucial timing information, increasing the data volume, and limiting overall network performance. This paper details an event representation called Time-Ordered Recent Event (TORE) volumes. TORE volumes are designed to compactly store raw spike timing information with minimal information loss. This bio-inspired design is memory efficient, computationally fast, avoids time-blocking (i.e., fixed and predefined frame rates), and contains "local memory" from past data. The design is evaluated on a wide range of challenging tasks (e.g., event denoising, image reconstruction, classification, and human pose estimation) and is shown to dramatically improve state-of-the-art performance. TORE volumes are an easy-to-implement replacement for any algorithm currently utilizing event representations.
Collapse
|
13
|
Zhang Y, Lv H, Zhao Y, Feng Y, Liu H, Bi G. Event-Based Optical Flow Estimation with Spatio-Temporal Backpropagation Trained Spiking Neural Network. MICROMACHINES 2023; 14:mi14010203. [PMID: 36677264 PMCID: PMC9867051 DOI: 10.3390/mi14010203] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 01/11/2023] [Accepted: 01/12/2023] [Indexed: 06/01/2023]
Abstract
The advantages of an event camera, such as low power consumption, large dynamic range, and low data redundancy, enable it to shine in extreme environments where traditional image sensors are not competent, especially in high-speed moving target capture and extreme lighting conditions. Optical flow reflects the target's movement information, and the target's detailed movement can be obtained using the event camera's optical flow information. However, the existing neural network methods for optical flow prediction of event cameras has the problems of extensive computation and high energy consumption in hardware implementation. The spike neural network has spatiotemporal coding characteristics, so it can be compatible with the spatiotemporal data of an event camera. Moreover, the sparse coding characteristic of the spike neural network makes it run with ultra-low power consumption on neuromorphic hardware. However, because of the algorithmic and training complexity, the spike neural network has not been applied in the prediction of the optical flow for the event camera. For this case, this paper proposes an end-to-end spike neural network to predict the optical flow of the discrete spatiotemporal data stream for the event camera. The network is trained with the spatio-temporal backpropagation method in a self-supervised way, which fully combines the spatiotemporal characteristics of the event camera while improving the network performance. Compared with the existing methods on the public dataset, the experimental results show that the method proposed in this paper is equivalent to the best existing methods in terms of optical flow prediction accuracy, and it can save 99% more power consumption than the existing algorithm, which is greatly beneficial to the hardware implementation of the event camera optical flow prediction., laying the groundwork for future low-power hardware implementation of optical flow prediction for event cameras.
Collapse
Affiliation(s)
- Yisa Zhang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hengyi Lv
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Yuchen Zhao
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Yang Feng
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Hailong Liu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Guoling Bi
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| |
Collapse
|
14
|
Grimaldi A, Gruel A, Besnainou C, Jérémie JN, Martinet J, Perrinet LU. Precise Spiking Motifs in Neurobiological and Neuromorphic Data. Brain Sci 2022; 13:68. [PMID: 36672049 PMCID: PMC9856822 DOI: 10.3390/brainsci13010068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 12/20/2022] [Accepted: 12/23/2022] [Indexed: 12/31/2022] Open
Abstract
Why do neurons communicate through spikes? By definition, spikes are all-or-none neural events which occur at continuous times. In other words, spikes are on one side binary, existing or not without further details, and on the other, can occur at any asynchronous time, without the need for a centralized clock. This stands in stark contrast to the analog representation of values and the discretized timing classically used in digital processing and at the base of modern-day neural networks. As neural systems almost systematically use this so-called event-based representation in the living world, a better understanding of this phenomenon remains a fundamental challenge in neurobiology in order to better interpret the profusion of recorded data. With the growing need for intelligent embedded systems, it also emerges as a new computing paradigm to enable the efficient operation of a new class of sensors and event-based computers, called neuromorphic, which could enable significant gains in computation time and energy consumption-a major societal issue in the era of the digital economy and global warming. In this review paper, we provide evidence from biology, theory and engineering that the precise timing of spikes plays a crucial role in our understanding of the efficiency of neural networks.
Collapse
Affiliation(s)
- Antoine Grimaldi
- INT UMR 7289, Aix Marseille Univ, CNRS, 27 Bd Jean Moulin, 13005 Marseille, France
| | - Amélie Gruel
- SPARKS, Côte d’Azur, CNRS, I3S, 2000 Rte des Lucioles, 06900 Sophia-Antipolis, France
| | - Camille Besnainou
- INT UMR 7289, Aix Marseille Univ, CNRS, 27 Bd Jean Moulin, 13005 Marseille, France
| | - Jean-Nicolas Jérémie
- INT UMR 7289, Aix Marseille Univ, CNRS, 27 Bd Jean Moulin, 13005 Marseille, France
| | - Jean Martinet
- SPARKS, Côte d’Azur, CNRS, I3S, 2000 Rte des Lucioles, 06900 Sophia-Antipolis, France
| | - Laurent U. Perrinet
- INT UMR 7289, Aix Marseille Univ, CNRS, 27 Bd Jean Moulin, 13005 Marseille, France
| |
Collapse
|
15
|
Schiopu I, Bilcu RC. Low-Complexity Lossless Coding of Asynchronous Event Sequences for Low-Power Chip Integration. SENSORS (BASEL, SWITZERLAND) 2022; 22:10014. [PMID: 36560383 PMCID: PMC9783680 DOI: 10.3390/s222410014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 12/13/2022] [Accepted: 12/16/2022] [Indexed: 06/17/2023]
Abstract
The event sensor provides high temporal resolution and generates large amounts of raw event data. Efficient low-complexity coding solutions are required for integration into low-power event-processing chips with limited memory. In this paper, a novel lossless compression method is proposed for encoding the event data represented as asynchronous event sequences. The proposed method employs only low-complexity coding techniques so that it is suitable for hardware implementation into low-power event-processing chips. A first, novel, contribution consists of a low-complexity coding scheme which uses a decision tree to reduce the representation range of the residual error. The decision tree is formed by using a triplet threshold parameter which divides the input data range into several coding ranges arranged at concentric distances from an initial prediction, so that the residual error of the true value information is represented by using a reduced number of bits. Another novel contribution consists of an improved representation, which divides the input sequence into same-timestamp subsequences, wherein each subsequence collects the same timestamp events in ascending order of the largest dimension of the event spatial information. The proposed same-timestamp representation replaces the event timestamp information with the same-timestamp subsequence length and encodes it together with the event spatial and polarity information into a different bitstream. Another novel contribution is the random access to any time window by using additional header information. The experimental evaluation on a highly variable event density dataset demonstrates that the proposed low-complexity lossless coding method provides an average improvement of 5.49%, 11.45%, and 35.57% compared with the state-of-the-art performance-oriented lossless data compression codecs Bzip2, LZMA, and ZLIB, respectively. To our knowledge, the paper proposes the first low-complexity lossless compression method for encoding asynchronous event sequences that are suitable for hardware implementation into low-power chips.
Collapse
|
16
|
Nunes UM, Demiris Y. Robust Event-Based Vision Model Estimation by Dispersion Minimisation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9561-9573. [PMID: 34813470 DOI: 10.1109/tpami.2021.3130049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We propose a novel Dispersion Minimisation framework for event-based vision model estimation, with applications to optical flow and high-speed motion estimation. The framework extends previous event-based motion compensation algorithms by avoiding computing an optimisation score based on an explicit image-based representation, which provides three main benefits: i) The framework can be extended to perform incremental estimation, i.e., on an event-by-event basis. ii) Besides purely visual transformations in 2D, the framework can readily use additional information, e.g., by augmenting the events with depth, to estimate the parameters of motion models in higher dimensional spaces. iii) The optimisation complexity only depends on the number of events. We achieve this by modelling the event alignment according to candidate parameters and minimising the resultant dispersion, which is computed by a family of suitable entropy-based measures. Data whitening is also proposed as a simple and effective pre-processing step to make the framework's accuracy performance more robust, as well as other event-based motion-compensation methods. The framework is evaluated on several challenging motion estimation problems, including 6-DOF transformation, rotational motion, and optical flow estimation, achieving state-of-the-art performance.
Collapse
|
17
|
Duan P, Wang ZW, Shi B, Cossairt O, Huang T, Katsaggelos AK. Guided Event Filtering: Synergy Between Intensity Images and Neuromorphic Events for High Performance Imaging. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8261-8275. [PMID: 34543190 DOI: 10.1109/tpami.2021.3113344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Many visual and robotics tasks in real-world scenarios rely on robust handling of high speed motion and high dynamic range (HDR) with effectively high spatial resolution and low noise. Such stringent requirements, however, cannot be directly satisfied by a single imager or imaging modality, rather by multi-modal sensors with complementary advantages. In this paper, we address high performance imaging by exploring the synergy between traditional frame-based sensors with high spatial resolution and low sensor noise, and emerging event-based sensors with high speed and high dynamic range. We introduce a novel computational framework, termed Guided Event Filtering (GEF), to process these two streams of input data and output a stream of super-resolved yet noise-reduced events. To generate high quality events, GEF first registers the captured noisy events onto the guidance image plane according to our flow model. it then performs joint image filtering that inherits the mutual structure from both inputs. Lastly, GEF re-distributes the filtered event frame in the space-time volume while preserving the statistical characteristics of the original events. When the guidance images under-perform, GEF incorporates an event self-guiding mechanism that resorts to neighbor events for guidance. We demonstrate the benefits of GEF by applying the output high quality events to existing event-based algorithms across diverse application categories, including high speed object tracking, depth estimation, high frame-rate video synthesis, and super resolution/HDR/color image restoration.
Collapse
|
18
|
Wang Y, Yang J, Peng X, Wu P, Gao L, Huang K, Chen J, Kneip L. Visual Odometry with an Event Camera Using Continuous Ray Warping and Volumetric Contrast Maximization. SENSORS (BASEL, SWITZERLAND) 2022; 22:5687. [PMID: 35957244 PMCID: PMC9370870 DOI: 10.3390/s22155687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 07/12/2022] [Accepted: 07/26/2022] [Indexed: 06/15/2023]
Abstract
We present a new solution to tracking and mapping with an event camera. The motion of the camera contains both rotation and translation displacements in the plane, and the displacements happen in an arbitrarily structured environment. As a result, the image matching may no longer be represented by a low-dimensional homographic warping, thus complicating an application of the commonly used Image of Warped Events (IWE). We introduce a new solution to this problem by performing contrast maximization in 3D. The 3D location of the rays cast for each event is smoothly varied as a function of a continuous-time motion parametrization, and the optimal parameters are found by maximizing the contrast in a volumetric ray density field. Our method thus performs joint optimization over motion and structure. The practical validity of our approach is supported by an application to AGV motion estimation and 3D reconstruction with a single vehicle-mounted event camera. The method approaches the performance obtained with regular cameras and eventually outperforms in challenging visual conditions.
Collapse
Affiliation(s)
- Yifu Wang
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Jiaqi Yang
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Xin Peng
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Peng Wu
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Ling Gao
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Kun Huang
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Jiaben Chen
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
| | - Laurent Kneip
- School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China; (Y.W.); (J.Y.); (X.P.); (P.W.); (L.G.); (K.H.); (J.C.)
- Shanghai Engineering Research Center of Intelligent Vision and Imaging, ShanghaiTech University, Shanghai 201210, China
| |
Collapse
|
19
|
Gu F, Lee Y, Zhuang Y, Li Y, Liu J, Yu F, Li R, Chen C. MDOE: A Spatiotemporal Event Representation Considering the Magnitude and Density of Events. IEEE Robot Autom Lett 2022. [DOI: 10.1109/lra.2022.3186523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Fuqiang Gu
- College of Computer Science, Chongqing University, Chongqing, China
| | - Yong Lee
- School of Mechanical and Electronic Engineering, Wuhan University of Technology, Wuhan, China
| | - Yuan Zhuang
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
| | - You Li
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University and the Hubei Luojia Laboratory, Wuhan, China
| | - Jingbin Liu
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
| | - Fangwen Yu
- Department of Precision Instrument, Tsinghua University, Beijing, China
| | - Ruiyuan Li
- College of Computer Science, Chongqing University, Chongqing, China
| | - Chao Chen
- College of Computer Science, Chongqing University, Chongqing, China
| |
Collapse
|
20
|
Peng X, Gao L, Wang Y, Kneip L. Globally-Optimal Contrast Maximisation for Event Cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:3479-3495. [PMID: 33471749 DOI: 10.1109/tpami.2021.3053243] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Event cameras are bio-inspired sensors that perform well in challenging illumination conditions and have high temporal resolution. However, their concept is fundamentally different from traditional frame-based cameras. The pixels of an event camera operate independently and asynchronously. They measure changes of the logarithmic brightness and return them in the highly discretised form of time-stamped events indicating a relative change of a certain quantity since the last event. New models and algorithms are needed to process this kind of measurements. The present work looks at several motion estimation problems with event cameras. The flow of the events is modelled by a general homographic warping in a space-time volume, and the objective is formulated as a maximisation of contrast within the image of warped events. Our core contribution consists of deriving globally optimal solutions to these generally non-convex problems, which removes the dependency on a good initial guess plaguing existing methods. Our methods rely on branch-and-bound optimisation and employ novel and efficient, recursive upper and lower bounds derived for six different contrast estimation functions. The practical validity of our approach is demonstrated by a successful application to three different event camera motion estimation problems.
Collapse
|
21
|
Ralph N, Joubert D, Jolley A, Afshar S, Tothill N, van Schaik A, Cohen G. Real-Time Event-Based Unsupervised Feature Consolidation and Tracking for Space Situational Awareness. Front Neurosci 2022; 16:821157. [PMID: 35600627 PMCID: PMC9120364 DOI: 10.3389/fnins.2022.821157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 04/04/2022] [Indexed: 11/19/2022] Open
Abstract
Earth orbit is a limited natural resource that hosts a vast range of vital space-based systems that support the international community's national, commercial and defence interests. This resource is rapidly becoming depleted with over-crowding in high demand orbital slots and a growing presence of space debris. We propose the Fast Iterative Extraction of Salient targets for Tracking Asynchronously (FIESTA) algorithm as a robust, real-time and reactive approach to optical Space Situational Awareness (SSA) using Event-Based Cameras (EBCs) to detect, localize, and track Resident Space Objects (RSOs) accurately and timely. We address the challenges of the asynchronous nature and high temporal resolution output of the EBC accurately, unsupervised and with few tune-able parameters using concepts established in the neuromorphic and conventional tracking literature. We show this algorithm is capable of highly accurate in-frame RSO velocity estimation and average sub-pixel localization in a simulated test environment to distinguish the capabilities of the EBC and optical setup from the proposed tracking system. This work is a fundamental step toward accurate end-to-end real-time optical event-based SSA, and developing the foundation for robust closed-form tracking evaluated using standardized tracking metrics.
Collapse
Affiliation(s)
- Nicholas Ralph
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
- *Correspondence: Nicholas Ralph
| | - Damien Joubert
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
| | - Andrew Jolley
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
- Air and Space Power Development Centre, Royal Australian Air Force, Canberra, ACT, Australia
| | - Saeed Afshar
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
| | - Nicholas Tothill
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
| | - André van Schaik
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
| | - Gregory Cohen
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
| |
Collapse
|
22
|
Ozawa T, Sekikawa Y, Saito H. Accuracy and Speed Improvement of Event Camera Motion Estimation Using a Bird’s-Eye View Transformation. SENSORS 2022; 22:s22030773. [PMID: 35161519 PMCID: PMC8840125 DOI: 10.3390/s22030773] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 01/14/2022] [Accepted: 01/17/2022] [Indexed: 01/27/2023]
Abstract
Event cameras are bio-inspired sensors that have a high dynamic range and temporal resolution. This property enables motion estimation from textures with repeating patterns, which is difficult to achieve with RGB cameras. Therefore, motion estimation of an event camera is expected to be applied to vehicle position estimation. An existing method, called contrast maximization, is one of the methods that can be used for event camera motion estimation by capturing road surfaces. However, contrast maximization tends to fall into a local solution when estimating three-dimensional motion, which makes correct estimation difficult. To solve this problem, we propose a method for motion estimation by optimizing contrast in the bird’s-eye view space. Instead of performing three-dimensional motion estimation, we reduced the dimensionality to two-dimensional motion estimation by transforming the event data to a bird’s-eye view using homography calculated from the event camera position. This transformation mitigates the problem of the loss function becoming non-convex, which occurs in conventional methods. As a quantitative experiment, we created event data by using a car simulator and evaluated our motion estimation method, showing an improvement in accuracy and speed. In addition, we conducted estimation from real event data and evaluated the results qualitatively, showing an improvement in accuracy.
Collapse
Affiliation(s)
- Takehiro Ozawa
- Graduate School of Science and Technology, Keio University, Yokohama 223-8522, Japan;
- Correspondence:
| | | | - Hideo Saito
- Graduate School of Science and Technology, Keio University, Yokohama 223-8522, Japan;
| |
Collapse
|
23
|
Akolkar H, Ieng SH, Benosman R. Real-Time High Speed Motion Prediction Using Fast Aperture-Robust Event-Driven Visual Flow. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:361-372. [PMID: 32750822 DOI: 10.1109/tpami.2020.3010468] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Optical flow is a crucial component of the feature space for early visual processing of dynamic scenes especially in new applications such as self-driving vehicles, drones and autonomous robots. The dynamic vision sensors are well suited for such applications because of their asynchronous, sparse and temporally precise representation of the visual dynamics. Many algorithms proposed for computing visual flow for these sensors suffer from the aperture problem as the direction of the estimated flow is governed by the curvature of the object rather than the true motion direction. Some methods that do overcome this problem by temporal windowing under-utilize the true precise temporal nature of the dynamic sensors. In this paper, we propose a novel multi-scale plane fitting based visual flow algorithm that is robust to the aperture problem and also computationally fast and efficient. Our algorithm performs well in many scenarios ranging from fixed camera recording simple geometric shapes to real world scenarios such as camera mounted on a moving car and can successfully perform event-by-event motion estimation of objects in the scene to allow for predictions of upto 500 ms i.e., equivalent to 10 to 25 frames with traditional cameras.
Collapse
|
24
|
Gallego G, Delbruck T, Orchard G, Bartolozzi C, Taba B, Censi A, Leutenegger S, Davison AJ, Conradt J, Daniilidis K, Scaramuzza D. Event-Based Vision: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:154-180. [PMID: 32750812 DOI: 10.1109/tpami.2020.3008413] [Citation(s) in RCA: 227] [Impact Index Per Article: 75.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of μs), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.
Collapse
|
25
|
Lin Y, Ding W, Qiang S, Deng L, Li G. ES-ImageNet: A Million Event-Stream Classification Dataset for Spiking Neural Networks. Front Neurosci 2021; 15:726582. [PMID: 34899154 PMCID: PMC8655353 DOI: 10.3389/fnins.2021.726582] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 10/22/2021] [Indexed: 11/16/2022] Open
Abstract
With event-driven algorithms, especially spiking neural networks (SNNs), achieving continuous improvement in neuromorphic vision processing, a more challenging event-stream dataset is urgently needed. However, it is well-known that creating an ES-dataset is a time-consuming and costly task with neuromorphic cameras like dynamic vision sensors (DVS). In this work, we propose a fast and effective algorithm termed Omnidirectional Discrete Gradient (ODG) to convert the popular computer vision dataset ILSVRC2012 into its event-stream (ES) version, generating about 1,300,000 frame-based images into ES-samples in 1,000 categories. In this way, we propose an ES-dataset called ES-ImageNet, which is dozens of times larger than other neuromorphic classification datasets at present and completely generated by the software. The ODG algorithm implements image motion to generate local value changes with discrete gradient information in different directions, providing a low-cost and high-speed method for converting frame-based images into event streams, along with Edge-Integral to reconstruct the high-quality images from event streams. Furthermore, we analyze the statistics of ES-ImageNet in multiple ways, and a performance benchmark of the dataset is also provided using both famous deep neural network algorithms and spiking neural network algorithms. We believe that this work shall provide a new large-scale benchmark dataset for SNNs and neuromorphic vision.
Collapse
Affiliation(s)
- Yihan Lin
- Department of Precision Instrument, Center for Brain Inspired Computing Research, Tsinghua University, Beijing, China
| | - Wei Ding
- Department of Precision Instrument, Center for Brain Inspired Computing Research, Tsinghua University, Beijing, China
| | - Shaohua Qiang
- Department of Precision Instrument, Center for Brain Inspired Computing Research, Tsinghua University, Beijing, China
| | - Lei Deng
- Department of Precision Instrument, Center for Brain Inspired Computing Research, Tsinghua University, Beijing, China
| | - Guoqi Li
- Department of Precision Instrument, Center for Brain Inspired Computing Research, Tsinghua University, Beijing, China
| |
Collapse
|
26
|
Kim Y, Panda P. Optimizing Deeper Spiking Neural Networks for Dynamic Vision Sensing. Neural Netw 2021; 144:686-698. [PMID: 34662827 DOI: 10.1016/j.neunet.2021.09.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Revised: 09/22/2021] [Accepted: 09/24/2021] [Indexed: 11/20/2022]
Abstract
Spiking Neural Networks (SNNs) have recently emerged as a new generation of low-power deep neural networks due to sparse, asynchronous, and binary event-driven processing. Most previous deep SNN optimization methods focus on static datasets (e.g., MNIST) from a conventional frame-based camera. On the other hand, optimization techniques for event data from Dynamic Vision Sensor (DVS) cameras are still at infancy. Most prior SNN techniques handling DVS data are limited to shallow networks and thus, show low performance. Generally, we observe that the integrate-and-fire behavior of spiking neurons diminishes spike activity in deeper layers. The sparse spike activity results in a sub-optimal solution during training (i.e., performance degradation). To address this limitation, we propose novel algorithmic and architectural advances to accelerate the training of very deep SNNs on DVS data. Specifically, we propose Spike Activation Lift Training (SALT) which increases spike activity across all layers by optimizing both weights and thresholds in convolutional layers. After applying SALT, we train the weights based on the cross-entropy loss. SALT helps the networks to convey ample information across all layers during training and therefore improves the performance. Furthermore, we propose a simple and effective architecture, called Switched-BN, which exploits Batch Normalization (BN). Previous methods show that the standard BN is incompatible with the temporal dynamics of SNNs. Therefore, in Switched-BN architecture, we apply BN to the last layer of an SNN after accumulating all the spikes from previous layer with a spike voltage accumulator (i.e., converting temporal spike information to float value). Even though we apply BN in just one layer of SNNs, our results demonstrate a considerable performance gain without any significant computational overhead. Through extensive experiments, we show the effectiveness of SALT and Switched-BN for training very deep SNNs from scratch on various benchmarks including, DVS-Cifar10, N-Caltech, DHP19, CIFAR10, and CIFAR100. To the best of our knowledge, this is the first work showing state-of-the-art performance with deep SNNs on DVS data.
Collapse
Affiliation(s)
- Youngeun Kim
- Department of Electrical Engineering, Yale University, New Haven, CT, USA.
| | | |
Collapse
|
27
|
Dinaux R, Wessendorp N, Dupeyroux J, Croon GCHED. FAITH: Fast Iterative Half-Plane Focus of Expansion Estimation Using Optic Flow. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3100153] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
28
|
|
29
|
Ge Z, Gao Y, So HKH, Lam EY. Event-based laser speckle correlation for micro motion estimation. OPTICS LETTERS 2021; 46:3885-3888. [PMID: 34388766 DOI: 10.1364/ol.430419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/13/2021] [Indexed: 06/13/2023]
Abstract
Micro motion estimation has important applications in various fields such as microfluidic particle detection and biomedical cell imaging. Conventional methods analyze the motion from intensity images captured using frame-based imaging sensors such as the complementary metal-oxide semiconductor (CMOS) and the charge-coupled device (CCD). Recently, event-based sensors have evolved with the special capability to record asynchronous light changes with high dynamic range, high temporal resolution, low latency, and no motion blur. In this Letter, we explore the potential of using the event sensor to estimate the micro motion based on the laser speckle correlation technique.
Collapse
|
30
|
Li R, Shi D, Zhang Y, Li R, Wang M. Asynchronous event feature generation and tracking based on gradient descriptor for event cameras. INT J ADV ROBOT SYST 2021. [DOI: 10.1177/17298814211027028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Recently, the event camera has become a popular and promising vision sensor in the research of simultaneous localization and mapping and computer vision owing to its advantages: low latency, high dynamic range, and high temporal resolution. As a basic part of the feature-based SLAM system, the feature tracking method using event cameras is still an open question. In this article, we present a novel asynchronous event feature generation and tracking algorithm operating directly on event-streams to fully utilize the natural asynchronism of event cameras. The proposed algorithm consists of an event-corner detection unit, a descriptor construction unit, and an event feature tracking unit. The event-corner detection unit addresses a fast and asynchronous corner detector to extract event-corners from event-streams. For the descriptor construction unit, we propose a novel asynchronous gradient descriptor inspired by the scale-invariant feature transform descriptor, which helps to achieve quantitative measurement of similarity between event feature pairs. The construction of the gradient descriptor can be decomposed into three stages: speed-invariant time surface maintenance and extraction, principal orientation calculation, and descriptor generation. The event feature tracking unit combines the constructed gradient descriptor and an event feature matching method to achieve asynchronous feature tracking. We implement the proposed algorithm in C++ and evaluate it on a public event dataset. The experimental results show that our proposed method achieves improvement in terms of tracking accuracy and real-time performance when compared with the state-of-the-art asynchronous event-corner tracker and with no compromise on the feature tracking lifetime.
Collapse
Affiliation(s)
- Ruoxiang Li
- National University of Defense Technology, Changsha, China
| | - Dianxi Shi
- Artificial Intelligence Research Center (AIRC), National Innovation Institute of Defense Technology (NIIDT), Beijing, China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin, China
| | - Yongjun Zhang
- Artificial Intelligence Research Center (AIRC), National Innovation Institute of Defense Technology (NIIDT), Beijing, China
| | - Ruihao Li
- Artificial Intelligence Research Center (AIRC), National Innovation Institute of Defense Technology (NIIDT), Beijing, China
- Tianjin Artificial Intelligence Innovation Center (TAIIC), Tianjin, China
| | - Mingkun Wang
- National University of Defense Technology, Changsha, China
| |
Collapse
|
31
|
Rebecq H, Ranftl R, Koltun V, Scaramuzza D. High Speed and High Dynamic Range Video with an Event Camera. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:1964-1980. [PMID: 31902754 DOI: 10.1109/tpami.2019.2963386] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our quantitative experiments show that our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality ( ), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos ( frames per second) of high-speed phenomena (e.g., a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. As an additional contribution, we demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data. We release the reconstruction code, a pre-trained model and the datasets to enable further research.
Collapse
|
32
|
Tayarani-Najaran MH, Schmuker M. Event-Based Sensing and Signal Processing in the Visual, Auditory, and Olfactory Domain: A Review. Front Neural Circuits 2021; 15:610446. [PMID: 34135736 PMCID: PMC8203204 DOI: 10.3389/fncir.2021.610446] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
The nervous systems converts the physical quantities sensed by its primary receptors into trains of events that are then processed in the brain. The unmatched efficiency in information processing has long inspired engineers to seek brain-like approaches to sensing and signal processing. The key principle pursued in neuromorphic sensing is to shed the traditional approach of periodic sampling in favor of an event-driven scheme that mimicks sampling as it occurs in the nervous system, where events are preferably emitted upon the change of the sensed stimulus. In this paper we highlight the advantages and challenges of event-based sensing and signal processing in the visual, auditory and olfactory domains. We also provide a survey of the literature covering neuromorphic sensing and signal processing in all three modalities. Our aim is to facilitate research in event-based sensing and signal processing by providing a comprehensive overview of the research performed previously as well as highlighting conceptual advantages, current progress and future challenges in the field.
Collapse
Affiliation(s)
| | - Michael Schmuker
- School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield, United Kingdom
| |
Collapse
|
33
|
Parlevliet PP, Kanaev A, Hung CP, Schweiger A, Gregory FD, Benosman R, de Croon GCHE, Gutfreund Y, Lo CC, Moss CF. Autonomous Flying With Neuromorphic Sensing. Front Neurosci 2021; 15:672161. [PMID: 34054420 PMCID: PMC8160287 DOI: 10.3389/fnins.2021.672161] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 04/07/2021] [Indexed: 11/17/2022] Open
Abstract
Autonomous flight for large aircraft appears to be within our reach. However, launching autonomous systems for everyday missions still requires an immense interdisciplinary research effort supported by pointed policies and funding. We believe that concerted endeavors in the fields of neuroscience, mathematics, sensor physics, robotics, and computer science are needed to address remaining crucial scientific challenges. In this paper, we argue for a bio-inspired approach to solve autonomous flying challenges, outline the frontier of sensing, data processing, and flight control within a neuromorphic paradigm, and chart directions of research needed to achieve operational capabilities comparable to those we observe in nature. One central problem of neuromorphic computing is learning. In biological systems, learning is achieved by adaptive and relativistic information acquisition characterized by near-continuous information retrieval with variable rates and sparsity. This results in both energy and computational resource savings being an inspiration for autonomous systems. We consider pertinent features of insect, bat and bird flight behavior as examples to address various vital aspects of autonomous flight. Insects exhibit sophisticated flight dynamics with comparatively reduced complexity of the brain. They represent excellent objects for the study of navigation and flight control. Bats and birds enable more complex models of attention and point to the importance of active sensing for conducting more complex missions. The implementation of neuromorphic paradigms for autonomous flight will require fundamental changes in both traditional hardware and software. We provide recommendations for sensor hardware and processing algorithm development to enable energy efficient and computationally effective flight control.
Collapse
Affiliation(s)
| | - Andrey Kanaev
- U.S. Office of Naval Research Global, London, United Kingdom
| | - Chou P. Hung
- United States Army Research Laboratory, Aberdeen Proving Ground, Maryland, MD, United States
| | | | - Frederick D. Gregory
- U.S. Army Research Laboratory, London, United Kingdom
- Department of Bioengineering, Imperial College London, London, United Kingdom
| | - Ryad Benosman
- Institut de la Vision, INSERM UMRI S 968, Paris, France
- Biomedical Science Tower, University of Pittsburgh, Pittsburgh, PA, United States
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Guido C. H. E. de Croon
- Micro Air Vehicle Laboratory, Department of Control and Operations, Faculty of Aerospace Engineering, Delft University of Technology, Delft, Netherlands
| | - Yoram Gutfreund
- The Neuroethological lab, Department of Neurobiology, The Rappaport Institute for Biomedical Research, Technion – Israel Institute of Technology, Haifa, Israel
| | - Chung-Chuan Lo
- Brain Research Center/Institute of Systems Neuroscience, National Tsing Hua University, Hsinchu, Taiwan
| | - Cynthia F. Moss
- Laboratory of Comparative Neural Systems and Behavior, Department of Psychological and Brain Sciences, Neuroscience and Mechanical Engineering, Johns Hopkins University, Baltimore, MD, United States
| |
Collapse
|
34
|
Tschopp F, von Einem C, Cramariuc A, Hug D, Palmer AW, Siegwart R, Chli M, Nieto J. Hough$^2$Map – Iterative Event-Based Hough Transform for High-Speed Railway Mapping. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3061404] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
35
|
Jiang R, Wang Q, Shi S, Mou X, Chen S. Flow‐assisted visual tracking using event cameras. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2021. [DOI: 10.1049/cit2.12005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Rui Jiang
- CelePixel Technology Co. Ltd 71 Nanyang Drive Singapore638075
| | - Qinyi Wang
- CelePixel Technology Co. Ltd 71 Nanyang Drive Singapore638075
- School of Electrical and Electronic Engineering Nanyang Technological University Singapore639798
| | - Shunshun Shi
- CelePixel Technology Co. Ltd 71 Nanyang Drive Singapore638075
| | - Xiaozheng Mou
- CelePixel Technology Co. Ltd 71 Nanyang Drive Singapore638075
| | - Shoushun Chen
- CelePixel Technology Co. Ltd 71 Nanyang Drive Singapore638075
- School of Electrical and Electronic Engineering Nanyang Technological University Singapore639798
| |
Collapse
|
36
|
An Asynchronous Real-Time Corner Extraction and Tracking Algorithm for Event Camera. SENSORS 2021; 21:s21041475. [PMID: 33672510 PMCID: PMC7923767 DOI: 10.3390/s21041475] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Revised: 02/14/2021] [Accepted: 02/16/2021] [Indexed: 11/25/2022]
Abstract
Event cameras have many advantages over conventional frame-based cameras, such as high temporal resolution, low latency and high dynamic range. However, state-of-the-art event- based algorithms either require too much computation time or have poor accuracy performance. In this paper, we propose an asynchronous real-time corner extraction and tracking algorithm for an event camera. Our primary motivation focuses on enhancing the accuracy of corner detection and tracking while ensuring computational efficiency. Firstly, according to the polarities of the events, a simple yet effective filter is applied to construct two restrictive Surface of Active Events (SAEs), named as RSAE+ and RSAE−, which can accurately represent high contrast patterns; meanwhile it filters noises and redundant events. Afterwards, a new coarse-to-fine corner extractor is proposed to extract corner events efficiently and accurately. Finally, a space, time and velocity direction constrained data association method is presented to realize corner event tracking, and we associate a new arriving corner event with the latest active corner that satisfies the velocity direction constraint in its neighborhood. The experiments are run on a standard event camera dataset, and the experimental results indicate that our method achieves excellent corner detection and tracking performance. Moreover, the proposed method can process more than 4.5 million events per second, showing promising potential in real-time computer vision applications.
Collapse
|
37
|
Nagata J, Sekikawa Y, Aoki Y. Optical Flow Estimation by Matching Time Surface with Event-Based Cameras. SENSORS (BASEL, SWITZERLAND) 2021; 21:1150. [PMID: 33562162 PMCID: PMC7915966 DOI: 10.3390/s21041150] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 02/02/2021] [Accepted: 02/03/2021] [Indexed: 11/21/2022]
Abstract
In this work, we propose a novel method of estimating optical flow from event-based cameras by matching the time surface of events. The proposed loss function measures the timestamp consistency between the time surface formed by the latest timestamp of each pixel and the one that is slightly shifted in time. This makes it possible to estimate dense optical flows with high accuracy without restoring luminance or additional sensor information. In the experiment, we show that the gradient was more correct and the loss landscape was more stable than the variance loss in the motion compensation approach. In addition, we show that the optical flow can be estimated with high accuracy by optimization with L1 smoothness regularization using publicly available datasets.
Collapse
Affiliation(s)
- Jun Nagata
- Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan;
| | - Yusuke Sekikawa
- Denso IT Laboratory, 2-15-1, Shibuya, Shibuya-ku, Tokyo 150-0002, Japan;
| | - Yoshimitsu Aoki
- Department of Electronics and Electrical Engineering, Faculty of Science and Technology, Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan;
| |
Collapse
|
38
|
Evaluation of Event-Based Corner Detectors. J Imaging 2021; 7:jimaging7020025. [PMID: 34460624 PMCID: PMC8321277 DOI: 10.3390/jimaging7020025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 01/25/2021] [Accepted: 01/27/2021] [Indexed: 12/03/2022] Open
Abstract
Bio-inspired Event-Based (EB) cameras are a promising new technology that outperforms standard frame-based cameras in extreme lighted and fast moving scenes. Already, a number of EB corner detection techniques have been developed; however, the performance of these EB corner detectors has only been evaluated based on a few author-selected criteria rather than on a unified common basis, as proposed here. Moreover, their experimental conditions are mainly limited to less interesting operational regions of the EB camera (on which frame-based cameras can also operate), and some of the criteria, by definition, could not distinguish if the detector had any systematic bias. In this paper, we evaluate five of the seven existing EB corner detectors on a public dataset including extreme illumination conditions that have not been investigated before. Moreover, this evaluation is the first of its kind in terms of analysing not only such a high number of detectors, but also applying a unified procedure for all. Contrary to previous assessments, we employed both the intensity and trajectory information within the public dataset rather than only one of them. We show that a rigorous comparison among EB detectors can be performed without tedious manual labelling and even with challenging acquisition conditions. This study thus proposes the first standard unified EB corner evaluation procedure, which will enable better understanding of the underlying mechanisms of EB cameras and can therefore lead to more efficient EB corner detection techniques.
Collapse
|
39
|
Hadviger A, Marković I, Petrović I. Stereo dense depth tracking based on optical flow using frames and events. Adv Robot 2020. [DOI: 10.1080/01691864.2020.1821770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Antea Hadviger
- Laboratory for Autonomous Systems and Mobile Robotics (LAMOR), University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia
| | - Ivan Marković
- Laboratory for Autonomous Systems and Mobile Robotics (LAMOR), University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia
| | - Ivan Petrović
- Laboratory for Autonomous Systems and Mobile Robotics (LAMOR), University of Zagreb Faculty of Electrical Engineering and Computing, Zagreb, Croatia
| |
Collapse
|
40
|
Bi Y, Chadha A, Abbas A, Bourtsoulatze E, Andreopoulos Y. Graph-based Spatio-Temporal Feature Learning for Neuromorphic Vision Sensing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:9084-9098. [PMID: 32941136 DOI: 10.1109/tip.2020.3023597] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Neuromorphic vision sensing (NVS) devices represent visual information as sequences of asynchronous discrete events (a.k.a., "spikes") in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, feature representation for NVS is far behind its APS-based counterparts, resulting in lower performance in high-level computer vision tasks. To fully utilize its sparse and asynchronous nature, we propose a compact graph representation for NVS, which allows for end-to-end learning with graph convolution neural networks. We couple this with a novel end-to-end feature learning framework that accommodates both appearancebased and motion-based tasks. The core of our framework comprises a spatial feature learning module, which utilizes residual-graph convolutional neural networks (RG-CNN), for end-to-end learning of appearance-based features directly from graphs. We extend this with our proposed Graph2Grid block and temporal feature learning module for efficiently modelling temporal dependencies over multiple graphs and a long temporal extent. We show how our framework can be configured for object classification, action recognition and action similarity labeling. Importantly, our approach preserves the spatial and temporal coherence of spike events, while requiring less computation and memory. The experimental validation shows that our proposed framework outperforms all recent methods on standard datasets. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we introduce, evaluate and make available the American Sign Language letters (ASL-DVS), as well as human action dataset (UCF101-DVS, HMDB51-DVS and ASLAN-DVS).
Collapse
|
41
|
Paredes-Valles F, Scheper KYW, de Croon GCHE. Unsupervised Learning of a Hierarchical Spiking Neural Network for Optical Flow Estimation: From Events to Global Motion Perception. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2051-2064. [PMID: 30843817 DOI: 10.1109/tpami.2019.2903179] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The combination of spiking neural networks and event-based vision sensors holds the potential of highly efficient and high-bandwidth optical flow estimation. This paper presents the first hierarchical spiking architecture in which motion (direction and speed) selectivity emerges in an unsupervised fashion from the raw stimuli generated with an event-based camera. A novel adaptive neuron model and stable spike-timing-dependent plasticity formulation are at the core of this neural network governing its spike-based processing and learning, respectively. After convergence, the neural architecture exhibits the main properties of biological visual motion systems, namely feature extraction and local and global motion perception. Convolutional layers with input synapses characterized by single and multiple transmission delays are employed for feature and local motion perception, respectively; while global motion selectivity emerges in a final fully-connected layer. The proposed solution is validated using synthetic and real event sequences. Along with this paper, we provide the cuSNN library, a framework that enables GPU-accelerated simulations of large-scale spiking neural networks. Source code and samples are available at https://github.com/tudelft/cuSNN.
Collapse
|
42
|
Lin S, Xu F, Wang X, Yang W, Yu L. Efficient Spatial-Temporal Normalization of SAE Representation for Event Camera. IEEE Robot Autom Lett 2020. [DOI: 10.1109/lra.2020.2995332] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
43
|
Almatrafi M, Baldwin R, Aizawa K, Hirakawa K. Distance Surface for Event-Based Optical Flow. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:1547-1556. [PMID: 32305894 DOI: 10.1109/tpami.2020.2986748] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We propose DistSurf-OF, a novel optical flow method for neuromorphic cameras. Neuromorphic cameras (or event detection cameras) are an emerging sensor modality that makes use of dynamic vision sensors (DVS) to report asynchronously the log-intensity changes (called "events") exceeding a predefined threshold at each pixel. In absence of the intensity value at each pixel location, we introduce a notion of "distance surface"-the distance transform computed from the detected events-as a proxy for object texture. The distance surface is then used as an input to the intensity-based optical flow methods to recover the two dimensional pixel motion. Real sensor experiments verify that the proposed DistSurf-OF accurately estimates the angle and speed of each events.
Collapse
|
44
|
Deng Y, Li Y, Chen H. AMAE: Adaptive Motion-Agnostic Encoder for Event-Based Object Classification. IEEE Robot Autom Lett 2020. [DOI: 10.1109/lra.2020.3002480] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
45
|
D'Angelo G, Janotte E, Schoepe T, O'Keeffe J, Milde MB, Chicca E, Bartolozzi C. Event-Based Eccentric Motion Detection Exploiting Time Difference Encoding. Front Neurosci 2020; 14:451. [PMID: 32457575 PMCID: PMC7227134 DOI: 10.3389/fnins.2020.00451] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Accepted: 04/14/2020] [Indexed: 11/13/2022] Open
Abstract
Attentional selectivity tends to follow events considered as interesting stimuli. Indeed, the motion of visual stimuli present in the environment attract our attention and allow us to react and interact with our surroundings. Extracting relevant motion information from the environment presents a challenge with regards to the high information content of the visual input. In this work we propose a novel integration between an eccentric down-sampling of the visual field, taking inspiration from the varying size of receptive fields (RFs) in the mammalian retina, and the Spiking Elementary Motion Detector (sEMD) model. We characterize the system functionality with simulated data and real world data collected with bio-inspired event driven cameras, successfully implementing motion detection along the four cardinal directions and diagonally.
Collapse
Affiliation(s)
- Giulia D'Angelo
- Event Driven Perception for Robotics, Italian Institute of Technology, iCub Facility, Genoa, Italy
| | - Ella Janotte
- Faculty of Technology and Center of Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
| | - Thorben Schoepe
- Faculty of Technology and Center of Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
| | - James O'Keeffe
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Moritz B Milde
- International Centre for Neuromorphic Systems, The MARCS Institute, Western Sydney University, Sydney, NSW, Australia
| | - Elisabetta Chicca
- Faculty of Technology and Center of Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
| | - Chiara Bartolozzi
- Event Driven Perception for Robotics, Italian Institute of Technology, iCub Facility, Genoa, Italy
| |
Collapse
|
46
|
Maro JM, Ieng SH, Benosman R. Event-Based Gesture Recognition With Dynamic Background Suppression Using Smartphone Computational Capabilities. Front Neurosci 2020; 14:275. [PMID: 32327968 PMCID: PMC7160298 DOI: 10.3389/fnins.2020.00275] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 03/10/2020] [Indexed: 11/13/2022] Open
Abstract
In this paper, we introduce a framework for dynamic gesture recognition with background suppression operating on the output of a moving event-based camera. The system is developed to operate in real-time using only the computational capabilities of a mobile phone. It introduces a new development around the concept of time-surfaces. It also presents a novel event-based methodology to dynamically remove backgrounds that uses the high temporal resolution properties of event-based cameras. To our knowledge, this is the first Android event-based framework for vision-based recognition of dynamic gestures running on a smartphone without off-board processing. We assess the performances by considering several scenarios in both indoors and outdoors, for static and dynamic conditions, in uncontrolled lighting conditions. We also introduce a new event-based dataset for gesture recognition with static and dynamic backgrounds (made publicly available). The set of gestures has been selected following a clinical trial to allow human-machine interaction for the visually impaired and older adults. We finally report comparisons with prior work that addressed event-based gesture recognition reporting comparable results, without the use of advanced classification techniques nor power greedy hardware.
Collapse
Affiliation(s)
| | - Sio-Hoi Ieng
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
- CHNO des Quinze-Vingts, INSERM-DGOS CIC 1423, Paris, France
| | - Ryad Benosman
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
- CHNO des Quinze-Vingts, INSERM-DGOS CIC 1423, Paris, France
- Departments of Ophthalmology/ECE/BioE, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Computer Science, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
47
|
Bergner F, Dean-Leon E, Cheng G. Design and Realization of an Efficient Large-Area Event-Driven E-Skin. SENSORS (BASEL, SWITZERLAND) 2020; 20:E1965. [PMID: 32244511 PMCID: PMC7180917 DOI: 10.3390/s20071965] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 03/26/2020] [Accepted: 03/27/2020] [Indexed: 12/20/2022]
Abstract
The sense of touch enables us to safely interact and control our contacts with our surroundings. Many technical systems and applications could profit from a similar type of sense. Yet, despite the emergence of e-skin systems covering more extensive areas, large-area realizations of e-skin effectively boosting applications are still rare. Recent advancements have improved the deployability and robustness of e-skin systems laying the basis for their scalability. However, the upscaling of e-skin systems introduces yet another challenge-the challenge of handling a large amount of heterogeneous tactile information with complex spatial relations between sensing points. We targeted this challenge and proposed an event-driven approach for large-area skin systems. While our previous works focused on the implementation and the experimental validation of the approach, this work now provides the consolidated foundations for realizing, designing, and understanding large-area event-driven e-skin systems for effective applications. This work homogenizes the different perspectives on event-driven systems and assesses the applicability of existing event-driven implementations in large-area skin systems. Additionally, we provide novel guidelines for tuning the novelty-threshold of event generators. Overall, this work develops a systematic approach towards realizing a flexible event-driven information handling system on standard computer systems for large-scale e-skin with detailed descriptions on the effective design of event generators and decoders. All designs and guidelines are validated by outlining their impacts on our implementations, and by consolidating various experimental results. The resulting system design for e-skin systems is scalable, efficient, flexible, and capable of handling large amounts of information without customized hardware. The system provides the feasibility of complex large-area tactile applications, for instance in robotics.
Collapse
Affiliation(s)
- Florian Bergner
- Institute for Cognitive Systems (ICS), Technische Universität München, Arcisstraße 21, 80333 München, Germany; (E.D.-L.); (G.C.)
| | | | | |
Collapse
|
48
|
Abstract
Dynamic vision sensor (DVS) is a new type of image sensor, which has application prospects in the fields of automobiles and robots. Dynamic vision sensors are very different from traditional image sensors in terms of pixel principle and output data. Background activity (BA) in the data will affect image quality, but there is currently no unified indicator to evaluate the image quality of event streams. This paper proposes a method to eliminate background activity, and proposes a method and performance index for evaluating filter performance: noise in real (NIR) and real in noise (RIN). The lower the value, the better the filter. This evaluation method does not require fixed pattern generation equipment, and can also evaluate filter performance using natural images. Through comparative experiments of the three filters, the comprehensive performance of the method in this paper is optimal. This method reduces the bandwidth required for DVS data transmission, reduces the computational cost of target extraction, and provides the possibility for the application of DVS in more fields.
Collapse
|
49
|
Afshar S, Ralph N, Xu Y, Tapson J, van Schaik A, Cohen G. Event-Based Feature Extraction Using Adaptive Selection Thresholds. SENSORS 2020; 20:s20061600. [PMID: 32183052 PMCID: PMC7146588 DOI: 10.3390/s20061600] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 03/07/2020] [Accepted: 03/08/2020] [Indexed: 11/25/2022]
Abstract
Unsupervised feature extraction algorithms form one of the most important building blocks in machine learning systems. These algorithms are often adapted to the event-based domain to perform online learning in neuromorphic hardware. However, not designed for the purpose, such algorithms typically require significant simplification during implementation to meet hardware constraints, creating trade offs with performance. Furthermore, conventional feature extraction algorithms are not designed to generate useful intermediary signals which are valuable only in the context of neuromorphic hardware limitations. In this work a novel event-based feature extraction method is proposed that focuses on these issues. The algorithm operates via simple adaptive selection thresholds which allow a simpler implementation of network homeostasis than previous works by trading off a small amount of information loss in the form of missed events that fall outside the selection thresholds. The behavior of the selection thresholds and the output of the network as a whole are shown to provide uniquely useful signals indicating network weight convergence without the need to access network weights. A novel heuristic method for network size selection is proposed which makes use of noise events and their feature representations. The use of selection thresholds is shown to produce network activation patterns that predict classification accuracy allowing rapid evaluation and optimization of system parameters without the need to run back-end classifiers. The feature extraction method is tested on both the N-MNIST (Neuromorphic-MNIST) benchmarking dataset and a dataset of airplanes passing through the field of view. Multiple configurations with different classifiers are tested with the results quantifying the resultant performance gains at each processing stage.
Collapse
|
50
|
Chadha A, Andreopoulos Y. Improved Techniques for Adversarial Discriminative Domain Adaptation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:2622-2637. [PMID: 31714227 DOI: 10.1109/tip.2019.2950768] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Adversarial discriminative domain adaptation (ADDA) is an efficient framework for unsupervised domain adaptation in image classification, where the source and target domains are assumed to have the same classes, but no labels are available for the target domain. While ADDA has already achieved better training efficiency and competitive accuracy on image classification in comparison to other adversarial based methods, we investigate whether we can improve its performance with a new framework and new loss formulations. Following the framework of semi-supervised GANs, we first extend the discriminator output over the source classes, in order to model the joint distribution over domain and task. We thus leverage on the distribution over the source encoder posteriors (which is fixed during adversarial training) and propose maximum mean discrepancy (MMD) and reconstruction-based loss functions for aligning the target encoder distribution to the source domain. We compare and provide a comprehensive analysis of how our framework and loss formulations extend over simple multi-class extensions of ADDA and other discriminative variants of semi-supervised GANs. In addition, we introduce various forms of regularization for stabilizing training, including treating the discriminator as a denoising autoencoder and regularizing the target encoder with source examples to reduce overfitting under a contraction mapping (i.e., when the target per-class distributions are contracting during alignment with the source). Finally, we validate our framework on standard datasets like MNIST, USPS, SVHN, MNIST-M and Office-31. We additionally examine how the proposed framework benefits recognition problems based on sensing modalities that lack training data. This is realized by introducing and evaluating on a neuromorphic vision sensing (NVS) sign language recognition dataset, where the source domain constitutes emulated neuromorphic spike events converted from conventional pixel-based video and the target domain is experimental (real) spike events from an NVS camera. Our results on all datasets show that our proposal is both simple and efficient, as it competes or outperforms the state-of-the-art in unsupervised domain adaptation, such as DIFA and MCDDA, whilst offering lower complexity than other recent adversarial methods.
Collapse
|