1
|
Zhang Z, Chen S, Wang Z, Yang J. PlaneSeg: Building a Plug-In for Boosting Planar Region Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11486-11500. [PMID: 37027268 DOI: 10.1109/tnnls.2023.3262544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Existing methods in planar region segmentation suffer the problems of vague boundaries and failure to detect small-sized regions. To address these, this study presents an end-to-end framework, named PlaneSeg, which can be easily integrated into various plane segmentation models. Specifically, PlaneSeg contains three modules, namely, the edge feature extraction module, the multiscale module, and the resolution-adaptation module. First, the edge feature extraction module produces edge-aware feature maps for finer segmentation boundaries. The learned edge information acts as a constraint to mitigate inaccurate boundaries. Second, the multiscale module combines feature maps of different layers to harvest spatial and semantic information from planar objects. The multiformity of object information can help recognize small-sized objects to produce more accurate segmentation results. Third, the resolution-adaptation module fuses the feature maps produced by the two aforementioned modules. For this module, a pairwise feature fusion is adopted to resample the dropped pixels and extract more detailed features. Extensive experiments demonstrate that PlaneSeg outperforms other state-of-the-art approaches on three downstream tasks, including plane segmentation, 3-D plane reconstruction, and depth prediction. Code is available at https://github.com/nku-zhichengzhang/PlaneSeg.
Collapse
|
2
|
Guo G, Feng Y, Lv H, Zhao Y, Liu H, Bi G. Event-Guided Image Super-Resolution Reconstruction. SENSORS (BASEL, SWITZERLAND) 2023; 23:2155. [PMID: 36850751 PMCID: PMC9961231 DOI: 10.3390/s23042155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 02/06/2023] [Accepted: 02/10/2023] [Indexed: 06/18/2023]
Abstract
The event camera efficiently detects scene radiance changes and produces an asynchronous event stream with low latency, high dynamic range (HDR), high temporal resolution, and low power consumption. However, the large output data caused by the asynchronous imaging mechanism makes the increase in spatial resolution of the event camera limited. In this paper, we propose a novel event camera super-resolution (SR) network (EFSR-Net) based on a deep learning approach to address the problems of low spatial resolution and poor visualization of event cameras. The network model is capable of reconstructing high-resolution (HR) intensity images using event streams and active sensor pixel (APS) frame information. We design the coupled response blocks (CRB) in the network that are able of fusing the feature information of both data to achieve the recovery of detailed textures in the shadows of real images. We demonstrate that our method is able to reconstruct high-resolution intensity images with more details and less blurring in synthetic and real datasets, respectively. The proposed EFSR-Net can improve the peak signal-to-noise ratio (PSNR) metric by 1-2 dB compared with state-of-the-art methods.
Collapse
Affiliation(s)
- Guangsha Guo
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
- College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yang Feng
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Hengyi Lv
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Yuchen Zhao
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Hailong Liu
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| | - Guoling Bi
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China
| |
Collapse
|
3
|
Zheng Y, Yu Z, Wang S, Huang T. Spike-Based Motion Estimation for Object Tracking Through Bio-Inspired Unsupervised Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 32:335-349. [PMID: 37015554 DOI: 10.1109/tip.2022.3228168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Neuromorphic vision sensors, whose pixels output events/spikes asynchronously with a high temporal resolution according to the scene radiance change, are naturally appropriate for capturing high-speed motion in the scenes. However, how to utilize the events/spikes to smoothly track high-speed moving objects is still a challenging problem. Existing approaches either employ time-consuming iterative optimization, or require large amounts of labeled data to train the object detector. To this end, we propose a bio-inspired unsupervised learning framework, which takes advantage of the spatiotemporal information of events/spikes generated by neuromorphic vision sensors to capture the intrinsic motion patterns. Without off-line training, our models can filter the redundant signals with dynamic adaption module based on short-term plasticity, and extract the motion patterns with motion estimation module based on the spike-timing-dependent plasticity. Combined with the spatiotemporal and motion information of the filtered spike stream, the traditional DBSCAN clustering algorithm and Kalman filter can effectively track multiple targets in extreme scenes. We evaluate the proposed unsupervised framework for object detection and tracking tasks on synthetic data, publicly available event-based datasets, and spiking camera datasets. The experiment results show that the proposed model can robustly detect and smoothly track the moving targets on various challenging scenarios and outperforms state-of-the-art approaches.
Collapse
|
4
|
Nunes UM, Demiris Y. Robust Event-Based Vision Model Estimation by Dispersion Minimisation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9561-9573. [PMID: 34813470 DOI: 10.1109/tpami.2021.3130049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We propose a novel Dispersion Minimisation framework for event-based vision model estimation, with applications to optical flow and high-speed motion estimation. The framework extends previous event-based motion compensation algorithms by avoiding computing an optimisation score based on an explicit image-based representation, which provides three main benefits: i) The framework can be extended to perform incremental estimation, i.e., on an event-by-event basis. ii) Besides purely visual transformations in 2D, the framework can readily use additional information, e.g., by augmenting the events with depth, to estimate the parameters of motion models in higher dimensional spaces. iii) The optimisation complexity only depends on the number of events. We achieve this by modelling the event alignment according to candidate parameters and minimising the resultant dispersion, which is computed by a family of suitable entropy-based measures. Data whitening is also proposed as a simple and effective pre-processing step to make the framework's accuracy performance more robust, as well as other event-based motion-compensation methods. The framework is evaluated on several challenging motion estimation problems, including 6-DOF transformation, rotational motion, and optical flow estimation, achieving state-of-the-art performance.
Collapse
|
5
|
Ralph N, Joubert D, Jolley A, Afshar S, Tothill N, van Schaik A, Cohen G. Real-Time Event-Based Unsupervised Feature Consolidation and Tracking for Space Situational Awareness. Front Neurosci 2022; 16:821157. [PMID: 35600627 PMCID: PMC9120364 DOI: 10.3389/fnins.2022.821157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 04/04/2022] [Indexed: 11/19/2022] Open
Abstract
Earth orbit is a limited natural resource that hosts a vast range of vital space-based systems that support the international community's national, commercial and defence interests. This resource is rapidly becoming depleted with over-crowding in high demand orbital slots and a growing presence of space debris. We propose the Fast Iterative Extraction of Salient targets for Tracking Asynchronously (FIESTA) algorithm as a robust, real-time and reactive approach to optical Space Situational Awareness (SSA) using Event-Based Cameras (EBCs) to detect, localize, and track Resident Space Objects (RSOs) accurately and timely. We address the challenges of the asynchronous nature and high temporal resolution output of the EBC accurately, unsupervised and with few tune-able parameters using concepts established in the neuromorphic and conventional tracking literature. We show this algorithm is capable of highly accurate in-frame RSO velocity estimation and average sub-pixel localization in a simulated test environment to distinguish the capabilities of the EBC and optical setup from the proposed tracking system. This work is a fundamental step toward accurate end-to-end real-time optical event-based SSA, and developing the foundation for robust closed-form tracking evaluated using standardized tracking metrics.
Collapse
Affiliation(s)
- Nicholas Ralph
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
- *Correspondence: Nicholas Ralph
| | - Damien Joubert
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
| | - Andrew Jolley
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
- Air and Space Power Development Centre, Royal Australian Air Force, Canberra, ACT, Australia
| | - Saeed Afshar
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
| | - Nicholas Tothill
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
| | - André van Schaik
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
| | - Gregory Cohen
- International Centre for Neuromorphic Engineering, MARCS Institute for Brain Behaviour and Development, Western Sydney University, Werrington, NSW, Australia
| |
Collapse
|
6
|
Gallego G, Delbruck T, Orchard G, Bartolozzi C, Taba B, Censi A, Leutenegger S, Davison AJ, Conradt J, Daniilidis K, Scaramuzza D. Event-Based Vision: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:154-180. [PMID: 32750812 DOI: 10.1109/tpami.2020.3008413] [Citation(s) in RCA: 227] [Impact Index Per Article: 75.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of μs), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.
Collapse
|
7
|
Jiang R, Wang Q, Shi S, Mou X, Chen S. Flow‐assisted visual tracking using event cameras. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 2021. [DOI: 10.1049/cit2.12005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Rui Jiang
- CelePixel Technology Co. Ltd 71 Nanyang Drive Singapore638075
| | - Qinyi Wang
- CelePixel Technology Co. Ltd 71 Nanyang Drive Singapore638075
- School of Electrical and Electronic Engineering Nanyang Technological University Singapore639798
| | - Shunshun Shi
- CelePixel Technology Co. Ltd 71 Nanyang Drive Singapore638075
| | - Xiaozheng Mou
- CelePixel Technology Co. Ltd 71 Nanyang Drive Singapore638075
| | - Shoushun Chen
- CelePixel Technology Co. Ltd 71 Nanyang Drive Singapore638075
- School of Electrical and Electronic Engineering Nanyang Technological University Singapore639798
| |
Collapse
|
8
|
An Asynchronous Real-Time Corner Extraction and Tracking Algorithm for Event Camera. SENSORS 2021; 21:s21041475. [PMID: 33672510 PMCID: PMC7923767 DOI: 10.3390/s21041475] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Revised: 02/14/2021] [Accepted: 02/16/2021] [Indexed: 11/25/2022]
Abstract
Event cameras have many advantages over conventional frame-based cameras, such as high temporal resolution, low latency and high dynamic range. However, state-of-the-art event- based algorithms either require too much computation time or have poor accuracy performance. In this paper, we propose an asynchronous real-time corner extraction and tracking algorithm for an event camera. Our primary motivation focuses on enhancing the accuracy of corner detection and tracking while ensuring computational efficiency. Firstly, according to the polarities of the events, a simple yet effective filter is applied to construct two restrictive Surface of Active Events (SAEs), named as RSAE+ and RSAE−, which can accurately represent high contrast patterns; meanwhile it filters noises and redundant events. Afterwards, a new coarse-to-fine corner extractor is proposed to extract corner events efficiently and accurately. Finally, a space, time and velocity direction constrained data association method is presented to realize corner event tracking, and we associate a new arriving corner event with the latest active corner that satisfies the velocity direction constraint in its neighborhood. The experiments are run on a standard event camera dataset, and the experimental results indicate that our method achieves excellent corner detection and tracking performance. Moreover, the proposed method can process more than 4.5 million events per second, showing promising potential in real-time computer vision applications.
Collapse
|
9
|
Savran A, Bartolozzi C. Face Pose Alignment with Event Cameras. SENSORS (BASEL, SWITZERLAND) 2020; 20:E7079. [PMID: 33321842 PMCID: PMC7764104 DOI: 10.3390/s20247079] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 10/30/2020] [Accepted: 11/05/2020] [Indexed: 06/12/2023]
Abstract
Event camera (EC) emerges as a bio-inspired sensor which can be an alternative or complementary vision modality with the benefits of energy efficiency, high dynamic range, and high temporal resolution coupled with activity dependent sparse sensing. In this study we investigate with ECs the problem of face pose alignment, which is an essential pre-processing stage for facial processing pipelines. EC-based alignment can unlock all these benefits in facial applications, especially where motion and dynamics carry the most relevant information due to the temporal change event sensing. We specifically aim at efficient processing by developing a coarse alignment method to handle large pose variations in facial applications. For this purpose, we have prepared by multiple human annotations a dataset of extreme head rotations with varying motion intensity. We propose a motion detection based alignment approach in order to generate activity dependent pose-events that prevents unnecessary computations in the absence of pose change. The alignment is realized by cascaded regression of extremely randomized trees. Since EC sensors perform temporal differentiation, we characterize the performance of the alignment in terms of different levels of head movement speeds and face localization uncertainty ranges as well as face resolution and predictor complexity. Our method obtained 2.7% alignment failure on average, whereas annotator disagreement was 1%. The promising coarse alignment performance on EC sensor data together with a comprehensive analysis demonstrate the potential of ECs in facial applications.
Collapse
Affiliation(s)
- Arman Savran
- Department of Computer Engineering, Yasar University, 35100 Izmir, Turkey
| | - Chiara Bartolozzi
- Event-Driven Perception for Robotics, Istituto Italiano di Tecnologia, 16163 Genova, Italy;
| |
Collapse
|
10
|
Ramesh B, Yang H, Orchard G, Le Thi NA, Zhang S, Xiang C. DART: Distribution Aware Retinal Transform for Event-Based Cameras. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:2767-2780. [PMID: 31144625 DOI: 10.1109/tpami.2019.2919301] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We introduce a generic visual descriptor, termed as distribution aware retinal transform (DART), that encodes the structural context using log-polar grids for event cameras. The DART descriptor is applied to four different problems, namely object classification, tracking, detection and feature matching: (1) The DART features are directly employed as local descriptors in a bag-of-words classification framework and testing is carried out on four standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS, NCaltech-101); (2) Extending the classification system, tracking is demonstrated using two key novelties: (i) Statistical bootstrapping is leveraged with online learning for overcoming the low-sample problem during the one-shot learning of the tracker, (ii) Cyclical shifts are induced in the log-polar domain of the DART descriptor to achieve robustness to object scale and rotation variations; (3) To solve the long-term object tracking problem, an object detector is designed using the principle of cluster majority voting. The detection scheme is then combined with the tracker to result in a high intersection-over-union score with augmented ground truth annotations on the publicly available event camera dataset; (4) Finally, the event context encoded by DART greatly simplifies the feature correspondence problem, especially for spatio-temporal slices far apart in time, which has not been explicitly tackled in the event-based vision domain.
Collapse
|
11
|
Lenz G, Ieng SH, Benosman R. Event-Based Face Detection and Tracking Using the Dynamics of Eye Blinks. Front Neurosci 2020; 14:587. [PMID: 32848527 PMCID: PMC7397845 DOI: 10.3389/fnins.2020.00587] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Accepted: 05/12/2020] [Indexed: 12/02/2022] Open
Abstract
We present the first purely event-based method for face detection using the high temporal resolution properties of an event-based camera to detect the presence of a face in a scene using eye blinks. Eye blinks are a unique and stable natural dynamic temporal signature of human faces across population that can be fully captured by event-based sensors. We show that eye blinks have a unique temporal signature over time that can be easily detected by correlating the acquired local activity with a generic temporal model of eye blinks that has been generated from a wide population of users. In a second stage once a face has been located it becomes possible to apply a probabilistic framework to track its spatial location for each incoming event while using eye blinks to correct for drift and tracking errors. Results are shown for several indoor and outdoor experiments. We also release an annotated data set that can be used for future work on the topic.
Collapse
Affiliation(s)
- Gregor Lenz
- INSERM UMRI S 968, Sorbonne Université, UPMC Univ. Paris, UMRS 968, Paris, France
- CNRS, UMR 7210, Institut de la Vision, Paris, France
| | - Sio-Hoi Ieng
- INSERM UMRI S 968, Sorbonne Université, UPMC Univ. Paris, UMRS 968, Paris, France
- CNRS, UMR 7210, Institut de la Vision, Paris, France
| | - Ryad Benosman
- INSERM UMRI S 968, Sorbonne Université, UPMC Univ. Paris, UMRS 968, Paris, France
- CNRS, UMR 7210, Institut de la Vision, Paris, France
- Departments of Ophthalmology/ECE/BioE, University of Pittsburgh, Pittsburgh, PA, United States
- Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
12
|
Maro JM, Ieng SH, Benosman R. Event-Based Gesture Recognition With Dynamic Background Suppression Using Smartphone Computational Capabilities. Front Neurosci 2020; 14:275. [PMID: 32327968 PMCID: PMC7160298 DOI: 10.3389/fnins.2020.00275] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 03/10/2020] [Indexed: 11/13/2022] Open
Abstract
In this paper, we introduce a framework for dynamic gesture recognition with background suppression operating on the output of a moving event-based camera. The system is developed to operate in real-time using only the computational capabilities of a mobile phone. It introduces a new development around the concept of time-surfaces. It also presents a novel event-based methodology to dynamically remove backgrounds that uses the high temporal resolution properties of event-based cameras. To our knowledge, this is the first Android event-based framework for vision-based recognition of dynamic gestures running on a smartphone without off-board processing. We assess the performances by considering several scenarios in both indoors and outdoors, for static and dynamic conditions, in uncontrolled lighting conditions. We also introduce a new event-based dataset for gesture recognition with static and dynamic backgrounds (made publicly available). The set of gestures has been selected following a clinical trial to allow human-machine interaction for the visually impaired and older adults. We finally report comparisons with prior work that addressed event-based gesture recognition reporting comparable results, without the use of advanced classification techniques nor power greedy hardware.
Collapse
Affiliation(s)
| | - Sio-Hoi Ieng
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
- CHNO des Quinze-Vingts, INSERM-DGOS CIC 1423, Paris, France
| | - Ryad Benosman
- Sorbonne Université, INSERM, CNRS, Institut de la Vision, Paris, France
- CHNO des Quinze-Vingts, INSERM-DGOS CIC 1423, Paris, France
- Departments of Ophthalmology/ECE/BioE, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Computer Science, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
13
|
Seifozzakerini S, Yau WY, Mao K, Nejati H. Hough Transform Implementation For Event-Based Systems: Concepts and Challenges. Front Comput Neurosci 2018; 12:103. [PMID: 30622466 PMCID: PMC6308381 DOI: 10.3389/fncom.2018.00103] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2018] [Accepted: 12/05/2018] [Indexed: 11/13/2022] Open
Abstract
Hough transform (HT) is one of the most well-known techniques in computer vision that has been the basis of many practical image processing algorithms. HT however is designed to work for frame-based systems such as conventional digital cameras. Recently, event-based systems such as Dynamic Vision Sensor (DVS) cameras, has become popular among researchers. Event-based cameras have a significantly high temporal resolution (1 μs), but each pixel can only detect change and not color. As such, the conventional image processing algorithms cannot be readily applied to event-based output streams. Therefore, it is necessary to adapt the conventional image processing algorithms for event-based cameras. This paper provides a systematic explanation, starting from extending conventional HT to 3D HT, adaptation to event-based systems, and the implementation of the 3D HT using Spiking Neural Networks (SNNs). Using SNN enables the proposed solution to be easily realized on hardware using FPGA, without requiring CPU or additional memory. In addition, we also discuss techniques for optimal SNN-based implementation using efficient number of neurons for the required accuracy and resolution along each dimension, without increasing the overall computational complexity. We hope that this will help to reduce the gap between event-based and frame-based systems.
Collapse
Affiliation(s)
- Sajjad Seifozzakerini
- Institute for Infocomm Research, Agency for Science, Technology and Research (ASTAR), Singapore, Singapore.,School of Electrical and Electronic Engineering, Nanyang Technological University (NTU), Singapore, Singapore
| | - Wei-Yun Yau
- Institute for Infocomm Research, Agency for Science, Technology and Research (ASTAR), Singapore, Singapore
| | - Kezhi Mao
- School of Electrical and Electronic Engineering, Nanyang Technological University (NTU), Singapore, Singapore
| | - Hossein Nejati
- Information Systems Technology and Design (ISTD), Singapore University of Technology and Design (SUTD), Singapore, Singapore
| |
Collapse
|
14
|
Alzugaray I, Chli M. Asynchronous Corner Detection and Tracking for Event Cameras in Real Time. IEEE Robot Autom Lett 2018. [DOI: 10.1109/lra.2018.2849882] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
15
|
Camunas-Mesa LA, Serrano-Gotarredona T, Ieng SH, Benosman R, Linares-Barranco B. Event-Driven Stereo Visual Tracking Algorithm to Solve Object Occlusion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4223-4237. [PMID: 29989974 DOI: 10.1109/tnnls.2017.2759326] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Object tracking is a major problem for many computer vision applications, but it continues to be computationally expensive. The use of bio-inspired neuromorphic event-driven dynamic vision sensors (DVSs) has heralded new methods for vision processing, exploiting reduced amount of data and very precise timing resolutions. Previous studies have shown these neural spiking sensors to be well suited to implementing single-sensor object tracking systems, although they experience difficulties when solving ambiguities caused by object occlusion. DVSs have also performed well in 3-D reconstruction in which event matching techniques are applied in stereo setups. In this paper, we propose a new event-driven stereo object tracking algorithm that simultaneously integrates 3-D reconstruction and cluster tracking, introducing feedback information in both tasks to improve their respective performances. This algorithm, inspired by human vision, identifies objects and learns their position and size in order to solve ambiguities. This strategy has been validated in four different experiments where the 3-D positions of two objects were tracked in a stereo setup even when occlusion occurred. The objects studied in the experiments were: 1) two swinging pens, the distance between which during movement was measured with an error of less than 0.5%; 2) a pen and a box, to confirm the correctness of the results obtained with a more complex object; 3) two straws attached to a fan and rotating at 6 revolutions per second, to demonstrate the high-speed capabilities of this approach; and 4) two people walking in a real-world environment.
Collapse
|
16
|
Ieng SH, Lehtonen E, Benosman R. Complexity Analysis of Iterative Basis Transformations Applied to Event-Based Signals. Front Neurosci 2018; 12:373. [PMID: 29946231 PMCID: PMC6006676 DOI: 10.3389/fnins.2018.00373] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 05/14/2018] [Indexed: 11/30/2022] Open
Abstract
This paper introduces an event-based methodology to perform arbitrary linear basis transformations that encompass a broad range of practically important signal transforms, such as the discrete Fourier transform (DFT) and the discrete wavelet transform (DWT). We present a complexity analysis of the proposed method, and show that the amount of required multiply-and-accumulate operations is reduced in comparison to frame-based method in natural video sequences, when the required temporal resolution is high enough. Experimental results on natural video sequences acquired by the asynchronous time-based neuromorphic image sensor (ATIS) are provided to support the feasibility of the method, and to illustrate the gain in computation resources.
Collapse
Affiliation(s)
- Sio-Hoi Ieng
- INSERM UMRI S 968, Sorbonne Universites, UPMC Univ Paris 06, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
- *Correspondence: Sio-Hoi Ieng
| | - Eero Lehtonen
- Department of Future Technologies, University of Turku, Turku, Finland
| | - Ryad Benosman
- INSERM UMRI S 968, Sorbonne Universites, UPMC Univ Paris 06, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| |
Collapse
|
17
|
Marcireau A, Ieng SH, Simon-Chane C, Benosman RB. Event-Based Color Segmentation With a High Dynamic Range Sensor. Front Neurosci 2018; 12:135. [PMID: 29695948 PMCID: PMC5904265 DOI: 10.3389/fnins.2018.00135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 02/20/2018] [Indexed: 12/01/2022] Open
Abstract
This paper introduces a color asynchronous neuromorphic event-based camera and a methodology to process color output from the device to perform color segmentation and tracking at the native temporal resolution of the sensor (down to one microsecond). Our color vision sensor prototype is a combination of three Asynchronous Time-based Image Sensors, sensitive to absolute color information. We devise a color processing algorithm leveraging this information. It is designed to be computationally cheap, thus showing how low level processing benefits from asynchronous acquisition and high temporal resolution data. The resulting color segmentation and tracking performance is assessed both with an indoor controlled scene and two outdoor uncontrolled scenes. The tracking's mean error to the ground truth for the objects of the outdoor scenes ranges from two to twenty pixels.
Collapse
Affiliation(s)
- Alexandre Marcireau
- Institut National de la Santé et de la Recherche Médicale, UMRI S 968, Sorbonne Universites, UPMC Univ Paris 06, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| | - Sio-Hoi Ieng
- Institut National de la Santé et de la Recherche Médicale, UMRI S 968, Sorbonne Universites, UPMC Univ Paris 06, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| | - Camille Simon-Chane
- Institut National de la Santé et de la Recherche Médicale, UMRI S 968, Sorbonne Universites, UPMC Univ Paris 06, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| | - Ryad B Benosman
- Institut National de la Santé et de la Recherche Médicale, UMRI S 968, Sorbonne Universites, UPMC Univ Paris 06, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| |
Collapse
|
18
|
Mishra A, Ghosh R, Principe JC, Thakor NV, Kukreja SL. A Saccade Based Framework for Real-Time Motion Segmentation Using Event Based Vision Sensors. Front Neurosci 2017; 11:83. [PMID: 28316563 PMCID: PMC5334512 DOI: 10.3389/fnins.2017.00083] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Accepted: 02/06/2017] [Indexed: 11/25/2022] Open
Abstract
Motion segmentation is a critical pre-processing step for autonomous robotic systems to facilitate tracking of moving objects in cluttered environments. Event based sensors are low power analog devices that represent a scene by means of asynchronous information updates of only the dynamic details at high temporal resolution and, hence, require significantly less calculations. However, motion segmentation using spatiotemporal data is a challenging task due to data asynchrony. Prior approaches for object tracking using neuromorphic sensors perform well while the sensor is static or a known model of the object to be followed is available. To address these limitations, in this paper we develop a technique for generalized motion segmentation based on spatial statistics across time frames. First, we create micromotion on the platform to facilitate the separation of static and dynamic elements of a scene, inspired by human saccadic eye movements. Second, we introduce the concept of spike-groups as a methodology to partition spatio-temporal event groups, which facilitates computation of scene statistics and characterize objects in it. Experimental results show that our algorithm is able to classify dynamic objects with a moving camera with maximum accuracy of 92%.
Collapse
Affiliation(s)
- Abhishek Mishra
- Singapore Institute for Neurotechnology, National University of Singapore Singapore, Singapore
| | - Rohan Ghosh
- Singapore Institute for Neurotechnology, National University of Singapore Singapore, Singapore
| | - Jose C Principe
- Department of Electrical and Computer Engineering, University of Florida Gainesville, FL, USA
| | - Nitish V Thakor
- Singapore Institute for Neurotechnology, National University of SingaporeSingapore, Singapore; Biomedical Engineering Department, Johns Hopkins UniversityBaltimore, MD, USA
| | - Sunil L Kukreja
- Singapore Institute for Neurotechnology, National University of Singapore Singapore, Singapore
| |
Collapse
|
19
|
Clady X, Maro JM, Barré S, Benosman RB. A Motion-Based Feature for Event-Based Pattern Recognition. Front Neurosci 2017; 10:594. [PMID: 28101001 PMCID: PMC5209354 DOI: 10.3389/fnins.2016.00594] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Accepted: 12/13/2016] [Indexed: 11/13/2022] Open
Abstract
This paper introduces an event-based luminance-free feature from the output of asynchronous event-based neuromorphic retinas. The feature consists in mapping the distribution of the optical flow along the contours of the moving objects in the visual scene into a matrix. Asynchronous event-based neuromorphic retinas are composed of autonomous pixels, each of them asynchronously generating "spiking" events that encode relative changes in pixels' illumination at high temporal resolutions. The optical flow is computed at each event, and is integrated locally or globally in a speed and direction coordinate frame based grid, using speed-tuned temporal kernels. The latter ensures that the resulting feature equitably represents the distribution of the normal motion along the current moving edges, whatever their respective dynamics. The usefulness and the generality of the proposed feature are demonstrated in pattern recognition applications: local corner detection and global gesture recognition.
Collapse
Affiliation(s)
- Xavier Clady
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| | - Jean-Matthieu Maro
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| | - Sébastien Barré
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| | - Ryad B Benosman
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| |
Collapse
|
20
|
Wang H, Xu J, Gao Z, Lu C, Yao S, Ma J. An Event-Based Neurobiological Recognition System with Orientation Detector for Objects in Multiple Orientations. Front Neurosci 2016; 10:498. [PMID: 27867346 PMCID: PMC5095131 DOI: 10.3389/fnins.2016.00498] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 10/19/2016] [Indexed: 11/24/2022] Open
Abstract
A new multiple orientation event-based neurobiological recognition system is proposed by integrating recognition and tracking function in this paper, which is used for asynchronous address-event representation (AER) image sensors. The characteristic of this system has been enriched to recognize the objects in multiple orientations with only training samples moving in a single orientation. The system extracts multi-scale and multi-orientation line features inspired by models of the primate visual cortex. An orientation detector based on modified Gaussian blob tracking algorithm is introduced for object tracking and orientation detection. The orientation detector and feature extraction block work in simultaneous mode, without any increase in categorization time. An addresses lookup table (addresses LUT) is also presented to adjust the feature maps by addresses mapping and reordering, and they are categorized in the trained spiking neural network. This recognition system is evaluated with the MNIST dataset which have played important roles in the development of computer vision, and the accuracy is increased owing to the use of both ON and OFF events. AER data acquired by a dynamic vision senses (DVS) are also tested on the system, such as moving digits, pokers, and vehicles. The experimental results show that the proposed system can realize event-based multi-orientation recognition. The work presented in this paper makes a number of contributions to the event-based vision processing system for multi-orientation object recognition. It develops a new tracking-recognition architecture to feedforward categorization system and an address reorder approach to classify multi-orientation objects using event-based data. It provides a new way to recognize multiple orientation objects with only samples in single orientation.
Collapse
Affiliation(s)
- Hanyu Wang
- School of Electronic Information Engineering, Tianjin University Tianjin, China
| | - Jiangtao Xu
- School of Electronic Information Engineering, Tianjin University Tianjin, China
| | - Zhiyuan Gao
- School of Electronic Information Engineering, Tianjin University Tianjin, China
| | - Chengye Lu
- School of Electronic Information Engineering, Tianjin University Tianjin, China
| | - Suying Yao
- School of Electronic Information Engineering, Tianjin University Tianjin, China
| | - Jianguo Ma
- School of Electronic Information Engineering, Tianjin University Tianjin, China
| |
Collapse
|
21
|
Reverter Valeiras D, Kime S, Ieng SH, Benosman RB. An Event-Based Solution to the Perspective-n-Point Problem. Front Neurosci 2016; 10:208. [PMID: 27242412 PMCID: PMC4870282 DOI: 10.3389/fnins.2016.00208] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 04/25/2016] [Indexed: 11/13/2022] Open
Abstract
The goal of the Perspective-n-Point problem (PnP) is to find the relative pose between an object and a camera from a set of n pairings between 3D points and their corresponding 2D projections on the focal plane. Current state of the art solutions, designed to operate on images, rely on computationally expensive minimization techniques. For the first time, this work introduces an event-based PnP algorithm designed to work on the output of a neuromorphic event-based vision sensor. The problem is formulated here as a least-squares minimization problem, where the error function is updated with every incoming event. The optimal translation is then computed in closed form, while the desired rotation is given by the evolution of a virtual mechanical system whose energy is proven to be equal to the error function. This allows for a simple yet robust solution of the problem, showing how event-based vision can simplify computer vision tasks. The approach takes full advantage of the high temporal resolution of the sensor, as the estimated pose is incrementally updated with every incoming event. Two approaches are proposed: the Full and the Efficient methods. These two methods are compared against a state of the art PnP algorithm both on synthetic and on real data, producing similar accuracy in addition of being faster.
Collapse
Affiliation(s)
- David Reverter Valeiras
- Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC Université Paris 06 Paris, France
| | - Sihem Kime
- Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC Université Paris 06 Paris, France
| | - Sio-Hoi Ieng
- Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC Université Paris 06 Paris, France
| | - Ryad Benjamin Benosman
- Centre National de la Recherche Scientifique, Institut National de la Santé et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC Université Paris 06 Paris, France
| |
Collapse
|
22
|
Reverter Valeiras D, Orchard G, Ieng SH, Benosman RB. Neuromorphic Event-Based 3D Pose Estimation. Front Neurosci 2016; 9:522. [PMID: 26834547 PMCID: PMC4722112 DOI: 10.3389/fnins.2015.00522] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 12/24/2015] [Indexed: 11/13/2022] Open
Abstract
Pose estimation is a fundamental step in many artificial vision tasks. It consists of estimating the 3D pose of an object with respect to a camera from the object's 2D projection. Current state of the art implementations operate on images. These implementations are computationally expensive, especially for real-time applications. Scenes with fast dynamics exceeding 30-60 Hz can rarely be processed in real-time using conventional hardware. This paper presents a new method for event-based 3D object pose estimation, making full use of the high temporal resolution (1 μs) of asynchronous visual events output from a single neuromorphic camera. Given an initial estimate of the pose, each incoming event is used to update the pose by combining both 3D and 2D criteria. We show that the asynchronous high temporal resolution of the neuromorphic camera allows us to solve the problem in an incremental manner, achieving real-time performance at an update rate of several hundreds kHz on a conventional laptop. We show that the high temporal resolution of neuromorphic cameras is a key feature for performing accurate pose estimation. Experiments are provided showing the performance of the algorithm on real data, including fast moving objects, occlusions, and cases where the neuromorphic camera and the object are both in motion.
Collapse
Affiliation(s)
| | | | - Sio-Hoi Ieng
- Natural Vision and Computation Team, Institut de la Vision Paris, France
| | - Ryad B Benosman
- Natural Vision and Computation Team, Institut de la Vision Paris, France
| |
Collapse
|