1
|
Schmid D, Jarvers C, Neumann H. Canonical circuit computations for computer vision. BIOLOGICAL CYBERNETICS 2023; 117:299-329. [PMID: 37306782 PMCID: PMC10600314 DOI: 10.1007/s00422-023-00966-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 05/18/2023] [Indexed: 06/13/2023]
Abstract
Advanced computer vision mechanisms have been inspired by neuroscientific findings. However, with the focus on improving benchmark achievements, technical solutions have been shaped by application and engineering constraints. This includes the training of neural networks which led to the development of feature detectors optimally suited to the application domain. However, the limitations of such approaches motivate the need to identify computational principles, or motifs, in biological vision that can enable further foundational advances in machine vision. We propose to utilize structural and functional principles of neural systems that have been largely overlooked. They potentially provide new inspirations for computer vision mechanisms and models. Recurrent feedforward, lateral, and feedback interactions characterize general principles underlying processing in mammals. We derive a formal specification of core computational motifs that utilize these principles. These are combined to define model mechanisms for visual shape and motion processing. We demonstrate how such a framework can be adopted to run on neuromorphic brain-inspired hardware platforms and can be extended to automatically adapt to environment statistics. We argue that the identified principles and their formalization inspires sophisticated computational mechanisms with improved explanatory scope. These and other elaborated, biologically inspired models can be employed to design computer vision solutions for different tasks and they can be used to advance neural network architectures of learning.
Collapse
Affiliation(s)
- Daniel Schmid
- Institute for Neural Information Processing, Ulm University, James-Franck-Ring, Ulm, 89081 Germany
| | - Christian Jarvers
- Institute for Neural Information Processing, Ulm University, James-Franck-Ring, Ulm, 89081 Germany
| | - Heiko Neumann
- Institute for Neural Information Processing, Ulm University, James-Franck-Ring, Ulm, 89081 Germany
| |
Collapse
|
2
|
Xie M, Lai T, Fang Y. A New Principle toward Robust Matching in Human-like Stereovision. Biomimetics (Basel) 2023; 8:285. [PMID: 37504173 PMCID: PMC10807409 DOI: 10.3390/biomimetics8030285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 06/27/2023] [Accepted: 06/29/2023] [Indexed: 07/29/2023] Open
Abstract
Visual signals are the upmost important source for robots, vehicles or machines to achieve human-like intelligence. Human beings heavily depend on binocular vision to understand the dynamically changing world. Similarly, intelligent robots or machines must also have the innate capabilities of perceiving knowledge from visual signals. Until today, one of the biggest challenges faced by intelligent robots or machines is the matching in stereovision. In this paper, we present the details of a new principle toward achieving a robust matching solution which leverages on the use and integration of top-down image sampling strategy, hybrid feature extraction, and Restricted Coulomb Energy (RCE) neural network for incremental learning (i.e., cognition) as well as robust match-maker (i.e., recognition). A preliminary version of the proposed solution has been implemented and tested with data from Maritime RobotX Challenge. The contribution of this paper is to attract more research interest and effort toward this new direction which may eventually lead to the development of robust solutions expected by future stereovision systems in intelligent robots, vehicles, and machines.
Collapse
Affiliation(s)
- Ming Xie
- School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore 639798, Singapore; (T.L.); (Y.F.)
| | | | | |
Collapse
|
3
|
Gallego G, Delbruck T, Orchard G, Bartolozzi C, Taba B, Censi A, Leutenegger S, Davison AJ, Conradt J, Daniilidis K, Scaramuzza D. Event-Based Vision: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:154-180. [PMID: 32750812 DOI: 10.1109/tpami.2020.3008413] [Citation(s) in RCA: 227] [Impact Index Per Article: 75.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of μs), very high dynamic range (140 dB versus 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.
Collapse
|
4
|
|
5
|
|
6
|
Ge Z, Gao Y, So HKH, Lam EY. Event-based laser speckle correlation for micro motion estimation. OPTICS LETTERS 2021; 46:3885-3888. [PMID: 34388766 DOI: 10.1364/ol.430419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/13/2021] [Indexed: 06/13/2023]
Abstract
Micro motion estimation has important applications in various fields such as microfluidic particle detection and biomedical cell imaging. Conventional methods analyze the motion from intensity images captured using frame-based imaging sensors such as the complementary metal-oxide semiconductor (CMOS) and the charge-coupled device (CCD). Recently, event-based sensors have evolved with the special capability to record asynchronous light changes with high dynamic range, high temporal resolution, low latency, and no motion blur. In this Letter, we explore the potential of using the event sensor to estimate the micro motion based on the laser speckle correlation technique.
Collapse
|
7
|
Tayarani-Najaran MH, Schmuker M. Event-Based Sensing and Signal Processing in the Visual, Auditory, and Olfactory Domain: A Review. Front Neural Circuits 2021; 15:610446. [PMID: 34135736 PMCID: PMC8203204 DOI: 10.3389/fncir.2021.610446] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
The nervous systems converts the physical quantities sensed by its primary receptors into trains of events that are then processed in the brain. The unmatched efficiency in information processing has long inspired engineers to seek brain-like approaches to sensing and signal processing. The key principle pursued in neuromorphic sensing is to shed the traditional approach of periodic sampling in favor of an event-driven scheme that mimicks sampling as it occurs in the nervous system, where events are preferably emitted upon the change of the sensed stimulus. In this paper we highlight the advantages and challenges of event-based sensing and signal processing in the visual, auditory and olfactory domains. We also provide a survey of the literature covering neuromorphic sensing and signal processing in all three modalities. Our aim is to facilitate research in event-based sensing and signal processing by providing a comprehensive overview of the research performed previously as well as highlighting conceptual advantages, current progress and future challenges in the field.
Collapse
Affiliation(s)
| | - Michael Schmuker
- School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield, United Kingdom
| |
Collapse
|
8
|
Method of Using RealSense Camera to Estimate the Depth Map of Any Monocular Camera. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING 2021. [DOI: 10.1155/2021/9152035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Robot detection, recognition, positioning, and other applications require not only real-time video image information but also the distance from the target to the camera, that is, depth information. This paper proposes a method to automatically generate any monocular camera depth map based on RealSense camera data. By using this method, any current single-camera detection system can be upgraded online. Without changing the original system, the depth information of the original monocular camera can be obtained simply, and the transition from 2D detection to 3D detection can be realized. In order to verify the effectiveness of the proposed method, a hardware system was constructed using the Micro-vision RS-A14K-GC8 industrial camera and the Intel RealSense D415 depth camera, and the depth map fitting algorithm proposed in this paper was used to test the system. The results show that, except for a few depth-missing areas, the results of other areas with depth are still good, which can basically describe the distance difference between the target and the camera. In addition, in order to verify the scalability of the method, a new hardware system was constructed with different cameras, and images were collected in a complex farmland environment. The generated depth map was good, which could basically describe the distance difference between the target and the camera.
Collapse
|
9
|
Wang G, Zhang C, Chen X, Ji X, Xue JH, Wang H. Bi-Stream Pose-Guided Region Ensemble Network for Fingertip Localization From Stereo Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:5153-5165. [PMID: 32070999 DOI: 10.1109/tnnls.2020.2964037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In human-computer interaction, it is important to accurately estimate the hand pose, especially fingertips. However, traditional approaches to fingertip localization mainly rely on depth images and thus suffer considerably from noise and missing values. Instead of depth images, stereo images can also provide 3-D information of hands. There are nevertheless limitations on the dataset size, global viewpoints, hand articulations, and hand shapes in publicly available stereo-based hand pose datasets. To mitigate these limitations and promote further research on hand pose estimation from stereo images, we build a new large-scale binocular hand pose dataset called THU-Bi-Hand, offering a new perspective for fingertip localization. In the THU-Bi-Hand dataset, there are 447k pairs of stereo images of different hand shapes from ten subjects with accurate 3-D location annotations of the wrist and five fingertips. Captured with minimal restriction on the range of hand motion, the dataset covers a large global viewpoint space and hand articulation space. To better present the performance of fingertip localization on THU-Bi-Hand, we propose a novel scheme termed bi-stream pose-guided region ensemble network (Bi-Pose-REN). It extracts more representative feature regions around joints in the feature maps under the guidance of the previously estimated pose. The feature regions are integrated hierarchically according to the topology of hand joints to regress a refined hand pose. Bi-Pose-REN and several existing methods are evaluated on THU-Bi-Hand so that benchmarks are provided for further research. Experimental results show that our Bi-Pose-REN has achieved the best performance on THU-Bi-Hand.
Collapse
|
10
|
Liu D, Bellotto N, Yue S. Deep Spiking Neural Network for Video-Based Disguise Face Recognition Based on Dynamic Facial Movements. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:1843-1855. [PMID: 31329135 DOI: 10.1109/tnnls.2019.2927274] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
With the increasing popularity of social media and smart devices, the face as one of the key biometrics becomes vital for person identification. Among those face recognition algorithms, video-based face recognition methods could make use of both temporal and spatial information just as humans do to achieve better classification performance. However, they cannot identify individuals when certain key facial areas, such as eyes or nose, are disguised by heavy makeup or rubber/digital masks. To this end, we propose a novel deep spiking neural network architecture in this paper. It takes dynamic facial movements, the facial muscle changes induced by speaking or other activities, as the sole input. An event-driven continuous spike-timing-dependent plasticity learning rule with adaptive thresholding is applied to train the synaptic weights. The experiments on our proposed video-based disguise face database (MakeFace DB) demonstrate that the proposed learning method performs very well, i.e., it achieves from 95% to 100% correct classification rates under various realistic experimental scenarios.
Collapse
|
11
|
Towards spike-based machine intelligence with neuromorphic computing. Nature 2019; 575:607-617. [PMID: 31776490 DOI: 10.1038/s41586-019-1677-2] [Citation(s) in RCA: 412] [Impact Index Per Article: 68.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 07/09/2019] [Indexed: 11/08/2022]
Abstract
Guided by brain-like 'spiking' computational frameworks, neuromorphic computing-brain-inspired computing for machine intelligence-promises to realize artificial intelligence while reducing the energy requirements of computing platforms. This interdisciplinary field began with the implementation of silicon circuits for biological neural routines, but has evolved to encompass the hardware implementation of algorithms with spike-based encoding and event-driven representations. Here we provide an overview of the developments in neuromorphic computing for both algorithms and hardware and highlight the fundamentals of learning and hardware frameworks. We discuss the main challenges and the future prospects of neuromorphic computing, with emphasis on algorithm-hardware codesign.
Collapse
|
12
|
Steffen L, Reichard D, Weinland J, Kaiser J, Roennau A, Dillmann R. Neuromorphic Stereo Vision: A Survey of Bio-Inspired Sensors and Algorithms. Front Neurorobot 2019; 13:28. [PMID: 31191287 PMCID: PMC6546825 DOI: 10.3389/fnbot.2019.00028] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Accepted: 05/07/2019] [Indexed: 11/16/2022] Open
Abstract
Any visual sensor, whether artificial or biological, maps the 3D-world on a 2D-representation. The missing dimension is depth and most species use stereo vision to recover it. Stereo vision implies multiple perspectives and matching, hence it obtains depth from a pair of images. Algorithms for stereo vision are also used prosperously in robotics. Although, biological systems seem to compute disparities effortless, artificial methods suffer from high energy demands and latency. The crucial part is the correspondence problem; finding the matching points of two images. The development of event-based cameras, inspired by the retina, enables the exploitation of an additional physical constraint—time. Due to their asynchronous course of operation, considering the precise occurrence of spikes, Spiking Neural Networks take advantage of this constraint. In this work, we investigate sensors and algorithms for event-based stereo vision leading to more biologically plausible robots. Hereby, we focus mainly on binocular stereo vision.
Collapse
Affiliation(s)
- Lea Steffen
- FZI Research Center for Information Technology, Karlsruhe, Germany
| | - Daniel Reichard
- FZI Research Center for Information Technology, Karlsruhe, Germany
| | - Jakob Weinland
- FZI Research Center for Information Technology, Karlsruhe, Germany
| | - Jacques Kaiser
- FZI Research Center for Information Technology, Karlsruhe, Germany
| | - Arne Roennau
- FZI Research Center for Information Technology, Karlsruhe, Germany
| | - Rüdiger Dillmann
- FZI Research Center for Information Technology, Karlsruhe, Germany.,Humanoids and Intelligence Systems Lab, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
| |
Collapse
|
13
|
Stereo Matching in Address-Event-Representation (AER) Bio-Inspired Binocular Systems in a Field-Programmable Gate Array (FPGA). ELECTRONICS 2019. [DOI: 10.3390/electronics8040410] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In stereo-vision processing, the image-matching step is essential for results, although it involves a very high computational cost. Moreover, the more information is processed, the more time is spent by the matching algorithm, and the more inefficient it is. Spike-based processing is a relatively new approach that implements processing methods by manipulating spikes one by one at the time they are transmitted, like a human brain. The mammal nervous system can solve much more complex problems, such as visual recognition by manipulating neuron spikes. The spike-based philosophy for visual information processing based on the neuro-inspired address-event-representation (AER) is currently achieving very high performance. The aim of this work was to study the viability of a matching mechanism in stereo-vision systems, using AER codification and its implementation in a field-programmable gate array (FPGA). Some studies have been done before in an AER system with monitored data using a computer; however, this kind of mechanism has not been implemented directly on hardware. To this end, an epipolar geometry basis applied to AER systems was studied and implemented, with other restrictions, in order to achieve good results in a real-time scenario. The results and conclusions are shown, and the viability of its implementation is proven.
Collapse
|
14
|
Pfeiffer M, Pfeil T. Deep Learning With Spiking Neurons: Opportunities and Challenges. Front Neurosci 2018; 12:774. [PMID: 30410432 PMCID: PMC6209684 DOI: 10.3389/fnins.2018.00774] [Citation(s) in RCA: 137] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 10/04/2018] [Indexed: 01/16/2023] Open
Abstract
Spiking neural networks (SNNs) are inspired by information processing in biology, where sparse and asynchronous binary signals are communicated and processed in a massively parallel fashion. SNNs on neuromorphic hardware exhibit favorable properties such as low power consumption, fast inference, and event-driven information processing. This makes them interesting candidates for the efficient implementation of deep neural networks, the method of choice for many machine learning tasks. In this review, we address the opportunities that deep spiking networks offer and investigate in detail the challenges associated with training SNNs in a way that makes them competitive with conventional deep learning, but simultaneously allows for efficient mapping to hardware. A wide range of training methods for SNNs is presented, ranging from the conversion of conventional deep networks into SNNs, constrained training before conversion, spiking variants of backpropagation, and biologically motivated variants of STDP. The goal of our review is to define a categorization of SNN training methods, and summarize their advantages and drawbacks. We further discuss relationships between SNNs and binary networks, which are becoming popular for efficient digital hardware implementation. Neuromorphic hardware platforms have great potential to enable deep spiking networks in real-world applications. We compare the suitability of various neuromorphic systems that have been developed over the past years, and investigate potential use cases. Neuromorphic approaches and conventional machine learning should not be considered simply two solutions to the same classes of problems, instead it is possible to identify and exploit their task-specific advantages. Deep SNNs offer great opportunities to work with new types of event-based sensors, exploit temporal codes and local on-chip learning, and we have so far just scratched the surface of realizing these advantages in practical applications.
Collapse
Affiliation(s)
- Michael Pfeiffer
- Bosch Center for Artificial Intelligence, Robert Bosch GmbH, Renningen, Germany
| | | |
Collapse
|
15
|
Cohen G, Afshar S, Orchard G, Tapson J, Benosman R, van Schaik A. Spatial and Temporal Downsampling in Event-Based Visual Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5030-5044. [PMID: 29994752 DOI: 10.1109/tnnls.2017.2785272] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
As the interest in event-based vision sensors for mobile and aerial applications grows, there is an increasing need for high-speed and highly robust algorithms for performing visual tasks using event-based data. As event rate and network structure have a direct impact on the power consumed by such systems, it is important to explore the efficiency of the event-based encoding used by these sensors. The work presented in this paper represents the first study solely focused on the effects of both spatial and temporal downsampling on event-based vision data and makes use of a variety of data sets chosen to fully explore and characterize the nature of downsampling operations. The results show that both spatial downsampling and temporal downsampling produce improved classification accuracy and, additionally, a lower overall data rate. A finding is particularly relevant for bandwidth and power constrained systems. For a given network containing 1000 hidden layer neurons, the spatially downsampled systems achieved a best case accuracy of 89.38% on N-MNIST as opposed to 81.03% with no downsampling at the same hidden layer size. On the N-Caltech101 data set, the downsampled system achieved a best case accuracy of 18.25%, compared with 7.43% achieved with no downsampling. The results show that downsampling is an important preprocessing technique in event-based visual processing, especially for applications sensitive to power consumption and transmission bandwidth.
Collapse
|
16
|
Camunas-Mesa LA, Serrano-Gotarredona T, Ieng SH, Benosman R, Linares-Barranco B. Event-Driven Stereo Visual Tracking Algorithm to Solve Object Occlusion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:4223-4237. [PMID: 29989974 DOI: 10.1109/tnnls.2017.2759326] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Object tracking is a major problem for many computer vision applications, but it continues to be computationally expensive. The use of bio-inspired neuromorphic event-driven dynamic vision sensors (DVSs) has heralded new methods for vision processing, exploiting reduced amount of data and very precise timing resolutions. Previous studies have shown these neural spiking sensors to be well suited to implementing single-sensor object tracking systems, although they experience difficulties when solving ambiguities caused by object occlusion. DVSs have also performed well in 3-D reconstruction in which event matching techniques are applied in stereo setups. In this paper, we propose a new event-driven stereo object tracking algorithm that simultaneously integrates 3-D reconstruction and cluster tracking, introducing feedback information in both tasks to improve their respective performances. This algorithm, inspired by human vision, identifies objects and learns their position and size in order to solve ambiguities. This strategy has been validated in four different experiments where the 3-D positions of two objects were tracked in a stereo setup even when occlusion occurred. The objects studied in the experiments were: 1) two swinging pens, the distance between which during movement was measured with an error of less than 0.5%; 2) a pen and a box, to confirm the correctness of the results obtained with a more complex object; 3) two straws attached to a fan and rotating at 6 revolutions per second, to demonstrate the high-speed capabilities of this approach; and 4) two people walking in a real-world environment.
Collapse
|
17
|
Ieng SH, Carneiro J, Osswald M, Benosman R. Neuromorphic Event-Based Generalized Time-Based Stereovision. Front Neurosci 2018; 12:442. [PMID: 30013461 PMCID: PMC6036184 DOI: 10.3389/fnins.2018.00442] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 06/12/2018] [Indexed: 11/13/2022] Open
Abstract
3D reconstruction from multiple viewpoints is an important problem in machine vision that allows recovering tridimensional structures from multiple two-dimensional views of a given scene. Reconstructions from multiple views are conventionally achieved through a process of pixel luminance-based matching between different views. Unlike conventional machine vision methods that solve matching ambiguities by operating only on spatial constraints and luminance, this paper introduces a fully time-based solution to stereovision using the high temporal resolution of neuromorphic asynchronous event-based cameras. These cameras output dynamic visual information in the form of what is known as “change events” that encode the time, the location and the sign of the luminance changes. A more advanced event-based camera, the Asynchronous Time-based Image Sensor (ATIS), in addition of change events, encodes absolute luminance as time differences. The stereovision problem can then be formulated solely in the time domain as a problem of events coincidences detection problem. This work is improving existing event-based stereovision techniques by adding luminance information that increases the matching reliability. It also introduces a formulation that does not require to build local frames (though it is still possible) from the luminances which can be costly to implement. Finally, this work also introduces a methodology for time based stereovision in the context of binocular and trinocular configurations using time based event matching criterion combining for the first time all together: space, time, luminance, and motion.
Collapse
Affiliation(s)
- Sio-Hoi Ieng
- Institut National de la Santé et de La Recherche Médicale UMRI S 968, Sorbonne Universités, UPMC Universités Paris, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| | - Joao Carneiro
- Institut National de la Santé et de La Recherche Médicale UMRI S 968, Sorbonne Universités, UPMC Universités Paris, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| | - Marc Osswald
- Institute of Neuroinformatics, University and ETH Zurich, Zurich, Switzerland
| | - Ryad Benosman
- Institut National de la Santé et de La Recherche Médicale UMRI S 968, Sorbonne Universités, UPMC Universités Paris, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| |
Collapse
|
18
|
Zhu AZ, Thakur D, Ozaslan T, Pfrommer B, Kumar V, Daniilidis K. The Multivehicle Stereo Event Camera Dataset: An Event Camera Dataset for 3D Perception. IEEE Robot Autom Lett 2018. [DOI: 10.1109/lra.2018.2800793] [Citation(s) in RCA: 130] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
19
|
Ieng SH, Lehtonen E, Benosman R. Complexity Analysis of Iterative Basis Transformations Applied to Event-Based Signals. Front Neurosci 2018; 12:373. [PMID: 29946231 PMCID: PMC6006676 DOI: 10.3389/fnins.2018.00373] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 05/14/2018] [Indexed: 11/30/2022] Open
Abstract
This paper introduces an event-based methodology to perform arbitrary linear basis transformations that encompass a broad range of practically important signal transforms, such as the discrete Fourier transform (DFT) and the discrete wavelet transform (DWT). We present a complexity analysis of the proposed method, and show that the amount of required multiply-and-accumulate operations is reduced in comparison to frame-based method in natural video sequences, when the required temporal resolution is high enough. Experimental results on natural video sequences acquired by the asynchronous time-based neuromorphic image sensor (ATIS) are provided to support the feasibility of the method, and to illustrate the gain in computation resources.
Collapse
Affiliation(s)
- Sio-Hoi Ieng
- INSERM UMRI S 968, Sorbonne Universites, UPMC Univ Paris 06, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
- *Correspondence: Sio-Hoi Ieng
| | - Eero Lehtonen
- Department of Future Technologies, University of Turku, Turku, Finland
| | - Ryad Benosman
- INSERM UMRI S 968, Sorbonne Universites, UPMC Univ Paris 06, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| |
Collapse
|
20
|
Marcireau A, Ieng SH, Simon-Chane C, Benosman RB. Event-Based Color Segmentation With a High Dynamic Range Sensor. Front Neurosci 2018; 12:135. [PMID: 29695948 PMCID: PMC5904265 DOI: 10.3389/fnins.2018.00135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 02/20/2018] [Indexed: 12/01/2022] Open
Abstract
This paper introduces a color asynchronous neuromorphic event-based camera and a methodology to process color output from the device to perform color segmentation and tracking at the native temporal resolution of the sensor (down to one microsecond). Our color vision sensor prototype is a combination of three Asynchronous Time-based Image Sensors, sensitive to absolute color information. We devise a color processing algorithm leveraging this information. It is designed to be computationally cheap, thus showing how low level processing benefits from asynchronous acquisition and high temporal resolution data. The resulting color segmentation and tracking performance is assessed both with an indoor controlled scene and two outdoor uncontrolled scenes. The tracking's mean error to the ground truth for the objects of the outdoor scenes ranges from two to twenty pixels.
Collapse
Affiliation(s)
- Alexandre Marcireau
- Institut National de la Santé et de la Recherche Médicale, UMRI S 968, Sorbonne Universites, UPMC Univ Paris 06, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| | - Sio-Hoi Ieng
- Institut National de la Santé et de la Recherche Médicale, UMRI S 968, Sorbonne Universites, UPMC Univ Paris 06, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| | - Camille Simon-Chane
- Institut National de la Santé et de la Recherche Médicale, UMRI S 968, Sorbonne Universites, UPMC Univ Paris 06, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| | - Ryad B Benosman
- Institut National de la Santé et de la Recherche Médicale, UMRI S 968, Sorbonne Universites, UPMC Univ Paris 06, UMR S 968, Centre National de la Recherche Scientifique, UMR 7210, Institut de la Vision, Paris, France
| |
Collapse
|
21
|
Padala V, Basu A, Orchard G. A Noise Filtering Algorithm for Event-Based Asynchronous Change Detection Image Sensors on TrueNorth and Its Implementation on TrueNorth. Front Neurosci 2018; 12:118. [PMID: 29556172 PMCID: PMC5844986 DOI: 10.3389/fnins.2018.00118] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Accepted: 02/14/2018] [Indexed: 11/13/2022] Open
Abstract
Asynchronous event-based sensors, or “silicon retinae,” are a new class of vision sensors inspired by biological vision systems. The output of these sensors often contains a significant number of noise events along with the signal. Filtering these noise events is a common preprocessing step before using the data for tasks such as tracking and classification. This paper presents a novel spiking neural network-based approach to filtering noise events from data captured by an Asynchronous Time-based Image Sensor on a neuromorphic processor, the IBM TrueNorth Neurosynaptic System. The significant contribution of this work is that it demonstrates our proposed filtering algorithm outperforms the traditional nearest neighbor noise filter in achieving higher signal to noise ratio (~10 dB higher) and retaining the events related to signal (~3X more). In addition, for our envisioned application of object tracking and classification under some parameter settings, it can also generate some of the missing events in the spatial neighborhood of the signal for all classes of moving objects in the data which are unattainable using the nearest neighbor filter.
Collapse
Affiliation(s)
- Vandana Padala
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
| | - Arindam Basu
- School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
| | - Garrick Orchard
- Singapore Institute for Neurotechnology (SINAPSE), National University of Singapore, Singapore, Singapore.,Temasek Labs, National University of Singapore, Singapore, Singapore
| |
Collapse
|
22
|
Abstract
In this article, we focus on the problem of depth estimation from a stereo pair of event-based sensors. These sensors asynchronously capture pixel-level brightness changes information (events) instead of standard intensity images at a specified frame rate. So, these sensors provide sparse data at low latency and high temporal resolution over a wide intrascene dynamic range. However, new asynchronous, event-based processing algorithms are required to process the event streams. We propose a fully event-based stereo three-dimensional depth estimation algorithm inspired by semiglobal matching. Our algorithm considers the smoothness constraints between the nearby events to remove the ambiguous and wrong matches when only using the properties of a single event or local features. Experimental validation and comparison with several state-of-the-art, event-based stereo matching methods are provided on five different scenes of event-based stereo data sets. The results show that our method can operate well in an event-driven way and has higher estimation accuracy.
Collapse
Affiliation(s)
- Zhen Xie
- College of Computer Science, Zhejiang University of Technology, Hangzhou, People’s Republic of China
| | - Jianhua Zhang
- College of Computer Science, Zhejiang University of Technology, Hangzhou, People’s Republic of China
| | - Pengfei Wang
- Temasek Laboratories, National University of Singapore, Singapore, Singapore
| |
Collapse
|
23
|
Rebecq H, Gallego G, Mueggler E, Scaramuzza D. EMVS: Event-Based Multi-View Stereo—3D Reconstruction with an Event Camera in Real-Time. Int J Comput Vis 2017. [DOI: 10.1007/s11263-017-1050-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
24
|
Xie Z, Chen S, Orchard G. Event-Based Stereo Depth Estimation Using Belief Propagation. Front Neurosci 2017; 11:535. [PMID: 29051722 PMCID: PMC5633728 DOI: 10.3389/fnins.2017.00535] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 09/14/2017] [Indexed: 11/13/2022] Open
Abstract
Compared to standard frame-based cameras, biologically-inspired event-based sensors capture visual information with low latency and minimal redundancy. These event-based sensors are also far less prone to motion blur than traditional cameras, and still operate effectively in high dynamic range scenes. However, classical framed-based algorithms are not typically suitable for these event-based data and new processing algorithms are required. This paper focuses on the problem of depth estimation from a stereo pair of event-based sensors. A fully event-based stereo depth estimation algorithm which relies on message passing is proposed. The algorithm not only considers the properties of a single event but also uses a Markov Random Field (MRF) to consider the constraints between the nearby events, such as disparity uniqueness and depth continuity. The method is tested on five different scenes and compared to other state-of-art event-based stereo matching methods. The results show that the method detects more stereo matches than other methods, with each match having a higher accuracy. The method can operate in an event-driven manner where depths are reported for individual events as they are received, or the network can be queried at any time to generate a sparse depth frame which represents the current state of the network.
Collapse
Affiliation(s)
- Zhen Xie
- College of Computer Science, Zhejiang University of Technology, Hangzhou, China.,Temasek Laboratories, National University of Singapore, Singapore, Singapore
| | - Shengyong Chen
- College of Computer Science, Zhejiang University of Technology, Hangzhou, China
| | - Garrick Orchard
- Temasek Laboratories, National University of Singapore, Singapore, Singapore.,Singapore Institute for Neurotechnology (SINAPSE), National University of Singapore, Singapore, Singapore
| |
Collapse
|
25
|
Sabatier Q, Ieng SH, Benosman R. Asynchronous Event-Based Fourier Analysis. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:2192-2202. [PMID: 28186889 DOI: 10.1109/tip.2017.2661702] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This paper introduces a method to compute the FFT of a visual scene at a high temporal precision of around 1- [Formula: see text] output from an asynchronous event-based camera. Event-based cameras allow to go beyond the widespread and ingrained belief that acquiring series of images at some rate is a good way to capture visual motion. Each pixel adapts its own sampling rate to the visual input it receives and defines the timing of its own sampling points in response to its visual input by reacting to changes of the amount of incident light. As a consequence, the sampling process is no longer governed by a fixed timing source but by the signal to be sampled itself, or more precisely by the variations of the signal in the amplitude domain. Event-based cameras acquisition paradigm allows to go beyond the current conventional method to compute the FFT. The event-driven FFT algorithm relies on a heuristic methodology designed to operate directly on incoming gray level events to update incrementally the FFT while reducing both computation and data load. We show that for reasonable levels of approximations at equivalent frame rates beyond the millisecond, the method performs faster and more efficiently than conventional image acquisition. Several experiments are carried out on indoor and outdoor scenes where both conventional and event-driven FFT computation is shown and compared.
Collapse
|
26
|
Ieng SH, Carneiro J, Benosman RB. Event-Based 3D Motion Flow Estimation Using 4D Spatio Temporal Subspaces Properties. Front Neurosci 2017; 10:596. [PMID: 28220057 PMCID: PMC5292574 DOI: 10.3389/fnins.2016.00596] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Accepted: 12/14/2016] [Indexed: 11/13/2022] Open
Abstract
State of the art scene flow estimation techniques are based on projections of the 3D motion on image using luminance-sampled at the frame rate of the cameras-as the principal source of information. We introduce in this paper a pure time based approach to estimate the flow from 3D point clouds primarily output by neuromorphic event-based stereo camera rigs, or by any existing 3D depth sensor even if it does not provide nor use luminance. This method formulates the scene flow problem by applying a local piecewise regularization of the scene flow. The formulation provides a unifying framework to estimate scene flow from synchronous and asynchronous 3D point clouds. It relies on the properties of 4D space time using a decomposition into its subspaces. This method naturally exploits the properties of the neuromorphic asynchronous event based vision sensors that allows continuous time 3D point clouds reconstruction. The approach can also handle the motion of deformable object. Experiments using different 3D sensors are presented.
Collapse
Affiliation(s)
- Sio-Hoi Ieng
- Institut National de la Santé et de la Recherche Médicale, UMRI S 968; Sorbonne Université, University of Pierre and Marie Curie, Univ Paris 06, UMR S 968; Centre National de la Recherche Scientifique, UMR 7210, Institut de la VisionParis, France
| | | | | |
Collapse
|
27
|
Osswald M, Ieng SH, Benosman R, Indiveri G. A spiking neural network model of 3D perception for event-based neuromorphic stereo vision systems. Sci Rep 2017; 7:40703. [PMID: 28079187 PMCID: PMC5227683 DOI: 10.1038/srep40703] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Accepted: 12/08/2016] [Indexed: 11/09/2022] Open
Abstract
Stereo vision is an important feature that enables machine vision systems to perceive their environment in 3D. While machine vision has spawned a variety of software algorithms to solve the stereo-correspondence problem, their implementation and integration in small, fast, and efficient hardware vision systems remains a difficult challenge. Recent advances made in neuromorphic engineering offer a possible solution to this problem, with the use of a new class of event-based vision sensors and neural processing devices inspired by the organizing principles of the brain. Here we propose a radically novel model that solves the stereo-correspondence problem with a spiking neural network that can be directly implemented with massively parallel, compact, low-latency and low-power neuromorphic engineering devices. We validate the model with experimental results, highlighting features that are in agreement with both computational neuroscience stereo vision theories and experimental findings. We demonstrate its features with a prototype neuromorphic hardware system and provide testable predictions on the role of spike-based representations and temporal dynamics in biological stereo vision processing systems.
Collapse
Affiliation(s)
- Marc Osswald
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
| | - Sio-Hoi Ieng
- Université Pierre et Marie Curie, Institut de la Vision, Paris, France
| | - Ryad Benosman
- Université Pierre et Marie Curie, Institut de la Vision, Paris, France
| | - Giacomo Indiveri
- Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
| |
Collapse
|
28
|
Clady X, Maro JM, Barré S, Benosman RB. A Motion-Based Feature for Event-Based Pattern Recognition. Front Neurosci 2017; 10:594. [PMID: 28101001 PMCID: PMC5209354 DOI: 10.3389/fnins.2016.00594] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Accepted: 12/13/2016] [Indexed: 11/13/2022] Open
Abstract
This paper introduces an event-based luminance-free feature from the output of asynchronous event-based neuromorphic retinas. The feature consists in mapping the distribution of the optical flow along the contours of the moving objects in the visual scene into a matrix. Asynchronous event-based neuromorphic retinas are composed of autonomous pixels, each of them asynchronously generating "spiking" events that encode relative changes in pixels' illumination at high temporal resolutions. The optical flow is computed at each event, and is integrated locally or globally in a speed and direction coordinate frame based grid, using speed-tuned temporal kernels. The latter ensures that the resulting feature equitably represents the distribution of the normal motion along the current moving edges, whatever their respective dynamics. The usefulness and the generality of the proposed feature are demonstrated in pattern recognition applications: local corner detection and global gesture recognition.
Collapse
Affiliation(s)
- Xavier Clady
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| | - Jean-Matthieu Maro
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| | - Sébastien Barré
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| | - Ryad B Benosman
- Centre National de la Recherche Scientifique, Institut National de la Santé Et de la Recherche Médicale, Institut de la Vision, Sorbonne Universités, UPMC University Paris 06 Paris, France
| |
Collapse
|
29
|
Hu Y, Liu H, Pfeiffer M, Delbruck T. DVS Benchmark Datasets for Object Tracking, Action Recognition, and Object Recognition. Front Neurosci 2016; 10:405. [PMID: 27630540 PMCID: PMC5006598 DOI: 10.3389/fnins.2016.00405] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Accepted: 08/19/2016] [Indexed: 11/13/2022] Open
Affiliation(s)
- Yuhuang Hu
- Institute of Neuroinformatics, University of Zurich and ETH Zurich Zurich, Switzerland
| | - Hongjie Liu
- Institute of Neuroinformatics, University of Zurich and ETH Zurich Zurich, Switzerland
| | - Michael Pfeiffer
- Institute of Neuroinformatics, University of Zurich and ETH Zurich Zurich, Switzerland
| | - Tobi Delbruck
- Institute of Neuroinformatics, University of Zurich and ETH Zurich Zurich, Switzerland
| |
Collapse
|
30
|
Adaptive Neuromorphic Circuit for Stereoscopic Disparity Using Ocular Dominance Map. NEUROSCIENCE JOURNAL 2016; 2016:8751874. [PMID: 27243029 PMCID: PMC4868909 DOI: 10.1155/2016/8751874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2015] [Revised: 02/27/2016] [Accepted: 03/13/2016] [Indexed: 11/18/2022]
Abstract
Stereopsis or depth perception is a critical aspect of information processing in the brain and is computed from the positional shift or disparity between the images seen by the two eyes. Various algorithms and their hardware implementation that compute disparity in real time have been proposed; however, most of them compute disparity through complex mathematical calculations that are difficult to realize in hardware and are biologically unrealistic. The brain presumably uses simpler methods to extract depth information from the environment and hence newer methodologies that could perform stereopsis with brain like elegance need to be explored. This paper proposes an innovative aVLSI design that leverages the columnar organization of ocular dominance in the brain and uses time-staggered Winner Take All (ts-WTA) to adaptively create disparity tuned cells. Physiological findings support the presence of disparity cells in the visual cortex and show that these cells surface as a result of binocular stimulation received after birth. Therefore, creating in hardware cells that can learn different disparities with experience not only is novel but also is biologically more realistic. These disparity cells, when allowed to interact diffusively on a larger scale, can be used to adaptively create stable topological disparity maps in silicon.
Collapse
|
31
|
Asynchronous Event-based Cooperative Stereo Matching Using Neuromorphic Silicon Retinas. Neural Process Lett 2016. [DOI: 10.1007/s11063-015-9434-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
32
|
Reverter Valeiras D, Lagorce X, Clady X, Bartolozzi C, Ieng SH, Benosman R. An Asynchronous Neuromorphic Event-Driven Visual Part-Based Shape Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:3045-3059. [PMID: 25794399 DOI: 10.1109/tnnls.2015.2401834] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Object tracking is an important step in many artificial vision tasks. The current state-of-the-art implementations remain too computationally demanding for the problem to be solved in real time with high dynamics. This paper presents a novel real-time method for visual part-based tracking of complex objects from the output of an asynchronous event-based camera. This paper extends the pictorial structures model introduced by Fischler and Elschlager 40 years ago and introduces a new formulation of the problem, allowing the dynamic processing of visual input in real time at high temporal resolution using a conventional PC. It relies on the concept of representing an object as a set of basic elements linked by springs. These basic elements consist of simple trackers capable of successfully tracking a target with an ellipse-like shape at several kilohertz on a conventional computer. For each incoming event, the method updates the elastic connections established between the trackers and guarantees a desired geometric structure corresponding to the tracked object in real time. This introduces a high temporal elasticity to adapt to projective deformations of the tracked object in the focal plane. The elastic energy of this virtual mechanical system provides a quality criterion for tracking and can be used to determine whether the measured deformations are caused by the perspective projection of the perceived object or by occlusions. Experiments on real-world data show the robustness of the method in the context of dynamic face tracking.
Collapse
|
33
|
Orchard G, Meyer C, Etienne-Cummings R, Posch C, Thakor N, Benosman R. HFirst: A Temporal Approach to Object Recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2015; 37:2028-2040. [PMID: 26353184 DOI: 10.1109/tpami.2015.2392947] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This paper introduces a spiking hierarchical model for object recognition which utilizes the precise timing information inherently present in the output of biologically inspired asynchronous address event representation (AER) vision sensors. The asynchronous nature of these systems frees computation and communication from the rigid predetermined timing enforced by system clocks in conventional systems. Freedom from rigid timing constraints opens the possibility of using true timing to our advantage in computation. We show not only how timing can be used in object recognition, but also how it can in fact simplify computation. Specifically, we rely on a simple temporal-winner-take-all rather than more computationally intensive synchronous operations typically used in biologically inspired neural networks for object recognition. This approach to visual computation represents a major paradigm shift from conventional clocked systems and can find application in other sensory modalities and computational tasks. We showcase effectiveness of the approach by achieving the highest reported accuracy to date (97.5% ± 3.5%) for a previously published four class card pip recognition task and an accuracy of 84.9% ± 1.9% for a new more difficult 36 class character recognition task.
Collapse
|
34
|
Brosch T, Tschechne S, Neumann H. On event-based optical flow detection. Front Neurosci 2015; 9:137. [PMID: 25941470 PMCID: PMC4403305 DOI: 10.3389/fnins.2015.00137] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Accepted: 04/02/2015] [Indexed: 11/16/2022] Open
Abstract
Event-based sensing, i.e., the asynchronous detection of luminance changes, promises low-energy, high dynamic range, and sparse sensing. This stands in contrast to whole image frame-wise acquisition by standard cameras. Here, we systematically investigate the implications of event-based sensing in the context of visual motion, or flow, estimation. Starting from a common theoretical foundation, we discuss different principal approaches for optical flow detection ranging from gradient-based methods over plane-fitting to filter based methods and identify strengths and weaknesses of each class. Gradient-based methods for local motion integration are shown to suffer from the sparse encoding in address-event representations (AER). Approaches exploiting the local plane like structure of the event cloud, on the other hand, are shown to be well suited. Within this class, filter based approaches are shown to define a proper detection scheme which can also deal with the problem of representing multiple motions at a single location (motion transparency). A novel biologically inspired efficient motion detector is proposed, analyzed and experimentally validated. Furthermore, a stage of surround normalization is incorporated. Together with the filtering this defines a canonical circuit for motion feature detection. The theoretical analysis shows that such an integrated circuit reduces motion ambiguity in addition to decorrelating the representation of motion related activations.
Collapse
Affiliation(s)
| | | | - Heiko Neumann
- Faculty of Engineering and Computer Science, Institute of Neural Information Processing, Ulm UniversityUlm, Germany
| |
Collapse
|
35
|
Akolkar H, Meyer C, Clady Z, Marre O, Bartolozzi C, Panzeri S, Benosman R. What can neuromorphic event-driven precise timing add to spike-based pattern recognition? Neural Comput 2015; 27:561-93. [PMID: 25602775 DOI: 10.1162/neco_a_00703] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
This letter introduces a study to precisely measure what an increase in spike timing precision can add to spike-driven pattern recognition algorithms. The concept of generating spikes from images by converting gray levels into spike timings is currently at the basis of almost every spike-based modeling of biological visual systems. The use of images naturally leads to generating incorrect artificial and redundant spike timings and, more important, also contradicts biological findings indicating that visual processing is massively parallel, asynchronous with high temporal resolution. A new concept for acquiring visual information through pixel-individual asynchronous level-crossing sampling has been proposed in a recent generation of asynchronous neuromorphic visual sensors. Unlike conventional cameras, these sensors acquire data not at fixed points in time for the entire array but at fixed amplitude changes of their input, resulting optimally sparse in space and time-pixel individually and precisely timed only if new, (previously unknown) information is available (event based). This letter uses the high temporal resolution spiking output of neuromorphic event-based visual sensors to show that lowering time precision degrades performance on several recognition tasks specifically when reaching the conventional range of machine vision acquisition frequencies (30-60 Hz). The use of information theory to characterize separability between classes for each temporal resolution shows that high temporal acquisition provides up to 70% more information that conventional spikes generated from frame-based acquisition as used in standard artificial vision, thus drastically increasing the separability between classes of objects. Experiments on real data show that the amount of information loss is correlated with temporal precision. Our information-theoretic study highlights the potentials of neuromorphic asynchronous visual sensors for both practical applications and theoretical investigations. Moreover, it suggests that representing visual information as a precise sequence of spike times as reported in the retina offers considerable advantages for neuro-inspired visual computations.
Collapse
Affiliation(s)
- Himanshu Akolkar
- iCub Facility, Istituto Italiano di Tecnologia, Genoa 16163, Italy
| | | | | | | | | | | | | |
Collapse
|
36
|
Lee JH, Delbruck T, Pfeiffer M, Park PKJ, Shin CW, Ryu HE, Kang BC. Real-time gesture interface based on event-driven processing from stereo silicon retinas. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2014; 25:2250-2263. [PMID: 25420246 DOI: 10.1109/tnnls.2014.2308551] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We propose a real-time hand gesture interface based on combining a stereo pair of biologically inspired event-based dynamic vision sensor (DVS) silicon retinas with neuromorphic event-driven postprocessing. Compared with conventional vision or 3-D sensors, the use of DVSs, which output asynchronous and sparse events in response to motion, eliminates the need to extract movements from sequences of video frames, and allows significantly faster and more energy-efficient processing. In addition, the rate of input events depends on the observed movements, and thus provides an additional cue for solving the gesture spotting problem, i.e., finding the onsets and offsets of gestures. We propose a postprocessing framework based on spiking neural networks that can process the events received from the DVSs in real time, and provides an architecture for future implementation in neuromorphic hardware devices. The motion trajectories of moving hands are detected by spatiotemporally correlating the stereoscopically verged asynchronous events from the DVSs by using leaky integrate-and-fire (LIF) neurons. Adaptive thresholds of the LIF neurons achieve the segmentation of trajectories, which are then translated into discrete and finite feature vectors. The feature vectors are classified with hidden Markov models, using a separate Gaussian mixture model for spotting irrelevant transition gestures. The disparity information from stereovision is used to adapt LIF neuron parameters to achieve recognition invariant of the distance of the user to the sensor, and also helps to filter out movements in the background of the user. Exploiting the high dynamic range of DVSs, furthermore, allows gesture recognition over a 60-dB range of scene illuminance. The system achieves recognition rates well over 90% under a variety of variable conditions with static and dynamic backgrounds with naïve users.
Collapse
|
37
|
Yu J, Gao X, Tao D, Li X, Zhang K. A unified learning framework for single image super-resolution. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2014; 25:780-792. [PMID: 24807954 DOI: 10.1109/tnnls.2013.2281313] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
It has been widely acknowledged that learning- and reconstruction-based super-resolution (SR) methods are effective to generate a high-resolution (HR) image from a single low-resolution (LR) input. However, learning-based methods are prone to introduce unexpected details into resultant HR images. Although reconstruction-based methods do not generate obvious artifacts, they tend to blur fine details and end up with unnatural results. In this paper, we propose a new SR framework that seamlessly integrates learning- and reconstruction-based methods for single image SR to: 1) avoid unexpected artifacts introduced by learning-based SR and 2) restore the missing high-frequency details smoothed by reconstruction-based SR. This integrated framework learns a single dictionary from the LR input instead of from external images to hallucinate details, embeds nonlocal means filter in the reconstruction-based SR to enhance edges and suppress artifacts, and gradually magnifies the LR input to the desired high-quality SR result. We demonstrate both visually and quantitatively that the proposed framework produces better results than previous methods from the literature.
Collapse
|
38
|
Camuñas-Mesa LA, Serrano-Gotarredona T, Ieng SH, Benosman RB, Linares-Barranco B. On the use of orientation filters for 3D reconstruction in event-driven stereo vision. Front Neurosci 2014; 8:48. [PMID: 24744694 PMCID: PMC3978326 DOI: 10.3389/fnins.2014.00048] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2013] [Accepted: 02/23/2014] [Indexed: 11/13/2022] Open
Abstract
The recently developed Dynamic Vision Sensors (DVS) sense visual information asynchronously and code it into trains of events with sub-micro second temporal resolution. This high temporal precision makes the output of these sensors especially suited for dynamic 3D visual reconstruction, by matching corresponding events generated by two different sensors in a stereo setup. This paper explores the use of Gabor filters to extract information about the orientation of the object edges that produce the events, therefore increasing the number of constraints applied to the matching algorithm. This strategy provides more reliably matched pairs of events, improving the final 3D reconstruction.
Collapse
Affiliation(s)
- Luis A Camuñas-Mesa
- Instituto de Microelectrónica de Sevilla (IMSE-CNM), CSIC y Universidad de Sevilla Sevilla, Spain
| | | | - Sio H Ieng
- UMR_S968 Inserm/UPMC/CNRS 7210, Institut de la Vision, Université de Pierre et Marie Curie Paris, France
| | - Ryad B Benosman
- UMR_S968 Inserm/UPMC/CNRS 7210, Institut de la Vision, Université de Pierre et Marie Curie Paris, France
| | - Bernabe Linares-Barranco
- Instituto de Microelectrónica de Sevilla (IMSE-CNM), CSIC y Universidad de Sevilla Sevilla, Spain
| |
Collapse
|
39
|
Benosman R, Clercq C, Lagorce X, Ieng SH, Bartolozzi C. Event-based visual flow. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2014; 25:407-417. [PMID: 24807038 DOI: 10.1109/tnnls.2013.2273537] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This paper introduces a new methodology to compute dense visual flow using the precise timings of spikes from an asynchronous event-based retina. Biological retinas, and their artificial counterparts, are totally asynchronous and data-driven and rely on a paradigm of light acquisition radically different from most of the currently used frame-grabber technologies. This paper introduces a framework to estimate visual flow from the local properties of events' spatiotemporal space. We will show that precise visual flow orientation and amplitude can be estimated using a local differential approach on the surface defined by coactive events. Experimental results are presented; they show the method adequacy with high data sparseness and temporal resolution of event-based acquisition that allows the computation of motion flow with microsecond accuracy and at very low computational cost.
Collapse
|
40
|
Hasler J, Marr B. Finding a roadmap to achieve large neuromorphic hardware systems. Front Neurosci 2013; 7:118. [PMID: 24058330 PMCID: PMC3767911 DOI: 10.3389/fnins.2013.00118] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 06/20/2013] [Indexed: 11/13/2022] Open
Abstract
Neuromorphic systems are gaining increasing importance in an era where CMOS digital computing techniques are reaching physical limits. These silicon systems mimic extremely energy efficient neural computing structures, potentially both for solving engineering applications as well as understanding neural computation. Toward this end, the authors provide a glimpse at what the technology evolution roadmap looks like for these systems so that Neuromorphic engineers may gain the same benefit of anticipation and foresight that IC designers gained from Moore's law many years ago. Scaling of energy efficiency, performance, and size will be discussed as well as how the implementation and application space of Neuromorphic systems are expected to evolve over time.
Collapse
Affiliation(s)
- Jennifer Hasler
- School of Electrical and Computer Engineering, Georgia Institute of TechnologyAtlanta, GA, USA
| | | |
Collapse
|
41
|
Carneiro J, Ieng SH, Posch C, Benosman R. Event-based 3D reconstruction from neuromorphic retinas. Neural Netw 2013; 45:27-38. [PMID: 23545156 DOI: 10.1016/j.neunet.2013.03.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Revised: 02/28/2013] [Accepted: 03/04/2013] [Indexed: 10/27/2022]
Abstract
This paper presents a novel N-ocular 3D reconstruction algorithm for event-based vision data from bio-inspired artificial retina sensors. Artificial retinas capture visual information asynchronously and encode it into streams of asynchronous spike-like pulse signals carrying information on, e.g., temporal contrast events in the scene. The precise time of the occurrence of these visual features are implicitly encoded in the spike timings. Due to the high temporal resolution of the asynchronous visual information acquisition, the output of these sensors is ideally suited for dynamic 3D reconstruction. The presented technique takes full benefit of the event-driven operation, i.e. events are processed individually at the moment they arrive. This strategy allows us to preserve the original dynamics of the scene, hence allowing for more robust 3D reconstructions. As opposed to existing techniques, this algorithm is based on geometric and time constraints alone, making it particularly simple to implement and largely linear.
Collapse
Affiliation(s)
- João Carneiro
- Université de Pierre et Marie Curie - Institut de la Vision, 17 rue Moreau, 75012 Paris, France.
| | | | | | | |
Collapse
|
42
|
Cao Y, He H, Man H. SOMKE: kernel density estimation over data streams by sequences of self-organizing maps. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:1254-1268. [PMID: 24807522 DOI: 10.1109/tnnls.2012.2201167] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, we propose a novel method SOMKE, for kernel density estimation (KDE) over data streams based on sequences of self-organizing map (SOM). In many stream data mining applications, the traditional KDE methods are infeasible because of the high computational cost, processing time, and memory requirement. To reduce the time and space complexity, we propose a SOM structure in this paper to obtain well-defined data clusters to estimate the underlying probability distributions of incoming data streams. The main idea of this paper is to build a series of SOMs over the data streams via two operations, that is, creating and merging the SOM sequences. The creation phase produces the SOM sequence entries for windows of the data, which obtains clustering information of the incoming data streams. The size of the SOM sequences can be further reduced by combining the consecutive entries in the sequence based on the measure of Kullback-Leibler divergence. Finally, the probability density functions over arbitrary time periods along the data streams can be estimated using such SOM sequences. We compare SOMKE with two other KDE methods for data streams, the M-kernel approach and the cluster kernel approach, in terms of accuracy and processing time for various stationary data streams. Furthermore, we also investigate the use of SOMKE over nonstationary (evolving) data streams, including a synthetic nonstationary data stream, a real-world financial data stream and a group of network traffic data streams. The simulation results illustrate the effectiveness and efficiency of the proposed approach.
Collapse
|